Web services for predictive text analytics

Language

Language identification detects the language a text is written in. Different languages use different characters. For example, Russian (Кирилица), Chinese (汉字) and Arabic (العربية) are easy to distinguish. Languages that use the same characters (e.g., Latin alphabet, abc) often have cues that set them apart (e.g., é ↔ ë).

language code
Afrikaans af
Amharic am
Aragonese an
Arabic ar
Assamese as
Azerbaijani az
Belarusian be
Bulgarian bg
Bengali bn
Breton br
Bosnian bs
Catalan ca
Czech cs
Welsh cy
Danish da
German de
Dzongkha dz
Greek el
English en
Esperanto eo
Spanish es
Estonian et
Basque eu
Persian fa
Finnish fi
Faroese fo
French fr
Irish ga
Galician gl
Gujarati gu
Hebrew he
Hindi hi
Croatian hr
Haitian ht
Hungarian hu
Armenian hy
Indonesian id
Icelandic is
Italian it
Japanese ja
Javanese jv
Georgian ka
Kazakh kk
Central Khmer km
Kannada kn
Korean ko
Kurdish ku
Kirghiz ky
Latin la
Luxembourgish lb
Lao lo
Lithuanian lt
Latvian lv
Malagasy mg
Macedonian mk
Malayalam ml
Mongolian mn
Marathi mr
Malay ms
Maltese mt
Bokmål Norwegian nb
Nepali ne
Dutch nl
Nynorsk, Norwegian nn
Norwegian no
Occitan oc
Oriya or
Punjabi pa
Polish pl
Pushto ps
Portuguese pt
Quechua qu
Romanian ro
Russian ru
Kinyarwanda rw
Northern Sami se
Sinhalese si
Slovak sk
Slovenian sl
Albanian sq
Serbian sr
Swedish sv
Swahili sw
Tamil ta
Telugu te
Thai th
Tagalog tl
Turkish tr
Uyghur ug
Ukrainian uk
Urdu ur
Vietnamese vi
Volapük vo
Walloon wa
Xhosa xh
Chinese zh
Zulu zu

Request

URL https://api.textgain.com/1/language
Parameter Value
q your text (max. 3,000 characters)
key your personal key

Response

The server returns the predicted language code and confidence, as a JSON string.

Q https://api.textgain.com/1/language?q=Loved+this+book!&key=***
A {"language": "en", "confidence": 0.95}

Genre

Genre classification predicts the type of text, based on its length, tone of voice and content.

genre length tone content
article+formalnames & dates
blog +casualpronouns (me, my)
mail formalinterjections (hi, thanks)
news formalnames & dates
review casualadjectives (good, bad)
status casualsmileys :-)

Request

URL https://api.textgain.com/1/genre
Parameter Value
q your text (max. 3,000 characters)
key your personal key

Response

The server returns the predicted genre and confidence, as a JSON string.

Q https://api.textgain.com/1/genre?q=Loved+this+book!&key=***
A {"genre": "review", "confidence": 0.95}

Part-of-speech tags

Part-of-speech tagging identifies sentence breaks and word types. Words have different roles depending on how they are used. For example, the word shop can be a noun (a shop, object) or a verb (to shop, action).

part-of-speech tag % example
noun NOUN30car, Google
verb VERB14be, have, do
punctuation PUNC11. ! ? : ; ,
preposition PREP10of, in, to, with
determiner DET 9a, an, the
adjective ADJ 7great, new, big
adverb ADV 4very, most
number NUM 4forty-two
pronoun PRON 3I, you, we, her
conjunction CONJ 2and, or, but
foreign X 2adieu
particle PRT 2's, to + VERB
interjectionINTJ 1yes, oh, wow
* Percentages are indicative for English

Request

URL https://api.textgain.com/1/tag
Parameter Value
q your text (max. 3,000 characters)
key your personal key
lang en, es, cs, da, de, fr, it, nl, pt, ru, sv, sw

Response

The server returns a JSON string with a list of sentences. Each sentence is a list of phrases. Each phrase is a list of {word, tag} values:

Q https://api.textgain.com/1/tag?q=I+didn't+like+the+book.&lang=en&key=***
A
{"text": [
           [
             [ {"word": "I"   , "tag": "PRON"} ], 
             [ {"word": "did" , "tag": "VERB"}, 
               {"word": "n't" , "tag": "ADV" }, 
               {"word": "read", "tag": "VERB"} ], 
             [ {"word": "the" , "tag": "DET" }, 
               {"word": "book", "tag": "NOUN"} ], 
             [ {"word": "."   , "tag": "PUNC"} ]
           ]
         ], "confidence": 0.95}

Hyphenation / Syllables

Readability metrics often rely on syllable counts. Hyphenation and syllabification go hand in hand. This classifier outputs hyphenation patterns and syllable counts. It is fairly robust to noisy language (see example *awsome to the right).

word language hyphenation # syllables
fasten enfas-ten2
duidelijk nldui-de-lijk3
bâtiment frbâ-ti-ment3
caliente esca-lien-te3
sprezzatura itsprez-za-tu-ra4

Request

URL https://api.textgain.com/1/syllables
Parameter Value
q your text (max. 3,000 characters)
key your personal key
lang af, as, be, bg, bn, ca, cop, cs, cu, cy, da, de (de-1901, de-1996, de-ch-1901), el-monoton, el-polyton, en (en-gb, en-us), eo, es, et, eu, fi, fr, fur, ga, gl, grc, gu, hi, hr, hsb, hu, hy, ia, id, is, it, ka, kmr, kn, la, la-x-classic, la-x-liturgic, lt, lv, ml, mn-cyrl, mr, mul-ethi, nb, nl, nn, oc, or, pa, pl, pms, pt, rm, ro, ru, sa, sh-cyrl, sh-latn, sk, sl, sr-cyrl, sv, ta, te, th, tk, tr, uk, zh-latn-pinyinen

Response

The server returns a JSON string with a list of {word, hyphenation, n_syllables} values:

Q https://api.textgain.com/1/syllables?q=awsome+party&lang=en&key=***
A
{"text": [
           {"hyphenation": ["aw", "some"], "n_syllables": 2, "word": "awsome"},
           {"hyphenation": ["par", "ty"], "n_syllables": 2, "word": "party"}
         ]}

Concepts

Concept extraction identifies keywords, key phrases and ‘named entities’ – names of persons, products, organizations, locations, dates, and so on. Keywords are nouns that appear more often in a text, and often at the start of a text. Named entities frequently start with a capital letter (e.g., Barack Obama). Concept extraction can be used to summarize a text, or to compare if two texts discuss similar topics for example.

concept example
named entityiPhone, Iraq, Obama
noun car, phone

Request

URL https://api.textgain.com/1/concepts
Parameter Value
q your text (max. 3,000 characters)
key your personal key
lang en, es, de, fr, it, nl
top 10 (optional)

Response

The server returns the top concepts, as a JSON string.

Q https://api.textgain.com/1/concepts?q=Loved+this+book!&lang=en&key=***
A {"concepts": ["book"]}

Sentiment

Sentiment analysis predicts whether a text is objective (fact) or subjective (opinion). Subjective text contains adverbs and adjectives with a positive or negative ‘polarity’ that capture the author’s personal opinion (e.g., an excellent opportunity or a bad product).

rating polarity example
★★★★☆+1.0awesome, excellent
★★★☆☆+0.0neutral, scientific
★★☆☆☆−1.0bad, expensive

Request

URL https://api.textgain.com/1/sentiment
Parameter Value
q your text (max. 3,000 characters)
key your personal key
lang en, es, ar, da, de, fr, it, ja, nl, no, pl, ru, sv, sw, zh

Response

The server returns the predicted polarity and confidence, as a JSON string.

Q https://api.textgain.com/1/sentiment?q=Loved+this+book!&lang=en&key=***
A {"polarity": 1.0, "confidence": 0.70}

Age

Age prediction estimates whether a text is written by an adolescent or an adult. Online, adolescents use more informal language, including abbreviated utterances (omg, wow) and mood (awesome, lame). Adolescents tend to talk about school, parents, and partying. Adults tend talk about work, children, health, and use more complex sentence structures.

age range
adolescent25−
adult 25+

Request

URL https://api.textgain.com/1/age
Parameter Value
q your text (max. 3,000 characters)
key your personal key
lang en, es, de, fr, nl

Response

The server returns the predicted age range and confidence, as a JSON string.

Q https://api.textgain.com/1/age?q=OMG+coool&lang=en&key=***
A {"age": "25-", "confidence": 0.75}

Gender

Gender prediction estimates whether a text is written by a man or a woman. Statistically, women tend to talk more about people and relationships (family, friends), while men are more interested in objects and things (e.g., cars, games). As a result, women will use more personal pronouns (I, you, we) in a social context and men will use more determiners (a, an, the) and more quantifiers (one, many).

gender code
man m
womanf

Request

URL https://api.textgain.com/1/gender
Parameter Value
q your text (max. 3,000 characters)
key your personal key
lang en, es, da, de, fi, fr, it, nl, no, pl, pt, sv
by who wrote it? (optional)

Response

The server returns the predicted gender and confidence, as a JSON string.

Q https://api.textgain.com/1/gender?q=I+like+it&by=Amy&lang=en&key=***
A {"gender": "f", "confidence": 0.95}

Gender Tagging

Gender tagging provides for each word in a text a male, female or neutral tag. These tags are estimated on observed language usage by male and female writers. Gender tagging differs from gender prediction, in that it attempts to measure what the respective genders have been observed to write about, as opposed to how they write. Note: these values are taken from large corpora and reflect the gender bias patterns present in the data.

gendertag code
male m
femalef
neutraln

Request

URL https://api.textgain.com/1/gendertag
Parameter Value
q your text (max. 3,000 characters)
key your personal key
lang en, nl

Response

The server returns a JSON string with a list of words. Each word is a list of {word, gender} values:

Q https://api.textgain.com/1/gendertag?q=She+destroys+spiders&lang=en
A {"text": [{"gender": "f", "word": "She"}, {"gender": "m", "word": "destroys"}, {"gender": "f", "word": "spiders"}]}

Education

Education prediction estimates whether a text displays basic or advanced writing skills. Statistically, people with higher education will use more formal language, with more punctuation marks ( , ; : ), correct spelling and capitalization, longer words and less emoticons (cf. idk lol just talkin ☺☺☺).

education level code
high (MBA, PhD, ...)+
low

Request

URL https://api.textgain.com/1/education
Parameter Value
q your text (max. 3,000 characters)
key your personal key

Response

The server returns the predicted education level and confidence, as a JSON string.

Q https://api.textgain.com/1/education?q=AWSOME+PARTY+!!!&key=***
A {"education": "-", "confidence": 0.80}

Personality

Personality prediction estimates whether a text is written by an extraverted or an introverted person. Extraverts tend to be more sociable, assertive and playful, while introverts are more solitary, reserved and shy. As a result, extraverts will use we more often, and more positive adjectives and less formal language. Introverts will use I more often, and they employ a broader vocabulary.

trait code
extraversionE
introversionI

Request

URL https://api.textgain.com/1/personality
Parameter Value
q your text (max. 3,000 characters)
key your personal key
lang en, nl

Response

The server returns the predicted personality and confidence, as a JSON string.

Q https://api.textgain.com/1/personality?q=I+love+it!&lang=en&key=***
A {"personality": "E", "confidence": 0.60}