Web services for predictive text analytics

Language

Language identification detects the language a text is written in. Different languages use different characters. For example, Russian (Кирилица), Chinese (汉字) and Arabic (العربية) are easy to distinguish. Languages that use the same characters (e.g., Latin alphabet, abc) often have cues that set them apart (e.g., é ↔ ë).

language code
Afrikaans af
Amharic am
Aragonese an
Arabic ar
Assamese as
Azerbaijani az
Belarusian be
Bulgarian bg
Bengali bn
Breton br
Bosnian bs
Catalan ca
Czech cs
Welsh cy
Danish da
German de
Dzongkha dz
Greek el
English en
Esperanto eo
Spanish es
Estonian et
Basque eu
Persian fa
Finnish fi
Faroese fo
French fr
Irish ga
Galician gl
Gujarati gu
Hebrew he
Hindi hi
Croatian hr
Haitian ht
Hungarian hu
Armenian hy
Indonesian id
Icelandic is
Italian it
Japanese ja
Javanese jv
Georgian ka
Kazakh kk
Central Khmer km
Kannada kn
Korean ko
Kurdish ku
Kirghiz ky
Latin la
Luxembourgish lb
Lao lo
Lithuanian lt
Latvian lv
Malagasy mg
Macedonian mk
Malayalam ml
Mongolian mn
Marathi mr
Malay ms
Maltese mt
Norwegian (Bokmål) no
Nepali ne
Dutch nl
Nynorsk, Norwegian nn
Norwegian no
Occitan oc
Oriya or
Punjabi pa
Polish pl
Pushto ps
Portuguese pt
Quechua qu
Romanian ro
Russian ru
Kinyarwanda rw
Northern Sami se
Sinhalese si
Slovak sk
Slovenian sl
Albanian sq
Serbian sr
Swedish sv
Swahili sw
Tamil ta
Telugu te
Thai th
Tagalog tl
Turkish tr
Uyghur ug
Ukrainian uk
Urdu ur
Vietnamese vi
Volapük vo
Walloon wa
Xhosa xh
Chinese zh
Zulu zu

Request

URL https://api.textgain.com/1/language
Parameter Value
q your text (max. 3,000 characters)
key your personal key

Response

The server returns the predicted language code and confidence, as a JSON string.

Q https://api.textgain.com/1/language?q=Loved+this+book!&key=***
A {"language": "en", "confidence": 0.95}

Genre

Genre classification predicts the type of text, based on its length, tone of voice and content.

genre length tone content
article+formalnames & dates
blog +casualpronouns (me, my)
mail formalinterjections (hi, thanks)
news formalnames & dates
review casualadjectives (good, bad)
status casualsmileys :-)

Request

URL https://api.textgain.com/1/genre
Parameter Value
q your text (max. 3,000 characters)
key your personal key

Response

The server returns the predicted genre and confidence, as a JSON string.

Q https://api.textgain.com/1/genre?q=Loved+this+book!&key=***
A {"genre": "review", "confidence": 0.95}

Part-of-speech tags

Part-of-speech tagging identifies sentence breaks and word types. Words have different roles depending on how they are used. For example, the word shop can be a noun (a shop, object) or a verb (to shop, action).

part-of-speech tag % example
noun NOUN30car, Google
verb VERB14be, have, do
punctuation PUNC11. ! ? : ; ,
preposition PREP10of, in, to, with
determiner DET 9a, an, the
adjective ADJ 7great, new, big
adverb ADV 4very, most
number NUM 4forty-two
pronoun PRON 3I, you, we, her
conjunction CONJ 2and, or, but
foreign X 2adieu
particle PRT 2's, to + VERB
interjectionINTJ 1yes, oh, wow
* Percentages are indicative for English

Request

URL https://api.textgain.com/1/tag
Parameter Value
q your text (max. 3,000 characters)
key your personal key
lang en, es, cs, da, de, fr, it, no, nl, pt, ru, sv, sw

Response

The server returns a JSON string with a list of sentences. Each sentence is a list of phrases. Each phrase is a list of {word, tag} values:

Q https://api.textgain.com/1/tag?q=I+didn't+like+the+book.&lang=en&key=***
A
{"text": [
           [
             [ {"word": "I"   , "tag": "PRON"} ], 
             [ {"word": "did" , "tag": "VERB"}, 
               {"word": "n't" , "tag": "ADV" }, 
               {"word": "read", "tag": "VERB"} ], 
             [ {"word": "the" , "tag": "DET" }, 
               {"word": "book", "tag": "NOUN"} ], 
             [ {"word": "."   , "tag": "PUNC"} ]
           ]
         ], "confidence": 0.95}

Passive Voice

The use of the passive voice helps you to draw attention away from the agent of the action. Stylistically, however, it is often frowned upon, because it reduces readability. This classifier identifies the verbs involved in the passive voice of a sentence.

Request

URL https://api.textgain.com/1/passive
Parameter Value
q your text (max. 3,000 characters)
key your personal key
lang en, de, nl

Response

The server returns a JSON string with a list of {word, passive} values:

Q https://api.textgain.com/1/passive?lang=en&q=My+beautiful+car+was+completely+totaled.&key=***
A
{"text": [
            {"passive": 0, "word": "My"}, 
            {"passive": 0, "word": "beautiful"},
            {"passive": 0, "word": "car"},
            {"passive": 1, "word": "was"},
            {"passive": 0, "word": "completely"},
            {"passive": 1, "word": "totaled"},
            {"passive": 0, "word": "."}
         ]}

Hyphenation / Syllables

Readability metrics often rely on syllable counts. Hyphenation and syllabification go hand in hand. This classifier outputs hyphenation patterns and syllable counts. It is fairly robust to noisy language (see example *awsome to the right).

word language hyphenation # syllables
fasten enfas-ten2
duidelijk nldui-de-lijk3
bâtiment frbâ-ti-ment3
caliente esca-lien-te3
sprezzatura itsprez-za-tu-ra4

Request

URL https://api.textgain.com/1/syllables
Parameter Value
q your text (max. 3,000 characters)
key your personal key
lang af, as, be, bg, bn, ca, cop, cs, cu, cy, da, de (de-1901, de-1996, de-ch-1901), el-monoton, el-polyton, en (en-gb, en-us), eo, es, et, eu, fi, fr, fur, ga, gl, grc, gu, hi, hr, hsb, hu, hy, ia, id, is, it, ka, kmr, kn, la, la-x-classic, la-x-liturgic, lt, lv, ml, mn-cyrl, mr, mul-ethi, no, nl, nn, oc, or, pa, pl, pms, pt, rm, ro, ru, sa, sh-cyrl, sh-latn, sk, sl, sr-cyrl, sv, ta, te, th, tk, tr, uk, zh-latn-pinyin

Response

The server returns a JSON string with a list of {word, hyphenation, n_syllables} values:

Q https://api.textgain.com/1/syllables?q=awsome+party&lang=en&key=***
A
{"text": [
           {"hyphenation": ["aw", "some"], "n_syllables": 2, "word": "awsome"},
           {"hyphenation": ["par", "ty"], "n_syllables": 2, "word": "party"}
         ]}

Concepts

Concept extraction identifies keywords, key phrases and ‘named entities’ – names of persons, products, organizations, locations, dates, and so on. Keywords are nouns that appear more often in a text, and often at the start of a text. Named entities frequently start with a capital letter (e.g., Barack Obama). Concept extraction can be used to summarize a text, or to compare if two texts discuss similar topics for example.

concept example
named entityiPhone, Iraq, Obama
noun car, phone

Request

URL https://api.textgain.com/1/concepts
Parameter Value
q your text (max. 3,000 characters)
key your personal key
lang ar, en, es, de, fr, it, nl
top 10 (optional)

Response

The server returns the top concepts, as a JSON string.

Q https://api.textgain.com/1/concepts?q=Loved+this+book!&lang=en&key=***
A {"concepts": ["book"]}

Geocoding

Geocoding looks for place names in a text (in any language) and returns a list of possible locations, along with their longitude and latitude and country of origin. Note that the results are exhaustive! For example, Berlin, Germany as well as Berlin in Colombia (Berlín) will be returned. The results are sorted according to population size (if known).
If you want to do language-specific filtering (for instance if you don't want to consider From, the town in Norway, you can combine this web service with the POS-tagger and only retain the NOUNs.

Fields example
{'country': 'Afghanistan'}ولنزرده
{'type': 'country'}তিউনিসিয়া (Tunisia)
{'longitude': 4.43539}Borgerhout
{'latitude': -34.92866}Adelaide
{'population': 12294193}Mexico City

Target countries

language code
AfghanistanAF
Aland IslandsAX
AlbaniaAL
AlgeriaDZ
American SamoaAS
AndorraAD
AngolaAO
AnguillaAI
AntarcticaAQ
Antigua and BarbudaAG
ArgentinaAR
ArmeniaAM
ArubaAW
AustraliaAU
AustriaAT
AzerbaijanAZ
BahamasBS
BahrainBH
BangladeshBD
BarbadosBB
BelarusBY
BelgiumBE
BelizeBZ
BeninBJ
BermudaBM
BhutanBT
BoliviaBO
Bonaire, Saint Eustatius and Saba BQ
Bosnia and HerzegovinaBA
BotswanaBW
Bouvet IslandBV
BrazilBR
British Indian Ocean TerritoryIO
British Virgin IslandsVG
BruneiBN
BulgariaBG
Burkina FasoBF
BurundiBI
CambodiaKH
CameroonCM
CanadaCA
Cape VerdeCV
Cayman IslandsKY
Central African RepublicCF
ChadTD
ChileCL
ChinaCN
Christmas IslandCX
Cocos IslandsCC
ColombiaCO
ComorosKM
Cook IslandsCK
Costa RicaCR
CroatiaHR
CubaCU
CuracaoCW
CyprusCY
CzechiaCZ
Democratic Republic of the CongoCD
DenmarkDK
DjiboutiDJ
DominicaDM
Dominican RepublicDO
EcuadorEC
EgyptEG
El SalvadorSV
Equatorial GuineaGQ
EritreaER
EstoniaEE
EthiopiaET
Falkland IslandsFK
Faroe IslandsFO
FijiFJ
FinlandFI
FranceFR
French GuianaGF
French PolynesiaPF
French Southern TerritoriesTF
GabonGA
GambiaGM
GeorgiaGE
GermanyDE
GhanaGH
GibraltarGI
GreeceGR
GreenlandGL
GrenadaGD
GuadeloupeGP
GuamGU
GuatemalaGT
GuernseyGG
GuineaGN
Guinea-BissauGW
GuyanaGY
HaitiHT
Heard Island and McDonald IslandsHM
HondurasHN
Hong KongHK
HungaryHU
IcelandIS
IndiaIN
IndonesiaID
IranIR
IraqIQ
IrelandIE
Isle of ManIM
IsraelIL
ItalyIT
Ivory CoastCI
JamaicaJM
JapanJP
JerseyJE
JordanJO
KazakhstanKZ
KenyaKE
KiribatiKI
KosovoXK
KuwaitKW
KyrgyzstanKG
LaosLA
LatviaLV
LebanonLB
LesothoLS
LiberiaLR
LibyaLY
LiechtensteinLI
LithuaniaLT
LuxembourgLU
MacaoMO
MacedoniaMK
MadagascarMG
MalawiMW
MalaysiaMY
MaldivesMV
MaliML
MaltaMT
Marshall IslandsMH
MartiniqueMQ
MauritaniaMR
MauritiusMU
MayotteYT
MexicoMX
MicronesiaFM
MoldovaMD
MonacoMC
MongoliaMN
MontenegroME
MontserratMS
MoroccoMA
MozambiqueMZ
MyanmarMM
NamibiaNA
NauruNR
NepalNP
NetherlandsNL
Netherlands AntillesAN
New CaledoniaNC
New ZealandNZ
NicaraguaNI
NigerNE
NigeriaNG
NiueNU
Norfolk IslandNF
North KoreaKP
Northern Mariana IslandsMP
NorwayNO
OmanOM
PakistanPK
PalauPW
Palestinian TerritoryPS
PanamaPA
Papua New GuineaPG
ParaguayPY
PeruPE
PhilippinesPH
PitcairnPN
PolandPL
PortugalPT
Puerto RicoPR
QatarQA
Republic of the CongoCG
ReunionRE
RomaniaRO
RussiaRU
RwandaRW
Saint BarthelemyBL
Saint HelenaSH
Saint Kitts and NevisKN
Saint LuciaLC
Saint MartinMF
Saint Pierre and MiquelonPM
Saint Vincent and the GrenadinesVC
SamoaWS
San MarinoSM
Sao Tome and PrincipeST
Saudi ArabiaSA
SenegalSN
SerbiaRS
Serbia and MontenegroCS
SeychellesSC
Sierra LeoneSL
SingaporeSG
Sint MaartenSX
SlovakiaSK
SloveniaSI
Solomon IslandsSB
SomaliaSO
South AfricaZA
South Georgia and the South Sandwich IslandsGS
South KoreaKR
South SudanSS
SpainES
Sri LankaLK
SudanSD
SurinameSR
Svalbard and Jan MayenSJ
SwazilandSZ
SwedenSE
SwitzerlandCH
SyriaSY
TaiwanTW
TajikistanTJ
TanzaniaTZ
ThailandTH
Timor LesteTL
TogoTG
TokelauTK
TongaTO
Trinidad and TobagoTT
TunisiaTN
TurkeyTR
TurkmenistanTM
Turks and Caicos IslandsTC
TuvaluTV
U.S. Virgin IslandsVI
UgandaUG
UkraineUA
United Arab EmiratesAE
United KingdomGB
United StatesUS
United States Minor Outlying IslandsUM
UruguayUY
UzbekistanUZ
VanuatuVU
VaticanVA
VenezuelaVE
VietnamVN
Wallis and FutunaWF
Western SaharaEH
YemenYE
ZambiaZM
ZimbabweZW

Request

URL https://api.textgain.com/1/geocode
Parameter Value
q your text (max. 3,000 characters)
key your personal key

Response

The server returns the candidate locations as a JSON string, a sorted array of words that may indicate a location.

Q https://api.textgain.com/1/geocode?q=Eindhoven+is+pretty+far+from+Россия.&key=***
A [{"population": 140702000, "country": "Russia", "longitude": 105.318756, "latitude": 61.52401, "type": "country", "place_name": "Россия"}, {"population": 209620, "country": "Netherlands", "longitude": 5.47778, "latitude": 51.44083, "type": "town", "place_name": "Eindhoven"}, {"population": 76538, "country": "United States", "longitude": -98.18362, "latitude": 26.1948, "type": "town", "place_name": "far"}, {"population": 0, "country": "Russia", "longitude": 54.98333, "latitude": 52.4, "type": "town", "place_name": "Россия"}, {"population": 0, "country": "South Africa", "longitude": 18.64289, "latitude": -33.98233, "type": "town", "place_name": "Eindhoven"}]

Translation

A simple translation engine that finds English translations for words in a text.

Request

URL https://api.textgain.com/1/translate
Parameter Value
q your text (max. 3,000 characters)
key your personal key
lang ar

Response

The server returns a list of word - translations dictionaries for words in the input that could be translated, as a JSON string.

Q https://api.textgain.com/1/translate?q=ويكيبيديا&lang=ar&key=***
A [{"translation": ["Wikipedia"], "word": "ويكيبيديا"}]

Sentiment

Sentiment analysis predicts whether a text is objective (fact) or subjective (opinion). Subjective text contains adverbs and adjectives with a positive or negative ‘polarity’ that capture the author’s personal opinion (e.g., an excellent opportunity or a bad product).

rating polarity example
★★★★☆+1.0awesome, excellent
★★★☆☆+0.0neutral, scientific
★★☆☆☆−1.0bad, expensive

Request

URL https://api.textgain.com/1/sentiment
Parameter Value
q your text (max. 3,000 characters)
key your personal key
lang en, es, ar, da, de, fr, it, ja, nl, no, pl, pt, ru, sv, sw, zh

Response

The server returns the predicted polarity and confidence, as a JSON string.

Q https://api.textgain.com/1/sentiment?q=Loved+this+book!&lang=en&key=***
A {"polarity": 0.88, "confidence": 0.70}

Age

Age prediction estimates whether a text is written by an adolescent or an adult. Online, adolescents use more informal language, including abbreviated utterances (omg, wow) and mood (awesome, lame). Adolescents tend to talk about school, parents, and partying. Adults tend talk about work, children, health, and use more complex sentence structures.

age range
adolescent25−
adult 25+

Request

URL https://api.textgain.com/1/age
Parameter Value
q your text (max. 3,000 characters)
key your personal key
lang en, es, de, fr, nl

Response

The server returns the predicted age range and confidence, as a JSON string.

Q https://api.textgain.com/1/age?q=OMG+coool&lang=en&key=***
A {"age": "25-", "confidence": 0.75}

Gender

Gender prediction estimates whether a text is written by a man or a woman. Statistically, women tend to talk more about people and relationships (family, friends), while men are more interested in objects and things (e.g., cars, games). As a result, women will use more personal pronouns (I, you, we) in a social context and men will use more determiners (a, an, the) and more quantifiers (one, many).

gender code
man m
womanf

Request

URL https://api.textgain.com/1/gender
Parameter Value
q your text (max. 3,000 characters)
key your personal key
lang en, es, da, de, fi, fr, it, nl, no, pl, pt, sv
by who wrote it? (optional)

Response

The server returns the predicted gender and confidence, as a JSON string.

Q https://api.textgain.com/1/gender?q=I+like+it&by=Amy&lang=en&key=***
A {"gender": "f", "confidence": 0.95}

Gender Tagging

Gender tagging provides for each word in a text a male, female or neutral tag. These tags are estimated on observed language usage by male and female writers. Gender tagging differs from gender prediction, in that it indicates which words the respective genders have been observed to use more in writing, as opposed to measuring actual male vs female writing style.
Disclaimer: these values are taken from large corpora and reflect the gender bias patterns present in the data.

gendertag code
male m
femalef
neutraln

Request

URL https://api.textgain.com/1/gendertag
Parameter Value
q your text (max. 3,000 characters)
key your personal key
lang de, en, es, fr, it, nl, pl, pt, sv

Response

The server returns a JSON string with a list of words. Each word is a list of {word, gender} values:

Q https://api.textgain.com/1/gendertag?q=She+destroys+spiders&lang=en
A {"text": [{"gender": "f", "word": "She"}, {"gender": "m", "word": "destroys"}, {"gender": "f", "word": "spiders"}]}

Education

Education prediction estimates whether a text displays basic or advanced writing skills. Statistically, people with higher education will use more formal language, with more punctuation marks ( , ; : ), correct spelling and capitalization, longer words and less emoticons (cf. idk lol just talkin ☺☺☺).

education level code
high (MBA, PhD, ...)+
low

Request

URL https://api.textgain.com/1/education
Parameter Value
q your text (max. 3,000 characters)
key your personal key

Response

The server returns the predicted education level and confidence, as a JSON string.

Q https://api.textgain.com/1/education?q=AWSOME+PARTY+!!!&key=***
A {"education": "-", "confidence": 0.80}

Personality

Personality prediction estimates whether a text is written by an extraverted or an introverted person. Extraverts tend to be more sociable, assertive and playful, while introverts are more solitary, reserved and shy. As a result, extraverts will use we more often, and more positive adjectives and less formal language. Introverts will use I more often, and they employ a broader vocabulary.

trait code
extraversionE
introversionI

Request

URL https://api.textgain.com/1/personality
Parameter Value
q your text (max. 3,000 characters)
key your personal key
lang en, nl

Response

The server returns the predicted personality and confidence, as a JSON string.

Q https://api.textgain.com/1/personality?q=I+love+it!&lang=en&key=***
A {"personality": "E", "confidence": 0.60}