Text analytics web services

We have a wide range of web services available. On this page, we describe what they do. For the technical details, check out our DevOps Center.

How does it work?

Textgain provides a set of URLs to which you can send requests, i.e. text to analyze. Note the ?q in the example below. You need to send your personal key with each request (replace ***). The server responds with a JSON string, a standardized, compact data format.

$r = 'https://api.textgain.com/1/age?q=lets+roll&key=***';
$r = file_get_contents($r);
$r = json_decode($r);
echo $r['age'];

Requests

  • Example: https://api.textgain.com/1/age?q=lets+roll&key=***
  • Requests are sent by secure HTTPS (verify).
  • Up to 100 free requests per day, no key needed.
  • Up to 3,000 characters per request (1 page).

    Response
    The server returns:
  • a JSON string. For example: {"age": "25-", "confidence": 0.75}
  • a HTTP 429 “Too Many Requests” status code if the daily limit is exceeded.

  • Profiling

    Discover the author behind a text through writing style analysis.

    Age Prediction

    Age prediction estimates whether a text is written by an adolescent or an adult. Online, adolescents use more informal language, including abbreviated utterances (omg, wow) and mood (awesome, lame). Adolescents tend to talk about school, parents, and partying. Adults tend talk about work, children, health, and use more complex sentence structures.


    REST-API >

    Gender Prediction

    Gender prediction estimates whether a text is written by a man or a woman. Statistically, women tend to talk more about people and relationships (family, friends), while men are more interested in objects and things (e.g., cars, games). As a result, women will use more personal pronouns (I, you, we) in a social context and men will use more determiners (a, an, the) and more quantifiers (one, many).

    REST-API >

    Gender Tagging

    Gender tagging provides for each word in a text a male, female or neutral tag. These tags are estimated on observed language usage by male and female writers. Gender tagging differs from gender prediction, in that it indicates which words the respective genders have been observed to use more in writing, as opposed to measuring typical male vs female writing style.

    REST-API >

    Education Prediction

    Education prediction estimates whether a text displays basic or advanced writing skills. Statistically, people with higher education will use more formal language and use more punctuation marks (, ; :), correct spelling and capitalization, longer words and sentences and less emoji (cf. idk lol just talkin ☺☺☺).


    REST-API >

    Personality Prediction

    Personality prediction estimates whether a text is written by an extraverted or an introverted person. Extraverts tend to be more sociable, assertive and playful, while introverts are more solitary, reserved and shy. As a result, extraverts will use we more often, and more positive adjectives and less formal language. Introverts will use I more often, and they employ a broader vocabulary.

    REST-API >

    Sentiment Analysis

    Measure whether people are communicating in a positive, neutral or negative way.

    Sentiment analysis

    Sentiment analysis predicts whether a text is objective (fact) or subjective (opinion). Subjective text contains adverbs and adjectives with a positive or negative ‘polarity’ that capture the author’s personal opinion (e.g., an excellent opportunity or a bad product).


    REST-API >

    Sentiment tagging

    Sentiment tagging provides for each token in a text a sentiment score, expressing its polarity. Unlike sentiment analysis, it is not sensitive to more complex linguistic patterns, such as negation and modality, you can use this as an alternate way of calculating the overall sentiment or to extract subjective terms in your document.

    REST-API >

    Concept Extraction & Conversion

    Extract concepts from text and apply conversion, such as geocoding, anonymization or even simple word-translation.

    Concept extraction

    Concept extraction identifies keywords, key phrases and ‘named entities’ – names of persons, products, organizations, locations, dates, and so on. Keywords are nouns that appear more often in a text, and often at the start of a text. Named entities frequently start with a capital letter (e.g., Barack Obama). Concept extraction can be used to summarize a text, or to compare if two texts discuss similar topics for example.

    REST-API >

    Geocoding

    Geocoding looks for place names in a text (in any language) and returns a list of possible locations, along with their longitude and latitude and country of origin. Note that the results are exhaustive! For example, Berlin, Germany as well as Berlin in Colombia (Berlín) will be returned. The results are sorted according to population size (if known).



    REST-API >

    Concept translation

    A simple translation engine that finds English translations for words in a text. This is word-based translation model and should not be considered as a machine translation solution.

    REST-API >

    Lexicon & Readability

    These services are designed to help you grammatically analyze your documents and measure their readability.

    Lemmatization

    Lemmatization involves the morphological analysis of words to reduce them to their dictionary form (lemma). It is more powerful than stemming, which simply strips morphological prefixes, rather than taking into account a word's part-of-speech and allomorphic transformations. For example, "bathing" would be stemmed to "bath", but would be lemmatized as "bathe".

    REST-API >

    Part-of-Speech Tagging

    Part-of-speech tagging identifies sentence breaks and word types. Words have different roles depending on how they are used. For example, the word shop can be a noun (a shop, object) or a verb (to shop, action).




    REST-API >

    Passive Voice

    The use of the passive voice helps you to draw attention away from the agent of the action. Stylistically, however, it is often frowned upon, because it reduces readability. This classifier identifies the verbs involved in the passive voice of a sentence.

    REST-API >

    Syllable Counts / Hyphenation

    Readability metrics often rely on syllable counts. Hyphenation and syllabification go hand in hand. This classifier outputs hyphenation patterns and syllable counts. It is fairly robust to noisy language (see example *awsome).


    REST-API >

    Identification

    Determine what language or genre your documents are in.

    Language identification

    Language identification detects the language a text is written in. Different languages use different characters. For example, Russian (Кирилица), Chinese (汉字) and Arabic (العربية) are easy to distinguish. Languages that use the same characters (e.g., Latin alphabet, abc) often have cues that set them apart (e.g., é ↔ ë).

    REST-API >

    Genre classification

    Genre classification predicts the type of text, based on its length, tone of voice and content.





    REST-API >

    READY TO GET STARTED?
    Create a free account or contact sales