Publications

An overview of our publications on natural language processing and analytics.

Scientific publications Technical reports Media coverage

custom_hero_background

Discover our scientific publications

Firearms violence incident monitor – Methodological report

This methodological report provides a description of the development of the online Gun Violence Incidents Monitor (gunviolence.eu). The monitor aims to generate an almost real-time and automated identification of firearms incidents in all the EU Member States that is easily accessible to all stakeholders. Publicly available media articles are used to identify these incidents. The monitor implements and deploys different methods of AI, such as machine learning (ML) and large language models (LLMs), to automate the process of identifying, assessing, clustering and analysing media articles on firearms violence and firearms seizures in the EU.

2023

Automatic detection of cyberbullying in social media text

While social media offer great communication opportunities, they also increase the vulnerability of young people to threatening situations online. Recent studies report that cyberbullying constitutes a growing problem among youngsters. Successful prevention depends on the adequate detection of potentially harmful messages and the information overload on the Web requires intelligent systems to identify potential risks automatically. The focus of this paper is on automatic cyberbullying detection in social media text by modelling posts written by bullies, victims, and bystanders of online bullying.

2018

Tracing Online Misogyny: Ananalysis of misogynist ideologies and practices from a German-international perspective

Digital hate directed against women is a pressing issue that has been a public concern for years. Women are systematically attacked and denigrated on the internet. According to a recent representative study by the competence network against online hate, women are one of the groups that are most frequently affected. At the same time few mechanisms have been developed to protect them.

2023

The Digital Misogynoir Report: Ending the dehumanising of Black women on social media

The Digital Misogynoir Report presents an analysis from Glitch & Textgain, revealing the alarming prevalence of digital misogynoir on social media platforms. This report discusses current efforts by tech companies and the UK government to create safer online spaces, as it exposes the distressing reality of dehumanising abuse targeted at Black women. Through rigorous statistical analysis of social media posts, we urgently call for action to combat the violent dehumanisation of Black women online.

2023

Automatic Detection of Online Jihadist Hate Speech

We developed a system that automatically detects online jihadist propaganda by using techniques from Natural Language Processing and Machine Learning. The system is trained on a corpus of 45,000 subversive Twitter messages indexed from October 2014 to December 2016. We present a qualitative and quantitative analysis of the jihadist rhetoric in the corpus, examine the network of users, outline the technical procedure used to train the system, and discuss examples of use.

2018

QAnon: Spreading Conspiracy Theories on Twitter

Between 1st October to 5th November 2020, we analysed social media messages related to QAnon conspiracy theories, using Natural Language Processing (NLP) technology. This report outlines the results of the findings of our quantitative analysis, as well as the qualitative analysis by the partners of the Get The Trolls Out! project.

2020

Multilingual Cross-domain Perspectives on Online Hate Speech

In this report, we present a study of eight corpora, by demonstrating the NLP techniques that we used to collect and analyze jihadist, extremist, racist, and sexist content. Analysis of the multilingual corpora shows that the different contexts share certain characteristics in their rhetoric. To expose the main features, we have focused on text classification, text profiling, keyword and collocation extraction, along with manual annotation and qualitative study.

2018

Text-Based Age and Gender Prediction for Online Safety Monitoring

This paper explores the capabilities of text-based age and gender prediction geared towards the application of detecting harmful content and conduct on social media. More specifically, we focus on the use case of detecting sexual predators who try to “groom” children online and possibly provide false age and gender information in their user profiles. We perform age and gender classification experiments on a dataset of Dutch chat posts, and evaluate and compare binary age classifiers trained to separate younger and older authors according to different age boundaries. We show that use-case applicable performance levels can be achieved for the classification of minors versus adults, thereby providing a useful component in a cybersecurity monitoring tool for social network moderators.

2016

Using a Personality-Profiling Algorithm to Investigate Political Microtargeting

Political advertisers have access to increasingly sophisticated microtargeting techniques. One such technique is tailoring ads to the personality traits of citizens. Questions have been raised about the effectiveness of this political microtargeting (PMT) technique. In two experiments, we investigate the causal effects of personality-congruent political ads. The results show evidence that citizens are more strongly persuaded by political ads that match their own personality traits. These findings feed into relevant and timely contributions to a salient academic and societal debate.

2020

Online hatred of women in the Incels.me forum – Linguistic analysis and automatic detection

This paper presents a study of a (now suspended) incel discussion forum and its users, involuntary celibates or incels, a virtual community of isolated men without a sexual life, who see women as the cause of their problems and often use the forum for misogynistic hate speech and other forms of incitement. The aim of this study is to shed light on the group dynamics of the incel community, by applying mixed-methods quantitative and qualitative approaches to analyze how the users of the forum create in-group identity and how they construct major out-groups, particularly women.

2019

Analysis of Memetic Warfare

If you ever wondered whether we are secretly ruled by alien reptilian overlords, when the Dark Enlightenment’s acceleration will begin, what the tailless amphibian wildlife in Kekistan looks like, or how to spot an Antifa tank – seek no further! Descend into the bizarre subcultures on 4chan, Telegram, Gab, and mainstream social media platforms we love so much.

Back to resources

Discover our technical reports

4chan & 8chan embeddings (TGTR-1)

We indexed over 30 million anonymous messages from the publicly available /pol/ message boards on 4chan and 8chan and compiled them into a language model. The trained word embeddings (±0.4GB) are available for free and may be useful for further study on online harmful content.

2020

Melancholy, Anxiety & Loneliness (TGTR-2)

We created a new NLP resource for assessing online depression. It is available for Dutch and captures expressions of anger, fear and sadness, along with various fine-grained mental states like despair, disappointment, hope, guilt, loneliness, melancholy, stress, relief and worry.

2020

Profanity & Offensive Words (TGTR-3)

We created an explainable NLP resource for online harmful language. It is currently available for English, German, French and Dutch, capturing verbal expressions of violent, dehumanising, discriminatory and toxic language, field-tested in real-life settings.

2020

GeenStijl.nl embeddings (TGTR-4)

We indexed over 8M public messages from GeenStijl to train a word embedding model that captures the language representations in the dataset. The trained word embeddings (±150MB) are available for free and may be useful for further study on online harmful content.

2021

Onze echokamers (TGTR-5)

We analysed how echo chambers can be mapped using public data from public accounts of news sites, influencers, and politicians. In this report, we describe the current state of affairs in the Dutch-speaking region.

2021

Online misogyny (TGTR-6)

We indexed more than 100.000 public anonymous posts from online fringe media platforms such as 4chan containing keywords referring to women, and mapped misogynist narratives using NLP techniques.

2021

Online antisemitism (TGTR-7)

We created a fine-grained AI system for the detection of antisemitism. This Explainable AI will identify English and German antisemitic expressions of violence, dehumanisation and conspiracies in short text messages.

2021

Back to resources

Discover our media coverage

Nuanced Text Analysis at Scale: Toxicity Detection and Digital Salafism

Textgain is featured in ISD’s Digital Dispatch, where ISD researchers outline the rationale and advantages of applying toxicity analysis approaches to the study of Salafi online content, and describe how it was applied during ISD’s recent research project mapping the evolving online Salafi ecosystem.

2022

Textgain featured in “Text Analytics APIs 2018”

Textgain is featured in Text Analytics APIs 2018: A Consumer Guide, a comprehensive report on the state-of-the-art in Text Analytics APis.

2018

Textgain featured in “Benchmark studie over artificiële intelligentie”

PWC published a study of AI vendors in Flanders. Textgain is featured in this overview as a spin-off of the University of Antwerp.

2018

Back to resources

.css-l0mio9{display:none;visibility:hidden;}