Browse through our publications on language processing technology and social media trend detection.
We offer a small range of data management and analysis software applications, including digitising non-digital archives. We also offer tailored and standard training programs to expand your knowledge on what happens in the online world.
We collected over 8M messages from the controversial Dutch websites GeenStijl and Dumpert to train a word embedding model that captures the toxic language representations contained in the dataset. The trained word embeddings (±150MB) are released for free and may be useful for further study on toxic online discourse.
Freely downloadable word embeddings, trained on 4chan and 8chan data.
Freely downloadable Dutch word embeddings, trained on massive amounts of data.