Contact us
Contact us

Datasets

Browse through our publications on language processing technology and social media trend detection.

Back to resources
custom_hero_background

Discover our datasets

We offer a small range of data management and analysis software applications, including digitising non-digital archives. We also offer tailored and standard training programs to expand your knowledge on what happens in the online world.

Geenstijl.nl embeddings

We collected over 8M messages from the controversial Dutch websites GeenStijl and Dumpert to train a word embedding model that captures the toxic language representations contained in the dataset. The trained word embeddings (±150MB) are released for free and may be useful for further study on toxic online discourse.

Download

4chan & 8chan Word Embeddings

Freely downloadable word embeddings, trained on 4chan and 8chan data.

Download

Dutch Word Embeddings

Freely downloadable Dutch word embeddings, trained on massive amounts of data.

Download

Want to discuss one of our products in more detail?

A short gettogether is always more enlightening.

Would our products be right for you? How could they help you? Just talk to one of our experts.

Get in touch with us!