● Pattern: Python toolkit for data mining, Natural Language Processing (NLP), Machine Learning (ML) and network analytics. Now curated by the University of Antwerp and Google Summer of Code. https://github.com/clips/pattern
● Grasp: faster, smaller and easier than Pattern, with new XAI tools. We’re still writing the docs, but feel free to check out the code and learn from these powerful and demystified algorithms. https://github.com/textgain/grasp
Some of our core datasets are freely available, commercially or for research:
- 8chan embeddings: a unique resource by Google Summer of Code student Pierre Voué, trained on 30M+ toxic messages from 8chan/pol/, for studying online polarization and radicalization. https://textgain.com/8chan
We can build it for you but we’d rather explain it to you. All of our team members are part-time lecturers with 15+ years of experience. Some of our free study reports:
- On sexism: Online hatred of women in the Incels.me forum: Linguistic analysis and automatic detection. Sylvia Jaki, Tom De Smedt, Maja Gwóźdź, Rudresh Panchal, Alexander Rossa & Guy De Pauw (2019). JLAC.
- On extremism: Multilingual Cross-domain perspectives on online hate speech. Tom De Smedt, Sylvia Jaki, Eduan Kotzé, Leïla Saoud, Maja Gwóźdź, Guy De Pauw & Walter Daelemans (2018). CTRS.
- On jihadism: Automatic detection of online jihadist hate speech. Tom De Smedt, Guy De Pauw & Pieter Van Ostaeyen (2018). CTRS.
- On NLP & ML: Pattern for Python. Tom De Smedt & Walter Daelemans (2012). JMLR.
Here are some of the AI For Good projects that we are working on. All project teams are gender-neutral and represent different ethnicities and ideologies. Reach out if you want to join our community:
- Project Grey (2018-2020) is co-funded by the Internal Security Fund (ISF) of the European Commission, to raise awareness about online polarization.
- RHETORiC (2019-2021) is co-funded by IMEC.ICON and investigates tools for news editors and consumers to detect and counter polarization on social media and support civil discourse
- DeTact (2019-2021) is co-funded by the Rights, Equality and Citizenship fund (REC) of the European Commission, to investigate tech for conflict resolution.
- Commit (2019-2021) is co-funded by the Internal Security Fund (ISF) of the European Commission, to raise awareness about online polarization.
- IMSyPP (2020-2022) is co-funded by the Rights, Equality and Citizenship fund (REC) of the European Commission, to investigate tech for conflict resolution.
- Factcheck Vlaanderen (2019) was funded by the Flemish Journalism Fund (VJF) to establish the first press fact-checking platform in Belgium.
- Africa’s Voices (2018) was privately funded to develop cutting-edge Swahili language technology for social media monitoring and listen to Africa’s voices.
Driven by empathy and a vision of a better world, our most prized resource is time. Here’s an overview of projects that we voluntarily engage in outside of Textgain:
- Guy De Pauw (CEO) also supervises the African Language Technology group (https://www.aflat.org) and chairs the Text Analytics for Cybersecurity and Online Safety conference (https://ta-cos.org).
- Tom De Smedt (CTO) also supervises PhDs in the Arts to develop creative chatbots for hospitalized children. PhD Ludivine Lechat was awarded the 2018 Smart Care prize (https://www.healthcare-executive.be/nl/nieuws/medisch-nieuws/belfius-smart-care-award-interactieve-sprookjes-voor-kinderen-in-een-ziekenhuis.html). Tom’s work on extremism was awarded the 2019 Research Prize of the Auschwitz Foundation.
- Gijs van Beek (CBDO) also advises the Leiden Islam Academy (http://www.leidenislamacademie.nl) and chairs the House of Deep Democracy (http://houseofdeepdemocracy.nl). He is active in various projects that bridge the gap between technology and society (e.g., https://marokko.nl), specializing in discrimination and extremism. His work has been awarded the Dutch National Health Innovation Award (https://zorgvoorjeouders.nl)