Plenum.be collects 183 million words from debates in the Belgian Chamber of Representatives. OCR is used to scan the pages. We updated the process and the website with a wonderful research tool courtesy of UAntwerp digital humanities.
The website and the process needed an update: make it more user-friendly on the outside, create a new database and taxonomy on the inside.
Our solutions involved many aspects of Plenum.be. We implemented the MongoDB database system, which is ideal for text files, remains perfectly scalable and works through an API.
We introduced a new and improved taxonomy using Ocelot NLP techniques, along with new and improved OCR techniques, like automatic lay-out analysis.
We also added an API-powered search function, and added speaker and intervention type metadata to research how this would improve accessibility.
For the UX design we led an ideation session at the start of the project, which will lead to wireframes later on.
All deployed on a local server.
This project is in progress.