Two years of vera.ai: Ontotext's Nedelina Mitankina on entity linking and event extraction, and how this relates to the Database of Known Fakes

In mid-September 2024 the vera.ai project had its second birthday, running for two years. On this occasion, Ontotext's Nedelina Mitankina looks back at some of the project achievements, and also points to what lies ahead. This is done primarily from the Ontotext perspective and the focus of their work, namely the development and enhancement of the so-called Database of Known Fakes (DBKF) and what has been done - and is planned - in the field of entity linking and event extraction.

The contribution below is a slightly edited version of an article that first appeared on the Ontotext website.

The vera.ai project turned two: and it continues to improve its AI-powered counter-disinformation toolbox

With the vera.ai project turning two, we look back at the project’s accomplishments in terms of AI-focused research and technology. The consortium partners have made notable strides in the fields of synthetic image and audio detection algorithms, claim extraction, and coordinated sharing behavior monitoring.

From Ontotext’s perspective, our main contribution to the vera.ai AI-powered counter-disinformation toolbox is the Database of Known Fakes (DBKF), which has been highlighted as a helpful instrument in the CORDIS EU research portal. DBKF is a great example of how Ontotext GraphDB can bring value to business users and verification professionals alike. The database is an integrated solution that gathers debunking content in different formats, enriches it, and interlinks it in the knowledge graph with meaningful metadata. This in turn enables various advanced searches over the data.

One of the key DBKF functionalities that we have enhanced in Year 2 of the vera.ai project is multilingual concept identification. This is the algorithm recognizing that people (Taylor Swift in English / Тейлър Суифт in Bulgarian), locations (Netherlands in English / Pays-Bas in French), organizations (WHO in English / L’OMS in French) and other general terms (human rights in English / Menschenrechte in German) may refer to one and the same thing in different languages. Entity linking appears to be the best fit for this task, also because the semantic information and relationships in the knowledge graph help immensely at the entity disambiguation step.

Through extensive research and experiments, we have come up with state-of-the-art AI models for multilingual entity linking that enable working with concepts in 100+ languages and help connect and analyze linguistically diverse content. As a result, the DBKF concept search has been reimplemented to provide accuracy and true multilinguality for verification experts’ queries. For example, linking mentions of “human rights” and “droits de l’homme” to a common reference concept (Wikidata human rights, Q8458) enables the DBKF to “know” that they are equivalents and to return documents containing such mentions in different languages as part of the concept search results.

Another important task we have worked to automate with the help of AI methods is event extraction. Research in this direction has been conducted within the scope of another relevant EU-funded project, VIGILANT. The resulting state-of-the-art event extraction algorithm, adapted and extended to disinformation content, has been wrapped in a service and enables DBKF users to perform a faceted search into 36 event types (such as attacks, demonstrations, rule changes and more).

In the spirit of vera.ai’s co-creation and collaborative approach with end-user stakeholders, the Ontotext team has also invested efforts into improving various UI and UX aspects of the DBKF, based on the project evaluators’ feedback. For example, on a document level, we now provide excerpts from the text where specific concepts or events appear, which contributes to the better explainability of results generated by AI algorithms.

Stepping into year three of the project, our efforts focus on harnessing AI powers for the complex task of disinformation narratives detection. Within Year 2 we extended our semantic model to enable adding narratives to the knowledge graph and experimented with different algorithms for clustering of debunks in order to identify repeating narratives. Next, we will expose narrative clusters to DBKF end-users and enable researching and hierarchical representation of complex narrative themes. Stay tuned as we will share more on this in the coming months!

If you want to learn more about Ontotext’s AI capabilities and how these could serve your particular use case, check out our AI in Action series of blog posts.

To keep up to date with vera.ai partners’ contributions to AI-assisted verification, follow the consortium’s Twitter and YouTube channels.

Note: This article first appeared on the Ontotext website on 11 October 2024. This is a slightly edited version.

Author: Nedelina Mitankina (Ontotext)

Editor: Jochen Spangenberg (DW)

vera.ai is co-funded by the European Commission under grant agreement ID 101070093, and the UK and Swiss authorities. This website reflects the views of the vera.ai consortium and respective contributors. The EU cannot be held responsible for any use which may be made of the information contained herein.