A project by vera.ai, entitled “The Persistence and Resilience of Misinformation in the Face of Fact Checking”, was facilitated by Fabio Giglietto, Massimo Terenzi (University of Urbino Carlo Bo) and Richard Rogers (University of Amsterdam) at the Digital Media Initiative (DMI) Winter School at the University of Amsterdam. We are pleased to share with you the results, that enrich the scientific work of vera.ai's work package 4.4 which focuses on the impact of disinformation and is led by UvA. These outcomes were first presented to a team of Masters, PhD students and postdoc researchers in a poster session on Friday, January 12th, 2024.
The study is based on a dataset of 38,099 Facebook stories shared between 2017 and 2022, identified as problematic by Meta's third-party fact-checkers. A first interactive map shows news items grouped by linguistic cluster. To identify the topics, OpenAI's GPT models and Mistral AI's embedding tools were used in a three-step process: The stories were translated into English using GPT-3.5-turbo and embeddings were generated for all the story titles and descriptions using Mistral's APIs. The embeddings were then clustered into 87 distinct topics using the k-means algorithm, grouping stories by thematic similarity. The clustering was performed using both OpenAI and MistralAI APIs, showing clear consistency in the results of the two models, as displayed in this graph. Finally, GPT-4 was used to label the clusters. The resulting map allowed for the identification of clusters for various topics.
The quality of these labels was assessed on a stratified sample of 400 stories, divided among three separate groups of participants, who individually evaluated the stories on a five-level Likert scale. On average, the labels were rated as “Very Accurate” or “Accurate” in 75% of the cases.
Three different thematic focuses were selected, picking the topics that emerged in the cluster map.
One of these research spotlights (fig. 1) concerned consumer fraud scams on Facebook. It revealed a surprising homogeneity in the language of the scams, highlighting how these fraudulent schemes are particularly concentrated in the retail sector and tend to occur in specific temporal spikes. It was interesting to note that – despite fact-checking measures – these scams continue to proliferate on the platform.
A second focus (fig. 2) was on misinformation regarding Emmanuel Macron and the French government. It was found that the false news concentrated on recurring themes such as corruption and political plans. The investigation allowed for a detailed picture of the various waves of misinformation and how they respond to or influence certain political and social events.
The third area of interest (fig. 3) focused on health-related stories, where the examination of misinformation titles led to their categorization into esoteric, sensationalist, and authoritative styles, with the study particularly highlighting a predominance of sensationalist titles.
Author: Massimo Terenzi (UNIURB)
Editor: Anna Schild (DW)