Datasets 

Here, you can find a list of datasets (including respective links and references) coming out of the project work.

You may also want to check out our presence on Zenodo where we also list datasets or the Data Management Plan.

“IDMT Audio Provenance Analysis Dataset” Milica Gerhardt, Luca Cuccovillo & Patrick Aichroth
April 11, 2024

This dataset contains two distinct collections tailored for evaluating audio provenance analysis solutions within specified scenarios: Singular Composition and Multi-Source Composition. For a comprehensive understanding of these scenarios and the process behind generating the test files, please consult the referenced publication.

This dataset is accompanying the respective publication. In case you use it please cite: M. Gerhardt, L. Cuccovillo and P. Aichroth, "Audio Provenance Analysis in Heterogeneous Media Sets," 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 2024, pp. 4387-4396, doi: 10.1109/CVPRW63382.2024.00442.

Show dataset

"M3DSYNTH: A DATASET OF MEDICAL 3D IMAGES WITH AI-GENERATED LOCAL MANIPULATIONS" Giada Zingarini, Davide Cozzolino, Riccrdo Corvi, Giovanni Poggi & Luisda Verdoliva
April 01, 2024

M3Dsynth, a large dataset of manipulated Computed Tomography (CT) lung images. We create manipulated images by injecting or removing lung cancer nodules in real CT scans, using three different methods based on Generative Adversarial Networks (GAN) or Diffusion Models (DM), for a total of 8,577 manipulated samples. Experiments show that these images easily fool automated diagnostic tools. We also tested several state-of-the-art forensic detectors and demonstrated that, once trained on the proposed dataset, they are able to accurately detect and localize manipulated synthetic content, even when training and test sets are not aligned, showing good generalization ability.

Show dataset

"EUvsDisinfo: a Dataset for Multilingual Detection of Pro-Kremlin Disinformation in News Articles (Dataset)" João Leite, Olesya Razuvayevskaya, Kalina Bontcheva & Carolina Scarton
January 15, 2024

This is the dataset and metadata accompanying the paper submission titled "EUvsDisinfo: a Dataset for Multilingual Detection of Pro-Kremlin Disinformation in News Articles".

Show dataset

"VERITE: A Robust Benchmark for Multimodal Misinformation Detection Accounting for Unimodal Bias" Stefanos-Iordanis Papadopoulos, Christos Koutlis, Symeon Papadopoulos & Panagiotis Petrantonakis
January 08, 2024

VERITE (VERification of Image-TExt pairs) is an annotated evaluation benchmark for multimodal (image-caption) misinformation detection that accounts for unimodal biases.

Show dataset

"Synthbuster: Towards Detection of Diffusion Model Generated Images" Quentin Bammey
November 02, 2023

Dataset of 9.000 AI-generated images, described in the paper “Synthbuster: Towards Detection of Diffusion Model Generated Images” (Quentin Bammey, 2023, Open Journal of Signal Processing)

Show dataset

Show related article

Related Articles

vera.ai is co-funded by the European Commission under grant agreement ID 101070093, and the UK and Swiss authorities. This website reflects the views of the vera.ai consortium and respective contributors. The EU cannot be held responsible for any use which may be made of the information contained herein.