Dataset described in: Quentin Bammey, "Synthbuster: Towards Detection of Diffusion Model Generated Images," in IEEE Open Journal of Signal Processing, vol. 5, pp. 1-9, 2024, doi: https://doi.org/10.1109/OJSP.2023.3337714.
This dataset contains synthetic, AI-generated images from 9 different models:
1000 images were generated per model. The images are loosely based on raise-1k images (Dang-Nguyen, Duc-Tien, et al. "Raise: A raw images dataset for digital image forensics." Proceedings of the 6th ACM multimedia systems conference. 2015.). For each image of the raise-1k dataset, a description was generated using the Midjourney /describe function and CLIP interrogator (https://github.com/pharmapsychotic/clip-interrogator/). Each of these prompts was manually edited to produce results as photorealistic as possible and remove living persons and artists names.
In addition to this, parameters were randomly selected within reasonable values for methods requiring so.
The prompts and parameters used for each method can be found in the `prompts.csv` file.
This dataset can be used to evaluate AI-generated image detection methods. We recommend matching the generated images with the real Raise-1k images, to evaluate whether the methods can distinguish the two of them. Raise-1k images are not included in the dataset, they can be downloaded separately at (http://loki.disi.unitn.it/RAISE/download.html).
None of the images suffered degradations such as JPEG compression or resampling, which leaves room to add your own degradations to test robustness to various transformation in a controlled manner.
Visit Zenodo appearance: https://zenodo.org/records/10066460
Download dataset directly: https://zenodo.org/records/10066460/files/synthbuster.zip?download=1
Author: Quentin Bammey (ENS)
Editor: Anna Schild (DW)