June 10, 2025 — Austrian synthetic data firm MOSTLY AI has launched a $100,000 prize challenge to raise awareness of how synthetic data can be used to create open-access datasets for businesses, AI developers and other organizations. The company said this is the largest prize for a hackathon-style synthetic data development challenge.
Entrants will be tasked with creating an accurate and privacy-safe synthetic dataset from a real dataset. Submitted datasets will be scored and ranked based on their accuracy to the real dataset, the extent of the anonymization, their solution’s compute efficiency, its ease of use, and its generalizability for other datasets.
The MOSTLY AI Prize aims to position synthetic data as a responsible and accessible resource for open-source AI innovation. It comes at a time when AI developers and enterprises alike are increasingly seeking high-quality data to train, refine, and run AI models.
AI is incredibly data-hungry, but privacy regulations put strict barriers on data sharing. This has fueled demand for open-access synthetic data solutions that can mimic real data while anonymizing it, allowing for frictionless movement of data for AI training and development.
The MOSTLY AI Prize aims to put synthetic data generation at the heart of the AI conversation – increasing awareness of its use cases and supporting open data sharing as a safe, scalable solution for innovation and research. Submissions for the prize close on Thursday 3 July, with winners announced on 9 July.
The Prize consists of two independent synthetic data challenges: the Flat Data Challenge and the Sequential Data Challenge.
Flat Data is data stored in a simple table format, where each column represents a different type of information. For example, a list of patients with each column keeping track of their age, gender, and height. Sequential data involves data that occurs in a specific order, capturing how things change over time, such as the daily value of a stock over the past year or longitudinal health records.
The launch of The MOSTLY AI Prize follows hot on the heels of the firm’s release of the world’s first industry-grade open-source toolkit for the creation of privacy-safe synthetic data. Entrants are welcome to use the MOSTLY AI toolkit to complete the challenge, but this is not a requirement.
MOSTLY AI raised $25 million in a Series B funding round in 2022, and $31 million since its launch. The firm’s global clients include Citi Bank, the U.S. Department of Homeland Security, Erste Group, Telefonica, and two of the five largest U.S. banks.
Alexandra Ebert, Chief AI and Data Democratization Officer at MOSTLY AI, said: “The demand for privacy-safe data has never been higher. AI developers, businesses, and organizations of all sizes are all clambering for access to data as they look to train, tune, and deploy AI – and they’re turning to synthetic data as the solution.
“The MOSTLY AI Prize is all about putting synthetic data front and center. We want people to understand how it can be used and position open data sharing as a safe and accessible option for researchers, innovators, and developers.
We’re still at the starting line of what AI can achieve. Open data access is key to unlocking its full potential – but achieving that will require the wider adoption of synthetic data tools.
“The MOSTLY AI Prize is a call-to-action for anyone with an interest in data and AI. We want as many people as possible to get involved, to not only foster innovation but to increase the accessibility of synthetic data generation.”