
(Shutterstock AI Image)
Unstructured data makes up over 90% of the enterprise data estate, yet most of it goes untapped. It sits in PDFs, contracts, emails, and meeting transcripts, locked away in formats that traditional data tools can’t easily process or govern. For years, enterprises have focused on managing the clean, tabular world of structured data, while leaving the messy and unlabeled stuff in the dark.
Collibra says it plans to change that with its acquisition of Deasy Labs, a startup focused on automating the classification and enrichment of unstructured content. According to Collibra, the deal will allow it to extend its governance platform beyond structured data sources, enabling organizations to bring documents, transcripts, and emails into the same oversight framework used for databases and spreadsheets.
The acquisition comes as more companies move beyond AI experiments and start embedding large language models (LLMs) into daily workflows. These systems are only as good as the data behind them, and that’s where many organizations are hitting a wall. Structured records can show what happened, but they rarely explain why. The context is often buried in internal documents that traditional data platforms haven’t been built to handle.
That’s the gap Collibra says it hopes to close. “As organizations scale their use of AI, the ability to unlock the value of unstructured data becomes critical,” said Felix Van de Maele, the company’s co-founder and CEO. “Deasy Labs gives us the ability to tag, filter, and enrich this dark data at scale—automatically turning unstructured files into structured, meaningful, and trusted data assets ready for AI. This is a leap forward for the industry, and for Collibra’s vision of unified data and AI governance.”
That mission now picks up with Deasy Labs, a young company built specifically to tackle this problem. The startup was founded in 2023 by engineers and product leads who had worked on data quality and AI systems at McKinsey, QuantumBlack, and Amazon. Backed by Y Combinator and a $3 million seed round from General Catalyst and RTP Global, the team focused on one goal: helping enterprises unlock value from unstructured content without relying on costly, manual processes.
Their platform uses a mix of machine learning and LLMs to scan documents, transcripts, and reports, and automatically generate metadata—everything from document versions and access flags to summaries and topic tags. It’s designed to fit into modern AI pipelines, including retrieval-augmented generation (RAG) systems, giving companies a way to make unstructured data more searchable, safer, and usable without rebuilding their stack.
“We started Deasy to help organizations make sense of the massive volume of unstructured content they deal with every day,” said Reece Griffiths, co-founder of the company. “Now, by joining Collibra, we get to scale that work faster—and bring it into a platform that’s already trusted by some of the most advanced data teams in the world.”
For Collibra users, the immediate benefit is clarity. Teams that once had to rely on external tools or tedious manual processes to manage documents can now surface structure and meaning directly within the Collibra platform. That means faster onboarding of new data, better visibility into what’s stored where, and fewer blind spots when building AI workflows.
Collibra plans to bring Deasy’s technology into its platform gradually, starting with automated tagging and classification features for large volumes of documents. Instead of requiring teams to label files by hand or rely on external tools, users will be able to surface meaning and context directly within Collibra. That metadata can then be used to apply rules, track usage, or feed search and discovery tools, just like they already do with structured data.
In practical terms, this gives Collibra a stronger foothold in how AI projects are managed from the ground up. Rather than treating governance as something that happens after the fact, the company is positioning itself as part of the data prep process, making sure that what flows into LLMs is well-organized and reliable. It’s a shift from being just a system of record to becoming an active part of how AI decisions are made.
That broader vision is getting validation from industry analysts. “Unifying governance across all structured and unstructured data into trusted, governed data assets is no longer optional,” said Sanjeev Mohan, Principal at SanjMo and former Gartner Analyst.
“Metadata-driven automation is key to unlocking the hidden value in documents, emails, and transcripts as it brings much-needed visibility and control to the least governed parts of the data estate. By bringing unstructured data into the fold of unified governance, Collibra is taking a critical step toward operationalizing AI at scale with confidence.”
Looking ahead, Collibra says it will focus on adding more automation to help customers manage both data and AI more easily. Industry experts see potential for even more. Mohan noted that Deasy’s technology could help build AI tools tailored to specific industries, whether it’s analyzing banking records or pulling insights from call center transcripts.
Related Items
Peering Into the Unstructured Data Abyss
Tapping into the Unstructured Data Goldmine for Enterprise in 2025
Anomalo Expands Data Quality Platform for Enhanced Unstructured Data Monitoring