Page cover

How Will Generative AI Affect Open Data?

Conversational generative AI such as ChatGPT is changing the data use landscape. Although most conversational agents are not optimised for finding and retrieving datasets, they do hold some promise for supporting users to make sense of datasets.

Generative AI can also support automatic data-to-text approaches for data summarisation. Data-to-text generation can produce summarisation sentences, which could then be edited by the publisher.

Croissant is a metadata format to help standardize machine learning (ML) datasets. The aim of Croissant is to make ML datasets easily discoverable and usable across tools and platforms. Croissant is easy to adopt because it doesn’t require changing the data itself or how it is represented. Instead, it adds a layer of metadata that represents the contents of the dataset in a standardized way, describing key attributes and properties.

We discuss the particular issues of opening data that is intended for machine learning uses in Data Quality.

Find out more about how to make your machine learning datasets discoverable and usable with Croissant

Last updated