16 Min reading time

Can an LLM write a data contract and validate itself?

24. 06. 2026
Overview

Can an LLM write a data contract and validate itself? In this blog, we explore how AI can help automate data contract creation by combining careful prompting, ODCS documentation, and deterministic syntax validation through VS Code diagnostics. The result is a semi-automated workflow that speeds up repetitive contract writing while keeping humans responsible for business rules, SLAs, and data quality decisions.

We have been working with data contracts for over a year now to further push data governance and to take the next step in the Data as a Product paradigm. In our efforts to implement this technology we successfully created a system that takes a data contract and uses it to validate a data schema.

One repetitive thing we had to do was write new data contracts for new data sources. It is not a hard work but gets tiring after you do it several times, especially if you have a lot of tables and columns because the contract gets really long. Naturally, we immediately thought to use an LLM to resolve us of our anguish.

Working out the concept

The first tries worked relatively well. We gave an LLM a task to use the example of the data contracts we have already written to create data contracts for new schemas. These generated data contracts had their flaws and required to be examined in detail, especially to verify:

  • if information about connection, data types, and data quality metrics were correct, and
  • if a generated contract satisfies the Open Data Contract Stardard (ODCS) which we used.

All the mistakes were corrected manually, which once again got tiring and led us to ask ourselves: “Can an LLM write a data contract and validate itself after?“
Because the semantics of a data contract, like data quality or SLA, come from business needs defined by people who understand them, we decided to keep our focus on the ODCS syntax.

To get an answer to that question we listed the possible paths we could take:

  1. Write careful instructions on how to validate the contract.
  2. Give an example of a data contract by which the LLM would validate new contracts.
  3. Give the LLM the documentation on ODCS and tell it to use that for self-validation.

The important thing was we did not want the validation process to be left to probability – it must be deterministic so every error in the syntax is corrected. Lucky for us, the ODCS JSON Schema files have been added to the JSON SchemaStore mid 2024. which means the contract’s syntax can be automatically validated in code editors like Intellij or VS Code.

So, the following idea came to mind:

Let’s give an LLM a task to use a data contract specification and ODCS documentation to generate a new data contract. The required information needed for the data contract which is not in the data contract specification, the LLM should ask for while in generation process. The LLM must validate the final generated data contract using the VSC diagnostics MCP tool with which it retrieves information from the editor about mistakes in the syntax. The correctness of information is checked by a human.

We went with it and it worked for our simple PoC example. How did we exactly do it?

Implementation

We used an LLM to generate a sample data contract specification from sample notes.

After that we put ODCS documentation and the data contract specification into our project root.

For editing purposes we use VSC as code editor and Claude Code in terminal. Claude was connected to our internal company LLM called Jarvis. We installed the diagnostics MCP from VSC Marketplace and added it to the Claude Code. After that we gave the Claude a prompt:

In the formal_contracts folder there is a data contract specification I have with a customer about the data I am exposing to them to query and use. I need to create a data contract for it so I can automatize the process of checking the data quality, payment, etc. for the contract I have. The ODCS_DOC folder contains a documentation on data contracts- what a data contract consists of and how to build it. Use the data contract documentation to create a new data contract document in the data-contracts folder called my_contract.odcs.yaml. Make sure to put all the required information into the data contract. If there is some information you don’t have but it is not required just do not put it into the data contract. If there is some information you don’t have but it is required ask me for it. If you are not sure how to populate some field ask me for additional information.

Claude started cooking and soon enough asked questions:

The questions always had offered answers which is a neat functionality because it enables you to quickly go trough them and makes the whole process more user friendly even for business users.

Once it was done, there were some errors in syntax, e.g.:

So we gave it this prompt:

Validate the contract using diagnostics. If there are any errors fix them. When fixing the errors use diagnostics output and data contract documentation in the ODCS_DOC folder to find out what is wrong and fix it. Do not use any other resources.

The Claude used the diagnostics MCP, found the errors and fixed them!

The final data contract looked like this:

Final notes

We repeated the whole process multiple times and got to following insights:

  1. The generation process is probabilistic so the prompt we gave it did not always have the same success.
    The best thing to mitigate that is to write more details into a prompt. Two problems are worth mentioning. The first one is the generation of quality rules. In most cases, the LLM would use the SQL type of quality rules without us giving it the instruction because it would probably conclude we need that when we are working with PostgreSQL database. In other cases, we would get quality types we don’t need, so it was useful to add this guideline to the prompt.
    The second problem was LLM using the other data contracts in the repository as examples, so it would add data from other contracts into this one. We resolved this problem by adding a guideline that it is forbidden to use resources other than those mentioned in the prompt.
  2. Generation and validation prompts can be glued together, but the process works better, and it is cleaner when they are separated.
  3. Combining LLM-s with a linter is a good step forward in automating the data contract writing process.
    Furthermore, if you add to it a good prompt and a well written documentation it is a certainty you would get syntactically correct data contract. You only need to check the values.

We believe this is just a first small step and a lot more can be done. Possible next steps:

  • Make software that will use the schema store outside an IDE to perform this automatic validation, so it is more accessible to business users.
  • Add automation process to unite this generation and validation process by performing the quality checks on data.
  • Use a data catalog like Actian or Collibra to define the data products which will then be read by an LLM to populate a data contract. This approach moves away from loose textual format but it is a possible source of information which would maybe be a first choice for someone who has worked with it.
DISCLAIMER: Human in the loop

LLM-s have the power to work out ideas in detail based on even scarce inputs. Although they can do that, it would be wrong to leave to them to define the business requirements like SLA or data quality. This needs to come from a person who understands the business processes and needs and can see the value behind enforcing the data contract.

Conclusion

Our example shows that LLMs can significantly streamline data contract creation—when paired with the right safeguards. By combining careful prompting, ODCS documentation, and automated syntax validation through VS Code diagnostics, we transformed a tedious manual task into a reliable semi-automated workflow. But here’s what we’re really curious about:

What has your experience been like? Have you tried using AI to automate any part of your data governance or data quality processes? What challenges have you faced?

We’re always looking to learn from others walking the same path. If you’re tackling similar challenges or just getting started with data contracts, we’d love to hear from you. Whether you want to share your experiences, exchange ideas, or explore how we might work together on your data quality and governance needs, feel free to reach out.

After all, the best solutions often come from collaboration.

Kontakt

Falls Sie Fragen haben, sind wir nur einen Klick entfernt.

Diese Seite ist durch reCAPTCHA geschützt. Es gelten die Datenschutzrichtlinie und die Nutzungsbedingungen von Google.

Kontaktieren Sie uns

Vereinbaren Sie einen Termin mit einem Experten