Which Problems With Schemas Did We Have Using Apicurio Registry?

02. 01. 2023

Overview

The fourth Schema Registry blog is out! Read more about performance and functional testing!

Schema Registration

There are multiple ways to register a schema to the Apicurio schema registry, by using its UI, by sending POST requests via some tool, by using existing plugins, or even by configuring project to auto-register used schemas to registry. Registration over UI and by sending POST request via some tool wasn’t suitable for our needs because it did not fit in our architecture, and also there was a problem that schemas could not have any spacings in them for methods from used Serde library to recognize them. Auto-registering schemas wasn’t a long-term option for us either because it didn’t provide a uniform way for schema registration from multiple projects. It did provide short-term solution for us though, until we haven’t figured out how to register schema properly in other way.

Plugin fitted our needs the best, we could register many schemas in short period of time. And there we had a choice between Maven and Gradle plugin, we opted for Maven plugin because there was the official Apicurio’s implementation of it. We had to create the separate Maven project that was only used for schema registration by its execution. Using plugin wasn’t without any problems. When it comes to implementation of some Avro schema as a generated Java object, we had to define how the String type would be represented. It could either be CharSequence or String object. That can be defined in configuration in pom.xml file (CharSequence). We used CharSequence because Avro Maven plugin adds Java-specific properties such as “avro.java.string”:”String” to generated code, which may prevent schema evolution. This is a problem when using Apicurio, when using Confluent Serdes it can be overridden by setting avro.remove.java.properties=true in the Avro serializer configuration.

Also, in configuration it was necessary to import Avro schemas that are referenced in other files first, so that they are recognized in those other schemas. After that, artifacts with references could be registered. There were few examples on the Web of how to exactly write references in schemas to register them correctly. We have used syntax like this:

<reference>
    <name>test.ErrorV1</name>
    <groupId>default</groupId>
    <artifactId>test.ErrorV1</artifactId>
    <version>${apicurio.artifacts.version}</version>
    <type>AVRO</type>
    <file>
        ${project.basedir}/src/main/avro/common/v1/ErrorV1.avsc
    </file>
    <ifExists>RETURN_OR_UPDATE</ifExists>
</reference>

In this example we have used full schema name for both name and artifactId tags, but for name tag it’s not necessary. GroupId is optional so we left its value as default. Unlike when using Confluent, schema versions can be decimal number. Also, we had to define reference type, file in which reference schema is written, and what should be done with that reference if it already exists.

Using Apicurio Registry in combination with Apicurio’s Maven plugin and Confluent Serdes library requires that schemas are formatted and registered without any whitespaces or new lines in them for them to be successfully validated. When using Confluent’s Maven plugin instead of Apicurio’s in this combination schemas can be in normal format.

We used suffixes V1, etc. on schema names to make schema versioning more understandable, because that way we can easily identify schema’s major version and if it has been any breaking changes since the last version, making same schemas with different versions incompatible. For example, when we add a new optional field in the schema, then that new schema is still compatible with the old one, so there is no need for new suffix in the name. On the other hand, if we add a new required field, then schemas become incompatible and therefore we have to indicate that in the schemas name by changing its suffix.

Schema Resolving – References

There were some problems we faced with resolving schemas that contained references while using Apicurio schema registry. Problems showed up once we decided to replace Apicurio SerDe library with Confluent’s counterpart. Before that everything worked just fine. How schema resolving works in Apicurio SerDe is shown further in the blog. Apicurio’s Confluent compatible API that should provide compatibility with Confluent Serde does not provide support for schemas with references, so the only way to work around this, is not using references in schemas at all, and that’s what we did.

Processor Performance Testing

We tried to performance test our processors using performance testing tool Apache JMeter and we didn’t manage to make performance test work using JMeter and Kafka load generator plugin for it called KLoadGen. There are also other JMeter plugins like Pepper-Box and Kafkameter, but they do not support Avro format unlike KLoadGen. At the time of trying to perform performance testing we were using Apicurio registry and Apicurio’s Serdes and the problem was that KLoadGen did not have support for Apicurio’s Serdes, so we couldn’t send messages to desired topics. Our alternative was to write unit tests which send the large number of events to input topics to simulate its behavior under heavy load.

Conclusion

We stumbled upon some problems using Apicurio schema registry, and that’s no surprise considering that it is still relatively immature technology, but we managed to overcome all problems we have encountered.

We mentioned performance and functional testing in this blog, and in the next one we will talk more about unit testing.

Schema registry blog series (4 of 6):

Part 1

Part 2

Part 3

Part 5