Protobuf - Integration with Kafka



We have covered quite a lot of examples of Protobuf and its data types. In this chapter, let us take another example and see how Protobuf integrates with a Schema Registry used by Kafka. Let us first understand what a "schema registry" is.

Schema Registry

Kafka is one of the widely used messaging queues. It is used to apply the publisher-subscriber model at scale. More information about Kafka can be found here − https://www.tutorialspoint.com/apache_kafka/index.htm

However, at the basic level, a Kafka producer is supposed to send a message, i.e., a piece of information which the Kafka consumer can read. And this sending and consuming of message is where we need a schema. It is especially required in large-scale organization where there are multiple teams reading/writing to Kafka topic. Kafka provides a way to store this schema in a schema registry which are then created/consumed when the producer/consumer creates/consumes the message.

There are two major benefits of maintaining a schema −

  • Compatibility − In larger organizations, it is necessary that the team producing the message does not break the downstream tools which consume these messages. Schema registry ensures that changes are backwards compatible.

  • Efficient encoding − Sending in a field name, its type with every message is space and compute inefficient. With schemas in place, we do not need to send this information with each message.

The schema registry supports Avro, Google Protobuf and JSON Schema as the schema language. The schema in these languages can be stored in the schema registry. For this tutorial, we would require Kafka setup and Schema registry setup.

For installation of Kafka, you can check the following links −

Once you have Kafka installed, you can then setup the Schema Registry by updating the /etc/schema-registry/schema-registry.properties file.

# where should schema registry listen on
listeners=http://0.0.0.0:8081

# Schema registry uses Kafka beneath it, so we need to tell where are the Kafka brokers available
kafkastore.bootstrap.servers=PLAINTEXT://hostname:9092,SSL://hostname2:9092
Once done, you can then run:
sudo systemctl start confluent-schema-registry

With the setup out of the way, let us start using Google Protobuf along with the Schema Registry.

Kafka Producer with Protobuf Schema

Let us continue with our theater example. We will use the following Protobuf schema −

syntax = "proto3";
package theater;
option java_package = "com.tutorialspoint.theater";

message Theater {
   string name = 1;
   string address = 2;
  
   int32 total_capcity = 3;
   int64 mobile = 4;
   float base_ticket_price = 5;
  
   bool drive_in = 6;
  
   enum PAYMENT_SYSTEM{
      CASH = 0;
      CREDIT_CARD = 1;
      DEBIT_CARD = 2;
      APP = 3;
   }
   PAYMENT_SYSTEM payment = 7;
   repeated string snacks = 8;
   
   map<string, int32> movieTicketPrice = 9;
}

Now, let us create a simple Kafka writer which would write the message encoded in this format to the Kafka topic. But for doing that, first, we need to add a few dependencies to our Maven POM −

  • Kafka Client to use Kafka producer and consumer

  • Kafka Protobuf serializer to serialize and deserialize the message

  • Slf4j simple to ensure we get logs from Kafka

<dependency>
   <groupId>org.apache.kafka</groupId>
   <artifactId>kafka-clients</artifactId>
   <version>2.5.0</version>
</dependency>

<!-- https://mvnrepository.com/artifact/io.confluent/kafka-protobuf-serializer -->
<dependency>
   <groupId>io.confluent</groupId>
   <artifactId>kafka-protobuf-serializer</artifactId>
   <version>5.5.1</version>
</dependency>

<!-- https://mvnrepository.com/artifact/org.slf4j/slf4j-simple -->
<dependency>
   <groupId>org.slf4j</groupId>
   <artifactId>slf4j-simple</artifactId>
   <version>1.7.30</version>
</dependency>

Once this is done, let us now create a Kafka producer. This producer will create and send a message which will contain the theater object.

package com.tutorialspoint.kafka;

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Properties;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord;
import com.tutorialspoint.theater.TheaterOuterClass.Theater;
import com.tutorialspoint.theater.TheaterOuterClass.Theater.PAYMENT_SYSTEM;

public class KafkaProtbufProducer {
   public static void main(String[] args) throws Exception{
      String topicName = "testy1";
      Properties props = new Properties();
      props.put("bootstrap.servers", "localhost:9092");
      props.put("clientid", "foo");
      props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
      props.put("value.serializer", "io.confluent.kafka.serializers.protobuf.KafkaProtobufSerializer");
      props.put("schema.registry.url", "http://localhost:8081");
      props.put("auto.register.schemas", "true");
      
      Producer<String, Theater> producer = new KafkaProducer<>(props);
      producer.send(new ProducerRecord<String, Theater>(topicName, "SilverScreen", getTheater())).get();
      System.out.println("Sent to Kafka: \n" + getTheater());
      producer.flush();
      producer.close();
   }
   public static Theater getTheater() {
      List<String> snacks = new ArrayList<>();
      snacks.add("Popcorn");
      snacks.add("Coke");
      snacks.add("Chips");
      snacks.add("Soda");
           
      Map<String, Integer> ticketPrice = new HashMap<>();
      ticketPrice.put("Avengers Endgame", 700);
      ticketPrice.put("Captain America", 200);
      ticketPrice.put("Wonder Woman 1984", 400);
                  
      Theater theater = Theater.newBuilder()
         .setName("Silver Screener")
         .setAddress("212, Maple Street, LA, California")
         .setDriveIn(true)
         .setTotalCapacity(320)
         .setMobile(98234567189L)
         .setBaseTicketPrice(22.45f)
         .setPayment(PAYMENT_SYSTEM.CREDIT_CARD)
         .putAllMovieTicketPrice(ticketPrice)
         .addAllSnacks(snacks)
         .build();
      return theater;
   }
}

Here is a list of a few points that we need to be aware of −

  • We need to pass the Schema Registry URL to the Producer.

  • We also need to pass the correct Protobuf Serializer which is specific to the Schema Registry.

  • Schema registry would automatically store the schema of the theater object when we are done sending.

  • Lastly, we created a theater object from our auto-generated Java code and that is what we will be sending.

Let us now compile and execute the code −

mvn clean install ; java -cp .\target\protobuf-tutorial-1.0.jar com.tutorialspoint.kafka.KafkaProtbufProducer

We will get to see the following output −

[main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka version: 2.5.0
[main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka commitId: 66563e712b0b9f84
[main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka startTimeMs: 1621692205607
[kafka-producer-network-thread | producer-1] INFO org.apache.kafka.clients.Metadata - 
[Producer clientId=producer-1] Cluster ID: 7kwQVXjYSz--bE47MiXmjw

Sent to Kafka

name: "Silver Screener"
address: "212, Maple Street, LA, California"
total_capacity: 320
mobile: 98234567189
base_ticket_price: 22.45
drive_in: true
payment: CREDIT_CARD
snacks: "Popcorn"
snacks: "Coke"
snacks: "Chips"
snacks: "Soda"
movieTicketPrice {
   key: "Avengers Endgame"
   value: 700
}
movieTicketPrice {
   key: "Captain America"
   value: 200
}
movieTicketPrice {
   key: "Wonder Woman 1984"
   value: 400
}
[main] INFO org.apache.kafka.clients.producer.KafkaProducer - 
[Producer clientId=producer-1] Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms.

It means that our message has been sent.

Now, let us confirm that the schema has been stored in the Schema Registry.

curl  -X GET http://localhost:8081/subjects | jq

And the output which is displayed is "topicName" + "key/value"

[
   "testy1-value"
]

We can also see the schema which is stored by the registry −

curl  -X GET http://localhost:8081/schemas/ids/1 | jq {
   "schemaType": "PROTOBUF",
   "schema": "syntax = \"proto3\";\npackage theater;\n\noption java_package = \"com.tutorialspoint.theater\";\n\nmessage Theater {
      \n  string name = 1;\n  string address = 2;\n  int64 total_capacity = 3;\n  
      int64 mobile = 4;\n  float base_ticket_price = 5;\n  bool drive_in = 6;\n  
      .theater.Theater.PAYMENT_SYSTEM payment = 7;\n  repeated string snacks = 8;\n  
      repeated .theater.Theater.MovieTicketPriceEntry movieTicketPrice = 9;\n\n  
      message MovieTicketPriceEntry {\n    option map_entry = true;\n  \n    
      string key = 1;\n    int32 value = 2;\n  }\n  enum PAYMENT_SYSTEM {
         \n CASH = 0;\n    CREDIT_CARD = 1;\n    DEBIT_CARD = 2;\n    APP = 3;\n  
      }\n
   }\n"
}

Kafka Consumer with Protobuf Schema

Let us now create a Kafka consumer. This consumer will consume the message which contains the theater object.

package com.tutorialspoint.kafka;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Properties;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord;
import com.tutorialspoint.theater.TheaterOuterClass.Theater;
import com.tutorialspoint.theater.TheaterOuterClass.Theater.PAYMENT_SYSTEM;

public class KafkaProtbufProducer {
   public static void main(String[] args) throws Exception{
      String topicName = "testy1";
      Properties props = new Properties();
      props.put("bootstrap.servers", "localhost:9092");
      props.put("clientid", "foo");
      props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
         
      props.put("value.serializer", "io.confluent.kafka.serializers.protobuf.KafkaProtobufSerializer");
      props.put("schema.registry.url", "http://localhost:8081");
      props.put("auto.register.schemas", "true");
      Producer<String, Theater> producer = new KafkaProducer<>(props);
      producer.send(new ProducerRecord<String, Theater>(topicName, "SilverScreen", getTheater())).get();
      
      System.out.println("Sent to Kafka: \n" + getTheater());
      producer.flush();
      producer.close();
   }
   public static Theater getTheater() {
      List<String> snacks = new ArrayList<>();
      snacks.add("Popcorn");
      snacks.add("Coke");
      snacks.add("Chips");
      snacks.add("Soda");
           
      Map<String, Integer> ticketPrice = new HashMap<>();
      ticketPrice.put("Avengers Endgame", 700);
      ticketPrice.put("Captain America", 200);
      ticketPrice.put("Wonder Woman 1984", 400);
      
      Theater theater = Theater.newBuilder()
         .setName("Silver Screener")
         .setAddress("212, Maple Street, LA, California")
         .setDriveIn(true)
         .setTotalCapacity(320)
         .setMobile(98234567189L)
         .setBaseTicketPrice(22.45f)
         .setPayment(PAYMENT_SYSTEM.CREDIT_CARD)
         .putAllMovieTicketPrice(ticketPrice)
         .addAllSnacks(snacks)
         .build();
      return theater;
   }
}

Here is a list of points that we need to be aware of −

  • We need to pass the Schema Registry URL to the Consumer.

  • We also need to pass the correct Protobuf Deserializer which is specific to the Schema Registry.

  • The Schema Registry would automatically read the stored schema of the theater object when we are done consuming.

  • Lastly, we created a theater object from our auto-generated Java code and that is what we will be sending.

Let us now compile and execute the code −

mvn clean install ; java -cp .\target\protobuf-tutorial-1.0.jar com.tutorialspoint.kafka.KafkaProtbufConsumer

offset = 0, key = SilverScreen, value = May 22, 2021 7:50:15 PM com.google.protobuf.TextFormat$Printer$MapEntryAdapter compareTo
May 22, 2021 7:50:15 PM com.google.protobuf.TextFormat$Printer$MapEntryAdapter compareTo

name: "Silver Screener"
address: "212, Maple Street, LA, California"
total_capacity: 320
mobile: 98234567189
base_ticket_price: 22.45
drive_in: true
payment: CREDIT_CARD
snacks: "Popcorn"
snacks: "Coke"
snacks: "Chips"
snacks: "Soda"
movieTicketPrice {
   key: "Captain America"
   value: 200
}
movieTicketPrice {
   key: "Wonder Woman 1984"
   value: 400
}
movieTicketPrice {
   key: "Avengers Endgame"
   value: 700
}

So, as we can see, the message which was written into Kafka was correctly consumed by the Consumer. Plus, the Registry stored the schema which can also be accessed by a REST API.

Advertisements