Protocol Buffers in Python (with Kafka)

Thought Leadership

Protocol Buffers in Python (with Kafka)

Mar 7th, 2023

Anthony Morast

Sterling Trading Tech C++ Software Engineer

Kafka

Apache Kafka is an open-source software used for distributed even streaming with a focus on high throughput and high-performance applications. Kafka is a very powerful tool and a full introduction is beyond the purview of this article. For this article, we need only to understand that Kafka is used for passing messages between applications. It works as a standalone service that maintains queues referred to as topics within Kafka.

One simple way to think about Kafka is to consider a Python dictionary that maps strings to messages,

A Kafka producer will append a message (of any type) to one or many queues. A Kafka consumer will then subscribe to these topics and parse incoming messages. The consumer can parse only new messages, the entire queue which is stored on disk, or any subset of messages in the queue depending on the configuration.

Obviously, Kafka has much more functionality than this but, for the uninitiated, this basic understanding will help to understand Kafka as it is used for interprocess communication later in this post.

Protobufs in Python

For the sake of example, we will consider a Python service that publishes stock trades and stock quotes to a Kafka topic. A separate service will subscribe to these messages and consume them from the Kafka topics.

First, the protobuf messages are defined in the .proto format. An extensive explanation of the message format can be found in the project’s documentation.

As the comments above explain, there are 5 message types created, i.e. 5 different protobuf messages.

Trade — a message that contains information about a stock trade. In particular, the stock symbol that was traded, the time-in-force (Good-til-Cancelled, Day, etc.), and the price at which the trade was placed.
TradeList — Simply a list of Trade messages defined above using protobuf’s repeated keyword which creates, for example, a list in Python or a vector in C++.
Quote — A stock quote that contains the stock symbol, the bid price, and the ask price.
QuoteList — A list of Quote objects.
MessageWrapper — A wrapper used to send any message of the wire. This message contains a msgType which references the MessageType enum and is used to determine which message was received. The wrapper also includes a payload and uses protobuf’s oneof keyword. This keyword is followed by a list of messages that can be added as the payload of the message. Only one message can be added to the wrapper.

With the message types defined in a file called example.proto we can use protoc to generate messages in our language of choice. This can be installed via the instructions in the project’s documentation. To generate Python objects for these messages the following command is used

Specifying the output directory for your language of choice generates objects for that language. The available command line options are,

One or many of these options are followed by a list of .proto files. In our case, there is only one proto file, example.proto. The files generated by protoc are difficult to read but coding examples abound about the use of these objects in different languages. Additionally, the protobuf documentation has example usage for the different languages.

For example, in the Python code below the protobuf messages are used to create and print lists of Trades and Quotes.

As seen here, the use of these messages is very straightforward. The same can be said for other languages. For me in particular I only have experience using protobufs for intercommunication between C++ and Python and the messages are as easy to use in C++ as they are in Python.

Interprocess Communication With Protobufs

To demonstrate interprocess communication with protobufs via Kafka, a Python Kafka consumer and producer will be created. Note that I already have a Kafka server running locally. To do this, two methods are needed: one to send messages via Kafka and one to consume messages via Kafka.

These methods are defined in the Python logic above. As described in the comments, the consumer and producer will use the trades topic.

With the producer method the example code above can be extended to include methods that publish TradeLists and QuoteLists by passing them into the produce(msg) method.

The code above is similar to that in the “Protobufs in Python” section above. Two methods were added

SendQuoteListViaKafka — sends a list of quotes (QuoteList) message wrapped in the protobuf message wrapper via Kafka.
SendTradeListViaKafka — sends a list of trades (TradeList) message wrapped in the protobuf message wrapper via Kafka.

On the receiving end, a Python script runs forever consuming messages as they arrive on the “trades” topic.

The logic above passes in the processMsg() method as the callback function to the Kafka consumer. When a message is received and consumer the raw bytes of the message are passed to this method and used to create a MessageWrapper object. With this object, the message type (QuoteList or TradeList in this case) can be determined and accessed in the message wrapper. In this code, nothing is done with the messages except the content being printed. Running the two scripts above allows us to see the messages coming in via Kafka.

Shown here are the 10 Quotes and 5 Trades that were randomly generated by the TradeProducer script and received and printed by the TradeConsumer. That is two separate processes are both using the same Kafka topic to communicate via protobuf messages.

Conclusion

This post demonstrated the use of Google’s Protocol Buffers (protobufs) and Apache Kafka for interprocess communication. What was not shown was the ability to produce messages in one language (e.g. Python, C++, or Java) and consumer them in a completely different language as the protobuf messages are language-agnostic. This allows for very complex and high-performance applications. Also not shown here but another powerful feature of Kafka, is the ability for multiple producers and consumers to use the same Kafka topics which allows for a very complex web of applications to all communicate with one another.

By: Anthony Morast, Sterling Trading Tech C++ Software Engineer

"I am a professional software engineer and an amateur mathematician. My main interests are programming, machine learning, fluid dynamics, and a few others."

Thought Leadership