
Introduction
As we’re getting closer to the Conduit 1.0 release, we recently started conducting a series of benchmarks on our most popular connectors. We started with MongoDB and Kafka, and in this case, we were eager to run some tests using one of our built-in connectors.
More particularly, we wanted to put Conduit to the test, head-to-head against Kafka Connect, moving data from Postgres to Kafka. Our goal was to see just how much performance we could squeeze out of Conduit while still maintaining a reasonable usage of resources.
The results were very promising. Conduit moved data faster than Kafka Connect in both CDC and snapshot operations, and did it while using dramatically less memory in some cases, over 98% less. In this post, we’ll break down how we ran the tests, share the numbers, and show where Conduit really shines.
Methodology
Performance Measurement
To ensure consistency and accuracy, we used our own recently launched benchmarking tool, Benchi. Benchi collects throughput data using Conduit’s built-in metrics and Kafka’s JMX metrics, while CPU and memory usage is monitored through Docker runtime stats. This setup lets us compare both tools under identical, automated conditions using the following metrics:
- Message Throughput (messages per second)
- CPU Utilization
- Memory Usage
Snapshots vs CDC
Snapshot and CDC workloads have different performance profiles, so we made sure to configure them accordingly. Thankfully, Benchi allows us to do that very easily. The main differences in the setup were:
- Snapshot: All test data is loaded, and only once that is done, the pipeline starts running.
- CDC: Streaming is started and paused, data is inserted, then streaming resumes, forcing the pipeline into CDC mode.
The following setup ensured both tools processed the same data under the same conditions, depending on the mode (CDC or Snapshot).
Setup
All benchmarks ran on a t2.xlarge AWS EC2 instance (4 vCPUs, 16 GB RAM, 120 GB gp3 EBS volume). Kafka and Postgres ran in Docker containers, with a single Kafka broker and Postgres instance. While we did try different EC2 instances, we decided to go with a t2.xlarge considering it had reasonable capacity to give Kafka Connect a fair chance. For Conduit, you can certainly run your pipelines in a much more constrained environment, massively reducing your cost.
The amount of data we inserted into the Postgres instance for each test was 20 million records with the following schema:
CREATE TABLE employees (
id INT NOT NULL,
name VARCHAR(255),
email VARCHAR(255),
full_time BOOLEAN NOT NULL DEFAULT TRUE,
position VARCHAR(100),
hire_date DATE NOT NULL,
salary REAL CHECK (salary >= 0),
updated_at TIMESTAMPTZ DEFAULT NOW(),
created_at TIMESTAMPTZ DEFAULT NOW(),
PRIMARY KEY (id)
);
Conduit
We chose the latest Conduit released version v0.13.4 with the Postgres connector, using the new pipeline engine. Pipelines used initial_only
for snapshots and logrepl
with logical replication slots for CDC.
Kafka Connect
We ran Kafka Connect v7.8.1 with Debezium Postgres connector. Default worker settings, 10 GB heap (KAFKA_HEAP_OPTS: "-Xms10G -Xmx10G"
), and tuned batch/queue sizes.
Full configurations are here and here.
Running the Benchmarks
To reproduce these results, you can simply run your own EC2 instance and follow these steps:
curl -L https://github.com/ConduitIO/streaming-benchmarks/archive/refs/heads/main.zip -o streaming-benchmarks.zip
unzip streaming-benchmarks.zip
cd streaming-benchmarks-main && make install-tools
make run-postgres-kafka-cdc
make run-postgres-kafka-snapshot
Results
Here’s how Conduit and Kafka Connect compare in both modes:
Mode | Tool | Message Rate (msg/s) | CPU (%) | Memory (MB) |
---|---|---|---|---|
CDC | Conduit | 48.060 | 110,2 | 110,2 |
Kafka Connect | 44.889 | 147,1 | 6.863 | |
Snapshot | Conduit | 70.753 | 231,0 | 2.234 |
Kafka Connect | 68.783 | 184,2 | 2.729 |
In CDC mode, Conduit's combination of higher throughput and significantly lower memory usage makes the biggest difference. We called this a huge win since we consider that pipelines typically spend most of their time in CDC. Having this efficiency directly impacts day-to-day operations and can immensely reduce the cost of your infrastructure or simply expand the options for where you can run your pipelines.
For snapshots, the throughput gap ended up being smaller, though Conduit was still leading. In this case, memory consumption was still lower than Kafka Connect, but with higher CPU usage.
Charts
Key Findings
For Schema support, even though Conduit has the ability to maintain the schema on structured data through the pipeline, we decided to disable the schema extraction on the source as this was not necessarily needed, and we wanted to reduce the overhead. This can be accomplished by setting both sdk.schema.extract.key.enabled
and sdk.schema.extract.payload.enabled
to false in the source Postgres connector, and it had a direct impact on performance.
Implementing a ReadN
method (supported thanks to our Connector SDK), we were able to start reading multiple records at the same time, pulling batches of changes in a single operation. The implementation of this method in the source Postgres connector resulted in a 7,2% improvement on CDC and a 2,4% boost on Snapshot.
CDC Mode
Conduit delivered 7% higher throughput (48.060 msg/s vs. 44.889 msg/s) and used 98% less memory (110 MB vs. 6.863 MB). CPU usage was also 25% lower (110% vs. 147%).
Snapshot Mode
When configuring the Postgres source, specifying your desired batch size via the connector configuration parameters snapshot.fetchSize
and sdk.batch.size
is relevant. The optimal value we came up with was 75000, though this number was purely experimental. For Conduit, we felt comfortable bumping up this number as memory consumption is clearly not an issue for it.
In the end, throughput was 3% higher for Conduit (70.753 msg/s vs. 68.783 msg/s), with 18% less memory used (2.234 MB vs. 2.729 MB). However, CPU usage was 25% higher (231% vs. 184%).
Future improvements
We believe there is still potential to continue increasing speed by experimenting with different methods for moving data between goroutines. When we conducted tests using channels with various batch sizes and buffering strategies, we saw dramatic differences in performance depending on how data was grouped and transferred.
For instance, sending 20 million objects one at a time over an unbuffered channel took around 5.5 seconds, while simply adding a buffer of size 50 brought that down to 1.8 seconds. The real breakthrough came when we increased the batch size to 1,000 or even 10,000—at that point, the total time dropped to just 80 ms, regardless of channel buffering.
Based on these results, we definitely consider it worth exploring batch sending from the CDC and snapshot iterators. There is a good chance we can achieve even greater performance by sending records in groups rather than one at a time. Considering Conduit is JVM free, the only foreseeable future we anticipate for improvements is an even higher throughput without being concerned about our resource consumption. 🚀
Let’s Chat!
Curious about these benchmarks? Have ideas for new tests, or want to share your own results? Join us on Discord or start a GitHub discussion.