Introducing Conduit 0.6

With Conduit 0.6, we’re inching closer to the 1.0 release. Conduit is an important building block in the Meroxa platform to stream data from and to a variety of data stores. Starting with Conduit 0.5, we’ve made a concerted effort to focus on features and bug fixes that help developers as they operate Conduit in production environments. This is true for the Meroxa platform and those that use Conduit today.

Significant Features

More ways to install Conduit

Let’s face it. There’s so many different ways a Developer or a DevOps team wants to install software on their machines or in a production environment. That’s why all of our releases starting with 0.6 will have the ability to be installed via:

Connector Lifecycle Events

Before Conduit 0.6, if you wanted to build a Conduit connector, the connector needed to be able to respond to a handful of events from Conduit itself, `Configure`, `Open`, `Read`, `Write`, `Ack`, or `Teardown`. These events would get emitted to the connector through the invocation of a pipeline. At first, these events seem more than enough to cover the needs of various data stores and ways to connect to them. In practice, these weren’t enough to cover extra actions that a connector might want to take. Let’s say you wrote a Change Data Capture connector for Postgres. In this connector you need to open a replication slot on the database and close the slot when you’re done streaming data. With the new lifecycle events, you could open the replication slot in a Source `OnCreate` event and when the connector shuts down you can close the slot in the Source `OnDelete` event.

In Conduit 0.6, we’ve introduced a few more events throughout the connector’s lifecycle. These events include:

Source OnCreate
Source OnUpdate
Source OnDelete
Destination OnCreate
Destination OnUpdate
Destination OnDelete

With these extra events, you’ll now be able to have more control over when and what your connector does when Conduit includes it in a pipeline. If you want more information about it, check the original Design Doc and the associated issues.

Parallel Processors

In Conduit Pipelines when you wanted to add a processor, that processor would sequentially process records as they’re pulled from the upstream data source. With the release of Parallel Processors, you now have the ability to specify a number of workers and Conduit will process incoming records across the processor workers. This allows processors to keep up with high data velocity pipelines. Keep in mind that for the data coming into the processor the data may get processed by processor workers out of order but the records will flow out of the processor in the order that they came in.

To kick the tires on this, you’ll need to include the number of `workers` you want in your pipeline configuration file:

version: 2.0

pipelines:
  - id: pipeline1
    processors:
      - id: proc1
        type: js
        workers: 1

If you don’t include `workers` in your processor definition, the default will be `1`.

To learn more about Parallel Processors, go check out the PR!

Looking forward to 1.0

One of our main principles on the Conduit team is to make sure that what we say Conduit does is actually what you get. This is why we’ve been so focused on making sure operating Conduit is as expected. In terms of feature development, we want 1.0 to signify that Conduit won’t have any major breaking changes. This provides guarantees around how you can expect to interface with and develop against Conduit. As of this time, we don’t expect any major breaking changes to the internal APIs of Conduit and the connector spec. Once we spend more time with Conduit in Meroxa’s production environment, we’ll be able to gather the information we need to know if those APIs will need to change.

So what does the next set of capabilities and features look like? We’re diligently working on a Conduit Kubernetes Operator. For advanced production environments, this will make running a Conduit service that much easier with all of the needed behaviors around starting, stopping, and restarting pipelines all built-in. But that’s just one of the many capabilities we’re looking to add before we get to 1.0, check out all of the milestones in GitHub for more information.

We’d love your feedback too!

As we start gearing up for 1.0, we’d love to get your feedback! If you want to see the full list of what was included in this release, check out the Conduit Changelog and the documentation. Also, feel free to join us on Discord or Twitter.