Testing streaming systems and architectures can be difficult because you need to mock data and have an upstream system continuously push that mock data. This post is about how to set up Conduit’s data generator connector.
The generator connector is built into Conduit. You don’t need to download an external connector to get started. The connector has a number of capabilities like controlling the content it generates (a struct or a file), the format (structured payloads and raw payloads) and the amount and frequency of data generated. With this connector, you’ll be able to test the flow of data through your streaming systems.
Our example will be a simple pipeline, with a generator source and a file destination. The generator source will be generating records, which will then be written to a file.
Let’s go over the configuration options for the generator source in this example (also described in the README):
format.type and format.options
These two parameters are both required and specify the contents of generated records. format.options has different meanings depending on format.type.
format.type can bestructured, raworfile. If structured is used, records with structured payloads will be generated. In that case, format.options needs to be a list of name-type pairs, where type can be one of int, string, time, bool. The generator above will create records with structured payloads, where we will have an ID integer field, a name field (of type string), a company field (of type string as well) and a trial field (of type boolean).
Similar is true when format.type is raw. The only difference is that the structs will be serialized as JSON strings, and then converted to bytes.
To use a file as the payload, we need to set format.type to file. format.options is then expected to be a file path.
Simulates time needed to read a record. In this example, records will be read every 10 milliseconds.
The number of records which the generator will generate, or -1 for no limit. In our example, 5 records will be generated.
burst.sleepTime and burst.generateTime
These two options make it possible to simulate bursts. With this, the connector can sleep for burst.sleepTime (not generating any records), then generate records for burst.generateTime, and then ut will repeat the same cycle. The connector always starts with the sleeping phase. The cycles will end when recordCount has been reached, or never (if recordCount is set to -1).
Here, the connector will sleep for 15s. Then it will be generating records for the next 30s. Every record will take 1ms to be generated. Once 30s are over, the same cycle will be repeated. recordCount is set to 2000, meaning that the cycles will stop after 2000 records have been generated.
Creating the file destination
Now let’s create a place for all the generated records to be written to. We’ll configure a file destination:
curl -C POST 'http://localhost:8080/v1/connectors' -d '
Since we’re generating only 5 records, and are simulating a 10-millisecond read time, we should be able to see the records in the destination pretty much instantaneously. If you check the contents of/home/conduitdev/projects/conduit/file-destination.txt, you should see something like this:
That’s all it takes! If you have any questions, suggestions, or just generally want to talk about streaming data, feel free to start a GitHub discussion or have a conversation with us on discord. And don’t forget to follow us on Twitter if you aren’t already.