With Web 2.0 being decades old even those outside of the software engineering world are familiar with the term. The success of Web 2.0 has led to systems that produce unprecedented volumes of data. This deluge of data has created the need for another type of app: the data app.
A data app is an application that uses real-time or near-real-time events to solve a problem. This is in contrast to web apps, which are focused on the classic and well-known HTTP request/response model. With web apps, the underlying data architecture and processing are offloaded to backend systems, separate from the frontend system with the UI for the end-user.
Data apps are the perfect solution to the growing complexity of data-driven applications and the complex data architecture required to process all that data. However, there is a lot of confusion around what makes data apps different from web apps.
In this article, we’ll compare web apps with data apps. We’ll look at their relationship with interaction models and how data apps might solve problems that web apps aren’t equipped to solve. We’ll close by looking at an example data app built usingTurbine.
Let’s dive in.
What is a Web App?
Generally speaking, most developers are familiar with the concepts surrounding web apps. Web apps use the classic HTTP request and response model to interact and generate data from user interaction.
In most cases, the REST API with its CRUD concept has become the de facto approach to dealing with the backend data flow and interactions generated by most web applications.
Typically, most web apps are made of a frontend, which is more UI-related, generating events and data, while the backend system of REST APIs and other supporting services deal with the processing and movement of the data.
What is a Data App?
A data app is an application that uses events to solve the same or similar data problems as the backend systems driving many web apps.
Data apps are more focused, seeking primarily to solve the following technical problems:
- Persisting/syncing data and events between and on data infrastructure.
- Transforming and manipulating data between and on data infrastructure.
- Other common data processing tasks between and on data infrastructure.
In most web apps, the core functionality is often to create, consume, or present data. Data apps are a natural evolution towards better design, architecture, and support for the high-volume data-driven software world many developers and engineers find themselves in.
Data apps and architecture
One important aspect of a data app that distinguishes it from a web app isthe tightening of concerns between infrastructure and code. While web apps typically involve both a front-end layer and a back-end layer, data apps operate on the back-end only, interacting directly with the data infrastructure. With the common use cases of real-time or near real-time data, the complexity of the code and the architecture built to support these high-volume data sets has become a serious burden and hurdle for many developers.
Interaction Models
Before diving further into data models, let’s take a side tour of a topic related to software design: interaction models. In this context, interaction models can help us understand the fundamental differences between web apps and data apps. We’ll look at the two major types of interaction models: user-to-system interactions and system-to-system interactions.
User-to-system interaction models
User-to-system interaction models are common in the software design of web apps. With the rise in popularity of UX design, we’ve seen an increased emphasis on the interaction between the end-user and the system (the web app).
In this context, software design is all about modeling the system in a way that helps the end-user interact with the application to perform certain tasks. This could simply be the way a user navigates and interacts with a page or performs certain actions and updates to the system.
System-to-system interactions models
On the other hand, the system-to-system interaction model has an entirely different goal in mind. System-to-system interactions are oftenmodeled around how different pieces of infrastructure interact and work together to analyze and process data.
Consider a real-world example: a continuous incoming stream of user clicks from a frontend system that must be processed and made available in a company’s Data Lake for analysis by downstream business units.
Closing the gap between web and data apps
For today’s web apps, a common area of complexity and limitation centers around the system-to-system interaction model. While web apps thrive at addressing user-to-system interactions, the lines can get blurry when it comes to processing the data generated by those interactions.
At a high level, many questions arise when engineers and developers try to hash out responsibilities when it comes to data processing. How much data transformation and handling can be done by the web app? Should the web app do any of it, or should all data be handed off to other systems to process?
As an example, the engineers working on web apps typically aren’t deeply familiar with the complexities of streaming data processing. Often, this sort of work is handed off to another backend system and a team that is responsible for data processing.
How can data apps solve these complex data processing problems while retaining the familiarity of web apps in code and project structure? One of those ways is withturbine-py, a Python package built specifically for creating data apps.
But first, let’s dive into the benefits that data apps provide and how they help engineers solve complex data processing problems.
How Data Apps Solve Problems
It’s well known that streaming with real-time or near-real-time data processing is important for modern data processing applications, but it’s also incredibly complicated. Data apps solve these issues by handing off the complexity of the underlying streaming infrastructure.
Data apps are built in such a way that they can handle event-driven streams of data, respond in real-time, and scale to use cloud-native best practices. Engineers can focus on building applications that solve complex problems rather than worrying about the complexity of processing streaming data or the infrastructure needed to support those technologies. Typically, managing these technologies correctly requires a dedicated team of engineers.
Benefits of Data Apps
Data apps — like those built with Turbine — have several benefits that extend from this reduction of complexity.
First, by allowing developers to focus on code rather than on managing complex infrastructure and cloud-related operations, data apps free up time and energy for developers so that they can focus on the code that matters: the application code itself.
Also, the speed at which new engineers can become familiar with and contribute to codebases increases dramatically. When less time is spent understanding streaming architecture and managing those resources, more effort can be spent on the core of the application logic.
Let’s look at a simple data app built using Turbine to see these benefits in action.
Example of a Data App Using Turbine
Currently, Turbine data apps can be written withGo,Python, JavaScript, and Ruby In this example, we will use Python. We’ll solve a data processing problem that is common for many organizations.
In our sample problem, we have streaming records generated by our users in a web app, and those records need to be processed into a Data Lake, with transformation applied for later analytics by business users.
Turbine fits the use case for this problem perfectly, providing a data app framework for responding to real-time data while being able to scale in the cloud.
Tooling setup
First, we install the Meroxa CLI to help with the scaffolding of a Turbine data app. We follow theseinstallation instructions. Weset up our Meroxa account and then log in via the CLI.
$ brew tap meroxa/taps; brew install meroxa
Next, we install theturbine-py package. Then, we initialize our Python data app, creating a clean template.
$ pip3 install turbine-py
$ meroxa app init data-warehouse — lang python — path ~/src
Now we are ready to start developing our Python data app! When we initialized our app, the following files were automatically generated for us as our template:
- main.py
- app.json
- __init__.py
- fixtures
— demo-cdc.json
— demo-no-cdc.json
Writing our first Turbine data app
There are five important concepts for writing Turbine data apps, which include:
- Turbine class (provides need functionality)
- Data processing function(s)
- Resources (datastores)
- Records (collection of data)
- Write (push data out)
TheTurbine class itself provides access to the necessary components to build your data app with minimal code. Of course, you will have one or moredata processing functions or methods to apply transformations to your records.
Resources in Turbine will allow you to connect to your data sources.Records are simply a collection of data that your data app will process. Lastly,writing will push the processed data back out of the data app. You canconfigure your Resources and Destinations in Meroxa.
Since we don’t need to worry about the complexity of consuming a stream of records or the technical requirements related to the source streaming technology, we can focus on writing the transformation function that takes individual records and transforms them as needed.
Writing the Code
We will write the code for our data app in main.py
, which will be our entry point.
First, we will import the needed Python packages into our main.py
code.
from turbine import Turbine
from turbine.runtime import Record
Next, we will write our Python class that inherits from the Turbine class to process our streaming user records.
class DataLake:
@staticmethod
async def run(turbine: Turbine):
source = await turbine.resources(“user_activity”)
records = await source.records(“click_stream”)
processed = await turbine.process(records, transform)
destination_db = await turbine.resources(“data_lake”)
await destination_db.write(processed, “user_analytics”)
This simple class is straightforward to follow, as the Turbine data app abstracts away the details of complex stream processing. There are four simple steps encapsulated inside our run method.
- Connect to a Meroxa-configured source system.
- Pull streaming records from the source.
- Transform the streaming records as needed, yielding the set of processed records.
- Connect to a Meroxa-configured destination to write our processed records.
With the data flow of our app written, the only remaining step is to write the transformation function that will process our streaming user records. In our example case, our clickstream records contain a field with first and last names concatenated together, like “John Doe.” We simply need to split this field into separate records — first_name and last_name — before ingesting it into a Data Lake.
def transform(user_stream: t.List[Record]) -> t.List[Record]:
updated = []
for user_click in user_stream:
user_click_to_update = user_click.value
full_name = value_to_update[“payload”][“user”][“name”].split(‘ ‘)
first_name = full_name[0]
last_name = full_name[1]
updated.append(
Record(key=user_click.key, value={“first_name” :
first_name, “last_name”: last_name}, timestamp=user_click.timestamp)
)
return updated
With a little configuration and setup,our Turbine data app can ingest and process complex streaming data, and it does so with very few lines of code!
Conclusion
Data apps, though relatively new, bring with them a whole host of benefits. These benefits include the efficiency and streamlining of processes along with the simplicity of onboarding new engineers. Building data apps with a tool like Turbine is a perfect approach to today’s complex real-time and near-real-time data processing needs. The ability to approach a normally complicated data problem with a straightforward codebase — while offloading the complexity related to architecture and streaming data — is a game-changer for developers.