Conduit 0.3 is here!

See what's new >>>

Real-Time Fraud Detection with Turbine and Novelty Detector

Most fraud detection is based on numeric data. Why? Because it's easier. Categorical data is hard to analyze and virtually impossible to analyze in real-time. Behavioral and profile data can provide the necessary info to detect an anomaly. And we’re not talking about just scoring the categorical data in order to make the models easier. With Meroxa Turbine and thatDot Novelty Detector accessing and analyzing categorical data just got a lot easier.

Turbine is Meroxa’s a real-time data application framework that makes it easy to turn your data pipelines into data applications. The vision for the Meroxa Data Platform and Turbine is to empower Software Engineers to build and deploy Data Apps; data processing applications that manipulate, enrich and analyze data that solve problems and derive value for the business.

An appealing aspect of the Turbine framework is that it enables the use of highly specialized tools such as thatDot’s Novelty Detector product. Novelty Detector is a real-time anomaly detection tool that uses categorical data to help you find anomalies in your data that you may not have otherwise been able to find while greatly reducing false positives.

Together, these two tools can help you build a data infrastructure powerful enough to handle large volumes of data and that can quickly identify anomalies. This can be a valuable addition to any software stack, as it can help you and your customers avoid costly mistakes and quickly identify and fix problems.

In this blog we’ll outline a simple Turbine Data App that leverages Novelty Detector to highlight novel, noteworthy or otherwise interesting user activities in real-time.

novelty-app

Prerequisite:

Sign up for a Meroxa account and install the latest Meroxa CLI.

  1. Setup your Novelty Environment and obtain credentials.
  2. Clone the example to your local machine:

git clone git@github.com:meroxa/novelty.git

Since this example uses Go, you will need to have Go installed.

How it works:

The novelty Turbine app takes use of activity data (e.g. user A carried out action B at time T) from a PostgreSQL database and streams it in real-time to the Novelty Detector server. The Novelty Detector server scores each "observation" for novelty, adding some additional anomaly metadata, which is then injected back into the PostgreSQL database.

Here’s an example Novelty Detector response payload:

{
	"observation": [
		"my",
        "sample",
        "observation"
    ],
	"score": 0.36231689108923804,
	"totalObsScore": 0.36231689108923804,
	"sequence": 3,
	"probability": 0.6666666666666666,
	"uniqueness": 0.9943363088569088,
	"infoContent": 0.5849625007211563,
	"mostNovelComponent": {
		"index": 2,
		"value": "observation",
		"novelty": 0.5849625007211563
	}
}

A full explanation of each field of the payload can be found on the Novelty Detector Usage Guide here but it is worth noting a few of the more interesting payload elements:

  • observation - simply the observation originally passed into Novelty Detector, included for reference.
  • score - The score is the total calculation of how novel the particular observation is. The value is always between 0 and 1, where zero is entirely normal and not-anomalous, and one is highly novel and clearly anomalous.
  • mostNovelComponent - an object, consisting of index, value, and novelty that indicates just how novel is the most novel component of the observation, indicated by index + value.

A key aspect of Novelty Detector, and one of the reasons it pairs so well with Turbine, is its simplicity of operation: once you have connected Turbine to Novelty Detector, it starts scoring observations without requiring any other configuration or setup.

Code:

The core of the Data App looks much like any typical Turbine app, but there are a couple of sections worth digging into.

func formatObservation(r turbine.Record) []string {
	country := r.Payload.Get("country").(string)
	city := r.Payload.Get("city").(string)
	email := r.Payload.Get("email").(string)
	userID := r.Payload.Get("user_id").(float64)
	tsFloat := r.Payload.Get("timestamp").(float64)
    tod, err := timeOfDay(fmt.Sprint(int(tsFloat)))
    
    log.Printf("tod: %+v", tod)
    
    if err != nil {
		log.Printf("error in formatObservation: %s", err.Error())
		return nil
	}
    
    obs := []string{tod, country, city, email, fmt.Sprint(userID)}
	log.Printf("obs: %+v", obs)
	return obs
}

Here we’re formatting the observation as an array of categorical data, starting with the value with the lowest cardinality (or the most significant).

A particularly interesting optimization is the bucketing of time data in the form of the timeOfDay function.

func timeOfDay(t string) (string, error) {
	intTime, err := strconv.ParseInt(t, 10, 64)

	if err != nil {
		return "", err
	}

	ts := time.Unix(intTime, 0)

	splitAfternoon := 12
	splitEvening := 17
	splitNight := 21

	if ts.Hour() < splitAfternoon {
		return "morning", nil
	}
    
	if ts.Hour() >= splitAfternoon && ts.Hour() < splitEvening {
		return "afternoon", nil
	}
    
	if ts.Hour() >= splitEvening && ts.Hour() < splitNight {
		return "evening", nil
	}

	return "night", nil
}

The function takes a unix timestamp value and maps it to morning, afternoon, evening or night.

You can find the full example for this data app on GitHub. We can't wait to see what you build 🚀

Additional resources:

         Meroxa, Turbine