Skip to main content
Version: sdf-beta2

Split

Dataflows can split events data into multiple topics. The split behavior is defined in the sinks section of the service definition. Similar to merge, the split operation can transform the event to match a target topic schema.

  sinks:
- type: topic
id: <topic-id>
transforms:
- operator: <operator-name>
...
- type: topic
id: <topic-id>
transforms:
- operator: <operator-name>

The transforms section defines the transformation to apply to the event before sending it to the target topic. The operator type depends on the desired business logic.

Split Example

We'll create a dataflow that reads person records from the usertopic and splits them into child and adult topics based on age.

1. Create a Dataflow file

Create a directory split-test:

$ mkdir split-test; cd split-test

Add the following dataflow.yaml file:

# dataflow.yaml
apiVersion: 0.5.0

meta:
name: split
version: 0.1.0
namespace: examples

config:
converter: json

types:
person:
type: object
properties:
name:
type: string
age:
type: i32

topics:
user:
schema:
value:
type: person
child:
schema:
value:
type: person
adult:
schema:
value:
type: person
services:
split-service:
sources:
- type: topic
id: user
sinks:
- type: topic
id: child
transforms:
- operator: filter
run: |
fn is_child(person: Person) -> Result<bool> {
Ok(person.age < 18)
}
- type: topic
id: adult
transforms:
- operator: filter
run: |
fn is_adult(person: Person) -> Result<bool> {
Ok(person.age >= 18)
}

In this example, we used the filter operator as the target schema matches the source. If there is a mismatch, use filter-map and update the record accordingly.

Run dataflow:

$ sdf run

Add --ui if you want to see the graphical representation of the dataflow.

2. Test the Dataflow

Produce to user:

$ fluvio produce user
{"name":"Andrew","age":16}
{"name":"Jackson","age":17}
{"name":"Randy","age":32}
{"name":"Alice","age":28}
{"name":"Linda","age":15}

Consume from child:

$ fluvio consume child -Bd
{"age":16,"name":"Andrew"}
{"age":17,"name":"Jackson"}
{"age":15,"name":"Linda"}

Consume from adult:

$ fluvio consume adult -Bd
{"age":32,"name":"Randy"}
{"age":28,"name":"Alice"}

In summary, split-service utilizes sinks to divide the data into two different topics: child and adult. The child filter operator passes records with ages less than 18 years old. The adult filter operator passes records with an age greater or equal to 18 years old.

References