Skip to main content
Version: sdf-beta1.1

Split

Dataflows can split events data into multiple topics. The split behavior is defined in the sinks section of the service definition. Similar to merge, the split operation can transform the event to match a target topic schema.

  sinks:
    - type: topic
      id: <topic-id>
      transforms:
        - operator: <operator-name>
          ...
    - type: topic
      id: <topic-id>
      transforms:
        - operator: <operator-name>

The transforms section defines the transformation to apply to the event before sending it to the target topic. The operator type depends on the desired business logic.

Split Example

We'll create a dataflow that reads person records from the usertopic and splits them into child and adult topics based on age.

1. Create a Dataflow file

Create a directory split-test:

$ mkdir split-test; cd split-test

Add the following dataflow.yaml file:

# dataflow.yaml
apiVersion: 0.5.0

meta:
  name: split
  version: 0.1.0
  namespace: examples

config:
  converter: json

types:
  person:
    type: object
    properties:
      name:
        type: string
      age:
        type: i32

topics:
  user:
    schema:
      value:
        type: person
  child:
    schema:
      value:
        type: person
  adult:
    schema:
      value:
        type: person
services:
  split-service:
    sources:
      - type: topic
        id: user
    sinks:
      - type: topic
        id: child
        transforms:
          - operator: filter
            run: |
              fn is_child(person: Person) -> Result<bool> {
                  Ok(person.age < 18)
              }
      - type: topic
        id: adult
        transforms:
          - operator: filter
            run: |
              fn is_adult(person: Person) -> Result<bool> {
                  Ok(person.age >= 18)
              }

In this example, we used the filter operator as the target schema matches the source. If there is a mismatch, use filter-map and update the record accordingly.

Run dataflow:

$ sdf run

Add --ui if you want to see the graphical representation of the dataflow.

2. Test the Dataflow

Produce to user:

$ fluvio produce user
{"name":"Andrew","age":16}
{"name":"Jackson","age":17}
{"name":"Randy","age":32}
{"name":"Alice","age":28}
{"name":"Linda","age":15}

Consume from child:

$ fluvio consume child -Bd
{"age":16,"name":"Andrew"}
{"age":17,"name":"Jackson"}
{"age":15,"name":"Linda"}

Consume from adult:

$ fluvio consume adult -Bd
{"age":32,"name":"Randy"}
{"age":28,"name":"Alice"}

In summary, split-service utilizes sinks to divide the data into two different topics: child and adult. The child filter operator passes records with ages less than 18 years old. The adult filter operator passes records with an age greater or equal to 18 years old.

References