Skip to content

Dataflow Concepts

Dataflows are pipelines that run jobs and steps configured via YAML.

Dataflow Influences

Bridge dataflows borrow heavily from the structure of other types of pipelines. For example, the design was influenced by Ansible Playbooks, Azure YAML Pipelines (and its younger cousin GitHub Actions).

Dataflow Basics

Steps run actions

The basic building block of dataflows are steps. Each step performs an operation by running an action via adapters. Adapters are interfaces to various technologies

Example actions steps might run include:

  • Display a message on the console
  • Set a variable
  • Call an API using http/https requests
  • Write to a file (e.g. txt/csv/json/yaml)
  • Read a file (e.g. txt/csv/json/yaml)
  • Query a SQL database
  • Send an e-mail

And more… see the step actions for a list of available adapters.

Hello World

As a starting point, consider the simple use case of ‘Hello World’. Running the following dataflow YAML would print “Hello World!” to the console:

    steps:
      - params: 'Hello World!'

This is an example of a single job, single step pipeline where the pipeline, stage and job are implied.

It also shows an example of the default adapter: stdout. To illustrate, the following YAML does the exact same as the above example:

    steps:
      - name: HelloWorld
        action: stdout.output
        params: 'Hello World!'

And taking it one step further, the following example does the same, but with a few additional concepts:

    name: HelloWorldJobExample
    variables:
        msg: 'Hello World!'
    jobs:
      - name: MyFirstJob
        steps:
          - name: OutputHelloWorld
            params: {{ msg }}

The above again, a single job, single step pipeline but with the job and step explicitly named. It gives a preview introducing the concepts of variables and context.

Dataflow namespace

Context References

Dataflows maintain an in-memory model of the pipelines. This uses the following structure:

pipelines.<name>.stages.<name>.jobs.<name>.steps.<name>

In string form, this structure is referred to as a “full context reference”. If no name is provided in the configuration for a given level, a number is inserted. For example, if no names are provided at all, the full context reference for the first step would be:

pipelines.0.stages.0.jobs.0.steps.0

Relative Context References

When running a pipeline, dataflows keeps track of the “current context” which identifies where it is in the processing of the YAML configuration. Relative context references are a shorthand form of context references based on the current context.

For example, if there where two unnamed steps in a job, when on step 1, the previous step could be referenced simply using step.0. The full context reference starting with pipelines can also be used.

Relative context references can start at any level so for the example, all the references below are equivalent:

  • steps.0
  • jobs.0.steps.0
  • stages.0.jobs.0.steps.0
  • pipelines.0.stages.0.jobs.steps.0

Context References for Keywords

Any option in a step can use values from the rvrdata namespace. This cam be done at runtime by appending the suffix __ref to any option and supplying a context reference. For example, using “steps.STEPNAME.data” will retrieve the data generated by the step at runtime.

Variables can also be referenced using this syntax by appending the __ref suffix to a variable name (new in rvrdata-1.1.0).

Note that variable names that include the strings “pipelines.”, “stages.”, “jobs.” or “steps.” should be not be used to avoid confusion with context references in the rvrdata namespace.

Referencing values using the __ref suffix is preferred as the most efficient way to retrieve runtime values in dataflows. If a single value is being used, using this approach will perform better over using of jinja. This is because the __ref syntax directly references the keyword value in memory at runtime. Using jinja causes the value to be rendered as text and then parsed as YAML. For large values, the rendering and parsing process takes additional processing time.

Dataflow Schema

See YAML for more detail about the structure of dataflow syntax.