loop

Loops are used to run steps repeatedly. Under the loop keyword, an array of items can be supplied. These will be used for each iteration of the running the step.

loop: [ array | dict ]
pause: float

pause

pause injects a delay between iterations of running a step. As a protection, there is a default delay of 1s. This is to avoid overwhelming downstream systems such as API calls when using steps. The default can be overridden by setting the pause to some other number of seconds.

Errors in loop

The default behavior when steps encounter actions is to exit the pipeline. This is true for loops as well. If an error is encountered in an iteration, the loop will log a WARNING and stop.

Sometimes, errors are expected and the desired behavior is to have the loop continue. To enable this set condition: True on the step (see condition). This will continue to iterate through all items in the loop in spite of errors.

Note that means every item will be run through the loop even if it is malformed and every step will generate errors. It is recommended to test the loop to ensure it is configured correctly before enabling this option.

Examples

Loop with scalar values

The following YAML snippet shows how a loop can be defined:

steps:
  - name: print_items
    notes: This step prints out a list of items.
    params:
        output: |
          {{ item }}
    loop:
      - "itemA"
      - "itemB"
    pause: 0

The dataflow above would render the following:

itemA
itemB

Loop through dictionary

Normally an array of items is used for a loop. It is also possible to supply a dictionary. In this case, the keys of the dictionary will be used to generate an array of items.

The items will then be in the form of item.key and item.value.

For example:

steps:
  - name: list_values
    params:
        output: |
          {{ item.key }} is set to {{ item.value }}
    pause: 0
    loop: {
      "A": "value1",
      "B": "value2"
    }

would render

A is set to value1
B is set to value2

Loop using data from context reference

A typical pattern is to have a step read in or prepare an array of data and then loop through the data as items. This can be done using context references.

The following YAML snippet loops through records read from a .csv file:

steps:
  - name: get_items
    notes: List has header values of `col1`, `col2`, `col3`.
    action: csv.read
    params:
        filename: "simple_list.csv"

  - name: print_items
    notes: This step prints out a list of items.
    params:
        output: |
          {{ item.col1 }}
    loop__ref: steps.get_items.data
    pause: 0

Loop items not available on initial render

This behavior has been changed for version from rvrdata-0.9.3 onwards.

Dataflow steps may contain jinja statements that reference the item variable. This variable is not set until the loop data is retrieved. Under some circumstances, this means that statements that expect item to have a value should include a test for item to be None.

For example, the following would trigger an exception for rvrdata-0.9.2 and earlier:

- name: no_initial_item
  params:
      output: |
        {{ item|str_to_utc_str('%Y-%m-%d') }}
  loop:
    - "2023-11-01"

This example step should work for rvrdata-0.9.3 and later.

To correct it for earlier versions, consider the following

- name: no_initial_item_with_test
  params:
      output: |
        {{ item|str_to_utc_str('%Y-%m-%d') if item }}
  loop:
    - "2023-11-01"

By adding the if item to the statement, jinja will only evaluate the filter if item has a value. This avoids the exception on initial render.

It should be noted, that even though this case is handled in later versions of rvrdata, it is still a good coding practice to be prepared for missing or incorrect data using techniques like this.