loop
Loops are used to run steps repeatedly.
Under the loop
keyword, an array of items can be supplied.
These will be used for each iteration of the running the step.
pause
pause
injects a delay between iterations of running a step.
As a protection, there is a default delay of 1s.
This is to avoid overwhelming downstream systems such as API calls when using steps.
The default can be overridden by setting the pause
to some other number of seconds.
Errors in loop
The default behavior when steps encounter actions is to exit the pipeline. This is true for loops as well. If an error is encountered in an iteration, the loop will log a WARNING and stop.
Sometimes, errors are expected and the desired behavior is to have the loop continue.
To enable this set condition: True
on the step (see condition).
This will continue to iterate through all items in the loop in spite of errors.
Note that means every item will be run through the loop even if it is malformed and every step will generate errors. It is recommended to test the loop to ensure it is configured correctly before enabling this option.
Examples
Loop with scalar values
The following YAML snippet shows how a loop can be defined:
steps:
- name: print_items
notes: This step prints out a list of items.
params:
output: |
{{ item }}
loop:
- "itemA"
- "itemB"
pause: 0
The dataflow above would render the following:
Loop through dictionary
Normally an array of items is used for a loop. It is also possible to supply a dictionary. In this case, the keys of the dictionary will be used to generate an array of items.
The items will then be in the form of item.key
and item.value
.
For example:
steps:
- name: list_values
params:
output: |
{{ item.key }} is set to {{ item.value }}
pause: 0
loop: {
"A": "value1",
"B": "value2"
}
would render
Loop using data from context reference
A typical pattern is to have a step read in or prepare an array of data and then loop through the data as items. This can be done using context references.
The following YAML snippet loops through records read from a .csv
file:
steps:
- name: get_items
notes: List has header values of `col1`, `col2`, `col3`.
action: csv.read
params:
filename: "simple_list.csv"
- name: print_items
notes: This step prints out a list of items.
params:
output: |
{{ item.col1 }}
loop__ref: steps.get_items.data
pause: 0
Loop items not available on initial render
This behavior has been changed for version from rvrdata-0.9.3 onwards.
Dataflow steps may contain jinja statements that reference the item
variable.
This variable is not set until the loop data is retrieved.
Under some circumstances, this means that statements that expect item
to have a value
should include a test for item to be None.
For example, the following would trigger an exception for rvrdata-0.9.2 and earlier:
- name: no_initial_item
params:
output: |
{{ item|str_to_utc_str('%Y-%m-%d') }}
loop:
- "2023-11-01"
This example step should work for rvrdata-0.9.3 and later.
To correct it for earlier versions, consider the following
- name: no_initial_item_with_test
params:
output: |
{{ item|str_to_utc_str('%Y-%m-%d') if item }}
loop:
- "2023-11-01"
By adding the if item
to the statement,
jinja will only evaluate the filter if item has a value.
This avoids the exception on initial render.
It should be noted, that even though this case is handled in later versions of rvrdata, it is still a good coding practice to be prepared for missing or incorrect data using techniques like this.