Steps show_context
Steps can show their runtime context using the show_context
feature (new in rvrdata-1.1.0).
When building or testing dataflows, it can be helpful to see the variables or data available for a given step. This “runtime context” is from the in-memory namespace identified by the context reference.
The show_context
output feature gives a view into the current context.
Using this feature generates a context report that can include the following:
- the variables present and their data types
- the data generated by the step
- the context references available to the step
The report is rendered to stdout
(default) or can be saved to a .yml
file.
show_context
On any step, show_context
can be enabled in the output
section similar to the following:
steps:
- name: show_context_with_data_types
# action, params and other settings go here.
output:
show_context:
- show_vars: True
This will render the context report out to stdout
without affecting the data of the step.
show_context options
The YAML below shows the available options for show_context
.
steps:
- name: show_context_options
params:
silent: True # not required but note that stdout is not required
output: "sample output"
output:
show_context:
- ref: "steps.show_context_options" # defaults to current step context ref
show_ref_warnings: True|False # defaults to False
show_vars: True|False # defaults to False
show_vars_values: True|False # defaults to True if show_vars True
show_data: True|False # defaults to False
show_data_count: 1 # defaults to 1 if show_data True
# Use 0 to show data types only.
# Use -1 to show all data - use with care for large datasets.
# For strings, shows the first data_count * 1000 characters.
# For dict, shows the first characters of the string representation.
show_step: True|False # defaults to False
enabled: True|False # defaults to True, can be used as a toggle to disable output
show_context_filename: "folder/context_file.yml" # output to YAML file instead of stdout
Note that show_context
is a list of context references.
This allows more than one reference to be included in the context report.
It also can be used to show the data for a specific context reference,
such as an attribute of data.
Option details
Each of the options are listed below
- enabled (bool) defaults to True, implied so not required.
- Can be used as a toggle to disable output.
- ref (str) defaults to current step context.
- Object dot not notation for context reference in namespace.
- Can be used to return a specific data value.
- show_ref_warnings (str) defaults to False.
- Generates a log warning message if the reference is not available.
- References not found will be reported as such in the context report but will not affect how the dataflow runs (hence why the WARNINGS are disabled by default).
- show_step (str) defaults to False.
- Includes the rendered step values in the context report.
- Useful to see the jinja rendered at runtime.
- show_vars (bool) defaults to False.
- Generates list of variables with their data types.
- show_vars_values (bool) defaults to True if show_vars True.
- Shows the string representation of the variable values.
- show_data (bool) defaults to False.
- Care should be taken when working with large datasets. See show_data_count below.
- show_data_count (int) defaults to 1 if show_data True.
- Set to
0
to show data types only. - Set to
-1
to show all data (this can overwhelm stdout for large datasets - see working with large datasets). - show_data_key (str) for dict data, performs a get using the key to get records.
- Uses
show_data_count
for records found with the get key. - show_context_filename (str) optional path to YAML file.
- If present, context report will be saved to the file instead of rendered to stdout.
Context Report
The summary report generated for the context is in YAML
format.
It is not included in the data of a step and will not affect the dataflow namespace, just report on it.
By default the report is rendered to stdout.
Use the filename
option
to save the report to a YAML file.
Context Report Format
The results shown in the context report show values from the in-memory namespace at the time the report is generated. This is important to remember because it will be what the step has access to. Subsequent steps in a dataflow that have not run will not be available yet. As a consequence, all results are show by full context reference.
The following sample YAML shows the structure:
contexts:
"full.context.ref": # e.g. "pipelines.name.stages..."
current: # e.g. default, current refers to current step
variables:
available_context_refs:
data:
rendered_step:
"ref": # each specific ref will appear as a key
error: # error returned for bad reference
type: # data type
values: # Reference value
This report will appear on stdout or be saved to a file if the filename
option is used.
Working with Large Datasets
Dataflows are often used to retrieve large sets of data from APIs or from files. This can easily go into the 10,000s, 100,000s or beyond.
As a result, rendering all records to stdout is not recommended. Using
show_data_count: -1
should be used with care.
It can overwhelm the buffer size of stdout. This can lead to all sorts of issues since stdout might used in many places, all with different limits including:
- locally to a command-line terminal,
- within stdout of docker engine,
- captured by a parent process (eg. script.run action)
It is suggested that before exporting all data, a smaller data count be used to sample the records (e.g. 1,5 or 10).
If all records are needed, instead of rendering to stdout,
it is recommended that the filename
option be used when
including all data records in a context report using show_data_count: -1
.
This will save the data to a .yml file instead of stdout.
Remember there needs to be enough available disk space for the file.
Checking data values
One useful feature of show_context
is the ability to introspect specific data values.
The ref
option can be used to specify any object property in the namespace
using dot notation.
This includes examining the .data
of the current step or a different step.
If the context reference string provided does not match anything available to the step,
a message noting this will be included in the report.
A WARNING log message can be generated is show_ref_warnings
is enabled.
Working with complex data
Not all data types can be represented as strings. And some responses have properties that are methods, not attributes. In these cases, the context report will not be able to render them.
Consider the following example:
steps:
- name: get_cookies
action: requests.get
params:
url: "https://httpbin.org/cookies"
output:
show_context:
- show_data: True
- ref: steps.get_cookies.data.text
# Note: - ref: steps.get_cookies.data.json is a method
- ref: steps.get_cookies.data.cookies
show_data_count: 0 # disables show value - cookie jar does not render as a string
This example will display three sets of data for the context report:
- the current step profile
- the text value of the HTTP response
- the data type for the
response.cookies
(to see if cookies are present or not)
The display of the cookies value is set to 0
to disable the output.
This is because cookies do not have a string representation so a WARNING would be generated
if showing the values was enabled.
It would also stop the render from producing nicely formatted YAML.
With the show values disabled (show_data_count: 0
),
the context report shows “type: RequestsCookieJar” and shows the count.
show_context_runs
The show_context
feature generates output after a step is run.
This enables it to show data and other results of the step.
For steps that contain a while or a loop,
there is an additional feature called show_context_runs
.
This shows the context at the end of each individual run.
It follows the same syntax as show_context
with options
provided in a list.
For example, the following dataflow runs the stdout.output action twice:
steps:
- name: while_show_context_runs
action: stdout.output
params:
output: |
Run {{ 0 if not current else current.runs }}
while: "{{ True if not current.data or current.runs < 2 else False }}"
show_context_while:
- show_vars: True
- ref: "current.data"
Notice the if not current
in the output jinja.
This is there because the first time the action is run, current
is an empty dict.
This means that current.runs
is None
so nothing would be rendered
(i.e. “{{ current.runs }}” would just be an empty string).
This is important to remember:
current
has no values until after an action runs.
Below is the equivalent example using a loop:
steps:
- name: loop_show_context_runs
params:
output: |
Run {{ item }}
loop: [0, 1]
show_context_runs:
- ref: "current.data"
show_context_runs output to file
The show_context_filename
option is available for show_context_runs
.
This overrides the default output to stdout and dumps the YAML to the supplied file path.