Steps show_context

Steps can show their runtime context using the show_context feature (new in rvrdata-1.1.0).

When building or testing dataflows, it can be helpful to see the variables or data available for a given step. This “runtime context” is from the in-memory namespace identified by the context reference.

The show_context output feature gives a view into the current context.

Using this feature generates a context report that can include the following:

the variables present and their data types
the data generated by the step
the context references available to the step

The report is rendered to stdout (default) or can be saved to a .yml file.

show_context

On any step, show_context can be enabled in the output section similar to the following:

steps:
  - name: show_context_with_data_types
    # action, params and other settings go here.
    output:
      show_context:
        - show_vars: True

This will render the context report out to stdout without affecting the data of the step.

show_context options

The YAML below shows the available options for show_context.

steps:
  - name: show_context_options
    params:
      silent: True  # not required but note that stdout is not required
      output: "sample output"
    output:
      show_context:
        - ref: "steps.show_context_options"  # defaults to current step context ref
          show_ref_warnings: True|False  # defaults to False
          show_vars: True|False  # defaults to False
          show_vars_values: True|False  # defaults to True if show_vars True
          show_data: True|False  # defaults to False
          show_data_count: 1  # defaults to 1 if show_data True
            # Use 0 to show data types only.
            # Use -1 to show all data - use with care for large datasets.
            # For strings, shows the first data_count * 1000 characters.
            # For dict, shows the first characters of the string representation.
          show_step: True|False  # defaults to False
          enabled: True|False  # defaults to True, can be used as a toggle to disable output
      show_context_filename: "folder/context_file.yml"  # output to YAML file instead of stdout

Note that show_context is a list of context references. This allows more than one reference to be included in the context report. It also can be used to show the data for a specific context reference, such as an attribute of data.

Option details

Each of the options are listed below

enabled (bool) defaults to True, implied so not required.
Can be used as a toggle to disable output.
ref (str) defaults to current step context.
Object dot not notation for context reference in namespace.
Can be used to return a specific data value.
show_ref_warnings (str) defaults to False.
Generates a log warning message if the reference is not available.
References not found will be reported as such in the context report but will not affect how the dataflow runs (hence why the WARNINGS are disabled by default).
show_step (str) defaults to False.
Includes the rendered step values in the context report.
Useful to see the jinja rendered at runtime.
show_vars (bool) defaults to False.
Generates list of variables with their data types.
show_vars_values (bool) defaults to True if show_vars True.
Shows the string representation of the variable values.
show_data (bool) defaults to False.
Care should be taken when working with large datasets. See show_data_count below.
show_data_count (int) defaults to 1 if show_data True.
Set to 0 to show data types only.
Set to -1 to show all data (this can overwhelm stdout for large datasets - see working with large datasets).
show_data_key (str) for dict data, performs a get using the key to get records.
Uses show_data_count for records found with the get key.
show_context_filename (str) optional path to YAML file.
If present, context report will be saved to the file instead of rendered to stdout.

Context Report

The summary report generated for the context is in YAML format. It is not included in the data of a step and will not affect the dataflow namespace, just report on it.

By default the report is rendered to stdout. Use the filename option to save the report to a YAML file.

Context Report Format

The results shown in the context report show values from the in-memory namespace at the time the report is generated. This is important to remember because it will be what the step has access to. Subsequent steps in a dataflow that have not run will not be available yet. As a consequence, all results are show by full context reference.

The following sample YAML shows the structure:

contexts:
  "full.context.ref":  # e.g. "pipelines.name.stages..."
    current:  # e.g. default, current refers to current step
      variables:
      available_context_refs:
      data:
      rendered_step:
    "ref":  # each specific ref will appear as a key
      error:  # error returned for bad reference
      type:  # data type
      values:  # Reference value

This report will appear on stdout or be saved to a file if the filename option is used.

Working with Large Datasets

Dataflows are often used to retrieve large sets of data from APIs or from files. This can easily go into the 10,000s, 100,000s or beyond.

As a result, rendering all records to stdout is not recommended. Using show_data_count: -1 should be used with care.

It can overwhelm the buffer size of stdout. This can lead to all sorts of issues since stdout might used in many places, all with different limits including:

locally to a command-line terminal,
within stdout of docker engine,
captured by a parent process (eg. script.run action)

It is suggested that before exporting all data, a smaller data count be used to sample the records (e.g. 1,5 or 10).

If all records are needed, instead of rendering to stdout, it is recommended that the filename option be used when including all data records in a context report using show_data_count: -1. This will save the data to a .yml file instead of stdout. Remember there needs to be enough available disk space for the file.

Checking data values

One useful feature of show_context is the ability to introspect specific data values. The ref option can be used to specify any object property in the namespace using dot notation.

This includes examining the .data of the current step or a different step.

If the context reference string provided does not match anything available to the step, a message noting this will be included in the report. A WARNING log message can be generated is show_ref_warnings is enabled.

Working with complex data

Not all data types can be represented as strings. And some responses have properties that are methods, not attributes. In these cases, the context report will not be able to render them.

Consider the following example:

steps:
  - name: get_cookies
    action: requests.get
    params:
      url: "https://httpbin.org/cookies"
    output:
      show_context:
        - show_data: True
        - ref: steps.get_cookies.data.text
        # Note: - ref: steps.get_cookies.data.json is a method
        - ref: steps.get_cookies.data.cookies
          show_data_count: 0  # disables show value - cookie jar does not render as a string

This example will display three sets of data for the context report:

the current step profile
the text value of the HTTP response
the data type for the response.cookies (to see if cookies are present or not)

The display of the cookies value is set to 0 to disable the output. This is because cookies do not have a string representation so a WARNING would be generated if showing the values was enabled. It would also stop the render from producing nicely formatted YAML.

With the show values disabled (show_data_count: 0), the context report shows “type: RequestsCookieJar” and shows the count.

show_context_runs

The show_context feature generates output after a step is run. This enables it to show data and other results of the step.

For steps that contain a while or a loop, there is an additional feature called show_context_runs. This shows the context at the end of each individual run.

It follows the same syntax as show_context with options provided in a list.

For example, the following dataflow runs the stdout.output action twice:

steps:
  - name: while_show_context_runs
    action: stdout.output
    params:
      output: |
        Run {{ 0 if not current else current.runs }}
    while: "{{ True if not current.data or current.runs < 2 else False }}"
    show_context_while:
      - show_vars: True
      - ref: "current.data"

Notice the if not current in the output jinja. This is there because the first time the action is run, current is an empty dict. This means that current.runs is None so nothing would be rendered (i.e. “{{ current.runs }}” would just be an empty string).

This is important to remember:

current has no values until after an action runs.

Below is the equivalent example using a loop:

steps:
  - name: loop_show_context_runs
    params:
      output: |
        Run {{ item }}
    loop: [0, 1]
    show_context_runs:
      - ref: "current.data"

show_context_runs output to file

The show_context_filename option is available for show_context_runs. This overrides the default output to stdout and dumps the YAML to the supplied file path.