Skip to content

rvrsql

Access data of sql style databases

To facilitate communication with databases rvrsql which wraps sqlalchemy is introduced in rvrdata as an adapter. Examples of databases supported include: MySQL, PostgreSQL, SQLite.

How it works

execute is the only action present in rvrsql adapter which handles database connection and execution of sql statements on the connected database.

Example dataflow to connect to postgresql database:

name: Postgresdataflow
notes: Connect to postgres database with pg8000 driver
variables:
  database_url: "postgresql+pg8000://user:password@host:port/database_name"
  sql_query: "raw sql query goes here"
steps:
  - name: Postgresconnect
    action: rvrsql.execute
    params:
      database_url: {{ database_url }}
      sql_statement: {{ sql_query }}

Execute Parameter Options

database_url and sql_statement are the mandatory parameter options which needs to be provided to execute action.

database_url parameter option

database_url: details regarding database connection goes here. Typical form of a database URL is:

dialect+driver://username:password@host:port/database

Breakdown of database_url syntax

Dialect names include identifying name of sql style database mysql,postgresql,oracle,sqlite. Driver is the DBAPI used to connect to database. For instance, pg8000 for postgresql and pymysql for mysql database. It will use username and password as the credentials to interact with given database serving on host and port

Take a look at the sqlalchemy documentation to learn more about the options available for dialects and drivers.

Note: If working with drivers other than pg8000 and pymysql, make sure to install them in your work environment as they are not included in the package.

sql_statement parameter in execute

Raw sql statements which needs to be executed on the database has to be provided in this parameter, this field cannot be None.

Output of the execute step

Output of the step is a sqlalchemy CursorResult object, see CursorResult for methods supported and ways to extract data.

Notes about CursorResult object

As mentioned above, output of execute step is a CursorResult object and whether the object has any data or not depends on the statement that is executed in the step.For instance, INSERT, UPDATE and DELETE statements does not return any rows but gives us information such as rowcount which states the number of rows impacted by the statement. See below example dataflow:

name: InsertRecord
notes: Insert record in given table
variables:
  database_url: "postgresql+pg8000://testuser:password@localhost:5432/test_db"
  insert_statement: "INSERT INTO users VALUES ('first', 'last', 'Male', 'some@email.com')"
steps:
  - name: ConnectandInsertRecord
    action: rvrsql.execute
    params:
      database_url: {{ database_url }}
      sql_statement: {{ insert_statement }}
      cursor_method: rowcount
  - name: GetInsertedrowcount
    notes: console logs number of rows inserted as part of previous step sql execution
    action: stdout.output
    params: {{ steps.ConnectandInsertRecord.data }}

Where as SELECT statement returns data, example dataflow:

name: Select
notes: Simple select of records from a table
variables:
  database_url: postgresql+pg8000://testuser:password@localhost:5432/test_db
  select_statement: "SELECT * FROM users"
steps:
  - name: ConnectandSelectEntries
    action: rvrsql.execute
    params:
      database_url: {{ database_url }}
      sql_statement: {{ select_statement }}
      cursor_method: all
  - name: GetAllRowsSelected
    action: stdout.output
    params: {{ steps.ConnectandSelectEntries.data }}

NOTE

Data contained in Cursorresult object after execution of sql statement depends on the query executed and also on the dialect used. For instance pg8000 dialect when used for insert with list of rows passed as second argument to the execute will provide a list with primary key values that were inserted, but this might not be applicable to other backends/dialects.

Passwords and Environment Variables

It is recommended, database passwords should be set as environment variables and these variables has to be retrieved in dataflow