rvrsql
Access data of sql style databases
To facilitate communication with databases rvrsql
which wraps sqlalchemy
is introduced in rvrdata
as an adapter.
Examples of databases supported include: MySQL
, PostgreSQL
, SQLite
.
How it works
execute
is the only action present in rvrsql
adapter which handles database connection and execution of sql statements on the
connected database.
Example dataflow to connect to postgresql database:
name: Postgresdataflow
notes: Connect to postgres database with pg8000 driver
variables:
database_url: "postgresql+pg8000://user:password@host:port/database_name"
sql_query: "raw sql query goes here"
steps:
- name: Postgresconnect
action: rvrsql.execute
params:
database_url: {{ database_url }}
sql_statement: {{ sql_query }}
Execute Parameter Options
database_url
and sql_statement
are the mandatory parameter options which needs to be provided to execute action.
database_url parameter option
database_url:
details regarding database connection goes here.
Typical form of a database URL is:
Breakdown of database_url syntax
Dialect names include identifying name of sql style database mysql
,postgresql
,oracle
,sqlite
.
Driver is the DBAPI used to connect to database. For instance, pg8000
for postgresql and pymysql
for mysql database.
It will use username
and password
as the credentials to interact with given database
serving on host
and port
Take a look at the sqlalchemy documentation to learn more about the options available for dialects and drivers.
Note: If working with drivers other than pg8000
and pymysql
, make sure to install them in your work environment as they are not included in the package.
sql_statement parameter in execute
Raw sql statements which needs to be executed on the database has to be provided in this parameter, this field cannot be None.
Output of the execute step
Output of the step is a sqlalchemy CursorResult object, see CursorResult for methods supported and ways to extract data.
Notes about CursorResult object
As mentioned above, output of execute step is a CursorResult object and whether the object has any data or not depends on the statement that is executed in the step.For instance, INSERT
, UPDATE
and DELETE
statements does not return any rows but gives us information such as rowcount
which states the number of rows impacted by the statement. See below example dataflow:
name: InsertRecord
notes: Insert record in given table
variables:
database_url: "postgresql+pg8000://testuser:password@localhost:5432/test_db"
insert_statement: "INSERT INTO users VALUES ('first', 'last', 'Male', 'some@email.com')"
steps:
- name: ConnectandInsertRecord
action: rvrsql.execute
params:
database_url: {{ database_url }}
sql_statement: {{ insert_statement }}
cursor_method: rowcount
- name: GetInsertedrowcount
notes: console logs number of rows inserted as part of previous step sql execution
action: stdout.output
params: {{ steps.ConnectandInsertRecord.data }}
Where as SELECT
statement returns data, example dataflow:
name: Select
notes: Simple select of records from a table
variables:
database_url: postgresql+pg8000://testuser:password@localhost:5432/test_db
select_statement: "SELECT * FROM users"
steps:
- name: ConnectandSelectEntries
action: rvrsql.execute
params:
database_url: {{ database_url }}
sql_statement: {{ select_statement }}
cursor_method: all
- name: GetAllRowsSelected
action: stdout.output
params: {{ steps.ConnectandSelectEntries.data }}
NOTE
Data contained in Cursorresult
object after execution of sql statement depends on the query executed and also on the dialect used. For instance pg8000
dialect when used for insert
with list of rows
passed as second argument to the execute
will provide a list with primary key values that were inserted, but this might not be applicable to other backends/dialects.
Passwords and Environment Variables
It is recommended, database passwords should be set as environment variables and these variables has to be retrieved in dataflow