Julia Programming - Databases

Following are the four mechanisms for interfacing with a particular database system −

First method of accessing a database is by using the set of routines in an API (Application Program Interface). In this method, the DBMS will be bundled as a set of query and maintenance utilities. These utilities will communicate with the running database through a shared library which further will be exposed to the user as a set of routines in an API.
Second method is via an intermediate abstract layer. This abstract layer will communicate with the database API via a driver. Some example of such drivers are ODBC, JDBC, and Database Interface (DBI).
Third approach is to use Python module for a specific database system. PyCall package will be used to call routines in the Python module. It will also handle the interchange of datatypes between Python and Julia.
The fourth method is sending messages to the database. RESTful is the most common messaging protocol.

Julia Database APIs

Julia provides several APIs to communicate with various database providers.

MySQL

MySQL.jl is the package to access MySQL from Julia programming language.

Use the following code to install the master version of MySQL API −

Pkg.clone("https://github.com/JuliaComputing/MySQL.jl")

Example

To access MySQL API, we need to first connect to the MySQL server which can be done with the help of following code −`

using MySQL
con = mysql_connect(HOST, USER, PASSWD, DBNAME)

To work with database, use the following code snippet to create a table −

command = """CREATE TABLE Employee
         (
            ID INT NOT NULL AUTO_INCREMENT,
            Name VARCHAR(255),
            Salary FLOAT,
            JoinDate DATE,
            LastLogin DATETIME,
            LunchTime TIME,
            PRIMARY KEY (ID)
         );"""
response = mysql_query(con, command)
if (response == 0)
   println("Create table succeeded.")
else
   println("Create table failed.")
end

We can use the following command to obtain the SELECT query result as dataframe −

command = """SELECT * FROM Employee;"""
dframe = execute_query(con, command)

We can use the following command to obtain the SELECT query result as Julia Array −

command = """SELECT * FROM Employee;"""
retarr = mysql_execute_query(con, command, opformat=MYSQL_ARRAY)

We can use the following command to obtain the SELECT query result as Julia Array with each row as a tuple −

command = """SELECT * FROM Employee;"""
retarr = mysql_execute_query(con, command, opformat=MYSQL_TUPLES)

We can execute a multi query as follows −

command = """INSERT INTO Employee (Name) VALUES ('');
UPDATE Employee SET LunchTime = '15:00:00' WHERE LENGTH(Name) > 5;"""
data = mysql_execute_query(con, command)

We can get dataframes by using prepared statements as follows −

command = """SELECT * FROM Employee;"""

stmt = mysql_stmt_init(con)

if (stmt == C_NULL)
   error("Error in initialization of statement.")
end

response = mysql_stmt_prepare(stmt, command)
mysql_display_error(con, response != 0,
                  "Error occured while preparing statement for query \"$command\"")
                  
dframe = mysql_stmt_result_to_dataframe(stmt)
mysql_stmt_close(stmt)

Use the following command to close the connection −

mysql_disconnect(con)

JDBC

JDBC.jl is Julia interface to Java database drivers. The package JDBC.jl enables us the use of Java JDBC drivers to access databases from within Julia programming language.

To start working with it, we need to first add the database driver jar file to the classpath and then initialize the JVM as follows −

using JDBC
JavaCall.addClassPath("path of .jar file") # add the path of your .jar file
JDBC.init()

Example

The JDBC API in Julia is similar to Java JDBC driver. To connect with a database, we need to follow similar steps as shown below −

conn = DriverManager.getConnection("jdbc:gl:test/juliatest")
stmt = createStatement(conn)
rs = executeQuery(stmt, "select * from mytable")
   for r in rs
      println(getInt(r, 1), getString(r,"NAME"))
end

If you want to get each row as a Julia tuple, use JDBCRowIterator to iterate over the result set. Note that if the values are declared to be nullable in the database, they will be of nullable in tuples also.

for r in JDBCRowIterator(rs)
   println(r)
end

Updating the table

Use PrepareStatement to do insert and update. It has setter functions defined for different types corresponding to the getter functions −

ppstmt = prepareStatement(conn, "insert into mytable values (?, ?)")
setInt(ppstmt, 1,10)
setString(ppstmt, 2,"TEN")
executeUpdate(ppstmt)

Running stored procedures

Use CallableStatement to run the stored procedure −

cstmt = JDBC.prepareCall(conn, "CALL SYSCS_UTIL.SYSCS_SET_DATABASE_PROPERTY(?, ?)")
setString(cstmt, 1, "gl.locks.deadlockTimeout")
setString(cstmt, 2, "10")
execute(cstmt)

Metadata

In order to get an array of (column_name, column_type) tuples, we need to Pass the JResultSet object from executeQuery to getTableMetaData as follows −

conn = DriverManager.getConnection("jdbc:gl:test/juliatest")
stmt = createStatement(conn)
rs = executeQuery(stmt, "select * from mytable")
metadata = getTableMetaData(rs)

Use the following command to close the connection −

close(conn)

Executing a query

For executing a query, we need a cursor first. Once obtained a cursor you can run execute! command on the cursor as follows −

csr = cursor(conn)
execute!(csr, "insert into ptable (pvalue) values (3.14);")
execute!(csr, "select * from gltable;")

Iterating over the rows

We need to call rows on the cursor to iterate over the rows −

rs = rows(csr)
for row in rs

end

Use the following command to close the cursor call −

close(csr)

ODBC

ODBC.jl is a package which provides us a Julia ODBC API interface. It is implemented by various ODBC driver managers. We can install it as follows −

julia> Pkg.add(“ODBC”)

Installing ODBC Driver

Use the command below to install an ODBC driver −

ODBC.adddriver("name of driver", "full, absolute path to driver shared library"; kw...)

We need to pass −

The name of the driver
The full and absolute path to the driver shared library
And any additional keyword arguments which will be included as KEY=VALUE pairs in the .ini config files.

Enabling Connections

After installing the drivers, we can do the following for enabling connections −

Setup a DSN, via ODBC.adddsn("dsn name", "driver name"; kw...)
Connecting directly by using a full connection string like ODBC.Connection(connection_string)

Executing Queries

Following are two paths to execute queries −

DBInterface.execute(conn, sql, params) − It will directly execute a SQL query and after that will return a Cursor for any resultset.
stmt = DBInterface.prepare(conn, sql); DBInterface.execute(stmt, params) − It will first prepare a SQL statement and then execute. The execution can be done perhaps multiple times with different parameters.

SQLite

SQLlite is a fast, flexible delimited file reader and writer for Julia programming language. This package is registered in METADATA.jl hence can be installed by using the following command −

julia> Pkg.add("SQLite")

We will discuss two important and useful functions used in SQLite along with the example −

SQLite.DB(file::AbstractString) − This function requires the file string argument as the name of a pre-defined SQLite database to be opened. If the file does not exit, it will create a database.

Example

julia> using SQLite

julia> db = SQLite.DB("Chinook_Sqlite.sqlite")

Here we are using a sample database ‘Chinook’ available for SQLite, SQL Server, MySQL, etc.

SQLite.query(db::SQLite.DB, sql::String, values=[]) − This function returns the result, if any, after executing the prepared sql statement in the context of db.

Example

julia> SQLite.query(db, "SELECT * FROM Genre WHERE regexp('e[trs]', Name)")
6x2 ResultSet
| Row | "GenreId" |     "Name"           |
|-----|-----------|----------------------|
|   1 |     3     |     "Metal"          |
|   2 |     4     | "Alternative & Punk" |
|   3 |     6     |     "Blues"          |
|   4 |     13    |  "Heavy Metal"       |
|   5 |     23    |  "Alternative"       |
|   6 |     25    |     "Opera"          |

PostgreSQL

PostgreSQL.jl is the PostgreSQL DBI driver. It is an interface to PostgreSQL from Julia programming language. It obeys the DBI.jl protocol for working and uses the C PostgreeSQL API (libpq).

Let’s understand its usage with the help of following code −

using DBI
using PostgreSQL

conn = connect(Postgres, "localhost", "username", "password", "dbname", 5432)

stmt = prepare(conn, "SELECT 1::bigint, 2.0::double precision, 'foo'::character varying, " *
                     "'foo'::character(10);")
result = execute(stmt)
for row in result

end

finish(stmt)

disconnect(conn)

To use PostgreSQL we need to fulfill the following binary requirements −

DBI.jl
DataFrames.jl >= v0.5.7
DataArrays.jl >= v0.1.2
libpq shared library (comes with a standard PostgreSQL client installation)
julia 0.3 or higher

Hive

Hive.jl is a client for distributed SQL engine. It provides a HiveServer2, for example: Hive, Spark, SQL, Impala.

Connection

To connect to the server, we need to create an instance of the HiveSession as follows −

session = HiveSession()

It can also be connected by specifying the hostname and the port number as follows −

session = HiveSession(“localhost”,10000)

The default implementation as above will authenticates with the same user-id as that of the shell. We can override it as follows −

session = HiveSession("localhost", 10000, HiveAuthSASLPlain("uid", "pwd", "zid"))

Executing the queries

We can execute DML, DDL, SET, etc., statements as we can see in the example below −

crs = execute(session, "select * from mytable where formid < 1001";
            async=true, config=Dict())
while !isready(crs)
   println("waiting...")
   sleep(10)
end
crs = result(crs)

Other Packages

DBAPI is a new database interface proposal, inspired by Python’s DB API 2.0, that defies an abstract interface for database drivers in Julia. This module contains the following −

Abstract types
Abstract required functions which throw a NotImplementedError by default
Abstract optional functions which throw a NotSupportedError by default

To use this API, the database drivers must import this module, subtype its types, and create methods for its functions.

DBPrf is a Julia database which is maintained by JuliaDB. You see its usage below −

The user can provide input in two ways −

Command-Line mode

$ julia DBPerf.jl <Database_Driver_1.jl> <Database_Driver_2.jl> ....... <Database_Driver_N.jl> <DBMS>

Here, Database_Driver.jl can be of the following types −

ODBC.jl
JDBC.jl
PostgreSQL.jl
MySQL.jl
Mongo.jl
SQLite.jl

DBMS filed is applicable only if we are using JDBC.jl.

The database can be either Oracle or MySQL.

Example

DBPerf.jl ODBC.jl JDBC.jl MySql

Executing from Julia Prompt

julia> include("DBPerf.jl")
julia> DBPerf(<Database_Driver_1.jl>, <Database_Driver_2.jl>, ....... <Database_Driver_N.jl>, <DBMS>)

Example

DBPerf(“ODBC.jl”, “JDBC.jl”, “MySql”)