Relational database introduction with Python and SQLite
I mentioned in a previous post that I have an Excel spreadsheet containing a bunch of information about the development C code I'm working on. It is a large table showing which variables are input and output from which functions. The variable names are in the first column and the function names are in the last several columns of the first row. I use "i" or "o" to denote if a variable is an input to or an output from a particular function. I also have a few columns for the variable type and description. A simplified example is shown in the table below.
ORIGINAL TABLEname | module | type | desc | ExtSource | DoThis | CalcThis | CalcThat | ModifyStuff | ExtSink |
foo | ModuleExt | double | Description of foo | o | i | i | |||
bar | ModuleExt | double | Description of bar | o | i | i | |||
knark | Module1 | int | Description of knark | io | i | ||||
wert | Module1 | double | Description of wert | o | i | i | |||
jib | Module1 | double | Description of jib | o | i | ||||
laz | Module2 | double | Description of laz | o | o | i | |||
kew | Module2 | double | Description of kew | o | o | i |
As shown above, the table also includes the variable descriptions, type, and scope. In my real, unsimplified Excel table, I also include function-related information such as the function description, and the function prototype. This additional information makes the table very messy. Try to imagine where you would put the function description and function prototype information in the table above. I came up with a kludge solution, but it involved repeated information, and rows and columns with inconsistent meaning. I was realizing the limitations of a 2-dimensional table.
I had been interested in learning about relational databases, so this seemed like a good opportunity. Relational databases imply the SQL query language-- relational databases that use SQL are about the only kind around. They include: Oracle, Microsoft SQL Server, PostgreSQL, MySQL, and SQLite among others. I chose SQLite for my task because it is lightweight (no separate server), free, popular, and fast for small databases. (See http://www.sqlite.org/different.html for other distinctive features.) Also Python 2.5 now includes an interface module, sqlite3
, as part of the standard distribution. See http://docs.python.org/lib/module-sqlite3.html for the documentation.
Relational model design
I could create a relational database using my original table in the Excel spreadsheet. However, this organization is not in the spirit of relational database design. The [relational] approach is to create many smaller tables and relationships between those tables. For my example, I created a variable table, a function table, and a variable_function table to indicate the relationship between the variable and function tables. Though it looks more complicated than the original single table form, inputting this into a relational database, such as SQLite, will allow for more advanced manipulation of the data. It is also much more [conducive] to maintenance and expansion. For example, now there is a natural place to put the additional function-related information which was so clumsy in the original single table. To do this, I add desc and prototype columns to the function table. If I needed to add more information to about the different modules, I could create a separate module table and include the module_id as a column in the variable table.
VARIABLE TABLEid | name | module | type | desc |
1 | foo | ModuleExt | double | Description of foo |
2 | bar | ModuleExt | double | Description of bar |
3 | knark | Module1 | int | Description of knark |
4 | wert | Module1 | double | Description of wert |
5 | jib | Module1 | double | Description of jib |
6 | laz | Module2 | double | Description of laz |
7 | kew | Module2 | double | Description of kew |
FUNCTION TABLE
id | name |
1 | ExtSource |
2 | DoThis |
3 | CalcThis |
4 | CalcThat |
5 | ModifyStuff |
6 | ExtSink |
VARIABLE_FUNCTION TABLE
variable_id | function_id | type |
1 | 1 | output |
1 | 2 | input |
1 | 4 | input |
2 | 1 | output |
2 | 2 | input |
2 | 3 | input |
3 | 2 | input-output |
3 | 3 | input |
4 | 2 | output |
4 | 4 | input |
4 | 5 | input |
5 | 3 | output |
5 | 5 | input |
6 | 3 | output |
6 | 5 | output |
6 | 6 | input |
7 | 4 | output |
7 | 5 | output |
7 | 6 | input |
SQLite Implementation (w/ Python)
To implement this database, all I needed was Python 2.5. If you have an older version of Python, you can install the pysqlite module. Here is the Python code to implement the database above.
import sqlite3
# data
VARIABLES = (
(1, 'foo', 'ModuleExt', 'double', 'Description of foo'),
(2, 'bar', 'ModuleExt', 'double', 'Description of bar'),
(3, 'knark', 'Module1', 'int', 'Description of knark'),
(4, 'wert', 'Module1', 'double', 'Description of wert'),
(5, 'jib', 'Module1', 'double', 'Description of jib'),
(6, 'laz', 'Module2', 'double', 'Description of laz'),
(7, 'kew', 'Module2', 'double', 'Description of kew'),
)
FUNCTIONS = (
(1, 'ExtSource'),
(2, 'DoThis'),
(3, 'CalcThis'),
(4, 'CalcThat'),
(5, 'ModifyStuff'),
(6, 'ExtSink'),
)
VAR_FUNC = (
(1, 1, 'output'),
(1, 2, 'input'),
(1, 4, 'input'),
(2, 1, 'output'),
(2, 2, 'input'),
(2, 3, 'input'),
(3, 2, 'input-output'),
(3, 3, 'input'),
(4, 2, 'output'),
(4, 4, 'input'),
(4, 5, 'input'),
(5, 3, 'output'),
(5, 5, 'input'),
(6, 3, 'output'),
(6, 5, 'output'),
(6, 6, 'input'),
(7, 4, 'output'),
(7, 5, 'output'),
(7, 6, 'input'),
)
# get connection and cursor objects
conn = sqlite3.connect('iodatabase')
c = conn.cursor()
# create tables
c.execute('''create table variable (
id integer,
name text,
module text,
type text,
desc text
)''')
c.execute('''create table function (
id integer,
name text
)''')
c.execute('''create table var_func (
variable_id integer,
function_id integer,
type text
)''')
# fill tables with data
for row in VARIABLES:
c.execute('insert into variable values (?,?,?,?,?)', row)
for row in FUNCTIONS:
c.execute('insert into function values (?,?)', row)
for row in VAR_FUNC:
c.execute('insert into var_func values (?,?,?)', row)
Querying the Database
Now that I have created the database, I can get answers to interesting questions, such as What variables are output by CalcThis? Here is the Python/SQLite code to answer this question:
c.execute(''.join([
'SELECT variable.name, variable.module, variable.type, variable.desc ',
'FROM variable, var_func, function ',
'WHERE variable.id=var_func.variable_id ',
'AND function.id=var_func.function_id ',
'AND function.name="CalcThis" ',
'AND var_func.type="output" ',
]))
FORMAT = '%-6s%-10s%-8s%-20s'
print FORMAT % ('name', 'module', 'type', 'desc')
print '-' * 44
for row in c:
print FORMAT % row
name module type desc ---------------------------------------------- jib Module1 double Description of jib laz Module2 double Description of lazI can see this is consistent with my original table. The query works. For such a small example, the original method may seem easier, but as the number of entries grows, the benefit of the relational database grows as well. Here is another example which asks Which functions use the variable wert as an input?
c.execute(''.join([
'SELECT function.name ',
'FROM variable, var_func, function ',
'WHERE variable.id=var_func.variable_id ',
'AND function.id=var_func.function_id ',
'AND variable.name="wert" ',
'AND var_func.type="input" ',
]))
print 'name'
print '------------'
for row in c:
print '%s' % row
name --------- CalcThat ModifyStuffI would like to do even more complicated things like determine the prerequisite input variables across all functions for a given output variable. However, I still need to read more about that, so that will have to wait.
Comments
Spreadsheet? python?
You should take a look at our product:
www.resolversystems.com
jean viry-babel
jean viry-babel:
looks like a cool product-- wish it was my own.
I have a question it might be a lil bit out of topic, but do you have such an example for "playing" with an oracle db?
Thanks, Florin.