Saturday, July 14, 2007

Unit testing Relational Queries (SQL or HQL or JQL or LINQ...)

Hi!
Most applications I have built have something to do with a database (I remember that while I was on college I used to think that was not exiting stuff, I used to dream about doing neural networks stuff, logic programming in Prolog, etc) but then I met WebObjects and its Object Relational Mapper (Enterprise Objects Framework) and I got really excited about object oriented data programming... but I always had a problem... to test my object oriented queries I had to "manually" translate them into SQL and test them against the database, and only after they give me what I thought were correct result I would write them using EOF Qualifiers...
Then I met Hibernate's HQL and I realized it is much more powerful than EOF Qualifiers, but I still had to translate it to SQL to test it, I know I can get the SQL that is generated from the HQL from the debug console, and paste it in my favorite SQL editor... but even then if I found a mistake, a lot of times it was easier to tweak it in SQL and the manually translate it HQL.
Currently, there are some extensions for Eclipse (Hibernate Tools) that make this more "direct" but, what if I don't like (or don't want, or can't) use Eclipse... it would be great if someone could, for example, make a plugin for SquirrelSQL, but until then... what options do I have?

Then I learned about unit testing... and the answer came to my mind immediately: I just had to write a unit test for each of my queries. That worked fine... in the beginning... until I started having queries that returned thousands (or millions) of objects, and it wasn't such a good idea to output them to the debug console... and I had another problem... how should I write the "asserts" of query?... and how can I do it so that it doesn't make my test so slow that it becomes unusable? (I can, of course, check the results just by viewing them, but my brain is not that good to say if those 10,000 row really match with the idea I had when I wrote that HQL)

So, I started to look "what do I do" to check if an SQL query is correct, lets say for example, that I write this:

select count(*) from Address,Employee where Address.Id= Employee.AddressId and Employee.Id = 3

(Translated to English: How many address the Employee with Id = 3 has?)

Now... how do I test that? well I could add an assert after getting the result in java (or c#) like this:

assert(count>0)

But the, what happens if someone deletes the row with Id = 3 from the table Employee? That means my test will fail... or what what if someone deletes all the addresses from employee? and what if I want to test that if there are no addresses for an employee, the answer should be zero...

That is a lot of work just to test if that simple query is right... and I think that work could be done automatically:

Take a look at the SQL, it could be decomposed into:

select count(*) from (select * from Address,Employee where Address.Id= Employee.AddressId and Employee.Id = 3) as EmployeeAddresses

And then we could say, lets automatically check for the case when the resulting set is empty, and for the case when the result is not empty, to check for a case when the result is empty, we need and Employee with Addresses, so we generate:

Select * from Employee where exists(select * from Address where Address.EmployeeId = Employee.Id)

And take the Id first employee we get... and that should give us a non empty set if used in the original sql sentence that we are trying to test... after that, we automatically generate:

Select * from Employee where not exists(select * from Address where Address.EmployeeId = Employee.Id)

And take the Id first employee we get... and that should give us an empty set if used in the original sql sentence that we are trying to test...

I call this queries "inverses" of the original one, it like when one is testing a multiplication, to see if 2 x 3 = 6, just do: 6/3 = 2 and 6/2 = 3, if 2, and 3 match the operands of the multiplication, you multiplication is right. The same thing goes for SQL, one just has to find the way to "invert" it, if I could automate this inversion, the automatically generated queries would help me by telling me things that might not be immediately obvious to me when I look at the original query, and that would help me check if my original query is right.... it would me some kind of "invariants" that would help me to better understand my querying... or maybe I could even write the invariants first, and then create a query and see if it matches my invariants...

Mmmm.... maybe using a select there is another way to "invert" a query to test if it is right, using the actual inverse operation of selecting... that is "inserting", could I derive from:

select count(*) from Address,Employee where Address.Id= Employee.AddressId and Employee.Id = 3

Something like (In pseudocode):

Insert Employee;
Store Employee.Id
Run select count(*) from Address,Employee where Address.Id= Employee.AddressId and Employee.Id = EmployeeId
Assert("The answer should be zero")
Insert Address related to Employee
Run select count(*) from Address,Employee where Address.Id= Employee.AddressId and Employee.Id = EmployeeId
Assert("The answer should be one")

This has the advantage that I don't need a database with data already on it, but it has the disadvantage that takes lot of time to write an unit test like this in java, because to insert an employee, it might be necessary to:


  • Avoid breaking validation rules no related to this particular test, for example, an Employee must be related to a Department, but if the Department table is empty, then I should create a Department or I will not be able to insert an employee.
  • Avoid conflicts with validation rules directly related to this particular test, for example, what if I have an Hibernate interceptor that won't let me insert an address without 1 or more Addresses
The main problem here, I believe, is that f I insert a row leaving a not null column empty most databases won't wait until I try to commit the transaction to say "integrity violation" and rollback my changes... therefore it is impossible to write partial data just for the test that I have in front of me, but... could I automatically generate consistent inserts using as a source just the integrity rules at the database level... and the select that I want to test?
I think it can be done... the question is..
What is the algorithm to generate the inserts needed to satisfy an SQL select statement?

No comments: