Monday, June 11, 2007

SOA: Transactional Boundaries, The Paper / Computer Document Impedance Mismatch

Here I am... I again, facing the exact same problem... this is becoming repetitive... I have to build an OLTP system... what is an OLTP system?

Online Transaction Processing (or OLTP) is a class of programs that facilitate and manage transaction-oriented applications, typically for data entry and retrieval transaction processing.

And right there, in the wikipedia article about OLTP, it is possible to read about the Paper Computer Impedance Mismatch (I here I was feeling like I discovered something) "The term Online Transaction Processing is somewhat ambiguous: some understand "transaction" as a reference to computer or database transactions, while others (such as the Transaction Processing Performance Council) define it in terms of business or commercial transactions."

Well, IMO, that is just a part (a very important part) of the P/C impedance mismatch... when we talk about a transaction oriented computer system... what are we talking about? about computer SQL X/Open DTP transactions? or about business transactions...? Some people believe it is the same thing... Some other people have never realized the difference between paper and computer documents... and ask... why was I able to do X in paper but now it is invalid in the computer? To explain it.... lets go back to the origins...before computer documents.... lets say you have to register a fine against someone (someone committed a mistake against the law, and now they have to pay some money as punishment). So, you start writing down the document "a fine of 8 gold coins for... " suddenly, you feel and urge to go to the latrine.... you stand up and run.... you have left the paper document incomplete... it doesn't say what was the mistake that originated it... it doesn't say who has to pay the 8 gold coins, it doesn't say by which authority is that person obliged to pay... but there is nothing to worry about... is already in ink over the paper... it doesn't matter if you have a real bad digestion problem and you can't continue for 2 months... when you return to the paper sheet, it will still say "a fine of 8 gold coins for...". that is persistence... exactly the same persistence used if you were saving the fine in modern AJAX based system... but if you were using that modern system, when you returned after 2 months, you would find that "your session has timed out" and you have lost the amount you written in the "amount" field (or maybe someone else had to use the computer, and they closed your account, or had to use the plug... and unplugged your computer...). So, with paper, you have to "actively" want to revert persistence (by destroying the piece of paper) but, with electronic persistence, loosing information is a lot easier, just close the current window without hitting save... and it is lost... and it can get lost even "after" hitting save... because... now... we have created a new enemy for persistence... we have "data integrity"... we have more problems than when we only had to fight for paper to write stuff down...

Why do I see integrity as an enemy of persistence? it is pretty simple, really... with the new computer based OLTP systems, it is impossible to save "data without integrity" (corrupted data?), so, unless you know all the facts precisely, you can't write something down... (it doesn't matter that you know the fine is about selling a controlled substance, that it was issued by the DEA, and signed by "John Smith", if you don't know who has to pay it (Jane Doe for example) every time you hit the "save" button, the system is going to answer, in a very nice and polite way (if well designed) "I can not save that fine, because the "First Name" and "Last Name" of the person that has to pay for it are mandatory fields". Well, you say... just let me register the information I have "now", and after I get the missing data, I will return and add it, but the system wont hear your begging... and it will not save your data until it has "integrity"... but before you had to use this new software system you were able to write down this information in a piece of paper... it didn't matter if you had the full name of the person or you didn't, those people on the IT department I going to hear about you,  and they will have to change that stubborn attitude...

Well, it turns out, sometimes integrity is needed... what can you do with a fine without a name? suppose you have to manage another 100,000 similar documents... and that you don't know the name of the person that has to pay for half of them... and on some other thousands you don't know how much you have to pay... or why... and on others you don't even know any of this stuff (you just know someone has to pay something because of some unknown law, and that by adding all the fines in February, you will earn 50,000 dollars, because federal government told you that). Now you are in trouble, you have to start defining how much is the minimal information to describe a fine... you have to draw a line between "useful information with integrity" and "vague corrupted stuff" or you will start loosing track of what is happening with every document (what makes a document a document, how much can you alter it before it becomes another document).

Suddenly you realize you can ask the computer to classify the document, and create two lists (that is one of the things computer do really well with structured information), so you ask the computer to create a list with "complete integral documents" and another with "incomplete documents", and you say "problem solved, I have 100,000 documents, and of those, 45,000 have integrity (full documents), all others are work in progress" but, after few days... you realize now you only have 44,985 full documents... someone has been erasing the data in the documents because he got a bribe... and, unless you have a backup from previous week, you cant know which documents were corrupted... so, now it turns out that an already "integral" document, can go back and become "corrupted" real easy... and you cant even know that in those cases where the information was incomplete from the beginning, the source of the problem was that the original information was incomplete... or just that the person that has to write it on to the system is doing in an incomplete form... intentionally...