Sunday, December 6, 2009

SQL Portablility

As the blog indicates, not all topics will be MDM-centric. I will occasionally nerd-out on the most mundane topics as hands-on technical topics as SQL development (one day I'll profess my newfound love for SQL XML functions).  Which leads me to the current subject on hand - SQL Portability.  First we'll discuss the what, the why, and the how.

First what is SQL portability?  It's basically the ability to write a query that will run independently of the underlying RDBMS, provided the same underlying data structures and data exist in the database.  For instance, let's use a simple example.  Say we have two tables in an Oracle 11G database defined as:

DROP TABLE TBL_CHILD CASCADE CONSTRAINTS;
DROP TABLE TBL_PARENT CASCADE CONSTRAINTS;
CREATE TABLE TBL_PARENT
  (
    PK    NUMBER(15,0) NOT NULL,
    DESC1 VARCHAR2(30),
    DESC2 VARCHAR2(30),
    CONSTRAINT TBL_PARENT_PK PRIMARY KEY (PK)
  ) ;
INSERT INTO TBL_PARENT
  (PK,DESC1,DESC2
  ) VALUES
  (1,'FOO',NULL
  );
INSERT INTO TBL_PARENT
  (PK,DESC1,DESC2
  ) VALUES
  (2,NULL,'BAR'
  );
CREATE TABLE ESPS_S2X.TBL_CHILD
  (
    PK    NUMBER NOT NULL ENABLE,
    FK    NUMBER NOT NULL ENABLE,
    DESC1 VARCHAR2(30 BYTE),
    DESC2 VARCHAR2(30 BYTE),
    CONSTRAINT TBL_CHILD_PK PRIMARY KEY (PK),
    CONSTRAINT TBL_CHILD_TBL_PARENT_FK1 FOREIGN KEY (FK) REFERENCES ESPS_S2X.TBL_PARENT (PK) ON
  DELETE SET NULL
  ) ;
INSERT INTO TBL_CHILD
  (PK,FK,DESC1,DESC2
  ) VALUES
  (1,1,NULL,'BAR'
  );
INSERT INTO TBL_CHILD
  (PK,FK,DESC1,DESC2
  ) VALUES
  (2,2,'FOO',NULL
  );
Let's say we run the following two queries, each do the same thing:

SELECT NVL(DESC1, DESC2)  FROM tbl_parent;

SELECT COALESCE(DESC1, DESC2)  FROM tbl_parent;

Similarly,  the following queries that join the two tables produce the same result.

SELECT tbl_parent.pk,
  tbl_parent.DESC1,
  tbl_child.DESC2
FROM tbl_parent,
  tbl_child
WHERE tbl_parent.pk = tbl_child.fk;

SELECT tbl_parent.pk,
  tbl_parent.DESC1,
  tbl_child.DESC2
FROM tbl_parent
INNER JOIN tbl_child
ON tbl_parent.pk = tbl_child.fk;

However, the queries in blue will port to a DB2 database whereas the other queries would have to be rewritten to something that resembles the queries in blue in order to execute.

Secondly why would we establish SQL portability as a foundational non-functional requirement for any query that executes on a routine basis?  As long as the right result is achieved and the target response time is delivered, why is SQL portability a good thing?  I can think of 3 reasons:
  1. While rare, in some cases the RDBMS platform is changed for an application.  I worked for a company that developed an entire application on Oracle 9i only to have a strategic shift to IBM HW and SW.  This change (while necessary) invalidated the persistence layer for the entire application since the join paths were created using the Oracle-specific join syntax.  Not to mention all the canned reporting and QA queries that had to be rewritten using Ansi-approved SQL.  From that day forward, I made it a point to use ansi-approved syntax in case such a change were to happen.
  2. With OSS and SaaS, companies are looking to minimize the cost of software licenses.  Taking TCO out of the equation, the raw cost of licensing Oracle dwarfs the cost of licensing MS SQL Server or DB2.  Thus IT organizations, particularly those supporting large enterprises, will sometimes look to SQL portability as a potential exit strategy in case a software vendor like Oracle will not provide more beneficial licensing terms.  If the threat of running an application with minor refactoring at the query level is attainable on a competitior's RDBMS, this should grab the vendor's attention.
  3. As a developer, the less you depend on vendor-specific syntax, the more you train yourself for alternative RDBMS platforms.  Thus at a micro-level, you are prepared to develop queries on multiple platforms without actually having developed on the alternative platforms. 
Finally, the following guidelines can help ensure that SQL is portable from one RDBMS to another.  Using Oracle as the primary platform and DB2 as the alternative, the following standards can be enforced (manually through code reviews, perhaps automatically through code inspection tools).

  • Follow ansi-standards such as SQL-92, or ISO/IEC 9075-14:2008, the standard for defining standards for SQL/XML
  • Always use standard join syntax, and avoid vendor specific syntax such as Oracle's (+).  This would be impossible if you are supporting a version of Oracle the predates Oracle 9i.
  • Adopt standard functions.  For instance, coalesce perfoms the same function as Oracle's nvl and Oracle supports both.  However, DB2 recognizes only coalesce.  As a result, the standard should be to use coalesce.  Another example is to use CASE statements instead of using the DECODE function.
  • Perform sanity checks to ensure your elaborate or complex SQL will run on multiple platforms.  At first this may seem infrastructure-intensive and drain a DBA's valuable time.  However, with the advent of express editions of most major RDBMS, we can easily install, configure, and deploy the tables and data from one local RDBMS to another.  By running the SQL to understand what if any impact may occur in the event the RDMS sands shift beneath your feet, you'll have an understanding for the impact (if any) of such a change to your code and can communicate this to those making the decision.

Like most guidelines, there are exceptions.  SQL functions tend to be vendor-specific.  For instance Date functions and analytic functions are difficult to port.  Oracle hints are well, Oracle hints.  More advanced features like using Oracle's Total Recall "AS OF" clauses to query point in time records use vendor specific SQL clauses.    Thus there are limits to achieving 100% SQL portability.  However, when it comes to logic and overall query paths, following the simple guidelines above can provide the long-term flexibility of allowing your application to run independent of the RDBMS.