Mix and Match Databases: Dealing with Data Types

SQL Server
3 Comments

Moving between databases is hard enough, try using multiple databases in the same application and you might start thinking you’ve gone insane. Different application demands for accessibility, redundancy, backwards compatibility, or interoperability make this a possibility in the modern data center. One of the biggest challenges of running a heterogeneous database environment is dealing with a world of data type differences. There are two main ways to work through this situation:

  1. Using a subset of data types.
  2. Creating custom data type mappings.

To make comparisons easier, I’m going to focus on SQL Server, PostgreSQL, and Azure Table Services.

Using a Subset of Data Types

The ANSI standard defines a number of data types that should be supported by database vendors but, as with all standards, there’s no guarantee that vendors will support all data types or even support them equally. The SQL Standard defines the following data types: bigint, bit, bit varying, boolean, char, character varying, character, varchar, date, double precision, integer, interval, numeric, decimal, real, smallint, time (with or without time zone), timestamp (with or without time zone), xml (1).

As an example of differences between the ANSI standard and vendor implementations, the ANSI standard defines a TIMESTAMP data type that is implemented as a date and time with an optional time zone whereas SQL Server defined TIMESTAMP as an arbitrary auto-incrementing unique binary number.

Taking a look around it’s easy to see that there are major differences between databases. An easy way to resolve this problem is to use only a small subset of the available data types. This choice seems attractive when we’re working with a language that doesn’t support rich data types. Some languages only have support for a limited number of data types (C provides characters, numeric data types, arrays, and custom structs), while more advanced languages provide rich type systems.

Comparing our database solutions, Azure Table Services Data Model supports a constrained set of data types. While rich type systems are valuable, the Table Services data model provides everything needed to build complex data structures. The simple data model also makes it easy to expose Azure Table Services data as ATOM feeds that can be consumed by other applications. By opting for simplicity, this simple data model makes it possible to communicate with a variety of technologies, regardless of platform.

The downside of restricting an application to a limited set of data types is that it may become very difficult to store certain data in the database without resorting to writing custom serialization mechanisms. Custom serialization mechanisms make it impossible for users to reliably report on our data without intimate knowledge of how the data has been stored.

Compare the supported Azure Table Services data types with SQL Server 2008 R2’s data types and PostgreSQL’s data types. There’s some overlap, but not a lot. Limiting your application to a subset of datatypes is really nothing more than limiting your application to a subset of data that it can accurately store, model, and maniuplate. Everything else

Custom Data Type Mappings

Let’s assume we have an application that is built using PostgreSQL as the primary OLTP back end. We can expose a lot of our functionality through our cloud services as simple integers and strings, but there are some things that aren’t assured to work well when we move across different OLTP platforms. We can’t always map data types – how does inet map to SQL Server or Azure Table Services? There’s no immediately apparent way to map the inet data type to any other data type.

Clearly, custom data type mappings are not for the faint of heart. Decisions have to be made about gracefully degrading data types between databases so they can be safely reported on and reconstituted in the future. Depending on the application, inet could be stored as Edm.String in Azure Table Services or VARCHAR(16) (which only works if we’re ignoring IPv6 addresses and the netmask).

If this sounds like a recipe for confusion and disaster, you might be on to something. Using custom data type mappings across different databases can create confusion and requires custom documentation, but there is hope.

Applications using the database only need to know about the data types that are in the database. Reporting databases can be designed to work with business users’ reporting tools. As long as the data type mappings do not change, it’s easy enough to keep the reporting databases up to date through automated data movement scripts.

What Can You Do?

There’s a lot to keep in mind when you’re planning to deploy an application across multiple databases. Understanding how different databases handle different data types can ease the pain querying data in multiple databases. There’s no reason to limit your application to one database, just be aware that there are differences between platforms that need to be taken into account.

Further Reading

Google have created their own cross application/platform data serialization layer called protocol buffers. If you’re looking at rolling your own translation layer, protocol buffers may fit your needs.

Previous Post
Is Your Code an English Garden or Ikebana?
Next Post
Words DBAs Should Never Say to Developers

3 Comments. Leave new

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.