Getting BLOBS out of the Database

Raise your hand if you’re storing BLOBs in the database.

I get to say that a lot during our training. Every time I say it, nearly every hand in the room goes up. Some hands go up faster than others, but eventually nearly every hand is up.

It’s a design that happens far more often than it should, but it does happen.

Why Store BLOBs in the Database?

People put binary data in the database because they need the data to be point in time consistent with the rest of the database. It’s not enough to save space in the database if you can’t recover the file to a moment in time.

Think about this scenario:

  1. 8:00AM – Brent saves a new contract.
  2. 9:00AM – Kendra modifies the contract and saves it.
  3. 9:30AM – Jeremiah modifies the contract and saves it.

If the contract is being stored inside the database, we can recover to any point in time and have the appropriate version of the document. It may not be the most current version of the contract, but it’s the version of the document that’s consistent with the rest of the database.

Why Not Use the Filesystem?

File systems are great. They do an excellent job of storing files and organizing them into folders. File systems don’t do a great job of being point in time consistent with a relational database. There’s no transaction log to help us roll back writes that are in flight.

It’s a lot of work to get a full database backup and a file system back up to be remotely close to the same point in time. Restoring a database to a point in time is easy. Restoring a file system to a point in time is close to impossible.

Why Not Use an Appliance?

appliancesThere’s a third option available – some kind of appliance that sits between database and the file system. The appliance should manage file metadata and provide all access to the files in the file system.

Commercial databases ship with features that sound similar. SQL Server has a FILESTREAMdata type and Oracle has both a BFILE and ORD data type. Both of these types let the database interact with files in the file system. But they still have a problem – you’re stuck managing data through the database. Let’s be clear: this is not an appliance.

Content Addressable Storage (CAS) is a mature technology. The idea behind CAS is that a hardware device handles the meta-data for a given incarnation of a file. A developer sends a file into the CAS appliance and the CAS appliance returns a pointer to the file. Whenever the file changes, a new copy is created and a new handle is returned to the developer. Files can’t be modified, so any thing stored in the database can only point to the right version of the file.

We can combine this with a database pretty easily. Instead of storing a file path in the database, we store the handle that we get back from the CAS system.

How Does CAS Solve the Problem?

The main reason people store BLOBs in the database is so they can get blobs that are consistent with a point in time in the database. By using a storage device that cannot be modified (the CAS), we can make sure that the location we’ve stored in the database is always the right location – there’s no way to tamper with the files that we’re storing, so whatever gets stored in the database is correct.

There’s overhead to this approach – old data may never get cleared out. Typically, though, CAS systems store data on large, slow disks. There’s little need for the throughput that we use for a relational database store system. Do those cat pictures really need to be stored on RAID 10 SSDs? Moving BLOB storage outside of the relational database will free up resources for serving queries. Picking the right way to store your BLOB data will make it easier to scale your system.

Kendra says: Finding the right storage for large objects is a huge architectural decision that impacts performance and availability. Choose wisely!

Brent says: Want your SQL Server’s DBCCs and backups to run faster? This can help a lot.

Doug says: “It’s a lot of work to get a full database backup and a file system back up to be remotely close to the same point in time.” -> This is a major drawback that’s easily overlooked. Make sure everyone’s okay with that possibility when choosing the file system for BLOB data.

Previous Post
Which Tables are Being Used in Oracle?
Next Post
Erik Darling Joins Brent Ozar Unlimited

34 Comments. Leave new

  • You are talking about Content Addressable Storage (CAS) devices just like they were auto-magically-always-on-always-available-never-breaking-newer-wrong.

    They are not.

    The solution is not “The solution”. It is trading some specific db problems for another, potentially much greater and wider set of problems, imho.

    Slawek

    Reply
    • That’s true – I’m moving a problem from one area to another. But I want to manage BLOB storage differently than the way I manage SQL Server data. I may even have different SLAs or backup retention for the different types of data. Sure, I could do that in SQL Server, but it increases complexity.

      Reply
      • By adding CAS to the solution you have two different systems to manage, very very likely using different set of tools. Different permissions, backup / recovery strategies. Different access API, different libraries providing that API, which need to be maintained on every machine making use of them. Perhaps a different programming paradigm, different transaction handling, different and certainly more complicated error handling. I’ve probably missed quite a few other things which needs to be handled differently in heterogeneous environment.

        Wouldn’t that be much greater increase of **the whole system** complexity comparing to just moving some tables with BLOB data to a separate filegroup stored in files on slow disks?. And/or move the columns with BLOBS to a new set of tables (again kept on a different storage) to keep “the real data” nice slim and BLOB free?

        I am not saying that CAS devices are total waste of time – after all they are there because lots of people make very good use of them, and they provide good value for users. I’d be just very careful with general advice like “move all the BLOBs out of your db to a CAS and you’ll be all happy”. It depends – as usual. CAS might solve some of typical db problems but it is likely to add even more problems in different areas.

        Reply
        • Ivan Arzhentinski
          March 23, 2015 2:20 am

          Where’s the Like button here? 😉

          Reply
        • Kendra Little
          March 26, 2015 1:05 pm

          I agree that any system can have problems. But having met many people who can no longer back up their databases fast enough, I have to say the complexity is often worth it.

          Reply
          • Karl Maynard
            March 27, 2015 8:59 am

            As an alternative, and over simplified solution, create a second database to store just blob data in it… make sure you zip it if possible..

            ..I found by doing this and having a small script to pick up just the new and modified records
            (which in my case I copy to a secondary database server at a different location) I was able to remove the need for any lengthy backups !!

          • Kendra Little
            March 27, 2015 9:11 am

            Do you mean you don’t need to back up the blob data at all?

          • We have used a CAS system in the way Jeremiah describes in the post.
            It’s a huge project with hundreds of millions of pdf documents storeds in a EMC Centera.
            The result is awesome. We are combining several documents (usually 50) in the same binary before sending it to the CAS. Then, we store in the database metadata required to recover each document with the beginning position and length of each document in its binary.
            Applying binary compression before storing it, we have got 94% of size reduction (adjacent pdfs are similar).

            The EMC CAS is absolutely auto-managed. We have two of them. Each one in a different datacenter in a replicated state.

            Each CAS has several nodes and there is redundancy within the nodes of the same CAS. There’s no backup of the CAS content.

            The performance and usability is awesome.

            We are using a CLR stored procedure in the SQL Server metadata database to recover the documents. So the applications are totally unaware of the CAS existence. They are calling a simple stored procedure in the database, as usually.

          • raulggonzalez
            March 31, 2015 4:10 am

            @Jorge

            But you still need to have a backup strategy for the CAS, please see
            http://www.emc-centera.com/backup-emc-centera/

            Having data redundancy is not a replacement for having backups.

            The same as SQL HA’s are not a replacement for SQL backups either.

            Are you sure nobody there cares about those 100’s millions of docs being backed up?

          • Hola Raul.

            We were considering the need for the CAS backup with the vendor (EMC) and they were who conviced us about the no need of the Centera backup.

            It’s worth to say that this is an important content for our organization but not critical.

            The singular nature of the way the content is hashed and permanently checked and distributed between the different nodes of the CAS makes not likely to corrupt (i hope… :-)).

          • raulggonzalez
            March 31, 2015 4:55 am

            @Jorge, if that’s what they told you, probably they know best…

            I’ve got no experience using Centera, but we do use Documentum from EMC, but that’s based on SQL Server as repository, so a completely different story.

            Saludos

  • I recently had a project turned over to me and am trying to decide the next step to deal with BLOB as this has never been an issue for me previously. I’ve always used filesystem to store objects and never had any issues. Apparently this was an issue with this new customer as edits and loss occurred regularly due to a number of reasons (poor development being the primary cause), so they switched to dumping it in the database with no real plan and no baselines established.

    The problem is that they’ve stored it with the production data (everything is on one database) and it has caused it to grow at a pretty rapid pace, to the point where full db backups are not able to be retrieved anymore because of off-site hosting for government systems (too big to put on a disk, USB not allowed, transfer not allowed) … so no direct access and CAS is not an option. I wish the previous team had taken Kendra’s advice in choosing wisely, but alas, they did not and now performance and availability is quickly degrading.

    Admittedly, this is a new issue/experience for me. Any recommendations if CAS is not an option, customer refuses to go back to filesystem, and standing up a new instance for document-oriented db is also not an option?

    Reply
    • Consider storing the documents in a zip format. It will increase the processing needed to unpack and repack documents for editing, and will require more code, but may significantly reduce the needed DB storage.

      Reply
    • Tajinder Singh
      March 23, 2015 9:45 am

      Hi Steve, I also experience a similar issue in one of our clinical database. We are storing the files, images with in database and it grows very rapidly and management was so difficult. So we come up with a new design. We created one central database for all meta data and one database for every month that contain tables related to blobs and some other related meta data tables. We create new database for each month to store the blob data for that month. You can create all new databases for whole year once. It become easy to manage. We are storing the files in the database based on the time stamp. We modified all stored procedure, apps based on the new architecture to insert and read the files from month databases by the front end apps. If a customer want to read a particular file, apps are smart enough to find out in which month database they are located and it retrieve it very fast.

      Reply
  • raulggonzalez
    March 19, 2015 3:37 pm

    Great post, just have some comments that might be not clear enough.

    Jeremiah said “They do an excellent job of storing files and organizing them into folders”, well FILESTREAM itself does not do that, for that purpose you have to create filetables which are built on top of FILESTREAM, good to know beforehand to avoid surprises.
    https://msdn.microsoft.com/en-us/library/ff929144.aspx

    And Brent said “Want your SQL Server’s DBCCs and backups to run faster? This can help a lot.”
    I’ll refer just to FILESTREAM, DBCC faster?, oh yeah! quite a lot, DBCC only validates the existence of the file, so the performance boost is awesome comparing to validate the content of every page.

    Backups? I wouldn’t think so, and correct me if I’m wrong, but the performance you get when writing a huge file is not the same you get by backing up thousands of small files (BOL recommends FILESTREAM for files bigger the 1MB average)
    https://msdn.microsoft.com/en-us/library/gg471497.aspx#whentouse

    Reply
  • Oh, Jeremiah, I so agree with you when you advise to get the BLOBs out of SQL Server’s way! I recall one gig I had a few years ago, in which BLOB data was stored in the database, and the database was being merge replicated across the WAN.

    Uh-huh.

    They never could understand why replication took so long.

    🙂

    Reply
  • Great article Jeremiah. We have a 90GB database with only 500MB of ‘data’, the rest are photo’s as BLOB’s! Been thinking of moving to FILESTREAM but until recently didn’t have any other appropriate place to store the files ref backup, point-in-time recovery etc.

    So now the choice is FILESTREAM or CAS. Do you have any recommendations or further reading for CAS architecture? I’m bit confused as to whether CAS is another box, app or architecture style… A quick google isn’t helping! :S

    Keep up the great posts 🙂

    Reply
  • Really interesting 🙂 In the most recent application we did, we opted to use file system storage with a pointer record in the database. It doesn’t give us point in time recovery, but in our case thats OK as we are only storing images that are never updated or changed after the initial store.

    Our DB is around 350GB of data with ~3TB of images. We opted to keep our core\essential data in the database, whilst the items that we can afford to loose if we had to (but we do still backup!) on the file system.

    The other side of it is that our DB is on high speed disks, whilst our file hosting is on lower speed SAS drives.

    Given the size of the data, cost vs benefit was very important for this project.

    Reply
    • This is the method I’ve wanted to implement on our systems Matt, as we too have lower priority images that remain unmodified.
      I’d be interested to know if you employ any archiving though, and if so, how you do it?
      Thanks, Dawn.

      Reply
  • Totally agree about keeping BLOBs out of the database.

    We store our BLOBs in Amazon S3, with a local copy on the file server for 30 days. That allows us to failover to a backup data center by only worrying about the 200 GB database, and leaving the 10 TB of BLOBs for Amazon to deal with.

    I’d never heard of CAS, but the key is to make the file storage immutable. Whenever we update a BLOB it generates a new GUID which is the pointer in the database, and also the file name in storage.

    Reply
    • You should sha512 checksum instead for filname and pointer instead of guid.

      Reply
      • I do the same as Ted, with the S3 + . Using checsum is brilliant. Thanks!

        Reply
      • Can you give a little more detail on why sha512 checksum instead of guid? Only reason I can see is it might be useful for deduplication purposes.

        Reply
        • That’s pretty much it. A GUID is an arbitrary identifier, the SHA checksum is based on the content. SHA512 has a really low likelihood of collisions, so it’s very effective, if a bit big. For BLOBs, the size of the identifier is insignificant when compared to the content itself.

          Reply
        • It can also detect corruption of the file later on.

          Reply
          • Good point – didn’t think of that because our system uses other means of ensuring data integrity in the external blobs.

  • Hi,

    I usually store BLOBs in a separate table (key + Blob) in a BLOB filegroup ( or tablespace )
    This usually makes software development easier and the tablespace can be create where it’s apropriate. Faster devices for important BLOB. Slower devices for cat pictures…

    Reply
  • I’ve been storing files on S3 with pointers in my DB for a few years now. It’s been extremely successful, however, I’ve always wondered about people that use Blob storage. What exactly is the benefit?

    Reply
  • We have a ERP product that stores BLOB data (mostly PDF) in a table with a clustered BIGINT PK. All of the referencing and indexing data are stored in different structures.

    In order to improve performance of the SQL DB itself (~3TB), I’ve clustered the BLOB table into a separate file group and that FG has a couple of files assigned to it. We’ve chosen to store the data (which is pretty much WORM in nature) into LUNs which are filled with SATA drives. Performance is good.

    This works. We could retrofit CAS or filesystem into the app at some expense, but then backup and restore become a headache because you have to synchronize disparate storage methods. We get through a DBCC CHECKDB in about 85 minutes, so that’s not a problem (and we run these weekly).

    Sometimes the best place for the BLOB is the database. Just my $0.02.

    Reply
    • I’m not sure I follow – how would using CAS require that you synchronize the two methods? With CAS you get one pointer per version of the document. If the doc changes, you get a new pointer. Sure, the CAS has to be backed up, but that’s Somebody Else’s Problem™.

      Reply
      • I apologize that my comment could certainly have been clearer. One area where this causes an issue is in a lower-environment refresh. Understood that the backup of the CAS in the production environment is someone else’s responsibility.

        We can’t point a DEV/QA/System-Test environment at the Prod CAS, or we can’t test anything that writes to it. (:-D) So, we have to be able to synchronize closely.

        Do I make sense?

        Reply
        • Makes sense.

          When we had a CAS and needed access for development, we just bought a second CAS. Much like we bought a separate set of hardware for development purposes. It was important enough to the business to have that functionality, and not have it in the database, so made sure the functionality was available.

          Reply
  • There is another point Jeremiah has not talk about.
    When you store your BLOB in the database, SQL Server is using the buffer cache as an intermediate layer when recovering and storing the BLOBs.
    This can cause a significant reduction of your free buffer pool for the OLTP transactions if there is many Blob traffic in the database.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.