SAN Snapshot Smackdown

On Twitter, several SAN pros were saying their snapshots can pull off some nifty stuff.  Well, it’s time to find out just how true that is.

Let’s say we have a data warehouse with:

  • One 2TB database, using SQL Server partitioning – needs 8 data files and 1 log file
  • One 200GB database for reporting – has 1 data file and 1 log file
  • 5 smaller 20GB configuration databases – each with 1 data file and 1 log file
  • TempDB – eight data files and one log file

My business needs for backups are:

  • Back up any of the databases individually, at any time, with snapshots
  • Back up the 5 smaller 20GB databases with transaction log point-in-time recovery abilities
  • Sync any of the databases over the WAN to a second SAN
  • Be able to restore any of the databases individually to the snapshot, and be able to restore any of the smaller databases to a point in time

The question is:

  • How many LUNs do I need to accomplish that goal?
  • Is the backup process completely automated without scripting?  (Meaning, if the customer has to roll their own scripts, is there any examples you can point to?)
  • What makes/models of the SAN support it?

If your SAN can do it, I’d love to hear about it.  You can email me directly, or post the responses here in comments.

Previous Post
SQL MCM: The Exams
Next Post
I wrote another book. Already.

36 Comments. Leave new

  • This should be interesting… 🙂

    Reply
  • Mark Arnold
    April 6, 2010 3:29 pm

    Dude, I’m an Exchange MVP so perhaps I know a little more about VSS than you do, since SQL has only just come to the VSS party.

    If you want to take VSS backups you need the databases on different volumes then you can back them up individually. That’s VSS rules.

    That said you can simply make it look as if your databases are in the same place by clever use of that oh-so new technology: “mount points”

    How to do this from one easy clean console?
    Can Microsoft Data Protection Manager do it? NO.
    Can EMC, 3PAR, IBM, HP, Compellent do it for you? NO.
    Can lightspeed do it for you? NO.

    Is SnapManager 5 for SQL Server the tool for you? In the words of the immortal Hong Kong Fuey….. “Could be….”

    You should have put this into the SQL community. My good old friend and SQL MVP Gianluca Hotz would have sent you straight to my door.

    Reply
    • Hi Mark,

      Can you just help me with an issue.We have both Netapp snap manager and Microsoft DPM running on a Database.There seems to be some problem with the DPM backup since we have installed Snap manager.

      Is this an known issue or an error in configuration from our side ?

      Reply
      • Mark – most likely both systems are taking full backups of the same database, and that’s generally a bad idea. You’ll need to configure one of the backup programs to do copy-only backups.

        Reply
  • I’ve used a few of the snap products with SQL, and designing your SAN infrastructure for it is crucial. Generally you want to put one DB per, and when you set it up, make sure data and log are sync’d so they snap at the same time. It’s not just VSS – it’s been that way for a LONG time.

    The SAN should be VDI and/or VSS aware (hopefully VDI).

    So you’re looking at a lot of LUNs.

    All of the SAN-based stuff has no DBA involvement in my experience. EMC stuff can be automated (and can even do t-log backups, too), but it’s something you need to know what you’re doing to set up right.

    I’ve used snaps and clones successfully over the years for various things, but you need to design for it before prod, generally not after.

    Reply
  • Response from EMC.

    I collected some inputs from some of the engineers in our Proven Solutions team:

    ————————————————
    How many LUNs do I need to accomplish that goal?
    ————————————————

    Consider you are already using a partitioned database and considering you have put this in DW terms as opposed to dedicated OLTP SQL environment, the assumption here is that you are partitioning more for aged data over performance, i.e. concatenated, horizontal partitioning.

    With this in mind, the first thing you should consider is minimizing the cost of underlying storage of the colder data and maximizing the performance of the hotter data (latter partition data files) by using EMC FAST (Fully Automated Storage Tiering). Why keep older, colder data on EFD or Fibre Channel when it could sit well on SATA disk.

    Using Microsoft SQL Server and of course SQL CAT best practices, each data and log file should be allocated a dedicated volume (LUN). This also allows our replication management product Replication Manager to perform partial file-group recovery. Sounds complex, right? Trust me, it’s not, check out this demo:

    http://www.youtube.com/watch?v=YEkFUQRGvII

    Now, in terms of smaller, less I/O intensive databases which workloads tend to be similar and also all five are required to be logically consistency in order for the top-level application to function… then it is entirely acceptable, given the right consideration to consolidate these database data files and logs into the same volumes, so I will provide some considerations below

    I do not know about your TempDB workloads and the real reasons behind so many data files, is it purely following best practice (files to cores)? is it needed for raw disk I/O performance? is it needed to reduce SGAM/GAM waits/latches, etc…..but I will take a stab at somewhere in the middle and allocate two data files per LUN…

    Why am I telling you all this?

    At EMC we don’t want to guess, we don’t want to assume, we attempt to work out what the best fit is for the business and application requirement…

    The answer is based on the limited info… we had
    2.0 TB DB — 9 LUNS (8 data, 1 Log)
    200 GB DB — 2 LUNs (1 data, 1 log)
    5x 20GB config — 10 LUNs (5 data, 5 log)
    5x 20GB config(Consolidated) — 2 LUNs (1 data, 1 log)
    TempDB — 5 LUNs (4 data, 1 log)

    Totals: 26 LUNs or 18 LUNs (consolidated)

    ————————————————
    Is the backup process completely automated without scripting?
    ————————————————

    Yes, please check the demo provided above.

    EMC Replication Manager is the management application which makes it quick, automated and simple in order to achieve your data copies.

    EMC Networker, along with the Data Domain appliance will enable fast backup with de-duplication rates up to 10x

    We can perform the backup with minimal interruption to performance. We use in-array horsepower to create the data copy (clone).

    With EMC RM, we use SQL VDI in order to take application-consistent snapshots. Even with say 30, 30 SQL databases, the VDI session would typically last about a minute or so on the host.
    This is the ONLY host impact, which in terms of CPU, memory, disk and network is negligible.

    Want to perform that DBCC checkDB without affecting production, run it on another host with the same set of data instead… This scales very well.

    With EMC Networker, we can use VDI or SQL VSS in order to perform the same thing.

    You can backup at a database or instance level with both products. Scheduling natively integrated.

    More importantly, recovery…

    Let’s say you have data corruption, or perhaps lost a log file… Your last t-log backup was 13:00:00…
    You know you lost the log file at 15:33:20, because the database crashed, users called and the dbmail informed you – so you want to rewind back in time to 15:33:19 before the log file was lost….
    You quickly want to be sure that the 15:33:19 copy of data is good on another host before applying it back to production…

    If your database is protected by EMC RecoverPoint… you are always protected.

    EMC RecoverPoint enables continuous data protection, both synchronously and asynchronously both locally and remotely. RecoverPoint amongst other things has in-built WAN compression, sounds IP and FC WAN links, and enables that rewind in time…to the block write.. like Tivo for SQL. Many SQL folks who see RecoverPoint demos are amazed at the granularity and rollback capabilities.

    ————————————————
    What makes/models of the SAN support it?
    ————————————————
    Replication Manager supports EMC arrays (EMC CLARiiON, Celerra, Symmetrix V-MAX)

    Both RecoverPoint and Networker support both EMC and non EMC arrays (iSCSI and FC).

    Reply
    • That goes along with my experience with EMC storage and snaps/clones.

      Reply
    • Brian – thanks, that’s really interesting stuff! Can you elaborate more on the part about “so you want to rewind back in time to 15:33:19 before the log file was lost.” I watched the demo and I didn’t see anything about transaction log backup integration, so I’m curious to see if there’s something showing how that works. Thanks!

      Reply
      • (Disclaimer EMC employee)

        Hi Brent, some customers decide to snapshot and achieve quick recoveries – and some decide to continuously protect their databases and log files.

        EMC’s RecoverPoint allows for continuous local or remote protection using low-impact write-splitting technology that is built into some of our arrays (or can be host-based or SAN-switch based).

        Every write going to the database volume gets replicated over to the RecoverPoint (RP) journal. Just like the tlog is a journal of all transactions, RP is a journal of all the write IO’s. The journal is kept on a separate set of disks from production (again – local or remote or both) and allows for a really quick and simple recovery back to the production host.

        So it’s not really “backup” – it’s continuous protection in this case.

        So, transaction log backups are pretty much unnecessary when doing CDP/CRR.

        Here’s a video that explains it better than I can:
        http://www.youtube.com/watch?v=8KcJGJG3rjU
        (10 minutes)

        And here’s a customer story of someone who used RP for SQL:
        http://www.emc.com/B@/customer-profiles/h5751-nowcom-cp.pdf (PDF)

        Hope this helps clarify my earlier statements and please let me know if you have additional questions

        Reply
    • Brian,

      Cool that you also mentioned compression. Great!

      Mike

      Reply
  • SnapManager for SQL Server (SMSQL) allows the creation of fast backups using Snapshots. The configuration of such a SQL Server instance as you are describing has 9 LUNS, 1 LUN per database, with logs in a Separate LUN. I like using mount points within a directory structure that for each individual database. Looking like the following
    –> SQLData
    –>DWDB
    –>ReportingDB
    –>Tempdb
    –>Userdb1
    –>Userdb2
    –>Userdb3
    –>Userdb4
    –>Userdb5
    –>Logs
    Then the backup job that is created by SMSQL is a SQL Agent job that can be customized for individual database backups or for the entire SQL Server instance. The SQL Agent jobs can be placed into SQL Maintenance jobs to be grouped with other work or left alone to work at the time or threshold scheduled. The backup invokes SQLVDI.dll to take the snapshot and then also manage the transaction log to meet the requirements of point in time recovery of your database.

    The best part of the NetApp solution is it works the same across all our storage devices. From the DBA or backup admin perspective it is the same tool, same interface, same procedures that can be deployed across the entire environment so that you do not have use a different tool for each different NetApp SAN used.

    Reply
    • John – just a few questions to make it clear.

      When you say “9 LUNS, 1 LUN per database, with logs in a Separate LUN” do you mean all of the logs on a single LUN? If so, are you able to do an individual database backup for just one database, and how does that work?

      Can you elaborate more on what you mean by “also manage the transaction log to meet the requirements of point in time recovery of your database”? Are you saying you also do transaction log backups in conjunction with the snapshot, and if so, where do those log backups go?

      Thanks!

      Reply
  • Brian briefly posed a question about your choice to have 8 data files allocated to your temp DB. Like him, I assumed that this was based on the widely accepted best practice of files to cores. Recently though, I had someone point me to this msdn article, basically debunking that practice as a myth.

    http://blogs.msdn.com/psssql/archive/2007/02/21/sql-server-urban-legends-discussed.aspx

    Brent, I would love to hear you weigh in on this. Does splitting your databases (and tempdb) files up one per core actually provide a performance benefit or not? Sorry to wander off topic, but I saw the shot and I took it.

    Reply
    • Bob – yes, I’m waiting to let the vendors finish posting before I go back and add notes. I’m currently working with 2 vendors to get their answers straight – they answered via email first, so we’re going through a conversation there. I’ll have an update here soon, hopefully!

      Reply
  • Discalimer – I work as a consultant and reseller of many of the vendors, and have used them to provide solutions which essentially deliver what has been asked for.

    Yawn Dude, where have you been the last five years?
    Both EMC and NetApp have been able to do this with improving degrees of automation for some time (as Microsoft integrates VSS capabilities to use storage snapshots).
    1. How many Luns? – if you use separate luns for each item then you get maximum granularity of recovery – use less (consolidate) and you lose granularity. a few years ago it was a problem to carve many luns out of physical storage – that is long gone – so how many are you capable of managing before you get in a muddle – the storage can provide more.
    2. Automated – sort of – it is often a gui with input – ie expect an intelligent input of options as after all recovery should be overseen – but you can automate it to follow any predefined path if you want it to go ahead and not disturb your sleep.
    3.In NetApp’s or EMC’s case – just use latest version of SnapManager or Replication Manager which means all FAS and V-Series or all Clariion.

    The comment on Mark Arnold’s blog by you – about snapping logs – On NetApp you can pretty much snap any and everything – does SnapManager do it automatically and for all setups – no because you may not want it to. But you can recover to BOTH a snapshot point (ie when the backup was taken) or roll the logs to bring you up to date – it is a simple gui tick to choose.
    Now what was all the hard stuff you have spent all those years learning to do again? Better go train up on SnapManager and Replication Manager.

    Reply
    • Hi, Adrian. It’s interesting how I get different stories from different vendor employees. I look forward to being able to match up your stories with those from the vendors. Thanks!

      Reply
  • Regardless of how anyone wants to message it, and there are many ways. Regardless of what focus a particular SE has, and they have many. Regardless of whatever spin you think you’re putting on any response, one thing is clear. Snapshots have been “the future” for the past three years or so and you’ve missed the bus a bit. What were you doing as a “SAN administrator”? What controllers were you working on? Were you one of these SQL DBA’s who did big old SQL dumps and wasn’t happy unless he had a big fat pile of data in his hands that he could do something with? I meet those guys every single day of the week and it’s so frustrating. The storage guys hate them because they bloat everything. The network guys hate them because they choke the network and inopportune times of the day. The internal customers tear their hair out because it takes them days to get (say) a dev database up based on the production data. It does on and on and on.

    Reply
    • Mark – I’ve been trying my best to be tolerant of your approach here on my blog. I appreciate your enthusiasm about snapshots, but since you asked, I’ll name names.

      I got my start working with IBM DS4000 series controllers, which didn’t support snapshots of multiple LUNs simultaneously.

      We “upgraded” to the IBM N-series, which at the time, also did not support snapshots of a single database spread across multiple LUNs simultaneously. We had a data warehouse which used partitioning to segregate load to different sets of arrays depending on who was doing the querying.

      We used development & QA versions of the database on different SAN controllers from another vendor, plus controllers in another datacenter for DRP. NetApp’s technology did not support us easily getting the snapshots off to a different SAN without attaching separate tape drives to both the NetApp and the other SAN controllers, and ferrying tapes between them. I’m sure you’ll agree that’s not a convenient solution.

      This is why I’ve been so interested in the NetApp’s progress towards better SAN snapshots, and why I’m so interested in hearing your colleagues explain their emails – particularly the one where a NetApp employee said you can’t snapshot log drives. I look forward to the clarification.

      Last but not least – you’ll catch more flies with honey than vinegar. Your approach is not doing anything to make me recommend NetApp, and in fact, quite the opposite. The guys from EMC have been polite and friendly, and you’ve been nothing short of an asshole in both email and blog comments.

      Reply
      • Brent,
        Thank You! I have been a high level systems engineer for years but in the finance arena where I have spent a long time now SAN is not a primary technology in my daily world. Well until now that is. My previous daily experience with a SAN and LUNS etc goes back to HP StorageWorks 1000 series so as you can see it’s been quite some time. I am currently in the process of deciding between EMC , Compellent , HP, NETAPP etc. and found your column in my search. Well as I am wont to do I tend to just gorge myself on information and sort it out later. I would just like to say thanks for having an extremely informative column and BRAVO! on calling out Mark. Clearly you cut your teeth on SAN in the “old school” which is why you now can be considered an SME and up and comers who can navigate a GUI such as Mark have forgotten to respect their “elders” technologically speaking that is. In reading your column I am absorbing lots of great information but there are gaps that make me want to reach out and slap a certain person for his approach / attitude. It is distracting to the flow of good information you are putting forth. So now that I have said my piece. Thanks very much for the AWESOME column and for keeping it real instead of just being a brand shill. Eric.

        Reply
  • Natively generated application consistent snapshots on DS4000? Not so much. That is not something I would base my SAN experience on. By contrast, IBM N Series is NetApp FAS by another name and it did (does) support that which you claim it does not. You were probably using an older version of the rebranded SnapManager, which, as you’d expect with a vendor such as IBM, has to go through various qualification processes before they are ready to run with it. NetApp ourselves are in the same boat with our SnapManager for SharePoint which is a modified version of the Avepoint product. We’re generally a revision point or so behind as we qualify and do various engineering steps to it. I don’t know why you put “upgraded” in quotation marks there. I don’t quite see the context.

    Taking your comments about the invonvenience of getting snapshots off. No vendor supports replicating their own snaps onto someone else’s storage. But wait. Here’s something for you. NetApp has FlexClone which allows you to rapidly mount a clone of the volume containing the LUN and copy the database or any other data from the snapshot and out to somewhere else. EMC has similar technology. It doesn’t matter where. You don’t need a tape on the NetApp controller or anywhere else. Simply put, your N Series from IBM did not have up to date host-side software leveraging it and did not have the necessary licenses to enable you to do your job in the best manner possible. The fault there lies with NetApp and IBM who did not fully convey the features available. Either that or the fault lies with you for not communicating your requirements fully and with whoever did your storage design for not looking at the entire environment and designing the approporiate solution. Rest on this, the controllers could and can do all you wanted to at the time.

    I do not want you to recommend NetApp or anything else without understanding what it can, and more importantly cannot do. Right now you’re just not qualified to recommend FAS or Clariion to anyone, not until you’re up to date with what the respective management software products can do for a customer in a range of scenarios. NetApp and EMC will both agree that our respective backup products are good, solid, feature rich and reliable. We both sell thousands of licenses into organizations large and small. I can point to people with a small 5TB of data and to people with multi PB environments using SMSQL, SME, SMO, SMSAP, SMMOSS and even “SMNotes”.

    The next thing is using controllers and disk from other vendors. Depending on a qualification matrix we regularly place our controllers in front of other people’s equipment (EMC, HDS, IBM and others) so that the other vendor’s equipment is subordinate to FAS. All the NetApp “goodness” is then transposed onto the other storage, with the exception of the actual disk RAID. That’s how you’d use someone else’s disk and our technology – including snapshot and replication solutions.

    Reply
    • Is there any particular reason you’re being such a combatative dick in your responses?

      Reply
    • Mark – I’d like to ask you about a couple of lines:

      “No vendor supports replicating their own snaps onto someone else’s storage.”

      Have you used storage virtualization technology? It does in fact do what you’re describing here. You can move storage snapshots from one SAN to another. IBM’s SVC has a solution for this, and EMC’s may as well, although I haven’t used it.

      “You were probably using an older version of the rebranded SnapManager, which, as you’d expect with a vendor such as IBM, has to go through various qualification processes before they are ready to run with it.”

      Indeed, I stopped being a SAN admin about two years ago. This is one of the reasons why I’m a little – okay, more than a little – offended by your replies. You didn’t hesitate to blame me without asking questions. This was a disservice to NetApp, their products, and their employees.

      “Right now you’re just not qualified to recommend FAS or Clariion to anyone, not until you’re up to date with what the respective management software products can do…”

      Agreed, and that’s why I was asking for the public’s help with this post. Your replies helped me a great deal in determining whether I should recommend NetApp. Thanks for your help.

      Reply
  • It is not my intention and I apologise if you see it that way. Perhaps leave my combatativeness to one side and look just at the meat of what’s been said. Conversations such as this always get passionate and don’t necessarily come across like they would in a proper conversation.

    Reply
    • Mark, I hate to break this to you, but for the rest of us, no, conversations like this don’t always get passionate. You could check out EMC’s responses for an example of a helpful, non-combative response.

      Reply
  • Philip Barry
    April 12, 2010 1:00 pm

    Hi Brent,

    I’m only a DBA, but we use NetApp storage. When it comes to transaction log backups they are not snapshots, but stream based. I’m not sure of they are native SQL Server or their own technology (they use a different file extension to the default TRN), but they get written to a SnapInfo directory which is also where the metadata for the snapshots is written along with and sys DB backups (which are also stream based and not snapshots). 

    It’s also worth putting down that any user DB’s that share a LUN with a system DB will no longer snapshot, but will revert to stream based backups filling up the SnapInfo LUN/directory. 

    I must say this is only based on my usage, and I hope someone other than that Mark will comment from NetApp!

    Regards,
    Phil

    Reply
  • Brent,
    In this I am going to be NetApp specific – EMC also has much of this capability but is is difficult to be precise and cover two vendors in the same breath.
    NetApp uses aggregates to contain up to 16TB chunks of storage. Best practice is not to place the logs and database ofan application on the same aggregate (eggs, basket – get it?). You can and should put many LUNS (NetApp will often call them volumes, but a volume may contain one or more LUNS- it does not matter) in the same aggregate.
    So you could end up with two aggregates full of lots of DB and log LUNS. NetApp manages snaps at the volume level, so for each volume (with one or more LUNS) there is one set of snapshots. These can either be automatically generated on a time series, manually, or by the application aware software such as snapmanager. what is important is that if you gather luns into the same volume it automatically forms a consistency group – so YES you can snap multiple luns at the same time.
    Furthermore – if you use an an application aware product such as snapmanager – it puts SQL in a state where it can issue multiple snap commands to luns in separate volumes which will still form a consistency state – if you were not correctly advised on this about N-Series or NetApp that is very sad as it has been possible for many years.I remember doing it on SQL on CIFS in 2000.
    Similarly we often add in further time automated snaps on the log files for exchange – so that we can kill all logs after a particulr time – say 11:45 when we beleive somtething bad arrived. That was not part of snapmanager, but perfectly easy on the NetApp using standard procedures. (|I doubt you would want to do that to nost SQL logs as it could normally damage your audit integrity – but there are places you may want to.
    You seem to want to do replication of snaps – not sure I understand your actual requirement so I might miss it – you can replicate any lun or volume from NetApp to other storage uing external software or HW replication products, you use snapmirror to other NetApp, V-series or N-series. if you just want logs use SQL log shipping.
    As suggested elsewhere – why ship to another SAN- unless it is offsite – just use clones to access the data directly in place.
    I hope this helps – I am deliberately being general so you understand the principles – sometimes getting into the details too soon with vendor specialists will cause the problems about describing processes which seem to be in conflict – but are often specific answers to particular use cases. Even best practices are often conflicting depending on your requirements – so it is more important to spell out what exactly you are trying to achieve, rather than demand a single uniform answer.

    Reply
    • Adriaan – thanks for the reply. I’m not surprised about the consistency state stuff – I’ve been in communication with a few NetApp employees over email, and one of them flat out told me that NetApp can’t take snaps of a log file. I can see why customers have a hard time keeping it straight if the employees do too!

      Reply
  • John Parker of NetApp worked with me over email and produced the following:

    SnapManager for SQL provides options for doing both a Snapshot of the entire database, both log and data. This gives you recovery to the point of the snapshot. But, if you need a more granular backup to a specific point in time the need to have the transaction logs. SnapManager for SQL manages all the specific pieces of the backup to track the LSN’s so that when you need to recovery to a time that is in between the Snapshot of the data you can make this happen. The SnapInfo directory can be on the same LUN as the logs or in a separate LUN. Being a DBA I like to have the SnapInfo in a separate LUN just for recovery purposes because no matter how redundant a system is backups are for the times when the redundancy fails.

    This is how the backup works, according to the SMSQL Installation and Administration Guide:
    SnapManager performs the following tasks when creating a backup:
    1. Checks the SnapManager license
    2. Renames the most recent SnapInfo directory (if necessary)
    3. Renames the most recent Snapshot copy (if necessary)
    4. Creates a new directory in the SnapInfo directory for this backup
    a. During the backup process, SnapManager collects backup metadata that is automatically archived to the SnapInfo directory.
    5. Creates a backup set of the LUN’s containing the database files
    6. Backs up transaction logs (if specified) – this is a copy of the transaction log to the SnapInfo directory.
    7. Creates a Snapshot copy of the LUN that contains the SnapInfo directory
    8. Verifies the databases in the backup set (if specified)
    9. Deletes the oldest backup sets (if specified)
    10. Deletes the oldest Snapshot copy of the LUN that contains the SnapInfo Directory

    Summarizing the use of Snapshots, NetApp will take a Snapshot of the data and logs together to give you a recovery point at the time of the Snapshot, or you can take a Snapshot of the data along with a backup of the transaction log to get point in time recovery.

    Reply
  • Brent,

    so a couple of comments on the post you did with John Parker –
    If you put snapinfo in a different LUN but the same aggregate you are not getting any more protection, but you are forcing wasteful IO to copy all the log data through your server – I prefer to leave it in the same LUN. I often see people doing tons of work daily which slows performance and needs management to protect against an incredibly rare problem, which is better adressed by a different approach. If you need protection then archive/backup your logs elsewhere regularly or use some form of log shipping – not sure if latest SQL allows multiple copies of logs same as Oracle – that would of course be best.
    Also replication of the storage to a DR site would give full protection – look at using semisync replication for the logs and you could relistically get a SLA with RTO of minutes and RPO of near zero.

    If you go look at what is available when you do a recovery you will have both snapshots of the DB and logs, as well as a current log file – it is your choice when you recover whether to roll the current log or simply recover to the point in time snap.
    On a replicated DR environment this is possible at either site – so if your server was the problem then you go through exactly the same steps to come up in DR.

    Reply
    • Adriaan – that’s interesting. I wish I had the time to coordinate the recommendations between you and the other NetApp employees, but I’ve already spent hours on this. Thanks for the contributions though!

      Reply
  • Great comments!
    To Mark’s defense:
    He’s a Brit and has the usual dry wit. Mark’s experience is phenomenal – I’ve known him for years and he’s a true value. Don’t discount his comments, weed and feed the brashness, there’s a ton of FUD out there as well as many vendors are at end of fiscal (people are exhausted).

    Brief Qualification:
    I’ll qualify by saying that I’ve architected/designed all types of solutions from messaging, database, geo-apps, etc – with all major storage vendors. Due to this, I’m also a business continuity expert. All leading storage vendors have compelling technologies and feature sets. Concerning NetApp – for years I couldn’t understand how I’d want or get a DBA to allow the sharing of spindles for their DBs. I mean, why would anyone want to add complexity to determining I/O loads for a mission critical DB? To put it simply, when X hits the fan, sharing anything seemed insane. I wanted to find out. I applied and was offered a position with NetApp. Why not accept, after 3 years, I was still intrigued – I kept track of their growth and noticed more and more customers going NetApp.
    NOTE: I’m no longer a NetApp employee – my current company is a NetApp Star Partner as well as EMC’s Partner of the Year. We also do HP and a ton of work with other non-storage vendors.
    The Scenario with SQL on NetApp:
    NOTE: The requirements lead me to more questions however I’ll try to keep it simple.

    Thoughts/Assumptions/Caveats:
    1. 2TB DBs partitioned – my thoughts (outside native 2TB Windows limit) is that this may or may not be a warehouse using sliding windows. Synchronization is the real meat of the question – do all 8 partitions require 100% sync? Let’s go with ‘no’ based on the backup requirements given.
    2. Other User DBs – no problem. TempDBs – 8 files to support the 2TB partitioned DB – makes sense.
    3. No need to address 35 DB snap limit per volume on MS SQL (referencing 2005).
    4. No I/O requirements stated. The scenario presented is related to SAN carving and backups only. If I/O loads were to be introduced, this may affect any of the below.
    5. User DBs are in SQL Full Recovery Mode
    6. SQL Single Instance
    7. FC or iSCSI – either will suffice
    8. Single Host (as assumption states single instance)

    NetApp Preliminary Info:
    NetApp Aggregates are 16TB max with Data ONTAP 7.x (there are caveats if using Thin Provisioning).
    NetApp Snaps at the volume level NetApp Volumes can become NAS NFS or CIFS mounts/shares or they can host SAN FC and/or iSCSI LUNs.
    NetApp SnapDrive (SD) is required to connect Windows hosts to NetApp storage
    NetApp SnapManager for SQL (SMSQL) has all DBA options for backup/restore to include Full DB/T-Log, T-Log only, Copy, more and more and more. You name it – it can do the core DBA backup functions (and more).
    Netapp SnapManager for SQL requires at least 2 LUNs, one for DBs (typically User DBs) and another for SnapInfo.
    SnapInfo LUN is where the backup sets and T-Logs are copied prior to truncation. This gives the granularity to ‘up to the minute’ restores, pending T-Log activity and truncation schedules.
    SnapInfo LUN cannot contain any databases and it can be used to host the backup data for an entire instance or it can have separate directories created for specific DBs (or a mix). This allows for different granularities if needed.
    SnapMirror (SM) is one of NetApp’s replication solutions and replicates from the volume level. Can be sync, a-sync and semi-sync. READ THIS – SM can easily reverse replicate ONLY THE DELTAs. Any vendor can fail over – NetApp fails over and back very, very easily. I did an Exchange 2010 solution and failed over/back several times within one hour!
    NOTE: NetApp SnapManager Products leverage SnapDrive for their intended uses – this is required for consistent backups/restores. In the case of MOSS/SharePoint – SD, SMSQL and SnapManager for SharePoint Server are required
    HUGE: Read and follow all guides for best practices and strategies.

    Answers with NetApp Technologies (or thereabouts):
    • Overly Simplistic Approach –
    1. All DBs/Logs could be hosted on a single aggregate (aggr). (Assuming default of 16 drives (can go to 26) – that’s some I/O under the hood!)
    2. Out of the aggr, separate volumes (vols) would be created for each backup requirement presented
    3. LUNs carved from the Vols
    4. DBs placed appropriately
    5. Remote storage – 2nd aggr required on remote site with same storage space
    ? How I’d Carve each Site:
    ? 1 Vol/LUN for Sys DBs (if these will be hosted on the SAN)
    ? 1 Vol/LUN for Temp DB(s)
    ? 1 Vol/LUN for all User DBs
    ? 1 Vol/LUN for Logs
    ? 1 Vol for SnapInfo LUN/Directory
    ? Backup any DB individually
    ? SMSQL – Options
    ? It can snap all DBs on the volume consistently (select all DBs)
    ? It can snap a single or ‘hen pecked’ # of DBs in a volume – all non-selected DBs are still snapped but not consistent for restore.
    NOTE: SMSQL creates the needed SQL script to do the snaps – DBAs will see in SQL Manager. I recommend DBA create a job or two of their liking, then go into SQL Manager to modify and schedule through SQL Agent. The alternative is Windows Task scheduler…
    ? Why use a single volume for user DBs (simple response) – if all DBs combined are for Development or all have low I/O in Production, why not back them all up consistently at the same time. Less management. Restores can be by individual database from a volume.
    ? Why not do this (simple response) – loss of redundancy as all are hosted on the same aggr
    ? Back up the 5 smaller 20GB with Point in Time – already possible per the previous step.
    ? Sync all DBs across the WAN
    ? SMSQL can trigger SM replications
    ? Authorize heads to talk to each other for replication
    ? Map volumes for source/destination
    ? Use SMSQL to trigger the replication (create/modify jobs appropriately)
    ? Restore DBs individually per snap shot or with point in time – this would be doing either point in time (Full DB backup/snap time) or up to the minute (previous DB snap time and any/all logs since). Answer – the carving above already takes care of this. The only difference would be if one chooses not to set a SQL DB to be in Full Recovery mode (but that was covered as an assumption).
    • Real World Approach
    1. I hate to say it but here’s where we’re at:
    ? All backup/snap scenarios are already covered, regardless of the carving.
    NOTE: The only caveats I’ve seen is if a set of DBs or Solutions needing DB synchronization is required (thinking of some SAP and Siebel scenarios). If sync between DBs is required, more discussion/ thought is needed to determine best approach.
    ? The key changes would be to get some redundancy in the mix or address I/O
    ? Separate aggrs for DBs, Logs and SnapInfo (lots of redundancy)
    ? Split aggrs between multiple storage heads
    ? For I/O considerations, possibly , possibly for TempDB (thinking BizTalk)
    ? Go really, really crazy (would need much more disk) and create separate aggrs for every volume based on I/O (4 disks per aggr required) then create only 1 LUN per volume.
    ? Naturally, more instances would make sense (reporting server)
    ? FYI: Using NetApp Flex clone, you could create a near zero size duplicate of any of the above DBs and use for the reporting server. If needed to not be on the prod spindles, the snaps can be replicated to another volume (same storage head or different) prior to the FlexClone(s).
    ? Also, FlexClones can be ‘Split’, while servicing user requests to become a full copy of its parent.
    • Other
    1. NetApp can maintain 254 (or 253) snaps per volume – that’s a TON of backups that are online and at the DBA’s disposal. No more SQL dumps needed.
    ? Run through the typical DBA backup schedule throughout the day then divide into 254 to get how many backups can remain online. Snaps are scheduled like normal backup solutions therefore you could have 15 minute t-logs with 2-4 hour full DB Snaps for the current day, only the 8PM Full from the previous 7 days, perhaps more.
    ? It’s a good way of thinking about how the current Operations/backup processes will change as the technology simply rocks.
    ? BTW: I don’t believe anyone mentioned, a NetApp Snap is near-instantaneous (micro-seconds), regardless of the individual DB size.
    Answers:
    • LUNs needed – above answers my thoughts for simplicity however one needs to address the desired redundancy and I/O requirements for their comfort level. NetApp uses RAID DP – dual parity with aggrs up to 26 disks – that’s a lot of I/O for a single set of disks that can be carved to one’s heart’s delight (NAS/SAN).
    • Backup process – automated however one can do what they like. CLI commands are available, script examples are available, heck SMSQL will create the SQL statements for you and you can view in SQL Manager then modify per your needs. Admin guide provides info needed.
    • Makes/models – all NetApp makes and models with current software versions. NetApp is a Unified Storage platform – if one buys say a 3140 today, they want to upgrade to a 6080 next year, simply swap the heads.

    NOTE: There are many ways to ‘skin the cat’ – wanted to give a bit of meat but not go into the weeds too far.

    Thanks,
    ~Chris

    Reply
  • Disclaimer of Dumas – I used MS Word prior and did a cut/paste – please forgive.

    Reply
  • Nalin Uduwawala
    May 1, 2019 3:27 am

    Hi Brent,

    I know this is an old post but I am commenting now as we are in the hunt for new servers / architecture etc etc. If you are looking for a SAN the go to http://www.reduxio.com. They have a silver bullet. Unfortunately they are stopping production of the hardware and switching to Software defined Storage with the same technology.
    We have one of their boxes and it’s absolutely superb. Snapshots by the second. dial back along the time line , make a clone and you have recovered a back up. It’s called backdating.

    I can’t find a box that does it without a big storage penalty anywhere on the market. The box also does in line dedup and storage tiering ( yes — all in the same box).

    If you know of anything similar please let me know.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.