Blog

Oracle Backup Basics for SQL Server DBAs [Video]

To get ready for Tuesday’s webcast, here’s what you have to do:

  1. Watch the video below, but watch it today (or over the long weekend). There will be no live presentation this week and we won’t be rehashing all of the material in the video.
  2. Write down your questions or comments. (You don’t have to do this, but it’ll make it more fun.)
  3. Attend the live webcast on Tuesday at the usual time (11:30AM Central). Register here.
  4. During the first 10 minutes of the webcast, we’ll give away a prize. The catch is that you have to be there to win.

The live discussion of the video and Q&A won’t be recorded and published, and you also need to be present to win the prize. See you on Tuesday!

Monitoring Oracle with Statspack

At some point, you’re going to need to know what’s wrong with your Oracle instance. While there are a lot of monitoring tools around, there’s always some reason why third party monitoring tools can’t be installed. Oracle has shipped with something called Statspack that provides DBAs with some ability to monitor their Oracle instance.

Statspack: It's like an eye exam for Oracle

Statspack: It’s like an eye exam for Oracle

What Is Oracle Statspack?

Statspack is a set of tools for collecting performance data that Oracle began shipping with Oracle 8i. This isn’t a full monitoring framework, but it helps DBAs isolate poor performance within a time window. Once installed, Statspack can collect snapshots of Oracle performance. This will run on all editions of Oracle – there’s no requirement for Enterprise Edition or any Performance Pack.

Statspack does not set up any kind of regular schedule when it’s first configured. It’s up to you, the DBA, to figure out how often you need to be running Statspack. Since data has to be collected and then written somewhere, make sure you aren’t collecting data too frequently – you will be adding some load to the server.

Do I Need Special Access to Install Statspack?

Depending on how you look at it, either no special permissions are needed to install Statspack or else very high privileges are needed. Basically, you need to able to connect to Oracle with sysdba privileges. Any Oracle DBA responsible should be able to install Statspack. The only thing that might cause some issue is if OS level access is needed for scheduling data collection.

Since Statspack was originally designed for Oracle 8i, there are some changes that need to be made if you are deploying on Oracle 12c. Take a look at the comments on Statspack Examples for help getting Statspack installed on Oracle 12c.

What Kind of Data Does Statspack Collect?

Statspack can collect a lot of information about Oracle. Users can define just how much data they want to collect. The documentation goes to great length to remind DBAs that collecting too much data can slow down the database server.

Statspack collects data based on several configurable SQL thresholds. You can see the thresholds in the perfstat.stats$statspack_parameter table. When a query passes at least one of these thresholds, performance data will be collected.

Multiple levels of data can be collected. Oracle defines five levels of performance data collection – 0, 5, 6, 7, 10.

  • Level 0 Basic performance statistics about locks, waits, buffer pool information, and general background information.
  • Level 5 All of Level 0 plus SQL statement level details like number of executions, reads, number of parses (compiles in SQL Server speak), and memory usage.
  • Level 6 Everything from Level 5 plus execution plans.
  • Level 7 Disk metrics for particular segments that cross a threshold.
  • Level 10 COLLECT ALL THE THINGS! Plus collect information about latching. Typically you shouldn’t be doing this unless someone at Oracle has suggested it. Or youreally know what you’re doing.

This data gets stored in the Statspack tables whenever a snapshot is collected. Over time, these tables will grow so make sure that there’s enough space allocated for their tablespace or else purge out older data using the statspack.purge() function.

How Do I Use Statspack?

To collect data, either use the DBMS_JOB or Oracle Scheduler interface (depending on Oracle version) or use an operating system native task scheduler.

Once you have at least two snapshots you can report on the collected data by running $ORACLE_HOME/rdbms/admin/spreport.sql and supplying a start and end snapshot. Statspack is going to churn for a while and spit back a bunch of information. Since Statspack reports can be many thousands of lines long, spreport.sql will write to a file.

As you look through the file, you’ll find information about I/O, locking, waits, slowest queries running (but not which users/sessions are slow), and potentially a lot more, depending on how much information you’re collecting.

For the uninitiated, Oracle ships with a bunch of scripts installed in the server’s file system. These scripts can be invoked from inside your favorite SQL tool.

You thought this would be simple?

You thought this would be simple?

Limitations of Oracle Statspack

This isn’t a silver bullet, or even a bronze bullet. But it is a bullet for shooting trouble.

Statspack isn’t an automatic process. More sophisticated tools use an agent process to automatically start collecting data once they’re installed. Statspack is not that sophisticated. It requires manual configuration – a DBA needs to set up a schedule for Statspack collection and Statspack purging.

While Statspack reports on an entire server, things get a bit weird when you start bringing Oracle RAC and Oracle 12c Multitenant into the mix. With RAC, Statspack is only reporting on a single node of the cluster – to get full cluster statistics, you should look at other tooling. Statspack can also potentially cause problems on RAC that can lead to cluster instability. With Multitenant functionality, Statspack will report on the server as a whole, but you’ll have to alter the installation scripts to take full advantage of Statspack.

Another limitation of Statspack is the granularity of the data. Performance data is collected at various DBA-specified levels and at a DBA-specified interval – the DBA needs to have good knowledge of how load may vary across a day and schedule Statspack collection appropriately. Statspack metrics can also be skewed – long running events will be reported as occurring in the Statspack interval where the SQL finally finishes. If you are collecting data every 5 minutes and an I/O intensive task runs for thirty minutes, it may look like there’s a significant I/O load in a single 5 minute period.

It may require a practiced eye to correctly interpret the Statspack reports and avoid falsely attributing heavy load to a small time window.

Finally, these metrics can’t be tied back to a single session. It’s possible to see which piece of SQL is causing problems. Frequently that can be enough, but it may still be difficult to determine if it’s a problem on the whole or a problem with a single user’s session. Other tools, such as ASH and AWR can be used to provide finer grained monitoring, depending on the licensing level of Oracle.

Summarizing Statspack

Oracle Statspack can provide good enough performance metrics for many common DBA tasks. By interpreting Statspack reports, a DBA can discover any number of things about the Oracle system they’re in charge of without having to use third party tooling or purchase additional features and options. This can be especially important for those with Oracle Standard Edition systems.

For more information, check out the ORA FAQ article about Statspack and Jonathan Lewis’s collection of Statspack examples.

Generating Identities

The only thing you ever need to use for database identity is an IDENTITY, right? Well, maybe. There are a lot of different options and they all have different pros and cons.

IDENTITY columns

The default way to identify objects in SQL Server is to use an INT or BIGINT column marked as an IDENTITY. This guarantees relatively sequential numbers, barring restarts and failed inserts. Using identity columns put the responsibility for creating and maintaining object identity in the database.

SQL Server will cache IDENTITY values and generate a new batch of identity values whenever it runs out. Because identity values are cached in memory, using identity values can lead to jumps in the sequence after SQL Server is restarted. Since identities are cached in memory in large batches, they make it possible to rapidly insert data – as long as disks are fast enough.

Sequences

Sometimes the application needs more control over identity. SQL Server 2012 added sequences. A sequence, unlike an identity value, is a separate object in the database. Both application and database code can read from the sequence – multiple tables can share a sequence for an identity column or separate sequences can be created for each table.

Developers using a sequence can use the CACHE value to cache a specific number of sequence values in memory. Or, if the application should have minimal gaps in the sequence, the NOCACHE clause should be used.

The Problem with Sequential Identities

Both IDENTITY and SEQUENCE values keep identity generation squarely in the database and, by using integral values, they keep the value narrow.

You can run into problems with sequential inserts on very busy systems – this can lead to latch contention on the trailing pages of the clustered index. This issue can be resolve by spreading inserts across the table by using a GUID or some other semi-random clustering key. Admittedly, most systems are never going to run into this problem.

GUIDs for Object Identity

Some developers use GUIDs as a way of managing object identity. Although database administrators balk at this, there are good reasons to use GUIDs for object identity.

GUIDs let the application generate object identity. By moving object identity out to the application layer, users can do work in memory and avoid multiple round trips to the database until they’re ready to save the entire set of data. This technique gives tremendous flexibility to application developers and users.

There’s one other thing that a well designed application gets from this technique – independence from the database. An application that generates its own identity values doesn’t need the database to be online 24/7; as long as some other system is available to accept writes in lie of the database, the application still function.

Using GUIDs for object identity does have some issues. For starters, GUIDs are much wider than other integral data types – 16 bytes vs 4 bytes (INT) or 8 bytes (BIGINT). This is a non-issue for a single row or even for a small database, but at significant scale this can add a lot of data to the database. The other issue is that many techniques for generating sequential GUIDs in the application (see NHibernate’s GuidCombGenerator) can still run into GUID collisions.

Integral Generators

What if you could get the best of both worlds? Applications generating unique identities that are also sequential?

The point of identity generation is to abstract away some portion identity from data attributes and provide an independent surrogate value. GUIDs can provide this, but they aren’t the perfect solution. Identity generators like flake or rustflakes promise roughly sequential identity values that are generated in the application layer and are unique across multiple processes or servers.

The problem with an external identity generator is that it is an extra piece of code that developers need to manage. External dependencies carry some risk, but these are relatively safe items that require very little effort implement and maintain.

The Solution

There’s no right solution, there’s only a solution that works for you. You may even use each solution at different points in the lifecycle of the same product. It’s important, though, for developers and DBAs to be aware of how identity is currently being handled, the issues that can arise from the current solution, and ideas of how to handle it going forward.

Why Archive Data?

The data story so far

The data story so far

Meet Margot. Margot is an application developer who works for a small company. Margot’s application collects and generates a lot of data from users including their interactions with the site, emails and texts that they send, and user submitted forms. Data is never deleted from the database, but only a few administrative users need to query historical data.

The database has grown considerably because of this historical data – the production database is around 90GB but only 12GB or so is actively queried. The remaining data is a record of user activity, emails, text messages, and previous versions of user data.

Margot is faced with an important decision – How should she deal with this increase in data? Data can’t be deleted, there isn’t budget to upgrade to SQL Server Enterprise Edition and use table partitioning, and there’s a push to move to a cloud service to eliminate some operational difficulties.

Using Partitioned Views to Archive Data

A Partitioned View

A Partitioned View

One option that Margot has read about is “partitioned views” – this is a method where data is split into two or more tables with a view over the top. The view is used to provide easy access to all of the information in the database. Storing data across many tables means DBAs can store data in many different ways – e.g. compressed tables or filegroups and tiered storage.

There’s a downside to this approach – all of the data is still in one database. Any HA solutions applied to the live portion of the data set will have to be applied to the entire data set. This could lead to a significant cost increase in a hosted/cloud scenario.

Archiving Data with a Historical Database

Archive this!

Archive this!

The second thing that sprang to mind was creating a separate archival database. Old data is copied into the archival database by scheduled jobs. When users need to run historical reports, the queries hit the archival database. When users need to run current reports, queries are directed to the current application database.

Margot immediately noticed one problem – what happens when a user needs to query a combination of historical and current data? She’s not sure if the users are willing to accept limited reporting functionality.

Archiving Data with Separate Data Stores

One active database. One archival octopus.

One active database. One archival octopus.

A third option that Margot considered was creating a separate database for the data that needed to be kept forever. Current data would be written to both the live database and the historical database. Any data that didn’t need to be ever be in the current database (email or SMS history) would only be written to the historical database.

Although this made some aspects of querying more complex – how could row-level security from the primary database be applied to the historical database – Margot is confident that this solves the majority of problems that they were facing.

This solution would require application changes to make querying work, but Margot and her team thought it was the most flexible solution for their current efforts: both databases can be managed and tuned separately, plus the primary database remains small.

Other Ideas?

Not every database needs to scale in the same way. What ideas do you have to solve this problem?

What’s New in Oracle 12.1.0.2?

Oracle 12.1.0.2 was recently released with some fanfare. One of the most talked about features is Oracle Database In-Memory, but there’s more to this release than just columnar storage.

Big Investment in Data Warehousing

Almost all of the new features reflect an increased focus on data warehousing. This is similar to recent SQL Server releases and, indeed, most mainstream database releases. Different vendors are betting on BI and Oracle’s most recent release is no exception.

Full Database Caching

Caching data in memory can avoid costly disk-based I/O. By default, databases will cache frequently accessed data. There’s a setting in Oracle called the small table threshold – scans of tables larger than the_small_table_threshold will use a portion of memory that should age out faster. This idea (in SQL Server it’s called page disfavoring) exists to prevent large table scans from clearing out database server memory.

What if you want those tables to stick around in memory? Modern servers can hold huge amounts of memory and with Oracle RAC, you can scale your buffer pool across multiple servers.

The full database caching feature changes this behavior. Once full database caching is enabled, Oracle will cache all data in memory as the data is read.

It’s important to note the guidance around this new feature – it should only be used if the logical size of the database is no bigger than 80% of available memory. If you’re using Oracle RAC, DBAs need to be doubly careful to make sure that the workload is effectively partitioned across all RAC nodes. Read more about the features and gotchas in the Oracle Database Performance Tuning Guide.

Who needs disk when you have memory?

Who needs disk when you have memory?

Big Table Caching

Speaking of larger tables, the new patch includes a feature called automatic big table caching. If you have more data than you have memory, this feature seems a lot safer than full database caching.

Rather than attempt to cache all data in memory, big table caching sets aside a portion of the buffer pool for caching data. The DBA sets up a caching target (DB_BIG_TABLE_CACHE_PERCENT_TARGET) and up to that percentage of memory will be used for caching really big tables.

Like the full database caching feature, this works with single instances of Oracle or Oracle RAC. There are, of course, considerations for using the feature. The considerations are documented in the VLDB and Partitioning Guide.

In-Memory Column Store.

I’ve already covered Oracle’s In-Memory Column Store. There are some additional features that have been announced that make the In-Memory Column Store feature a bit more interesting.

A new data flow operator has been added to Oracle: the VECTOR GROUP BY. This new operator enables efficient querying for In-Memory Column Store tables. Although no new SQL operators have been added to Oracle for theVECTOR GROUP BY, it’s possible to use query hints to control optimizer behavior. Six new hints have been added to push the optimizer in the right direction.

Attribute Clustering

In the Oracle world, tables are almost always heaps. Clustering is done only rarely and in special circumstances.

Attribute clustering lets the data warehouse designer specify a table order based on one or more columns of the table. So far, this sounds like something that you could already do. It is also possible to use join attribute clustering to cluster on columns connected via a foreign key relationship. The data warehouse designer, or even the DBA, can modify table structure to cluster a fact table based on dimension hierarchy.

An additional feature called interleaved ordering uses some pretty advanced storage mechanisms to get better I/O out of an attribute clustered table. Interleaved clustering makes it more efficient to perform queries where you may not know the search predicates, or where search predicates may vary over multiple columns (sounds like a dimensional model to me).

There are a few upsides to attribute clustered tables – they require fewer (or no) indexes, they may require less I/O, and they improve data compression. As always, consult the Attribute Clustering documentation for all the details and fine print.

A map of all the people who need data.

A map of all the people who need data.

Zone Maps

Zone maps are new to Oracle 12.1.0.2 – they provide an additional level of metadata about the data stored in a table. Instead of operating at the row level, like an index, a zone map operates at the level of a range of blocks. In a way, an Oracle zone map is a lot like a SQL Server Columnstore dictionary – it allows I/O pruning to happen at a much coarser granularity than the row level.

When zone maps are combined with features like attribute clustering or partitioning, the database can take advantage of metadata at many levels within the zone map and perform incredibly efficient I/O pruning. Data can be pruned at the zone, partition, or subpartition level, and when attribute clustering is in use the pruning can happen at the dimension level too.

Zone maps are going to have the most benefit for Oracle databases with a heavy data warehousing workload. Big scans with common predicates, especially with low cardinality predicates, will use zone maps to eliminate a lot of I/O. Users taking advantage of attribute clustering and partitioning will also see improvements from this feature. You can read more in the Zone maps section of the data warehousing guide.

More than Data Warehousing

This is a wealth of features. Seriously, that’s a crazy investment in data warehousing for any release of a product, much less a point release.

Every Developer’s Dream: JSON Support

Everybody hates XML. It’s cumbersome to query and people love making fun of it. Many databases have been adding support for JSON and Oracle is no exception.

Oracle’s JSON implementation supports search indexes for general querying (give me the third element of the array in property “sandwiches”). Multiple index types exist for the JSON data type, giving developers fine grained control over how they interact with the data stored in the database. Oracle full text search also supports full-text queries using a new json_textcontains operator. If you want to jump down the JSON rabbit hole, make sure you check out the JSON documentation and pay particular attention to the sections about indexing – there are a number of examples of what will and won’t work.

Astute readers will notice that with the addition of JSON to Oracle 12c, SQL Server is now the only major database platform that doesn’t support JSON as a native datatype.

Cross Container Querying

Oracle 12c brought multi-tenant databases, sometimes referred to as contained databases or containers. If you wanted to query across databases, you’d typically have to go through some shenanigans. The CONTAINERS clause makes it easier for developers to write queries that can go across all containers, or even some containers.

Basically, instead of querying directly from one table, a developer would put the name of the table in a CONTAINERS function:

SELECT CON_NAME, *
FROM   CONTAINERS(employees) ;

And if you wanted to query only specific containers:

SELECT CON_NAME, *
FROM   CONTAINERS(employees)
WHERE  CON_ID IN (1,2,3,4) ;

While not useful for all systems, this does allow DBAs to perform administrative or back office queries across multiple database containers. The gotcha? All databases need to have the same schema.

Summary

Oracle 12.1.0.2 has a number of new features, some of which I didn’t mention since they relate entirely to managing the contained database features. If you want to learn more, check out the new feature guide - it’s full of links to the Oracle documentation on each new feature.

Getting an Oracle Execution Plan

Execution plans are fantastic – they make it easier to visualize how a query is running and find tuning opportunities. SQL Server DBAs starting with Oracle may be a bit frustrated by the lack of a clear way to get to execution plans. This covers three ways to get to execution plans in Oracle – estimated plans, actual plans from an active session, and actual plans from the plan cache.

Getting Estimated Oracle Execution Plans

One way, likely more familiar to MySQL and PostgreSQL DBAs, is to use the EXPLAIN PLAN FOR command to get an execution plan out of Oracle. To do this, a DBA would simply add the command EXPLAIN PLAN FOR to the beginning of a SQL statement:

EXPLAIN PLAN FOR
SELECT o.order_id, v.product_name
FROM   oe.orders o,
       (  SELECT order_id, product_name
          FROM   oe.order_items o, oe.product_information p
          WHERE  p.product_id = o.product_id
                 AND list_price < 50
                 AND min_price < 40  ) v
WHERE  o.order_id = v.order_id;

Oracle will immediately finish and return Explained. SQL Server DBAs will likely exclaim “WTF, Oracle?” and stomp off in frustration in search of a GUI that will let them connect to Oracle and show them a graphical execution plan.

Oracle gives DBAs the ability to save off plans into tables, and that’s exactly what the EXPLAIN PLAN FOR syntax does – it saves the plan into a table named plan_table. You can view the execution plan using the following command:

SELECT plan_table_output
FROM   table(dbms_xplan.display());

This produces lovely text output:


Plan hash value: 1906736282

---------------------------------------------------------------------------------------------
| Id  | Operation             | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                     |     1 |    40 |     6   (0)| 00:00:01 |
|   1 |  NESTED LOOPS         |                     |     1 |    40 |     6   (0)| 00:00:01 |
|   2 |   MERGE JOIN CARTESIAN|                     |     4 |   128 |     6   (0)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL  | PRODUCT_INFORMATION |     1 |    28 |     5   (0)| 00:00:01 |
|   4 |    BUFFER SORT        |                     |   105 |   420 |     1   (0)| 00:00:01 |
|   5 |     INDEX FULL SCAN   | ORDER_PK            |   105 |   420 |     1   (0)| 00:00:01 |
|*  6 |   INDEX UNIQUE SCAN   | ORDER_ITEMS_UK      |     1 |     8 |     0   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("MIN_PRICE"<40 AND "LIST_PRICE"<50)
   6 - access("O"."ORDER_ID"="ORDER_ID" AND "P"."PRODUCT_ID"="O"."PRODUCT_ID")

19 rows selected.

There are three parts here - the plan hash value, the row source tree, and predicate information. Plans can be read similarly to SQL Server execution plans; just start at the deepest level and work your way out. In this case, start at Id 6 and work your way up to 3, then 2, then 1, then 0. An asterisk next to the Id means that there’s more detailed information about that step in the Predicate Information section.

You can get more detailed output from Oracle by supplying parameters to the DBMS_XPLAN.DISPLAY() function:

select plan_table_output
from table(dbms_xplan.display('plan_table', null, 'all'));

More documentation on DBMS_XPLAN can be found in the Oracle documentation.

Getting Actual Execution Plans From Oracle

Run the query first:

SELECT o.order_id, v.product_name
FROM   oe.orders o,
       (  SELECT order_id, product_name
          FROM   oe.order_items o, oe.product_information p
          WHERE  p.product_id = o.product_id
          AND    list_price < 50
          AND    min_price < 40  ) v
WHERE  o.order_id = v.order_id;

And then follow up by executing:

select * from table(dbms_xplan.display_cursor());

You should see the following execution plan:


SQL_ID  gpbdw897v4mbg, child number 0
-------------------------------------
SELECT o.order_id, v.product_name FROM   oe.orders o,        (  SELECT
order_id, product_name           FROM   oe.order_items o,
oe.product_information p           WHERE  p.product_id = o.product_id
        AND    list_price < 50           AND    min_price < 40  ) v
WHERE  o.order_id = v.order_id

Plan hash value: 1906736282

---------------------------------------------------------------------------------------------
| Id  | Operation             | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                     |       |       |     6 (100)|          |
|   1 |  NESTED LOOPS         |                     |     1 |    40 |     6   (0)| 00:00:01 |
|   2 |   MERGE JOIN CARTESIAN|                     |     4 |   128 |     6   (0)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL  | PRODUCT_INFORMATION |     1 |    28 |     5   (0)| 00:00:01 |
|   4 |    BUFFER SORT        |                     |   105 |   420 |     1   (0)| 00:00:01 |
|   5 |     INDEX FULL SCAN   | ORDER_PK            |   105 |   420 |     1   (0)| 00:00:01 |
|*  6 |   INDEX UNIQUE SCAN   | ORDER_ITEMS_UK      |     1 |     8 |     0   (0)|          |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter(("MIN_PRICE"<40 AND "LIST_PRICE"<50))
   6 - access("O"."ORDER_ID"="ORDER_ID" AND "P"."PRODUCT_ID"="O"."PRODUCT_ID")


28 rows selected.

Hey, that looks exactly the same! The default parameters for dbms_xplan.display_cursor won’t show us all the juicy details that we want. Adding in FORMAT => '+ALLSTATS'will do exactly what we want. Before you go running that execution plan query again, you have two choices: you can either execute the full select query again, or you can copy down the SQL_ID and use it in the dbms_xplan.display_cursor function like this:

select *
from table(dbms_xplan.display_cursor(sql_id => 'gpbdw897v4mbg',
                                     format => '+ALLSTATS'));

The output gets a bit more detailed and ends up looking like this:


SQL_ID  5p20zwvbgb93j, child number 0
-------------------------------------
    SELECT o.order_id, v.product_name     FROM   oe.orders o,
 (  SELECT order_id, product_name               FROM   oe.order_items
o, oe.product_information p               WHERE  p.product_id =
o.product_id               AND    list_price < 50               AND
min_price < 40  ) v     WHERE  o.order_id = v.order_id

Plan hash value: 1906736282

----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name                | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem |  O/1/M   |
----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                     |      1 |        |    269 |00:00:00.08 |    1337 |       |       |          |
|   1 |  NESTED LOOPS         |                     |      1 |      1 |    269 |00:00:00.08 |    1337 |       |       |          |
|   2 |   MERGE JOIN CARTESIAN|                     |      1 |      4 |   9135 |00:00:00.03 |      34 |       |       |          |
|*  3 |    TABLE ACCESS FULL  | PRODUCT_INFORMATION |      1 |      1 |     87 |00:00:00.01 |      33 |       |       |          |
|   4 |    BUFFER SORT        |                     |     87 |    105 |   9135 |00:00:00.01 |       1 |  4096 |  4096 |     1/0/0|
|   5 |     INDEX FULL SCAN   | ORDER_PK            |      1 |    105 |    105 |00:00:00.01 |       1 |       |       |          |
|*  6 |   INDEX UNIQUE SCAN   | ORDER_ITEMS_UK      |   9135 |      1 |    269 |00:00:00.02 |    1303 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter(("MIN_PRICE"<40 AND "LIST_PRICE"<50))
   6 - access("O"."ORDER_ID"="ORDER_ID" AND "P"."PRODUCT_ID"="O"."PRODUCT_ID")


28 rows selected.

Now we can see the estimated (E-Rows) and actual (A-Rows) returned from each operation as well as timing and memory information. Parallel queries will return even more information (take a look at the example DBMS_XPLAN commands and output to see what that would look like.

Viewing The Oracle Execution Plan Cache

You might hear Oracle people talk about this as the library cache or the cursor cache. In the SQL Server world, we’d call it the execution plan cache. Whatever you want to call it, it’s contained in a system view: V$SQL_PLAN.

We could find our query in V$SQL_PLAN and get the execution plan doing something like this:

select plan_table_output
from   v$sql s,
       table(dbms_xplan.display_cursor(s.sql_id, s.child_number,
'basic')) t
where  s.sql_text like 'SELECT o.order_id, v.product_name%';

This query will pull back basic information about the different statements that have been executed that match our search condition. That output looks different because I used thebasic parameter for plan display instead of the default of typical, but here’s an example:


EXPLAINED SQL STATEMENT:
------------------------
SELECT o.order_id, v.product_name FROM   oe.orders o,        (  SELECT
order_id, product_name       FROM   oe.order_items o,
oe.product_information p       WHERE  p.product_id = o.product_id
    AND    list_price < 50       AND    min_price < 40  ) v
WHERE  o.order_id = v.order_id

Plan hash value: 1906736282

-----------------------------------------------------
| Id  | Operation             | Name                |
-----------------------------------------------------
|   0 | SELECT STATEMENT      |                     |
|   1 |  NESTED LOOPS         |                     |
|   2 |   MERGE JOIN CARTESIAN|                     |
|   3 |    TABLE ACCESS FULL  | PRODUCT_INFORMATION |
|   4 |    BUFFER SORT        |                     |
|   5 |     INDEX FULL SCAN   | ORDER_PK            |
|   6 |   INDEX UNIQUE SCAN   | ORDER_ITEMS_UK      |
-----------------------------------------------------

If you want to get the full plan from any of these statements, you can either include the sql_id in the query to find all plans for a statement, or you can use the plan hash value to get retrieve it from the v$sql_plan view and use it in the dbms_xplan.display_cursor function.

In the query to retrieve matching plans, our query uses Oracle's v$sql_plan system view to find matching cursors (everything's a cursor, yay) in memory. Once we have a list of matching cursors in memory, dbms_xplan.display_cursor is used to pull back information about the actual plan from the cursor cache.

Wrapping Up

There you have it - three ways to view execution plans in Oracle:

  1. Using EXPLAIN PLAN FOR to get an estimated execution plan.
  2. Using dbms_xplan.display_cursor() after running a query to get the actual execution plan.
  3. Querying v$sql_plan to find plans in the plan cache.

SQL Server experts and novices alike can get the kind of query details from Oracle that they’re used to seeing from SQL Server.

Backing Up an Oracle Database

One of a DBA’s most important tasks is taking backups of databases. As SQL Server DBAs, we’re used to using the same tools to backup the database that we use for the rest of our job. With Oracle, there are two main ways to back up a database: user managed backups and Oracle Recovery Manager (RMAN).

User Managed Backups of Oracle

User managed backups leave the database out of the backup equation. It’s up to the DBA and system administrator to figure out the best way to accomplish backup and recovery.

In the distant past, user managed backups required that the Oracle database be taken offline in order for the backup to occur. The database has to be in a consistent state; since the OS doesn’t know how Oracle is writing data, it can’t be trusted to copy files correctly.

Obviously, it’s not possible to take most databases out of production to take a backup. Oracle has a feature called a “hot backup”. In a hot backup, databases or tablespaces can be placed into backup mode. Once in backup mode a DBA or system administrator can use OS/storage tools to take a backup of the Oracle database. Writes can continue during this time, but Oracle changes its behavior to make it possible to get a consistent snapshot of the data.

As the name implies, user managed backups place the entire burden of managing database backups on the DBA. While this is more primitive than what most SQL Server DBAs are used to, the fundamental principle is the same – it’s up to the DBA to figure out when and how to backup the database.

Good backups are like good plumbing - you don't miss them until they're broken.

Good backups are like good plumbing – you don’t miss them until they’re broken.

Backing Up an Oracle Database With Recovery Manager

RMAN is the answer to your anguished cries of “There has to be a better way to back up a database!”

RMAN combines many aspects of automated user managed backups, but provides additional functionality and fail safe mechanisms. The overall upside is that RMAN ships with Oracle – there’s limited need to write custom software, instead DBAs configure RMAN to work in ways that make sense for the environment.

Effectively, here’s what happens: the DBA sets up RMAN with space to store backups, the fast recovery area in RMAN terms. The fast recovery area is where RMAN will store data file backups as well as archived redo log files. The idea behind the fast recovery area is that RMAN itself will manage backup retention based on policies that a DBA sets up. By having a fast recovery area present, it’s also possible to rapidly handle online restore – a DBA just has to tell RMAN a point in time for the restore and the software does the rest.

RMAN is a highly configurable tool – it supports many different types of backups (including the hot backups mentioned earlier), full backups similar to SQL Server full backups (data + some log), incremental backups, and even collapsing incremental backups so that a DBA has to restore fewer files. It’s relatively simple to start out with RMAN, but it provides a wealth of functionality that lets a DBA create the backup and recovery strategy the business needs.

Not everything is automatic: DBAs need to carefully analyze backup space requirements. RMAN handles backups, but if the fast recovery area fills up, the database will refuse writes until space is freed up.

By leveraging RMAN, DBAs can focus less on the technical details of backups and instead get to managing the business requirements of RPO and RTO while leaving tedious tasks to software.

How Should a SQL Server DBA Approach This?

From a SQL Server DBA’s perspective, RMAN is closer to a third party solution like Ola Hallengren’s maintenance scripts combined with a third party tool like Dell Litespeed or Red Gate SQL Backup. It handles many aspects of database backups, provides centralized reporting, and automates most tasks around backup and recovery. RMAN will seem more familiar to a SQL Server DBA, but some shops will still use user managed backups.

To get started learning more, check out the Oracle Database Backup and Recovery User’s Guide. It contains a number of detailed examples, syntax explanations, and considerations about how to best approach backing up an Oracle database.

Oracle Terminology for the SQL Server DBA

translation-monster

The translation team hard at work.

At some point you’re going to be confronted with an Oracle installation or even just an Oracle DBA. Communicating with a DBA who works on a different product can be difficult, it’s like speaking US English and having a conversation with a native English speaker from Scotland. The words are the same, but they have different meanings.

While this is by no means an exhaustive list, it will help SQL Server DBAs have a meaningful conversation with their Oracle colleagues.

Database
Oracle refers to the database as the data files on a disk that store data.

Database instance
The set of memory structures and system processes that manage database files. Basically, the instance is executables and memory. Oracle has some different terms to separate out plan cache, buffer pool, and other concepts. But at a high level, executables and memory make a database instance.

So far things seem the same. Up until Oracle 12c, though, these two concepts were close to one in the same – one instance of Oracle housed one database (things like Oracle RAC not included). One thing to take note of - Oracle on Windows runs within one process, just like SQL Server. On Linux, however, there will be multiple Oracle processes each with a clearly defined purpose.

Tablespace
A tablespace is roughly analogous to a filegroup. You can create tables and indexes inside a tablespace. Like a filegroup, you can take tablespace backups separate from the rest of the database.

Unlike SQL Server, each tablespace can have many different options – some tablespaces can be logged while others are not. During tablespace creation, DBAs can manage a variety of features of each tablespace including a separate undo tablespace (see below), per user disk quotas, logging, or even on-disk block size (this can be helpful when dealing with LOB data).

In short, Oracle DBAs can customize database behavior at the tablespace level as well as at the database level. This can be useful for controlling archive data performance, blobs, or managing other aspects of storage.

This tablespace is vast and untouched.

This tablespace is vast and untouched.

Default tablespace
Every user is created with a default tablespace. The default tablespace defines where that user’s tables and indexes will be created unless a different location is specified. This is like setting up a default filegroup, but it can be set per user instead of per database, and it provides finer grained control. A default tablespace is not like a default schema in SQL Server – users can create objects with different schemas inside their default tablespace. This isn’t related to object ownership like schemas in SQL Server, it’s related to object placement on disk.

Temporary tablespace
You know how SQL Server has one tempdb? Within Oracle, database administrators can specify a different temporary work space on a user by user basis. Fast OLTP workloads can have access to SSD temporary tablespace while data warehouse queries and ETL jobs can have their own temporary tablespace that uses rotational disks. Heck, you could even allocate PCI-Express storage for executives’ temporary tablespace if they needed lightning fast joins or just wanted to foot the bill for PCI-Express storage.

Undo tablespace
Oracle uses MVCC by default (in SQL Server you’d call it READ COMMITTED SNAPSHOT ISOLATION). Row versions have to be stored somewhere, but there’s no tempdb. The undo tablespace is used to track changes that have been made and to put the database back into a consistent state if a transaction is rolled back. Although it is possible to create multiple undo tablespaces, only one undo tablespace will be used any single Oracle instance at a time.

If only one tablespace can be active per Oracle instance, why have multiple undo tablespaces? Oracle RAC can contain multiple Oracle instances reading the same database. Each of the Oracle RAC instances can have a separate undo tablespace. If this sounds confusing, don’t worry – Oracle RAC is complex and deserves a separate blog post.

Rollback
Once upon a time, Oracle DBAs had to configure the undo tablespace by hand. This was called the rollback segment. Poorly configured rollback segments led to “snapshot too old” errors and grumpy DBAs. If you ever encounter Oracle using a rollback segment, kindly ask the DBA why they aren’t using automatic rollback management (undo tablespaces).

Redo log files
It’s a transaction log file! A key Oracle difference is that everything gets logged, even the undo information. Redo log files are used just like SQL Server transaction log files.

Like SQL Server’s transaction log, Oracle can have multiple redo log files. These log files are written to in a circular fashion – the log files are written to in order and, when all log files are full, Oracle will circle around to the beginning again.

Unlike SQL Server, having multiple redo log files is the preferred way to manage Oracle logging. By default, there are two groups of redo log files, but this can and should be configured, based on RPO/RTO needs.

Archived redo log files
These are redo log files that have been backed up. There are a number of ways to have Oracle automatically manage creating backups of redo log files that vary from manual to completely automated. If the disks storing these files fills up, Oracle will not be able to write to the data files – active redo log files can’t be archived any more. To ensure safety, writes are stopped.

Temporary tables
Oracle temporary tables are similar to SQL Server’s with one major exception – they’re statically defined. Even though an Oracle temp table definition will stick around until dropped, the data only persists for the duration of a session (or transaction if the table is configured that way).

The data inside a temporary table exists only for the current session – you can’t view data in another session’s temp table. The upside is that temp table metadata is always available for other users to query.

Backups
Oracle backups are very different from SQL Server backups – they’re both more simple and more complex than SQL Server at the same time. Many Oracle shops use a tool call Oracle Recovery Manager (RMAN) to handle database and redo log backups, archival, and even the expiration and deletion of backup files.

In-Memory Analytics: What’s Different Between SQL Server and Oracle

Pick a memory, any memory.

Pick a memory, any memory.

Both Microsoft and Oracle have released what they call “in-memory analytics” features. Microsoft started two years ago with SQL Server 2012 and Oracle is following suit next month (July 2014) with the Oracle In-Memory Analytics feature. Most of us won’t pick our database platform based on just one feature, but it’s important to know the difference between these two so you can have a better discussion around the technical choices that you’re making.

SQL Server Columnstore Indexes

When they first arrived, SQL Server columnstore indexes were touted as memory optimized. By using a columnar data structure and columnar compression techniques, SQL Server columnstore indexes are able to get tremendous improvements in performance.

In effect, SQL Server columnstore indexes are “in-memory” because of the effects of columnar compression. The compression means that the index size is significantly smaller. It’s likely that the whole index can be cached in memory and will stay in memory because the index smaller.

By thinking smarter, Microsoft created a technology that allowed BI professionals to achieve significant speeds on data warehouses. There was one problem: the SQL Server 2012 columnstore was read-only.

SQL Server 2014 columnstore fixed the read-only problem by introducing a writable clustered columnstore. When the index is created as a clustered columnstore index (it’s the only index on the table) SQL Server will use a writable deltastore to hold data. This advantage over SQL Server 2012 has prompted many teams to migrate to SQL Server 2014.

The downside of both SQL Server columnstore implementations is that they’re designed for pure data warehouse workloads. If you wanted to combine an OLTP workload with a data warehouse workload on the same system, you’d have some work head of you to build something that would work with both regular indexes and columnstore indexes. At its core, SQL Server columnstore technology is a data warehouse technology.

Oracle Database In-Memory

Oracle Database In-Memory takes a different approach to SQL Server. The Oracle feature is designed to be a transparent drop-in. Rather than require a separate physical structure on disk, a DBA will activate the in-memory feature for a given table and wait as Oracle does the rest.

Oracle does this by creating a dual-format data structure. As data is read and written to tables using the in-memory feature, the database builds columnar indexes in-memory and on-the-fly. Yes, this means that you’ll need more memory in your database servers. Many Oracle customers use Oracle RAC for scale-out performance improvements. Dedicating one RAC node to reporting isn’t unheard of and can be used to alleviate the pressure caused by holding a second, albeit compressed, copy of a table or index in memory.

Once the row-level data has been merged into the columnar index, Oracle uses compression to further improve performance. Just like other columnar database engines, compression vastly improves the performance of queries.

Oracle’s offering doesn’t persist the in-memory analytic indexes to disk – they’re purely in-memory indexes. This avoids the overhead of logging and persisting them to disk. The double write hit is avoided. However, there’s one problem with this technology – restarts. Whenever the database engine is restarted, the in-memory analytic indexes will have to be rebuilt from disk.

Technical details about the Oracle offering are sparse – we can expect to learn more after Oracle has released the feature to the public.

Comparing In-Memory Analytics

While both companies have produced an in-memory analytic feature of one form or another, direct comparisons are difficult. Microsoft SQL Server columnstore indexes are designed to improve pure data warehouse implementations. Oracle Database In-Memory, on the other hand, is intended to help DBAs get the most out of the hardware they already have. Many business users want to report on live data and this feature makes live reporting a possibility.

Sizing SQL Server for AWS

Let’s skip the shenanigans and get right down to it – you’re responsible for SQL Server and someone in your company wants to move things into Amazon Web Services (AWS). You’ve got SQL Server setup covered thanks to our helpful SQL Server setup checklist and you’re confident you’ve got hardware under control, but things are different up in the clouds.

The question on your mind is “What’s the best way to size a SQL Server in the cloud?”

The Basics: Sizing the Server

measure-this

Amazon gives you a set list of server sizes to choose from. It’s up to you to pick the right server for the job. The available choices can be overwhelming at first glance, but it gets simpler when you consider different classes of SQL Server and map them to different families of AWS instances.

By the way: we call servers instances in the cloud. Yes, it’s confusing. No, we can’t change it.

Instances are grouped into instance types – you can think of an instance type as a general category of performance. This is going to be your starting point for deploying SQL Server. Don’t worry, though, you can usually change the instance type after you’ve configured SQL Server.

Possible SQL Server instance types:

  • M3 General Purpose – These are basic servers with up to 8 cores and 30GB of memory. M3 instances won’t win any awards for speed, but they are cheap.
  • R3 Memory Optimized – These instances ship with up to 32 cores (2 sockets) and 244GB of memory. Two local SSDs are thrown in – seems like a good place for TempDB to me!
  • I2 IO Optimized – Sounds great at first, right? All the memory of the R3 with 6.4TB of local SSD. At only $7 an hour, it’s a steal!

So, which one do you need? Remember that you don’t want to just choose the biggest and baddest instance right off the bat – there’s a noticeable cost associated with any of these instances.

We’ll get back to picking an instance type after covering a few other details.

Support Level

In many organizations, DBAs are lucky to have tiered support. Mission critical applications like POS systems, or credit card processing get put in the highest tier of support. Internal applications may end up in a second tier. And, finally, legacy applications or applications that are not critical in any way to the business end up in the third tier.

We can’t ask AWS to give us faster disks, SAN replication or some other magical enterprise HA feature. But we can use this support information to figure out if this SQL Server requires the attention of a full time DBA or if it can be offloaded to something like Amazon RDS. Making this decision is outside the scope of this article, but knowing that there is an option is enough.

  • What’s the support level required for this application?
  • Does the application require regular manual tuning?
  • What’s the support SLA?
  • Is there an incident response plan?
    • Has anyone ever reviewed the incident response plan?
    • Have you ever had to follow it? That is: does it actually work?

Networking

I bet you don’t even think about networking.

The first thing you need to know is: by default, AWS instances use a single network card (NIC) for all traffic. Storage and communication use the same pipe. If you want to use database mirroring and rebuild your indexes, you may be in for a nasty surprise when mirroring drops in the middle of a rebuild.

Second: not all instance types are networked equally. Look in the Networking Performance column to get a vague idea of what’s going on. “Moderate” equates to 500 Mbps, “High” is 1000 Mbps, and “Low”/”Very Low” shouldn’t be entering the equation for a SQL Server.

Watch the storage and network throughput of your SQL Server to verify how much total throughput you need.

For SQL Server it’s better to guarantee a certain level of both network and disk performance. AWS offers what are called
EBS-Optimized Instances. These EBS-Optimized Instances use a separate network to communicate with storage in AWS. Traffic on the storage network won’t cause problems for traffic on the regular network.

Use EBS-Optimized Instances for your most demanding applications.

Other applications may not require as much throughput, but you should adjust maintenance to prevent the shared network from becoming a bottleneck.

Network considerations are:

  • What is typical network throughput for the application like?
  • Do you need separate storage and networking pathways?

Storage

storage

I’m going to repeat myself: Use EBS-Optimized Instances for your most demanding applications. You can change this after the instance is up and running but it will require a restart. Plan carefully.

For I/O intensive workloads, AWS offers Provisioned IOPS (PIOPS). The PIOPS option lets you decided on a guaranteed number of IOPS (+/- 10%) available to the system using that disk. PIOPS disks have an increased cost associated with them – they’re 2.5 times more expensive per GB than regular storage and you pay $0.10 / IOP / month.

Stripe drives with caution. In a conventional data centers, it’s a best practice use RAID to configure large volumes and to stripe data across many disks for additional performance. Although RAID striping is possible in AWS, keep in mind that any RAID volume is only as fast as the slowest disks in the RAID volume. Since all permanent storage in AWS is networked attached (and subject to unpredictable latency), think very carefully before setting up RAID volumes.

Storage is somewhat limited in AWS, too. Each AWS disk (EBS volume) can be up to 1TB in size. Up to either 16 or 25 disks can be attached to an instance (this depends on some driver limitations). At a maximum, 25TB of data can be stored on an AWS instance using EBS volumes. I bet you were worried for a minute, right? Anyone wanting to store more than 25TB of data will need scale out across multiple servers.

The storage considerations are:

  • How much storage do you need for data, logs, and backups?
  • How fast does that storage connectivity need to be?
  • How much are you going to access that storage in any given second?
  • Do you need to guarantee the accesses per second?
  • What is the rate of data growth?
  • What is the rate of data retention?
  • What is the total volume of data being stored?

Picking an AWS Instance Type for SQL Server

So how should you go about picking an AWS instance type for SQL Server?

Take a look at your memory requirements for SQL Server and compare that the instance types on offer in AWS. This may exclude several instances types immediately.

After answering the storage questions, determine if you need provisioned IOPS, dedicated storage connectivity, or even 10Gb networking. This further narrows down the choices you have for instances.

Once you’ve reached this point, you should have several choices of instance size available.

Put It All Together:

  • Network
    • What is typical network throughput for the application like?
    • Do you need separate storage and networking pathways?
    • Are features in place that require additional network throughput?
      (This might be AlwaysOn Availability Groups, database mirroring,
      or log shipping.)
  • Storage
    • How much storage do you need for data, logs, and backups?
    • How fast does that storage connectivity need to be?
    • How much are you going to access that storage in any given second?
    • Do you need to guarantee the accesses per second (IOPS)?
    • What is the total volume of data being stored?
  • Memory
    • What are the memory requirements of the existing SQL Server?
    • What is the use case for this SQL Server? OLTP? DSS?
    • What other services will run on this system?
    • Do you need Enterprise Edition?
  • CPU
    • What is the current CPU count?
    • What is the current CPU speed?
    • Do you require any Enterprise Edition features?
  • Support
    • What is the support agreement for this SQL Server?
    • What’s the support level required for this application?
    • Does the application require regular manual tuning?
    • What’s the support SLA?
    • Is there an incident response plan?
      • Has anyone ever reviewed the incident response plan?
      • Have you ever had to follow it? That is: does it actually work?
  • Instance Size: If you still have choices left to make between one of several instance sizes, ask a few more questions:
    • What is our current rate of activity growth? (e.g. as data volume doubles, will user activity double?)
    • How much will this instance cost over 1 year? 3 years?
    • Is a reserved instance a possibility (taking into account your growth rate)?

When you put it all together, sizing a SQL Server in AWS is a lot like sizing a on-premises SQL Server with several constraints around sizing and capacity.

Bonus questions:

  • Performance Profile
    • Is there a performance SLA?
    • What are the top wait types of an existing SQL Server?
    • What aspects of the application can be changed?
  • Doing Less Work

 Ignorable AWS Instance Types

Instance types we can completely ignore:

  • G2 GPU – This is for people who do math on graphics cards.
  • C3 Compute Optimized – Instances with a lot of CPU, some local SSD scratch space, and not a lot of memory. Great for HPC, not great for SQL Server.
  • HS1 High Storage Density – These monsters can have up to 24 2TB local drives delivering 2.6 gigabytes per second of disk throughput. At first glance, they look like a dream. However, the data stored on the local disks will be erased if you stop and then start your AWS instance. You could try to use AlwaysOn Availability Groups, but a total AWS Availability Zone or Regional outage could lead to massive data loss. These are primarily intended for clustered computing like Hadoop or distributed data warehouses (GreenPlum).
css.php