Let’s assume you want to get started with Oracle. Maybe your employer is switching to Oracle, maybe you just want a career change. Where do you go to get started?
Getting the Database
You can get a hold of the Oracle database in two main ways – a VM or installing it yourself. Using a VM is definitely the easiest way to get started. Oracle have provided a Oracle VM VirtualBox image that you can install. If you’re not familiar with VirtualBox, that’s okay; Oracle has set up instructions that will get you up and running quickly.
What if you want to install Oracle yourself?
You can get started with Oracle Express Edition. Hit that link and scroll all the way to the bottom. You can download Oracle Express Edition 11g Release 2. 11gR2 is the previous release of Oracle but it’s good for learning basic Oracle concepts and you’ll find a lot people are happily running Oracle 11gR2 in production.
If you want to be on the latest and greatest version of Oracle, you’ll need to download a full edition of Oracle. Even though there’s no Developer Edition of Oracle, there are five editions available to choose from. Personal Edition contains most of the features of Oracle Enterprise Edition and can be purchased from the Oracle store. If you want practice with complex DBA tasks, you’ll want to use Enterprise Edition. Otherwise, Personal Edition is the right choice.
You can also download and install the binaries directly from the Oracle database download page and run a full copy of Oracle while you evaluate the software. To the best of my knowledge, it’s only servers that are part of the development-production cycle that need to be fully licensed.
If you’re even lazier, you can spin up an instance of Oracle in one of many different clouds. Both Microsoft Azure and Amazon Web Services have a variety of different Oracle database configurations available for you to choose from.
Some people are self-directed, others prefer guided learning. I find that I’m in the second camp until I develop some skills. If you need to get started quickly, guided labs are a great way to ramp up your skills.
Oracle has created a huge amount of content about the Oracle database. The Oracle Documentation Library is the Oracle equivalent of TechNet. In addition to product documentation, ODL contains several courses – the 2 Day DBA is a good place to get started. From there you can head off into various tuning or development courses or even explore on your own.
It’s easy to get started with Oracle. You can either:
Once you’re set up, training is available through the Two Day DBA course, but there’s a wealth of information in the Oracle Documentation Library. A summary of training options is also available through the Oracle Learning Library.
To get ready for Tuesday’s webcast, here’s what you have to do:
- Watch the video below, but watch it today (or over the long weekend). There will be no live presentation this week and we won’t be rehashing all of the material in the video.
- Write down your questions or comments. (You don’t have to do this, but it’ll make it more fun.)
- Attend the live webcast on Tuesday at the usual time (11:30AM Central). Register here.
- During the first 10 minutes of the webcast, we’ll give away a prize. The catch is that you have to be there to win.
The live discussion of the video and Q&A won’t be recorded and published, and you also need to be present to win the prize. See you on Tuesday!
At some point, you’re going to need to know what’s wrong with your Oracle instance. While there are a lot of monitoring tools around, there’s always some reason why third party monitoring tools can’t be installed. Oracle has shipped with something called Statspack that provides DBAs with some ability to monitor their Oracle instance.
What Is Oracle Statspack?
Statspack is a set of tools for collecting performance data that Oracle began shipping with Oracle 8i. This isn’t a full monitoring framework, but it helps DBAs isolate poor performance within a time window. Once installed, Statspack can collect snapshots of Oracle performance. This will run on all editions of Oracle – there’s no requirement for Enterprise Edition or any Performance Pack.
Statspack does not set up any kind of regular schedule when it’s first configured. It’s up to you, the DBA, to figure out how often you need to be running Statspack. Since data has to be collected and then written somewhere, make sure you aren’t collecting data too frequently – you will be adding some load to the server.
Do I Need Special Access to Install Statspack?
Depending on how you look at it, either no special permissions are needed to install Statspack or else very high privileges are needed. Basically, you need to able to connect to Oracle with
sysdba privileges. Any Oracle DBA responsible should be able to install Statspack. The only thing that might cause some issue is if OS level access is needed for scheduling data collection.
Since Statspack was originally designed for Oracle 8i, there are some changes that need to be made if you are deploying on Oracle 12c. Take a look at the comments on Statspack Examples for help getting Statspack installed on Oracle 12c.
What Kind of Data Does Statspack Collect?
Statspack can collect a lot of information about Oracle. Users can define just how much data they want to collect. The documentation goes to great length to remind DBAs that collecting too much data can slow down the database server.
Statspack collects data based on several configurable SQL thresholds. You can see the thresholds in the
perfstat.stats$statspack_parameter table. When a query passes at least one of these thresholds, performance data will be collected.
Multiple levels of data can be collected. Oracle defines five levels of performance data collection – 0, 5, 6, 7, 10.
- Level 0 Basic performance statistics about locks, waits, buffer pool information, and general background information.
- Level 5 All of Level 0 plus SQL statement level details like number of executions, reads, number of parses (compiles in SQL Server speak), and memory usage.
- Level 6 Everything from Level 5 plus execution plans.
- Level 7 Disk metrics for particular segments that cross a threshold.
- Level 10 COLLECT ALL THE THINGS! Plus collect information about latching. Typically you shouldn’t be doing this unless someone at Oracle has suggested it. Or youreally know what you’re doing.
This data gets stored in the Statspack tables whenever a snapshot is collected. Over time, these tables will grow so make sure that there’s enough space allocated for their tablespace or else purge out older data using the
How Do I Use Statspack?
To collect data, either use the
DBMS_JOB or Oracle Scheduler interface (depending on Oracle version) or use an operating system native task scheduler.
Once you have at least two snapshots you can report on the collected data by running
$ORACLE_HOME/rdbms/admin/spreport.sql and supplying a start and end snapshot. Statspack is going to churn for a while and spit back a bunch of information. Since Statspack reports can be many thousands of lines long,
spreport.sql will write to a file.
As you look through the file, you’ll find information about I/O, locking, waits, slowest queries running (but not which users/sessions are slow), and potentially a lot more, depending on how much information you’re collecting.
For the uninitiated, Oracle ships with a bunch of scripts installed in the server’s file system. These scripts can be invoked from inside your favorite SQL tool.
Limitations of Oracle Statspack
This isn’t a silver bullet, or even a bronze bullet. But it is a bullet for shooting trouble.
Statspack isn’t an automatic process. More sophisticated tools use an agent process to automatically start collecting data once they’re installed. Statspack is not that sophisticated. It requires manual configuration – a DBA needs to set up a schedule for Statspack collection and Statspack purging.
While Statspack reports on an entire server, things get a bit weird when you start bringing Oracle RAC and Oracle 12c Multitenant into the mix. With RAC, Statspack is only reporting on a single node of the cluster – to get full cluster statistics, you should look at other tooling. Statspack can also potentially cause problems on RAC that can lead to cluster instability. With Multitenant functionality, Statspack will report on the server as a whole, but you’ll have to alter the installation scripts to take full advantage of Statspack.
Another limitation of Statspack is the granularity of the data. Performance data is collected at various DBA-specified levels and at a DBA-specified interval – the DBA needs to have good knowledge of how load may vary across a day and schedule Statspack collection appropriately. Statspack metrics can also be skewed – long running events will be reported as occurring in the Statspack interval where the SQL finally finishes. If you are collecting data every 5 minutes and an I/O intensive task runs for thirty minutes, it may look like there’s a significant I/O load in a single 5 minute period.
It may require a practiced eye to correctly interpret the Statspack reports and avoid falsely attributing heavy load to a small time window.
Finally, these metrics can’t be tied back to a single session. It’s possible to see which piece of SQL is causing problems. Frequently that can be enough, but it may still be difficult to determine if it’s a problem on the whole or a problem with a single user’s session. Other tools, such as ASH and AWR can be used to provide finer grained monitoring, depending on the licensing level of Oracle.
Oracle Statspack can provide good enough performance metrics for many common DBA tasks. By interpreting Statspack reports, a DBA can discover any number of things about the Oracle system they’re in charge of without having to use third party tooling or purchase additional features and options. This can be especially important for those with Oracle Standard Edition systems.
The only thing you ever need to use for database identity is an
IDENTITY, right? Well, maybe. There are a lot of different options and they all have different pros and cons.
The default way to identify objects in SQL Server is to use an
BIGINT column marked as an
IDENTITY. This guarantees relatively sequential numbers, barring restarts and failed inserts. Using identity columns put the responsibility for creating and maintaining object identity in the database.
SQL Server will cache
IDENTITY values and generate a new batch of identity values whenever it runs out. Because identity values are cached in memory, using identity values can lead to jumps in the sequence after SQL Server is restarted. Since identities are cached in memory in large batches, they make it possible to rapidly insert data – as long as disks are fast enough.
Sometimes the application needs more control over identity. SQL Server 2012 added sequences. A sequence, unlike an identity value, is a separate object in the database. Both application and database code can read from the sequence – multiple tables can share a sequence for an identity column or separate sequences can be created for each table.
Developers using a sequence can use the
CACHE value to cache a specific number of sequence values in memory. Or, if the application should have minimal gaps in the sequence, the
NOCACHE clause should be used.
The Problem with Sequential Identities
SEQUENCE values keep identity generation squarely in the database and, by using integral values, they keep the value narrow.
You can run into problems with sequential inserts on very busy systems – this can lead to latch contention on the trailing pages of the clustered index. This issue can be resolve by spreading inserts across the table by using a
GUID or some other semi-random clustering key. Admittedly, most systems are never going to run into this problem.
GUIDs for Object Identity
Some developers use GUIDs as a way of managing object identity. Although database administrators balk at this, there are good reasons to use GUIDs for object identity.
GUIDs let the application generate object identity. By moving object identity out to the application layer, users can do work in memory and avoid multiple round trips to the database until they’re ready to save the entire set of data. This technique gives tremendous flexibility to application developers and users.
There’s one other thing that a well designed application gets from this technique – independence from the database. An application that generates its own identity values doesn’t need the database to be online 24/7; as long as some other system is available to accept writes in lie of the database, the application still function.
Using GUIDs for object identity does have some issues. For starters, GUIDs are much wider than other integral data types – 16 bytes vs 4 bytes (
INT) or 8 bytes (
BIGINT). This is a non-issue for a single row or even for a small database, but at significant scale this can add a lot of data to the database. The other issue is that many techniques for generating sequential GUIDs in the application (see NHibernate’s GuidCombGenerator) can still run into GUID collisions.
What if you could get the best of both worlds? Applications generating unique identities that are also sequential?
The point of identity generation is to abstract away some portion identity from data attributes and provide an independent surrogate value. GUIDs can provide this, but they aren’t the perfect solution. Identity generators like flake or rustflakes promise roughly sequential identity values that are generated in the application layer and are unique across multiple processes or servers.
The problem with an external identity generator is that it is an extra piece of code that developers need to manage. External dependencies carry some risk, but these are relatively safe items that require very little effort implement and maintain.
There’s no right solution, there’s only a solution that works for you. You may even use each solution at different points in the lifecycle of the same product. It’s important, though, for developers and DBAs to be aware of how identity is currently being handled, the issues that can arise from the current solution, and ideas of how to handle it going forward.
Meet Margot. Margot is an application developer who works for a small company. Margot’s application collects and generates a lot of data from users including their interactions with the site, emails and texts that they send, and user submitted forms. Data is never deleted from the database, but only a few administrative users need to query historical data.
The database has grown considerably because of this historical data – the production database is around 90GB but only 12GB or so is actively queried. The remaining data is a record of user activity, emails, text messages, and previous versions of user data.
Margot is faced with an important decision – How should she deal with this increase in data? Data can’t be deleted, there isn’t budget to upgrade to SQL Server Enterprise Edition and use table partitioning, and there’s a push to move to a cloud service to eliminate some operational difficulties.
Using Partitioned Views to Archive Data
One option that Margot has read about is “partitioned views” – this is a method where data is split into two or more tables with a view over the top. The view is used to provide easy access to all of the information in the database. Storing data across many tables means DBAs can store data in many different ways – e.g. compressed tables or filegroups and tiered storage.
There’s a downside to this approach – all of the data is still in one database. Any HA solutions applied to the live portion of the data set will have to be applied to the entire data set. This could lead to a significant cost increase in a hosted/cloud scenario.
Archiving Data with a Historical Database
The second thing that sprang to mind was creating a separate archival database. Old data is copied into the archival database by scheduled jobs. When users need to run historical reports, the queries hit the archival database. When users need to run current reports, queries are directed to the current application database.
Margot immediately noticed one problem – what happens when a user needs to query a combination of historical and current data? She’s not sure if the users are willing to accept limited reporting functionality.
Archiving Data with Separate Data Stores
A third option that Margot considered was creating a separate database for the data that needed to be kept forever. Current data would be written to both the live database and the historical database. Any data that didn’t need to be ever be in the current database (email or SMS history) would only be written to the historical database.
Although this made some aspects of querying more complex – how could row-level security from the primary database be applied to the historical database – Margot is confident that this solves the majority of problems that they were facing.
This solution would require application changes to make querying work, but Margot and her team thought it was the most flexible solution for their current efforts: both databases can be managed and tuned separately, plus the primary database remains small.
Not every database needs to scale in the same way. What ideas do you have to solve this problem?
Big Investment in Data Warehousing
Almost all of the new features reflect an increased focus on data warehousing. This is similar to recent SQL Server releases and, indeed, most mainstream database releases. Different vendors are betting on BI and Oracle’s most recent release is no exception.
Full Database Caching
Caching data in memory can avoid costly disk-based I/O. By default, databases will cache frequently accessed data. There’s a setting in Oracle called the small table threshold – scans of tables larger than the
_small_table_threshold will use a portion of memory that should age out faster. This idea (in SQL Server it’s called page disfavoring) exists to prevent large table scans from clearing out database server memory.
What if you want those tables to stick around in memory? Modern servers can hold huge amounts of memory and with Oracle RAC, you can scale your buffer pool across multiple servers.
The full database caching feature changes this behavior. Once full database caching is enabled, Oracle will cache all data in memory as the data is read.
It’s important to note the guidance around this new feature – it should only be used if the logical size of the database is no bigger than 80% of available memory. If you’re using Oracle RAC, DBAs need to be doubly careful to make sure that the workload is effectively partitioned across all RAC nodes. Read more about the features and gotchas in the Oracle Database Performance Tuning Guide.
Big Table Caching
Speaking of larger tables, the new patch includes a feature called automatic big table caching. If you have more data than you have memory, this feature seems a lot safer than full database caching.
Rather than attempt to cache all data in memory, big table caching sets aside a portion of the buffer pool for caching data. The DBA sets up a caching target (
DB_BIG_TABLE_CACHE_PERCENT_TARGET) and up to that percentage of memory will be used for caching really big tables.
Like the full database caching feature, this works with single instances of Oracle or Oracle RAC. There are, of course, considerations for using the feature. The considerations are documented in the VLDB and Partitioning Guide.
In-Memory Column Store.
I’ve already covered Oracle’s In-Memory Column Store. There are some additional features that have been announced that make the In-Memory Column Store feature a bit more interesting.
A new data flow operator has been added to Oracle: the
VECTOR GROUP BY. This new operator enables efficient querying for In-Memory Column Store tables. Although no new SQL operators have been added to Oracle for the
VECTOR GROUP BY, it’s possible to use query hints to control optimizer behavior. Six new hints have been added to push the optimizer in the right direction.
In the Oracle world, tables are almost always heaps. Clustering is done only rarely and in special circumstances.
Attribute clustering lets the data warehouse designer specify a table order based on one or more columns of the table. So far, this sounds like something that you could already do. It is also possible to use join attribute clustering to cluster on columns connected via a foreign key relationship. The data warehouse designer, or even the DBA, can modify table structure to cluster a fact table based on dimension hierarchy.
An additional feature called interleaved ordering uses some pretty advanced storage mechanisms to get better I/O out of an attribute clustered table. Interleaved clustering makes it more efficient to perform queries where you may not know the search predicates, or where search predicates may vary over multiple columns (sounds like a dimensional model to me).
There are a few upsides to attribute clustered tables – they require fewer (or no) indexes, they may require less I/O, and they improve data compression. As always, consult the Attribute Clustering documentation for all the details and fine print.
Zone maps are new to Oracle 188.8.131.52 – they provide an additional level of metadata about the data stored in a table. Instead of operating at the row level, like an index, a zone map operates at the level of a range of blocks. In a way, an Oracle zone map is a lot like a SQL Server Columnstore dictionary – it allows I/O pruning to happen at a much coarser granularity than the row level.
When zone maps are combined with features like attribute clustering or partitioning, the database can take advantage of metadata at many levels within the zone map and perform incredibly efficient I/O pruning. Data can be pruned at the zone, partition, or subpartition level, and when attribute clustering is in use the pruning can happen at the dimension level too.
Zone maps are going to have the most benefit for Oracle databases with a heavy data warehousing workload. Big scans with common predicates, especially with low cardinality predicates, will use zone maps to eliminate a lot of I/O. Users taking advantage of attribute clustering and partitioning will also see improvements from this feature. You can read more in the Zone maps section of the data warehousing guide.
More than Data Warehousing
This is a wealth of features. Seriously, that’s a crazy investment in data warehousing for any release of a product, much less a point release.
Every Developer’s Dream: JSON Support
Everybody hates XML. It’s cumbersome to query and people love making fun of it. Many databases have been adding support for JSON and Oracle is no exception.
Oracle’s JSON implementation supports search indexes for general querying (give me the third element of the array in property “sandwiches”). Multiple index types exist for the JSON data type, giving developers fine grained control over how they interact with the data stored in the database. Oracle full text search also supports full-text queries using a new
json_textcontains operator. If you want to jump down the JSON rabbit hole, make sure you check out the JSON documentation and pay particular attention to the sections about indexing – there are a number of examples of what will and won’t work.
Astute readers will notice that with the addition of JSON to Oracle 12c, SQL Server is now the only major database platform that doesn’t support JSON as a native datatype.
Cross Container Querying
Oracle 12c brought multi-tenant databases, sometimes referred to as contained databases or containers. If you wanted to query across databases, you’d typically have to go through some shenanigans. The
CONTAINERS clause makes it easier for developers to write queries that can go across all containers, or even some containers.
Basically, instead of querying directly from one table, a developer would put the name of the table in a
SELECT CON_NAME, * FROM CONTAINERS(employees) ;
And if you wanted to query only specific containers:
SELECT CON_NAME, * FROM CONTAINERS(employees) WHERE CON_ID IN (1,2,3,4) ;
While not useful for all systems, this does allow DBAs to perform administrative or back office queries across multiple database containers. The gotcha? All databases need to have the same schema.
Oracle 184.108.40.206 has a number of new features, some of which I didn’t mention since they relate entirely to managing the contained database features. If you want to learn more, check out the new feature guide – it’s full of links to the Oracle documentation on each new feature.
Execution plans are fantastic – they make it easier to visualize how a query is running and find tuning opportunities. SQL Server DBAs starting with Oracle may be a bit frustrated by the lack of a clear way to get to execution plans. This covers three ways to get to execution plans in Oracle – estimated plans, actual plans from an active session, and actual plans from the plan cache.
Getting Estimated Oracle Execution Plans
One way, likely more familiar to MySQL and PostgreSQL DBAs, is to use the
EXPLAIN PLAN FOR command to get an execution plan out of Oracle. To do this, a DBA would simply add the command
EXPLAIN PLAN FOR to the beginning of a SQL statement:
EXPLAIN PLAN FOR SELECT o.order_id, v.product_name FROM oe.orders o, ( SELECT order_id, product_name FROM oe.order_items o, oe.product_information p WHERE p.product_id = o.product_id AND list_price < 50 AND min_price < 40 ) v WHERE o.order_id = v.order_id;
Oracle will immediately finish and return
Explained. SQL Server DBAs will likely exclaim “WTF, Oracle?” and stomp off in frustration in search of a GUI that will let them connect to Oracle and show them a graphical execution plan.
Oracle gives DBAs the ability to save off plans into tables, and that’s exactly what the
EXPLAIN PLAN FOR syntax does – it saves the plan into a table named
plan_table. You can view the execution plan using the following command:
SELECT plan_table_output FROM table(dbms_xplan.display());
This produces lovely text output:
Plan hash value: 1906736282 --------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 40 | 6 (0)| 00:00:01 | | 1 | NESTED LOOPS | | 1 | 40 | 6 (0)| 00:00:01 | | 2 | MERGE JOIN CARTESIAN| | 4 | 128 | 6 (0)| 00:00:01 | |* 3 | TABLE ACCESS FULL | PRODUCT_INFORMATION | 1 | 28 | 5 (0)| 00:00:01 | | 4 | BUFFER SORT | | 105 | 420 | 1 (0)| 00:00:01 | | 5 | INDEX FULL SCAN | ORDER_PK | 105 | 420 | 1 (0)| 00:00:01 | |* 6 | INDEX UNIQUE SCAN | ORDER_ITEMS_UK | 1 | 8 | 0 (0)| 00:00:01 | --------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("MIN_PRICE"<40 AND "LIST_PRICE"<50) 6 - access("O"."ORDER_ID"="ORDER_ID" AND "P"."PRODUCT_ID"="O"."PRODUCT_ID") 19 rows selected.
There are three parts here - the plan hash value, the row source tree, and predicate information. Plans can be read similarly to SQL Server execution plans; just start at the deepest level and work your way out. In this case, start at
Id 6 and work your way up to 3, then 2, then 1, then 0. An asterisk next to the
Id means that there’s more detailed information about that step in the
Predicate Information section.
You can get more detailed output from Oracle by supplying parameters to the
select plan_table_output from table(dbms_xplan.display('plan_table', null, 'all'));
More documentation on
DBMS_XPLAN can be found in the Oracle documentation.
Getting Actual Execution Plans From Oracle
Run the query first:
SELECT o.order_id, v.product_name FROM oe.orders o, ( SELECT order_id, product_name FROM oe.order_items o, oe.product_information p WHERE p.product_id = o.product_id AND list_price < 50 AND min_price < 40 ) v WHERE o.order_id = v.order_id;
And then follow up by executing:
select * from table(dbms_xplan.display_cursor());
You should see the following execution plan:
SQL_ID gpbdw897v4mbg, child number 0 ------------------------------------- SELECT o.order_id, v.product_name FROM oe.orders o, ( SELECT order_id, product_name FROM oe.order_items o, oe.product_information p WHERE p.product_id = o.product_id AND list_price < 50 AND min_price < 40 ) v WHERE o.order_id = v.order_id Plan hash value: 1906736282 --------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 6 (100)| | | 1 | NESTED LOOPS | | 1 | 40 | 6 (0)| 00:00:01 | | 2 | MERGE JOIN CARTESIAN| | 4 | 128 | 6 (0)| 00:00:01 | |* 3 | TABLE ACCESS FULL | PRODUCT_INFORMATION | 1 | 28 | 5 (0)| 00:00:01 | | 4 | BUFFER SORT | | 105 | 420 | 1 (0)| 00:00:01 | | 5 | INDEX FULL SCAN | ORDER_PK | 105 | 420 | 1 (0)| 00:00:01 | |* 6 | INDEX UNIQUE SCAN | ORDER_ITEMS_UK | 1 | 8 | 0 (0)| | --------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter(("MIN_PRICE"<40 AND "LIST_PRICE"<50)) 6 - access("O"."ORDER_ID"="ORDER_ID" AND "P"."PRODUCT_ID"="O"."PRODUCT_ID") 28 rows selected.
Hey, that looks exactly the same! The default parameters for
dbms_xplan.display_cursor won’t show us all the juicy details that we want. Adding in
FORMAT => '+ALLSTATS'will do exactly what we want. Before you go running that execution plan query again, you have two choices: you can either execute the full select query again, or you can copy down the
SQL_ID and use it in the
dbms_xplan.display_cursor function like this:
select * from table(dbms_xplan.display_cursor(sql_id => 'gpbdw897v4mbg', format => '+ALLSTATS'));
The output gets a bit more detailed and ends up looking like this:
SQL_ID 5p20zwvbgb93j, child number 0 ------------------------------------- SELECT o.order_id, v.product_name FROM oe.orders o, ( SELECT order_id, product_name FROM oe.order_items o, oe.product_information p WHERE p.product_id = o.product_id AND list_price < 50 AND min_price < 40 ) v WHERE o.order_id = v.order_id Plan hash value: 1906736282 ---------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | O/1/M | ---------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 269 |00:00:00.08 | 1337 | | | | | 1 | NESTED LOOPS | | 1 | 1 | 269 |00:00:00.08 | 1337 | | | | | 2 | MERGE JOIN CARTESIAN| | 1 | 4 | 9135 |00:00:00.03 | 34 | | | | |* 3 | TABLE ACCESS FULL | PRODUCT_INFORMATION | 1 | 1 | 87 |00:00:00.01 | 33 | | | | | 4 | BUFFER SORT | | 87 | 105 | 9135 |00:00:00.01 | 1 | 4096 | 4096 | 1/0/0| | 5 | INDEX FULL SCAN | ORDER_PK | 1 | 105 | 105 |00:00:00.01 | 1 | | | | |* 6 | INDEX UNIQUE SCAN | ORDER_ITEMS_UK | 9135 | 1 | 269 |00:00:00.02 | 1303 | | | | ---------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter(("MIN_PRICE"<40 AND "LIST_PRICE"<50)) 6 - access("O"."ORDER_ID"="ORDER_ID" AND "P"."PRODUCT_ID"="O"."PRODUCT_ID") 28 rows selected.
Now we can see the estimated (E-Rows) and actual (A-Rows) returned from each operation as well as timing and memory information. Parallel queries will return even more information (take a look at the example
DBMS_XPLAN commands and output to see what that would look like.
Viewing The Oracle Execution Plan Cache
You might hear Oracle people talk about this as the library cache or the cursor cache. In the SQL Server world, we’d call it the execution plan cache. Whatever you want to call it, it’s contained in a system view:
We could find our query in
V$SQL_PLAN and get the execution plan doing something like this:
select plan_table_output from v$sql s, table(dbms_xplan.display_cursor(s.sql_id, s.child_number, 'basic')) t where s.sql_text like 'SELECT o.order_id, v.product_name%';
This query will pull back basic information about the different statements that have been executed that match our search condition. That output looks different because I used the
basic parameter for plan display instead of the default of
typical, but here’s an example:
EXPLAINED SQL STATEMENT: ------------------------ SELECT o.order_id, v.product_name FROM oe.orders o, ( SELECT order_id, product_name FROM oe.order_items o, oe.product_information p WHERE p.product_id = o.product_id AND list_price < 50 AND min_price < 40 ) v WHERE o.order_id = v.order_id Plan hash value: 1906736282 ----------------------------------------------------- | Id | Operation | Name | ----------------------------------------------------- | 0 | SELECT STATEMENT | | | 1 | NESTED LOOPS | | | 2 | MERGE JOIN CARTESIAN| | | 3 | TABLE ACCESS FULL | PRODUCT_INFORMATION | | 4 | BUFFER SORT | | | 5 | INDEX FULL SCAN | ORDER_PK | | 6 | INDEX UNIQUE SCAN | ORDER_ITEMS_UK | -----------------------------------------------------
If you want to get the full plan from any of these statements, you can either include the
sql_id in the query to find all plans for a statement, or you can use the plan hash value to get retrieve it from the
v$sql_plan view and use it in the
In the query to retrieve matching plans, our query uses Oracle's
v$sql_plan system view to find matching cursors (everything's a cursor, yay) in memory. Once we have a list of matching cursors in memory,
dbms_xplan.display_cursor is used to pull back information about the actual plan from the cursor cache.
There you have it - three ways to view execution plans in Oracle:
EXPLAIN PLAN FORto get an estimated execution plan.
dbms_xplan.display_cursor()after running a query to get the actual execution plan.
v$sql_planto find plans in the plan cache.
SQL Server experts and novices alike can get the kind of query details from Oracle that they’re used to seeing from SQL Server.
One of a DBA’s most important tasks is taking backups of databases. As SQL Server DBAs, we’re used to using the same tools to backup the database that we use for the rest of our job. With Oracle, there are two main ways to back up a database: user managed backups and Oracle Recovery Manager (RMAN).
User Managed Backups of Oracle
User managed backups leave the database out of the backup equation. It’s up to the DBA and system administrator to figure out the best way to accomplish backup and recovery.
In the distant past, user managed backups required that the Oracle database be taken offline in order for the backup to occur. The database has to be in a consistent state; since the OS doesn’t know how Oracle is writing data, it can’t be trusted to copy files correctly.
Obviously, it’s not possible to take most databases out of production to take a backup. Oracle has a feature called a “hot backup”. In a hot backup, databases or tablespaces can be placed into backup mode. Once in backup mode a DBA or system administrator can use OS/storage tools to take a backup of the Oracle database. Writes can continue during this time, but Oracle changes its behavior to make it possible to get a consistent snapshot of the data.
As the name implies, user managed backups place the entire burden of managing database backups on the DBA. While this is more primitive than what most SQL Server DBAs are used to, the fundamental principle is the same – it’s up to the DBA to figure out when and how to backup the database.
Backing Up an Oracle Database With Recovery Manager
RMAN is the answer to your anguished cries of “There has to be a better way to back up a database!”
RMAN combines many aspects of automated user managed backups, but provides additional functionality and fail safe mechanisms. The overall upside is that RMAN ships with Oracle – there’s limited need to write custom software, instead DBAs configure RMAN to work in ways that make sense for the environment.
Effectively, here’s what happens: the DBA sets up RMAN with space to store backups, the fast recovery area in RMAN terms. The fast recovery area is where RMAN will store data file backups as well as archived redo log files. The idea behind the fast recovery area is that RMAN itself will manage backup retention based on policies that a DBA sets up. By having a fast recovery area present, it’s also possible to rapidly handle online restore – a DBA just has to tell RMAN a point in time for the restore and the software does the rest.
RMAN is a highly configurable tool – it supports many different types of backups (including the hot backups mentioned earlier), full backups similar to SQL Server full backups (data + some log), incremental backups, and even collapsing incremental backups so that a DBA has to restore fewer files. It’s relatively simple to start out with RMAN, but it provides a wealth of functionality that lets a DBA create the backup and recovery strategy the business needs.
Not everything is automatic: DBAs need to carefully analyze backup space requirements. RMAN handles backups, but if the fast recovery area fills up, the database will refuse writes until space is freed up.
By leveraging RMAN, DBAs can focus less on the technical details of backups and instead get to managing the business requirements of RPO and RTO while leaving tedious tasks to software.
How Should a SQL Server DBA Approach This?
From a SQL Server DBA’s perspective, RMAN is closer to a third party solution like Ola Hallengren’s maintenance scripts combined with a third party tool like Dell Litespeed or Red Gate SQL Backup. It handles many aspects of database backups, provides centralized reporting, and automates most tasks around backup and recovery. RMAN will seem more familiar to a SQL Server DBA, but some shops will still use user managed backups.
To get started learning more, check out the Oracle Database Backup and Recovery User’s Guide. It contains a number of detailed examples, syntax explanations, and considerations about how to best approach backing up an Oracle database.
At some point you’re going to be confronted with an Oracle installation or even just an Oracle DBA. Communicating with a DBA who works on a different product can be difficult, it’s like speaking US English and having a conversation with a native English speaker from Scotland. The words are the same, but they have different meanings.
While this is by no means an exhaustive list, it will help SQL Server DBAs have a meaningful conversation with their Oracle colleagues.
Oracle refers to the database as the data files on a disk that store data.
The set of memory structures and system processes that manage database files. Basically, the instance is executables and memory. Oracle has some different terms to separate out plan cache, buffer pool, and other concepts. But at a high level, executables and memory make a database instance.
So far things seem the same. Up until Oracle 12c, though, these two concepts were close to one in the same – one instance of Oracle housed one database (things like Oracle RAC not included). One thing to take note of – Oracle on Windows runs within one process, just like SQL Server. On Linux, however, there will be multiple Oracle processes each with a clearly defined purpose.
A tablespace is roughly analogous to a filegroup. You can create tables and indexes inside a tablespace. Like a filegroup, you can take tablespace backups separate from the rest of the database.
Unlike SQL Server, each tablespace can have many different options – some tablespaces can be logged while others are not. During tablespace creation, DBAs can manage a variety of features of each tablespace including a separate undo tablespace (see below), per user disk quotas, logging, or even on-disk block size (this can be helpful when dealing with LOB data).
In short, Oracle DBAs can customize database behavior at the tablespace level as well as at the database level. This can be useful for controlling archive data performance, blobs, or managing other aspects of storage.
Every user is created with a default tablespace. The default tablespace defines where that user’s tables and indexes will be created unless a different location is specified. This is like setting up a default filegroup, but it can be set per user instead of per database, and it provides finer grained control. A default tablespace is not like a default schema in SQL Server – users can create objects with different schemas inside their default tablespace. This isn’t related to object ownership like schemas in SQL Server, it’s related to object placement on disk.
You know how SQL Server has one tempdb? Within Oracle, database administrators can specify a different temporary work space on a user by user basis. Fast OLTP workloads can have access to SSD temporary tablespace while data warehouse queries and ETL jobs can have their own temporary tablespace that uses rotational disks. Heck, you could even allocate PCI-Express storage for executives’ temporary tablespace if they needed lightning fast joins or just wanted to foot the bill for PCI-Express storage.
Oracle uses MVCC by default (in SQL Server you’d call it
READ COMMITTED SNAPSHOT ISOLATION). Row versions have to be stored somewhere, but there’s no tempdb. The undo tablespace is used to track changes that have been made and to put the database back into a consistent state if a transaction is rolled back. Although it is possible to create multiple undo tablespaces, only one undo tablespace will be used any single Oracle instance at a time.
If only one tablespace can be active per Oracle instance, why have multiple undo tablespaces? Oracle RAC can contain multiple Oracle instances reading the same database. Each of the Oracle RAC instances can have a separate undo tablespace. If this sounds confusing, don’t worry – Oracle RAC is complex and deserves a separate blog post.
Once upon a time, Oracle DBAs had to configure the undo tablespace by hand. This was called the rollback segment. Poorly configured rollback segments led to “snapshot too old” errors and grumpy DBAs. If you ever encounter Oracle using a rollback segment, kindly ask the DBA why they aren’t using automatic rollback management (undo tablespaces).
Redo log files
It’s a transaction log file! A key Oracle difference is that everything gets logged, even the undo information. Redo log files are used just like SQL Server transaction log files.
Like SQL Server’s transaction log, Oracle can have multiple redo log files. These log files are written to in a circular fashion – the log files are written to in order and, when all log files are full, Oracle will circle around to the beginning again.
Unlike SQL Server, having multiple redo log files is the preferred way to manage Oracle logging. By default, there are two groups of redo log files, but this can and should be configured, based on RPO/RTO needs.
Archived redo log files
These are redo log files that have been backed up. There are a number of ways to have Oracle automatically manage creating backups of redo log files that vary from manual to completely automated. If the disks storing these files fills up, Oracle will not be able to write to the data files – active redo log files can’t be archived any more. To ensure safety, writes are stopped.
Oracle temporary tables are similar to SQL Server’s with one major exception – they’re statically defined. Even though an Oracle temp table definition will stick around until dropped, the data only persists for the duration of a session (or transaction if the table is configured that way).
The data inside a temporary table exists only for the current session – you can’t view data in another session’s temp table. The upside is that temp table metadata is always available for other users to query.
Oracle backups are very different from SQL Server backups – they’re both more simple and more complex than SQL Server at the same time. Many Oracle shops use a tool call Oracle Recovery Manager (RMAN) to handle database and redo log backups, archival, and even the expiration and deletion of backup files.
Both Microsoft and Oracle have released what they call “in-memory analytics” features. Microsoft started two years ago with SQL Server 2012 and Oracle is following suit next month (July 2014) with the Oracle In-Memory Analytics feature. Most of us won’t pick our database platform based on just one feature, but it’s important to know the difference between these two so you can have a better discussion around the technical choices that you’re making.
SQL Server Columnstore Indexes
When they first arrived, SQL Server columnstore indexes were touted as memory optimized. By using a columnar data structure and columnar compression techniques, SQL Server columnstore indexes are able to get tremendous improvements in performance.
In effect, SQL Server columnstore indexes are “in-memory” because of the effects of columnar compression. The compression means that the index size is significantly smaller. It’s likely that the whole index can be cached in memory and will stay in memory because the index smaller.
By thinking smarter, Microsoft created a technology that allowed BI professionals to achieve significant speeds on data warehouses. There was one problem: the SQL Server 2012 columnstore was read-only.
SQL Server 2014 columnstore fixed the read-only problem by introducing a writable clustered columnstore. When the index is created as a clustered columnstore index (it’s the only index on the table) SQL Server will use a writable deltastore to hold data. This advantage over SQL Server 2012 has prompted many teams to migrate to SQL Server 2014.
The downside of both SQL Server columnstore implementations is that they’re designed for pure data warehouse workloads. If you wanted to combine an OLTP workload with a data warehouse workload on the same system, you’d have some work head of you to build something that would work with both regular indexes and columnstore indexes. At its core, SQL Server columnstore technology is a data warehouse technology.
Oracle Database In-Memory
Oracle Database In-Memory takes a different approach to SQL Server. The Oracle feature is designed to be a transparent drop-in. Rather than require a separate physical structure on disk, a DBA will activate the in-memory feature for a given table and wait as Oracle does the rest.
Oracle does this by creating a dual-format data structure. As data is read and written to tables using the in-memory feature, the database builds columnar indexes in-memory and on-the-fly. Yes, this means that you’ll need more memory in your database servers. Many Oracle customers use Oracle RAC for scale-out performance improvements. Dedicating one RAC node to reporting isn’t unheard of and can be used to alleviate the pressure caused by holding a second, albeit compressed, copy of a table or index in memory.
Once the row-level data has been merged into the columnar index, Oracle uses compression to further improve performance. Just like other columnar database engines, compression vastly improves the performance of queries.
Oracle’s offering doesn’t persist the in-memory analytic indexes to disk – they’re purely in-memory indexes. This avoids the overhead of logging and persisting them to disk. The double write hit is avoided. However, there’s one problem with this technology – restarts. Whenever the database engine is restarted, the in-memory analytic indexes will have to be rebuilt from disk.
Technical details about the Oracle offering are sparse – we can expect to learn more after Oracle has released the feature to the public.
Comparing In-Memory Analytics
While both companies have produced an in-memory analytic feature of one form or another, direct comparisons are difficult. Microsoft SQL Server columnstore indexes are designed to improve pure data warehouse implementations. Oracle Database In-Memory, on the other hand, is intended to help DBAs get the most out of the hardware they already have. Many business users want to report on live data and this feature makes live reporting a possibility.