If you’re going to be working with Oracle, you need to be able to get a better handle on what’s going on with the Oracle database. Just like other database platforms, Oracle provides a data dictionary to help users interrogate the database system.
Looking at System Objects
Database administrators can view all of the objects in an Oracle system through the
DBA_% prefixed objects.
You can get a list of all available views through the
dba_objects system view:
/* There's a gotcha here: if you installed Oracle as suggested, you'll be using a case sensitive collation. That's not a big deal, just don't forget that while you don't need to capitalize object names in SQL*Plus, you do need to capitalize the names while you're searching. */ SELECT COUNT(DISTINCT object_name) FROM dba_objects WHERE object_name LIKE 'DBA_%';
And the results:
COUNT(*) ---------- 1025
Just over 1000 views, eh? That’s a lot of system views. If you just want to examine a list of tables stored in your Oracle database you can use the
dba_tables view to take a look. Here we’ll look at the
EXAMPLE database schema:
SELECT owner, tablespace_name, table_name FROM dba_tables WHERE tablespace_name = 'EXAMPLE' ORDER BY owner, tablespace_name, table_name ;
The curious can use the
desc command to get a list of all columns available, either in the
dba_tables view, or any of the tables returned by querying
A user shouldn’t have access to the
DBA_ views. Those are system level views and are best left to people with administrative access to a system. If a user shouldn’t have that level of access, what should they have? Certainly they should have access to their own objects.
Users can view their own data with the
USER_ views. There’s a
user_objects table that will show information about all objects visible to the current user. If you just want to see your own tables, you can use the
user_tables view instead:
SELECT table_name, tablespace_name FROM user_tables ;
Of course, users may have access to more than database objects that they own. In these cases, users can use the
ALL_ views to see everything that they have access to:
SELECT COUNT(DISTINCT object_name) FROM all_objects UNION ALL SELECT COUNT(DISTINCT object_name) FROM dba_objects ;
Running this query nets 52,414 rows in
all_objects and 54,325 in
dba_objects. Clearly there are a few things that I don’t have direct access to, and that’s a good thing.
System Status with V$ Views
Oracle’s V$ views record current database activity. They provide insight into current activity and, in some cases, they also provide insight into historical activity. There are a number of dynamic performance views (Oracle’s term for the V$ views) covering everything from waits to sessions to data access patterns and beyond.
As an example, you can view all sessions on an Oracle database using the
SELECT sid, username, machine FROM v$session WHERE username IS NOT NULL ;
Oracle has a wait interface, just like SQL Server. Waits are available at either the system or session level. The
v$system_event view shows wait information for the life of the Oracle process. The
v$session_event view shows total wait time at a session level (what has this process waited on since it started). You can look at currently running (or just finished sessions) using
Using this, we can look into my session on the system with:
SELECT wait_class, event, total_waits, time_waited, average_wait, max_wait, time_waited_micro FROM v$session_event WHERE wait_class <> 'Idle' AND SID = 255 ;
Don’t be afraid to explore on your local installation. There’s no harm in playing around with different Oracle features to determine how they work and what kind of information you can glean from them.
You can also use the GV$ views, thanks to Jeff Smith for pointing out my omission. These are views that are designed for Oracle RAC so you can see the health of every node in the RAC cluster. The upside of this is that you can get a big picture of an entire cluster and then dive into individual nodes using the V$ views on each node. You can even execute queries that use the GV$ views, even if you don’t have RAC, and you’ll be just fine.
A Word of Warning
Be careful with the both the data dictionary and the V$ views – querying certain views may trigger license usage to show up in the
dba_feature_usage_statistics view. Before using features like Active Session History or the Automatic Workload Repository, make sure that you have the proper features licensed for your Oracle database. Using these optional features for your own education is fine.
Tune in here to watch our webcast video for this week! To join our weekly webcast for live Q&A, make sure to watch the video by 12:00 PM EST on Tuesday, October 7! Not only do we answer your questions, we also give away a prize at 12:25 PM EST – don’t miss it!
Have questions? Feel free to leave a comment so we can discuss it on Tuesday!
It’s Hard to Configure
Historically speaking, Oracle was a bit painful to configure. A DBA needed to be able to size internal components like the rollback segment, buffer cache, large object cache, sort area, and a number of other memory structures. This gave Oracle a reputation for being difficult to configure. Rightfully so – compared to SQL Server at the time, Oracle was difficult to configure.
Starting with Oracle 9i, the database included limited automatic memory management features. Instead of having to size many aspects of memory, Oracle DBAs just had to size two. And with the introduction of Oracle 11g, Oracle memory management became a matter of configuring a max memory target.
Tuning is Complicated
Database tuning is hard. Thankfully databases just come with GUI wizards that work every time, right?
Database tuning is difficult in both SQL Server and Oracle. Oracle DBAs have a wealth of system views to choose from when designing performance reports. There are the usual tools to get information about instance-level CPU, disk, and other waits.
On top of the system views, Oracle users who have licensed the Performance Pack have access to the Automatic Workload Repository (AWR). AWR constantly collects information about Oracle performance and allows DBAs to get a fine-grained view of performance at a number of levels. On top of the system views provided by AWR, it’s also possible to generate AWR reports that generate analysis of database performance over a period of time.
The User Interface is Bad
SQL Server DBAs and developers who are used to SQL Server Management Studio are initially horrified when they’re exposed to Oracle’s command line user interface through SQL*Plus or RMAN. Although the command line is a rough introduction to a product, it’s also a rich environment where users can run scripts, prompt for input mid-script, and create full featured applications with little more than PL/SQL. Although the command line tools appear unforgiving, they offer a wealth of information, built-in help, and query editing capabilities that tie into the user’s primary tools.
Users who refuse to get on the command line aren’t left out in the cold. Oracle has a pair of tools – Enterprise Manager and SQL Developer that provide additional tooling for DBAs and developers. Enterprise Manager provides a dashboard for DBAs and system administrators to review server health at many different levels – from the enterprise through to the datacenter and all the way down to a single server. SQL Developer is a development tool with built-in reports; SQL Server professionals will find SQL Developer to be very familiar.
It Doesn’t Run Well on Windows
“Oracle just doesn’t run well on Windows.” I’ve heard this phrase a lot. Oracle runs on Windows and Windows is officially supported by Oracle for production deployments. Anecdotally, there are very few Windows only bugs for the Oracle database proper; most bugs are cross-platform.
However, you will find that almost all Oracle examples assume you’re running Oracle on a Linux or UNIX system. A quick scan of various forums, blogs, and other online resources indicates that maybe 20% of Oracle deployments are on Windows. Don’t let that stop you from learning about Oracle – most functionality can be accessed with only minimal knowledge of the operating system. For everything else, there’s always your favorite search engine.
You Need a Team of DBAs
Everyone knows that a SQL Server DBA can manage far more SQL Servers than an Oracle DBA, right? After all, with all that manual memory management, lack of tuning, and no Windows support, you need a team of talented UNIX system administrators to keep Oracle running well.
While it may have required a village to run an Oracle database in the past, it hasn’t been that way for some time. Recent versions of Oracle have automated many of the involved processes. Other features like RMAN and AWR reports provide time-saving features that make it easier for DBAs to do more work.
What other misconceptions have you heard about Oracle’s place in the world of databases?
Microsoft just announced a new round of D-grade VMs that have 60% faster CPU and local SSD than can go up to 7,000 IOPS in a canned IOmeter test. Before jumping to conclusions or, even worse, picking a cloud provider, it’s best to look at these numbers critically.
The new CPU is being advertised as 60% faster than the previous generation of processors. Clearly this has got to be some next generation hardware, right? Maybe we’ll get access to the new Xeon v3 – it’s not that outlandish of an idea; Amazon Web Services (AWS) had Xeon v2s in their datacenters before the chips were generally available.
Glenn Berry, a consultant who digs into computers for fun, did some initial testing with these new Azure instance types. In his investigations, he saw 2.2GHz E5-2660 chips. These aren’t even the slower end of the new generation of Intel Xeon v2 chips – they’re the previous generation of CPU… from 2012. Azure trades raw power for power efficiency.
If these not-so-fast CPUs are 60% faster, what are your current Azure VMs and SQL Database instances running on? Anecdotal evidence indicates that the current generation of A and P series VMs are running on older AMD Opteron hardware. Older AWS hardware is in the same boat, but it’s slowly being phased out.
Microsoft are reporting performance of up to 7000 IOPS per local Azure SSD but persistent storage is still rotational. During the D Series SSD VMs interview a screenshot of iometer at 7,000 IOPS is shown, but no additional information is provided. Iometer tests typically use a 4k read/write block size for tests, which is a great size for random file access. It’s not awesome for SQL Server, but we can divide that by 16 to get a representative SQL Server number…
437.5 64KB IOPS.
Or so the Azure Product Manager says in the original interview. I don’t believe what I hear, and you shouldn’t either, so I fired up an Azure D14 VM to see for myself. What I saw was pleasantly surprising:
If we dig into the IOPS provided by Crystal Disk Mark, we see a decent looking picture unfold:
----------------------------------------------------------------------- CrystalDiskMark 3.0.3 x64 (C) 2007-2013 hiyohiyo Crystal Dew World : http://crystalmark.info/ ----------------------------------------------------------------------- * MB/s = 1,000,000 byte/s [SATA/300 = 300,000,000 byte/s]
Sequential Read : 705.103 MB/s Sequential Write : 394.053 MB/s Random Read 512KB : 528.562 MB/s Random Write 512KB : 398.193 MB/s Random Read 4KB (QD=1) : 16.156 MB/s [ 3944.4 IOPS] Random Write 4KB (QD=1) : 26.506 MB/s [ 6471.1 IOPS] Random Read 4KB (QD=32) : 151.645 MB/s [ 37022.8 IOPS] Random Write 4KB (QD=32) : 167.086 MB/s [ 40792.5 IOPS] Test : 4000 MB [D: 2.0% (16.2/800.0 GB)] (x5) Date : 2014/09/23 0:24:10 OS : Windows Server 2012 R2 Datacenter (Full installation) [6.3 Build 9600] (x64)
What’s it really mean? It means that the 7,000 IOPS number reported was probably for 4KB random writes. It’s hardly representative of SQL Server workloads, but we also can see what kind of numbers the drives will pull under significant load.
Comparing AWS and Azure Performance
AWS offers an instance called the r3.4xlarge. It comes with 16 cores and 122GB of memory. The AWS instance type is about the same as the D14 (16 cores and 112GB of memory). The D14 is $2.611 / hour. The AWS instance is $1.944 / hour.
All prices include Windows licensing.
So far, the Azure D-grade instance costs 70 cents more per hour for 4.8GHz fewer clock cycles and 10GB less memory. Not to mention the computational differences between the current generation of CPU and what Azure is running.
Surely the SSD must be amazing…
Not so fast. Literally.
Some AWS local SSDs benchmark have reported numbers as high 20,000 16KB IOPS for random write and 30,000 16KB IOPS for sequential read. Sure, the AWS instance only has a 320GB disk, but it’s capable of performing 5,000 64KB IOPS compared to the 440 IOPS (I rounded up to be generous) that Azure supplies.
In my testing, the AWS local SSD beat out the Azure SSD on random I/O by a reasonable margin:
How about those IOPS?
----------------------------------------------------------------------- CrystalDiskMark 3.0.3 x64 (C) 2007-2013 hiyohiyo Crystal Dew World : http://crystalmark.info/ ----------------------------------------------------------------------- * MB/s = 1,000,000 byte/s [SATA/300 = 300,000,000 byte/s] Sequential Read : 404.856 MB/s Sequential Write : 350.255 MB/s Random Read 512KB : 348.770 MB/s Random Write 512KB : 349.176 MB/s Random Read 4KB (QD=1) : 21.337 MB/s [ 5209.3 IOPS] Random Write 4KB (QD=1) : 38.448 MB/s [ 9386.7 IOPS] Random Read 4KB (QD=32) : 261.320 MB/s [ 63798.8 IOPS] Random Write 4KB (QD=32) : 237.201 MB/s [ 57910.4 IOPS] Test : 4000 MB [Z: 0.0% (0.1/300.0 GB)] (x5) Date : 2014/09/23 1:05:22 OS : Windows Server 2012 R2 Server Standard (full installation) [6.3 Build 9600] (x64)
So… First – Azure offers really good local SSD performance if you decide to purchase the entire instance. Using a D14 instance type is a reasonable expectation for customers deploying SQL Server – SQL Server is a power hungry monster and it deserves to be fed.
Despite their truth, the Azure numbers aren’t all they’re cracked up to be. Here’s how it breaks down:
Cost: 34% more expensive
Sequential Reads: 74% faster
Sequential Writes: 12.5% faster
Random Reads: 42% slower/fewer IOPS
Random Writes: 30% slower/fewer IOPS
Azure has a history of mediocre performance, but it’s well-documented mediocre performance. Azure persistent storage currently maxes out at 500 no-unit-given IOPS per disk (compared to AWS’s 4,000 256KB IOPS for EBS volumes), but these limits are well-documented.
The Bottom Line
Not all clouds are created equal and 60% more doesn’t mean that it’s any better than it was before. It’s up to you, dear reader, to determine what 60% faster means and how that applies to your environment. For companies dipping their toes in the cloud waters, be very wary with the new improved Azure performance. You may find that you’re deploying far more VMs than you thought, just to handle the same workload.
Let’s assume you want to get started with Oracle. Maybe your employer is switching to Oracle, maybe you just want a career change. Where do you go to get started?
Getting the Database
You can get a hold of the Oracle database in two main ways – a VM or installing it yourself. Using a VM is definitely the easiest way to get started. Oracle have provided a Oracle VM VirtualBox image that you can install. If you’re not familiar with VirtualBox, that’s okay; Oracle has set up instructions that will get you up and running quickly.
What if you want to install Oracle yourself?
You can get started with Oracle Express Edition. Hit that link and scroll all the way to the bottom. You can download Oracle Express Edition 11g Release 2. 11gR2 is the previous release of Oracle but it’s good for learning basic Oracle concepts and you’ll find a lot people are happily running Oracle 11gR2 in production.
If you want to be on the latest and greatest version of Oracle, you’ll need to download a full edition of Oracle. Even though there’s no Developer Edition of Oracle, there are five editions available to choose from. Personal Edition contains most of the features of Oracle Enterprise Edition and can be purchased from the Oracle store. If you want practice with complex DBA tasks, you’ll want to use Enterprise Edition. Otherwise, Personal Edition is the right choice.
You can also download and install the binaries directly from the Oracle database download page and run a full copy of Oracle while you evaluate the software. To the best of my knowledge, it’s only servers that are part of the development-production cycle that need to be fully licensed.
If you’re even lazier, you can spin up an instance of Oracle in one of many different clouds. Both Microsoft Azure and Amazon Web Services have a variety of different Oracle database configurations available for you to choose from.
Some people are self-directed, others prefer guided learning. I find that I’m in the second camp until I develop some skills. If you need to get started quickly, guided labs are a great way to ramp up your skills.
Oracle has created a huge amount of content about the Oracle database. The Oracle Documentation Library is the Oracle equivalent of TechNet. In addition to product documentation, ODL contains several courses – the 2 Day DBA is a good place to get started. From there you can head off into various tuning or development courses or even explore on your own.
It’s easy to get started with Oracle. You can either:
Once you’re set up, training is available through the Two Day DBA course, but there’s a wealth of information in the Oracle Documentation Library. A summary of training options is also available through the Oracle Learning Library.
To get ready for Tuesday’s webcast, here’s what you have to do:
- Watch the video below, but watch it today (or over the long weekend). There will be no live presentation this week and we won’t be rehashing all of the material in the video.
- Write down your questions or comments. (You don’t have to do this, but it’ll make it more fun.)
- Attend the live webcast on Tuesday at the usual time (11:30AM Central). Register here.
- During the first 10 minutes of the webcast, we’ll give away a prize. The catch is that you have to be there to win.
The live discussion of the video and Q&A won’t be recorded and published, and you also need to be present to win the prize. See you on Tuesday!
At some point, you’re going to need to know what’s wrong with your Oracle instance. While there are a lot of monitoring tools around, there’s always some reason why third party monitoring tools can’t be installed. Oracle has shipped with something called Statspack that provides DBAs with some ability to monitor their Oracle instance.
What Is Oracle Statspack?
Statspack is a set of tools for collecting performance data that Oracle began shipping with Oracle 8i. This isn’t a full monitoring framework, but it helps DBAs isolate poor performance within a time window. Once installed, Statspack can collect snapshots of Oracle performance. This will run on all editions of Oracle – there’s no requirement for Enterprise Edition or any Performance Pack.
Statspack does not set up any kind of regular schedule when it’s first configured. It’s up to you, the DBA, to figure out how often you need to be running Statspack. Since data has to be collected and then written somewhere, make sure you aren’t collecting data too frequently – you will be adding some load to the server.
Do I Need Special Access to Install Statspack?
Depending on how you look at it, either no special permissions are needed to install Statspack or else very high privileges are needed. Basically, you need to able to connect to Oracle with
sysdba privileges. Any Oracle DBA responsible should be able to install Statspack. The only thing that might cause some issue is if OS level access is needed for scheduling data collection.
Since Statspack was originally designed for Oracle 8i, there are some changes that need to be made if you are deploying on Oracle 12c. Take a look at the comments on Statspack Examples for help getting Statspack installed on Oracle 12c.
What Kind of Data Does Statspack Collect?
Statspack can collect a lot of information about Oracle. Users can define just how much data they want to collect. The documentation goes to great length to remind DBAs that collecting too much data can slow down the database server.
Statspack collects data based on several configurable SQL thresholds. You can see the thresholds in the
perfstat.stats$statspack_parameter table. When a query passes at least one of these thresholds, performance data will be collected.
Multiple levels of data can be collected. Oracle defines five levels of performance data collection – 0, 5, 6, 7, 10.
- Level 0 Basic performance statistics about locks, waits, buffer pool information, and general background information.
- Level 5 All of Level 0 plus SQL statement level details like number of executions, reads, number of parses (compiles in SQL Server speak), and memory usage.
- Level 6 Everything from Level 5 plus execution plans.
- Level 7 Disk metrics for particular segments that cross a threshold.
- Level 10 COLLECT ALL THE THINGS! Plus collect information about latching. Typically you shouldn’t be doing this unless someone at Oracle has suggested it. Or youreally know what you’re doing.
This data gets stored in the Statspack tables whenever a snapshot is collected. Over time, these tables will grow so make sure that there’s enough space allocated for their tablespace or else purge out older data using the
How Do I Use Statspack?
To collect data, either use the
DBMS_JOB or Oracle Scheduler interface (depending on Oracle version) or use an operating system native task scheduler.
Once you have at least two snapshots you can report on the collected data by running
$ORACLE_HOME/rdbms/admin/spreport.sql and supplying a start and end snapshot. Statspack is going to churn for a while and spit back a bunch of information. Since Statspack reports can be many thousands of lines long,
spreport.sql will write to a file.
As you look through the file, you’ll find information about I/O, locking, waits, slowest queries running (but not which users/sessions are slow), and potentially a lot more, depending on how much information you’re collecting.
For the uninitiated, Oracle ships with a bunch of scripts installed in the server’s file system. These scripts can be invoked from inside your favorite SQL tool.
Limitations of Oracle Statspack
This isn’t a silver bullet, or even a bronze bullet. But it is a bullet for shooting trouble.
Statspack isn’t an automatic process. More sophisticated tools use an agent process to automatically start collecting data once they’re installed. Statspack is not that sophisticated. It requires manual configuration – a DBA needs to set up a schedule for Statspack collection and Statspack purging.
While Statspack reports on an entire server, things get a bit weird when you start bringing Oracle RAC and Oracle 12c Multitenant into the mix. With RAC, Statspack is only reporting on a single node of the cluster – to get full cluster statistics, you should look at other tooling. Statspack can also potentially cause problems on RAC that can lead to cluster instability. With Multitenant functionality, Statspack will report on the server as a whole, but you’ll have to alter the installation scripts to take full advantage of Statspack.
Another limitation of Statspack is the granularity of the data. Performance data is collected at various DBA-specified levels and at a DBA-specified interval – the DBA needs to have good knowledge of how load may vary across a day and schedule Statspack collection appropriately. Statspack metrics can also be skewed – long running events will be reported as occurring in the Statspack interval where the SQL finally finishes. If you are collecting data every 5 minutes and an I/O intensive task runs for thirty minutes, it may look like there’s a significant I/O load in a single 5 minute period.
It may require a practiced eye to correctly interpret the Statspack reports and avoid falsely attributing heavy load to a small time window.
Finally, these metrics can’t be tied back to a single session. It’s possible to see which piece of SQL is causing problems. Frequently that can be enough, but it may still be difficult to determine if it’s a problem on the whole or a problem with a single user’s session. Other tools, such as ASH and AWR can be used to provide finer grained monitoring, depending on the licensing level of Oracle.
Oracle Statspack can provide good enough performance metrics for many common DBA tasks. By interpreting Statspack reports, a DBA can discover any number of things about the Oracle system they’re in charge of without having to use third party tooling or purchase additional features and options. This can be especially important for those with Oracle Standard Edition systems.
The only thing you ever need to use for database identity is an
IDENTITY, right? Well, maybe. There are a lot of different options and they all have different pros and cons.
The default way to identify objects in SQL Server is to use an
BIGINT column marked as an
IDENTITY. This guarantees relatively sequential numbers, barring restarts and failed inserts. Using identity columns put the responsibility for creating and maintaining object identity in the database.
SQL Server will cache
IDENTITY values and generate a new batch of identity values whenever it runs out. Because identity values are cached in memory, using identity values can lead to jumps in the sequence after SQL Server is restarted. Since identities are cached in memory in large batches, they make it possible to rapidly insert data – as long as disks are fast enough.
Sometimes the application needs more control over identity. SQL Server 2012 added sequences. A sequence, unlike an identity value, is a separate object in the database. Both application and database code can read from the sequence – multiple tables can share a sequence for an identity column or separate sequences can be created for each table.
Developers using a sequence can use the
CACHE value to cache a specific number of sequence values in memory. Or, if the application should have minimal gaps in the sequence, the
NOCACHE clause should be used.
The Problem with Sequential Identities
SEQUENCE values keep identity generation squarely in the database and, by using integral values, they keep the value narrow.
You can run into problems with sequential inserts on very busy systems – this can lead to latch contention on the trailing pages of the clustered index. This issue can be resolve by spreading inserts across the table by using a
GUID or some other semi-random clustering key. Admittedly, most systems are never going to run into this problem.
GUIDs for Object Identity
Some developers use GUIDs as a way of managing object identity. Although database administrators balk at this, there are good reasons to use GUIDs for object identity.
GUIDs let the application generate object identity. By moving object identity out to the application layer, users can do work in memory and avoid multiple round trips to the database until they’re ready to save the entire set of data. This technique gives tremendous flexibility to application developers and users.
There’s one other thing that a well designed application gets from this technique – independence from the database. An application that generates its own identity values doesn’t need the database to be online 24/7; as long as some other system is available to accept writes in lie of the database, the application still function.
Using GUIDs for object identity does have some issues. For starters, GUIDs are much wider than other integral data types – 16 bytes vs 4 bytes (
INT) or 8 bytes (
BIGINT). This is a non-issue for a single row or even for a small database, but at significant scale this can add a lot of data to the database. The other issue is that many techniques for generating sequential GUIDs in the application (see NHibernate’s GuidCombGenerator) can still run into GUID collisions.
What if you could get the best of both worlds? Applications generating unique identities that are also sequential?
The point of identity generation is to abstract away some portion identity from data attributes and provide an independent surrogate value. GUIDs can provide this, but they aren’t the perfect solution. Identity generators like flake or rustflakes promise roughly sequential identity values that are generated in the application layer and are unique across multiple processes or servers.
The problem with an external identity generator is that it is an extra piece of code that developers need to manage. External dependencies carry some risk, but these are relatively safe items that require very little effort implement and maintain.
There’s no right solution, there’s only a solution that works for you. You may even use each solution at different points in the lifecycle of the same product. It’s important, though, for developers and DBAs to be aware of how identity is currently being handled, the issues that can arise from the current solution, and ideas of how to handle it going forward.
Meet Margot. Margot is an application developer who works for a small company. Margot’s application collects and generates a lot of data from users including their interactions with the site, emails and texts that they send, and user submitted forms. Data is never deleted from the database, but only a few administrative users need to query historical data.
The database has grown considerably because of this historical data – the production database is around 90GB but only 12GB or so is actively queried. The remaining data is a record of user activity, emails, text messages, and previous versions of user data.
Margot is faced with an important decision – How should she deal with this increase in data? Data can’t be deleted, there isn’t budget to upgrade to SQL Server Enterprise Edition and use table partitioning, and there’s a push to move to a cloud service to eliminate some operational difficulties.
Using Partitioned Views to Archive Data
One option that Margot has read about is “partitioned views” – this is a method where data is split into two or more tables with a view over the top. The view is used to provide easy access to all of the information in the database. Storing data across many tables means DBAs can store data in many different ways – e.g. compressed tables or filegroups and tiered storage.
There’s a downside to this approach – all of the data is still in one database. Any HA solutions applied to the live portion of the data set will have to be applied to the entire data set. This could lead to a significant cost increase in a hosted/cloud scenario.
Archiving Data with a Historical Database
The second thing that sprang to mind was creating a separate archival database. Old data is copied into the archival database by scheduled jobs. When users need to run historical reports, the queries hit the archival database. When users need to run current reports, queries are directed to the current application database.
Margot immediately noticed one problem – what happens when a user needs to query a combination of historical and current data? She’s not sure if the users are willing to accept limited reporting functionality.
Archiving Data with Separate Data Stores
A third option that Margot considered was creating a separate database for the data that needed to be kept forever. Current data would be written to both the live database and the historical database. Any data that didn’t need to be ever be in the current database (email or SMS history) would only be written to the historical database.
Although this made some aspects of querying more complex – how could row-level security from the primary database be applied to the historical database – Margot is confident that this solves the majority of problems that they were facing.
This solution would require application changes to make querying work, but Margot and her team thought it was the most flexible solution for their current efforts: both databases can be managed and tuned separately, plus the primary database remains small.
Not every database needs to scale in the same way. What ideas do you have to solve this problem?
Big Investment in Data Warehousing
Almost all of the new features reflect an increased focus on data warehousing. This is similar to recent SQL Server releases and, indeed, most mainstream database releases. Different vendors are betting on BI and Oracle’s most recent release is no exception.
Full Database Caching
Caching data in memory can avoid costly disk-based I/O. By default, databases will cache frequently accessed data. There’s a setting in Oracle called the small table threshold – scans of tables larger than the
_small_table_threshold will use a portion of memory that should age out faster. This idea (in SQL Server it’s called page disfavoring) exists to prevent large table scans from clearing out database server memory.
What if you want those tables to stick around in memory? Modern servers can hold huge amounts of memory and with Oracle RAC, you can scale your buffer pool across multiple servers.
The full database caching feature changes this behavior. Once full database caching is enabled, Oracle will cache all data in memory as the data is read.
It’s important to note the guidance around this new feature – it should only be used if the logical size of the database is no bigger than 80% of available memory. If you’re using Oracle RAC, DBAs need to be doubly careful to make sure that the workload is effectively partitioned across all RAC nodes. Read more about the features and gotchas in the Oracle Database Performance Tuning Guide.
Big Table Caching
Speaking of larger tables, the new patch includes a feature called automatic big table caching. If you have more data than you have memory, this feature seems a lot safer than full database caching.
Rather than attempt to cache all data in memory, big table caching sets aside a portion of the buffer pool for caching data. The DBA sets up a caching target (
DB_BIG_TABLE_CACHE_PERCENT_TARGET) and up to that percentage of memory will be used for caching really big tables.
Like the full database caching feature, this works with single instances of Oracle or Oracle RAC. There are, of course, considerations for using the feature. The considerations are documented in the VLDB and Partitioning Guide.
In-Memory Column Store.
I’ve already covered Oracle’s In-Memory Column Store. There are some additional features that have been announced that make the In-Memory Column Store feature a bit more interesting.
A new data flow operator has been added to Oracle: the
VECTOR GROUP BY. This new operator enables efficient querying for In-Memory Column Store tables. Although no new SQL operators have been added to Oracle for the
VECTOR GROUP BY, it’s possible to use query hints to control optimizer behavior. Six new hints have been added to push the optimizer in the right direction.
In the Oracle world, tables are almost always heaps. Clustering is done only rarely and in special circumstances.
Attribute clustering lets the data warehouse designer specify a table order based on one or more columns of the table. So far, this sounds like something that you could already do. It is also possible to use join attribute clustering to cluster on columns connected via a foreign key relationship. The data warehouse designer, or even the DBA, can modify table structure to cluster a fact table based on dimension hierarchy.
An additional feature called interleaved ordering uses some pretty advanced storage mechanisms to get better I/O out of an attribute clustered table. Interleaved clustering makes it more efficient to perform queries where you may not know the search predicates, or where search predicates may vary over multiple columns (sounds like a dimensional model to me).
There are a few upsides to attribute clustered tables – they require fewer (or no) indexes, they may require less I/O, and they improve data compression. As always, consult the Attribute Clustering documentation for all the details and fine print.
Zone maps are new to Oracle 18.104.22.168 – they provide an additional level of metadata about the data stored in a table. Instead of operating at the row level, like an index, a zone map operates at the level of a range of blocks. In a way, an Oracle zone map is a lot like a SQL Server Columnstore dictionary – it allows I/O pruning to happen at a much coarser granularity than the row level.
When zone maps are combined with features like attribute clustering or partitioning, the database can take advantage of metadata at many levels within the zone map and perform incredibly efficient I/O pruning. Data can be pruned at the zone, partition, or subpartition level, and when attribute clustering is in use the pruning can happen at the dimension level too.
Zone maps are going to have the most benefit for Oracle databases with a heavy data warehousing workload. Big scans with common predicates, especially with low cardinality predicates, will use zone maps to eliminate a lot of I/O. Users taking advantage of attribute clustering and partitioning will also see improvements from this feature. You can read more in the Zone maps section of the data warehousing guide.
More than Data Warehousing
This is a wealth of features. Seriously, that’s a crazy investment in data warehousing for any release of a product, much less a point release.
Every Developer’s Dream: JSON Support
Everybody hates XML. It’s cumbersome to query and people love making fun of it. Many databases have been adding support for JSON and Oracle is no exception.
Oracle’s JSON implementation supports search indexes for general querying (give me the third element of the array in property “sandwiches”). Multiple index types exist for the JSON data type, giving developers fine grained control over how they interact with the data stored in the database. Oracle full text search also supports full-text queries using a new
json_textcontains operator. If you want to jump down the JSON rabbit hole, make sure you check out the JSON documentation and pay particular attention to the sections about indexing – there are a number of examples of what will and won’t work.
Astute readers will notice that with the addition of JSON to Oracle 12c, SQL Server is now the only major database platform that doesn’t support JSON as a native datatype.
Cross Container Querying
Oracle 12c brought multi-tenant databases, sometimes referred to as contained databases or containers. If you wanted to query across databases, you’d typically have to go through some shenanigans. The
CONTAINERS clause makes it easier for developers to write queries that can go across all containers, or even some containers.
Basically, instead of querying directly from one table, a developer would put the name of the table in a
SELECT CON_NAME, * FROM CONTAINERS(employees) ;
And if you wanted to query only specific containers:
SELECT CON_NAME, * FROM CONTAINERS(employees) WHERE CON_ID IN (1,2,3,4) ;
While not useful for all systems, this does allow DBAs to perform administrative or back office queries across multiple database containers. The gotcha? All databases need to have the same schema.
Oracle 22.214.171.124 has a number of new features, some of which I didn’t mention since they relate entirely to managing the contained database features. If you want to learn more, check out the new feature guide – it’s full of links to the Oracle documentation on each new feature.