Common table expressions are a feature of SQL that lets a developer create a query that can be referenced multiple times. This feature gives developers another tool to add flexibility or just to simplify code.
Why Common Table Expression?
Why is it called a Common Table Expression (CTE)? Because that’s what the documentation calls it!
No, seriously, that’s it. Different databases have different terms for this feature. That’s what we call it in SQL Server.
What does a CTE do?
A CTE effectively creates a temporary view that a developer can reference multiple times in the underlying query. You can think of the CTE as if it were an inline view.
Here’s a slight re-write of a StackExchange query to find interesting and unanswered questions.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
WITH unanswered AS ( SELECT p.id FROM dbo.Posts p WHERE (SELECT COUNT(*) FROM dbo.Posts a WHERE a.ParentId = p.Id AND a.Score > 0) = 0 AND CommunityOwnedDate IS NULL AND ClosedDate IS NULL AND ParentId IS NULL AND AcceptedAnswerId IS NULL ) SELECT TOP 2000 unanswered.Id AS [Post Link] , (10.0 + us.Reputation / 200.0 + p.Score * 100) AS Weight FROM unanswered JOIN dbo.Posts p ON unanswered.Id = p.Id JOIN PostTags pt ON pt.PostId = unanswered.Id JOIN Users u ON u.Id = p.OwnerUserId GROUP BY unanswered.Id, u.Reputation, p.Score ORDER BY Weight DESC ; |
The CTE, lines 1 – 12, effectively creates a temporary view that we can use throughout the rest of the query. You can also think of it in the same way that you’d think of a derived table (a join to a subquery).
CTEs make it possible to simplify code, express complex ideas more concisely, or just write code quickly without having to worry as much about structure. They’re a great feature of the SQL language that many people overlook.
39 Comments. Leave new
If a complex sql script has a cte that it calls multiple times, do the queries inside the cte get rerun each time? or does SQL run the cte once and reuse those results each time the cte is called?
Great question – yes, the CTE will be evaluated every time you reference it.
In Oracle you may use Materialize hint that as far as I know creates a temp table, so the CTE would be evaluated just once.
Oracle does get all the sweet hints. SQL Server is a bit less ambitious about CTE materialization, I should’ve been more clear with my original comment.
Thanks, Vercelli!
Good thing to know when writing complex SQL. Could make or break your performance
It can, but I like CTEs as an early “rough draft” of query structure. I can mess around with that virtual table all I want and then move it out to a different structure once I get the query shape down.
Love them, especially recursive CTE
Nice explanation! This is the first time I’ve heard it called a temporary view and that just made it so much more understandable to me. And as for it being evaluated every time….so is a view.
From the example given of a CTE I do not see why I would use that instead of a select into a temp table; especially if it gets evaluated each time it is reference.
I write code for understandability first and I tune for performance when performance becomes an issue. If the CTE is evaluated every time it’s referenced, in this particular case, it’s only evaluated once. Which could give the same performance as a temp table. Which gives me a net gain of… typing more? The satisfaction of using the octothorpe?
If you are proficient in using Temp tables, are you saying that there is no logical scenario where using a CTE would be a better bet?
I never thought I would see a trend, fad or stylistic preference moving away from use of the Hash Tag. 🙂
I’m saying “why complicate your code as the first step?”
Write for humans, not the query optimizer.
Affirmative. Will write for the humans.
I tend to use them for multi-stage table transformations, where each intermediate version of a table is only needed by the next stage of the transformation and a temporary table would be a) overly verbose and b) needlessly written to disk. Let me know if you want an example.
If you’re writing a … shudder … view then you won’t be able to use a temp table (obviously), so a CTE would be the only option. But the lack of materialisation (in MS SQL anyway) is disappointing from a performance point of view.
Bruce – The reason why you’d want to us a CTE instead of selecting in to a temp table is because the CTE will not write to TempDB. TempDB on some servers may be very slow. As a general rule, writes to disk are slow.
Reference – http://stevestedman.com/2012/11/temp-table-vs-table-variable-vs-cte-and-the-use-of-tempdb/
In fact, in certain situations a CTE may be even better/faster than a permanent temp table (i.e. a real table, not a #table or @TableVariable) that you temporarily reuse multiple times.
Jason – see the other comment on this post about how TempDB doesn’t always write to disk anymore, depending on your version and patch level.
Bruce – #Temp tables are *always* written to disk, which on every SQL server that does *not* use a RAM Disk for TempDB is very slow compared to just staying in memory in the first place. The result set of a CTE tends to stay in memory, very similar to a derived table.
See here – http://stevestedman.com/2012/11/temp-table-vs-table-variable-vs-cte-and-the-use-of-tempdb/
Jason – to clarify, temp tables aren’t always written to disk anymore starting with recent SQL 2012 CUs and SQL 2014:
https://www.brentozar.com/archive/2014/04/memory-ssd-tempdb-temp-table-sql-2014/
I loved this article, I wonder if I can use this kind of tables replacing a common SQL selector with pagination, if that is possilble? My webapplication have very complex selects in some pages and those SQLs are killing the performance (sometimes…several times!!!), if yes please let me know it =D that would be awesome to apply!!!
Totally, you can add a ROW_NUMBER() to a CTE and use that for pagination.
I don’t think it’s a good strategy for this blog to repeat content that is documented 100 times on the web. This is not a value add.
Thanks for your time and consideration.
Thanks for the article 🙂 I didn’t use CTE’s previously until another developer introduced me to them. I now use a combination of CTE’s and has tables, depending on what we are coding. If we need to re-use the query (usually in this case the result set will be quite large), it becomes a hash table, but if we are just doing a once off with a smaller set, then a CTE is defiantly easier to read.
Cheers.
I disagree that this type of explanation is common on the internet. When this feature was introduced, it seems like everyone was hot to demonstrate a recursive query. I’ve been a DBA for 10 years and never had the occassion to write a recursive query to show the hierarchy view of the EMP table. To me, the CTE is so much more valuable as an aid to clearly written SQL– (especially when compared to a derived table). The concept of defining that set of rows first, and then querying against that set, is so much more intuitive.
Great article!!!
I like CTEs, and use it quite often too. I like how it allows to have multiple steps (temp views) in one query.
for example:
;WITH first_script
(
— script to get unanswered
)
, second_script
(
— script to get answered
)
SELECT columns FROM
first_script f
INNER JOIN
second_script s ON f.column = s.column;
That’s awesome! I didn’t know you could do that.
As an old-timey programmer, I like starting out with CTEs — to me they act like subroutines.
It’s a blooming pain troubleshooting chained cte’s
I often use CTEs when writing reports that contain a mix of aggregate and “detail” information. For example, a report that lists employees and the number of hours they have worked over a given time period. This seems cleaner to me than the alternatives: grouping by every column you need in your report (e.g. GROUP BY EmployeeID, LastName, FirstName … ) or using an aggregate function on every column that you’re not grouping by.
Here’s a simplified example (dates are passed to the sp as parameters):
;WITH emphours AS (
SELECT
P.EmployeeID,
SUM(P.HoursWorked) AS TotalHours
FROM PayrollHours AS P
GROUP BY P.EmployeeID
WHERE P.WorkDate BETWEEN @DateStart AND @DateEnd
)
SELECT
E.EmployeeID,
E.FirstName,
E.LastName,
E.Department,
H.TotalHours
FROM Employee AS E
INNER JOIN emphours AS H
ON H.EmployeeID = E.EmployeeID
You can also use window functions for this purpose. E.g. your example:
SELECT
E.EmployeeID,
E.FirstName,
E.LastName,
E.Department,
SUM(P.HoursWorked) OVER ( Partition by e.employeeID) as TotalHours
FROM Employee AS E
INNER JOIN PayrollHours AS P
ON P.EmployeeID = E.EmployeeID
WHERE P.WorkDate BETWEEN @DateStart AND @DateEnd
Yes, I love using CTE queries to create multi-step processes. I also use them for MERGE statements between a staging table and primary table. They are very quick and easy to use while also allowing you to JOIN each view within a CTE onto each other. For example, do a WITH USERS where I select a set of users and then create another view within the same CTE query called SALES, which joins USERS to select another table and then create a third view for PRODUCTS. Once I have all 3 views, I can union them all together into a final query or create a fourth view and keep on going. 🙂
I work for an organisation that uses CTEs heavily in our reporting queries (simply because we always have done without issues), but now a lot of our customers are moving to SQL Server 2014 and 2016 are we seeing massive performance drops with using CTEs. We think the problem is twofold; 1) CTE’s are not indexable and 2) the query optimiser in later versions of SQL Server cannot count the number of rows in a CTE so we’re getting inefficient execution plans and reports that run very poorly.
CTEs have been my bread and butter for years but for larger data-sets, we’re finding that temp tables and carefully applied indexes are helping us out a lot.
Its interesting to see that you use CTEs in development with half an eye on re-writing if performance becomes an issue. Its like designing a house using Lego and then re-building it with proper bricks later, but we’ve just been living in Lego houses all this time!
Nick – errr, not exactly. There’s a lot of what you’re saying in there that isn’t really true, but it’s beyond what I can cover fast in a blog post comment.
You might start by looking at the changes in the new 2014 Cardinality Estimator. Try setting your database compat levels back to 2012, and you’ll get different execution plans – those might be better suited to the kinds of queries you’re running. Then, you can up your game over time and gradually make those queries better for the new CE.
Thanks for your reply Brent.
You said that a lot of what I say isn’t true – well I said that CTE’s can’t be indexed and I’m fairly confident that this is the case. I then said that “the query optimiser in later versions of SQL Server cannot count the number of rows in a CTE” so I’m guessing this is incorrect?
I’m not a DBA but I have been tasked with solving this problem, and this is the conclusion I was leaning towards from studying query execution plans.
Setting compatibility level down to 2008 or 2012 does work in terms of improving performance but customers quite reasonably don’t want to do this. They say “You told us to upgrade to the latest version, so we may as well not have bothered?” and we need to respect that and find another way. I did find this though:
OPTION (USE HINT ( ‘FORCE_LEGACY_CARDINALITY_ESTIMATION’ ))
Sadly I’m remotely accessing these systems and can’t get connected today so I can’t try this out, but do you think it might help to stick one of these hints at the end of every CTE?
Nick – yeah, doing this kind of personal training is a little beyond what I can do fast in a blog post comment, so head on over to https://dba.stackexchange.com and break your questions down individually. Be as specific as you can when you do it. Hope that helps! Brent
We just upgraded our database and SQL Server from 2014 to 2016. We )have a report that uses CTE (which I’m not that familiar with) but before we upgraded the report ran in under 1 minute. now it never ends. Is this because of the upgrade? is there anything I can do to fix it besides removing the CTE?
For questions, hit a QA site like https://DBA.stackexchange.com.
Thank you.