It’s been a while that I haven’t blogged, and my queue grows inversely proportional to the amount of time I have available for blogging, so let me do a short series of easy stuff, simply to take it off the list.
This is not about new features, crazy new tips and tricks or anything of the sort. It’s just a couple short lessons on performance and how to reduce your carbon footprint and make the planet last longer.
It’s about how to properly ask the database: are there any records there?
Out in the wild, you can find a bajillion instances of something like this:
IF SomeTable.COUNT > 0 THEN
Or a variation of it, that looks like this:
IF SomeTable.COUNT = 0 THEN
It amazes me again and again how often people just routinely write this. The logic is seemingly simple: if there are (no) records in the table then do something.
You see, the problem is that when you ask for COUNT, that’s exactly what you get. To give you the count, SQL Server has to go down into the database and perform the actual counting of rows, typically by doing either a table scan or an index scan.
A scan involves at minimum physically accessing each record in its storage (but not necessarily reading its content). You can imagine that it can take a long time to process it.
This is what COUNT looks like to SQL:
SELECT COUNT(*) FROM …
How fast SQL Server will be at processing COUNT depends on a number of variables. Number of records, presence (or absence) of a matching index, index distribution, index fragmentation, logical order fragmentation, page density fragmentation, are a few of variables that can influence how fast SQL Server can physically perform the operation on the data storage level. Then there are hardware constraints, such as memory available to SQL Server or amount of storage memory, that can influence whether the data will be actually read from the disk or from memory. Then there are behavioral factors, like how busy the table is, and how much concurrency there is on it.
All in all, whenever you ask for COUNT, you make SQL Server (and quite possibly your users, too) hate you. I’ve personally seen instances where COUNT ran for over a minute.
Now, you may instinctively think that you can improve the speed of COUNT if you apply some filters. This is only true if you have an efficient index (which probably comes with performance penalties elsewhere). However, if you apply filters and SQL Server cannot (or decides not to) use an index, not only SQL Server will have to access each row, it will also have to retrieve it in its entirety. This can slow everything down by an order of magnitude. Bummer.
So, all this crazy work, just to tell you if there are *any* records in a table.
Logically speaking, to know if there are any elements of a set, you don’t necessarily need to count all its elements and then compare the result to zero. Why not simply trying to retrieve whichever one element from the set and see if you got anything or nothing.
And that’s precisely what ISEMPTY function does: it accesses the first matching row only. This is what ISEMPTY looks like on SQL Server:
SELECT TOP 1 NULL FROM …
Now, SQL Server will still have to do a table or an index scan, and depending on your filters it may need to retrieve the row from the database to perform the filter matching, but once it successfully retrieves (and matches) one row, it terminates immediately without going to further rows.
This may still be slow, though. Depending on same factors COUNT depends on, you may be extremely unlucky that your one matching row is the very last row read from the database. However, this is extremely unlikely, and if this systematically happens, you can solve it with an index.
In any case, ISEMPTY is as efficient as it gets and you should always use ISEMPTY instead of COUNT when the only thing you need to know is if there are any records in the table, and you don’t really care about how many of them there are exactly.