If you haven’t read my yesterday’s post, then do that first.
Good. At this point you are extremely smart because you know that you should never use COUNT = 0 but should use ISEMPTY instead.
However, what if:
IF SomeTable.COUNT = 1 THEN
Well, if this is what you need, then that’s what you need. And that’s exactly what we needed in the team a few days ago.
Let’s get this clear: I am not making all this fuss about COUNT just for sports. COUNT can get slow. It can get really, really slow. It can get minutes-per-call slow.
And when it gets slow, and you still don’t need to know exactly how many there are, but only care that there is exactly 1 record, then you do have a problem.
In case your SomeTable from the example above is a massive performance hog, then, in theory, you can make it faster by doing this:
IF SomeTable.FIND(‘-‘) AND (SomeTable.NEXT = 0) THEN
If anything does, this surely qualifies as a C/AL wat situation. If it looks crazy, that’s because it is. So lets take a look how it does what it does:
- It attempts to find records in the database. If this succeeds (returns TRUE), we are now positioned at the first record in the recordset.
- Then it attempts to move to the next record. If there is exactly one record in the table, NEXT returns 0, so this part also returns TRUE.
To gain (still theoretical) performance benefits here, it is crucially important to use FIND(‘-‘) and not FINDSET or FINDFIRST instead. Why is that?
FINDSET retrieves all records from a table and returns all of them to the NST. No matter what, this will do a full table or index scan and retrieve each row from the data file. It places no limiting clauses in the query, so it really retrieves all rows that are there. If your table has fifty million rows, all of them will be read from the database and streamed over to NST.
Regardless of what you do after issuing a FINDSET to SQL, this will be the slowest of all possible operations you can do with your database. If you set off to solve a slow COUNT, then FINDSET is definitely as wrong an answer as you can possibly imagine.
So, what then?
If FINDSET selects all of them, and this is clearly not what you need, then FINDFIRST should be good, because it puts a TOP 1 clause, literally limiting the retrieval to one row only. in the COUNT vs. ISEMPTY story, TOP 1 saves the day. Wouldn’t it be faster to simply write:
IF SomeTable.FINDFIRST AND (SomeTable.NEXT = 0) THEN
No it wouldn’t, and this is why.
FINDFIRST does indeed add the TOP 1 clause to your query, and it will indeed retrieve and return one row only. But it’s not about what FINDFIRST does here, it’s about what NEXT does. NEXT does not merely retrieve one more row from the database. It iterates through the recordset by moving the current position marker in the recordset.
NEXT does its work on NST. When you NEXT through a recordset, NST does not involve SQL Server. However, NST will involve SQL Server if you issue a NEXT on a recordset that NST is not sure if there are more records. FINDFIRST is one of those situations where NST does not know that. So, when you NEXT after a FINDFIRST (or FINDLAST for that matter) NST will actually issue an entire new SELECT statement in SQL Server. This new SELECT statement will not be as efficient as you’d expect it to be, because it will actually have to filter out the row you have already retrieved, so depending on complexity of your primary key and kinds of filters you set before calling FINDFIRST, NEXT can cause unnecessary mess deep down in SQL.
You never do IF SomeTable.FINDFIRST THEN [REPEAT UNTIL ] NEXT. It’s bound to cause performance issues.
Many NAV developers (still) believe that FINDFIRST is a one-on-one replacement for FIND(‘-‘) but it never was, and it never will be. FINDFIRST means “find one any only one and give me that one and don’t worry about any more rows in there because I don’t really give a flying rat’s bottom about them”.
So, if this is what FINDFIRST does, what does FIND(‘-‘) do, then?
Okay, I admit, this one makes no sense. Honestly. FIND(‘-‘) is a wat in its own right. FIND(‘-‘) issues a SELECT statement which looks almost exactly as FINDSET’s one, with one important distinction: it places a TOP n clause, where this n depends on factors known only to NST. It starts off at 50, and NST may decide to use different numbers depending on its own usage statistics per table.
So, FIND(‘-‘) will make SQL Server select top 50 rows from a table, and stream those 50 rows to NST. Then, any NEXT you call on this 50-row recordset will happen entirely on NST level. However, when you NEXT beyond the last row, NST will issue another SELECT statement which will select the remainder of the table.
Why does NST do this? Because it wants to leave you, the developer, in control of database performance.
The rule of thumb is:
- If you know you will loop through all records of a table (or within a filter) then use FINDSET. It will only ever result in one single SELECT statement.
- If there is a decision within your REPEAT..UNTIL NEXT loop that may exit the loop based on a condition before the entire recordset is enumerated, then use FIND(‘-‘) because if there are chances that you exit inside the first batch of 50 rows, your database tier will be less busy and everything will be far more efficient.
So, if you have a large table, and do a COUNT, you inevitable cause at minimum a (full) index scan (it may be a partial index scan if you have a matching index and have applied a filter). A FIND(‘-‘) AND (NEXT = 0) will, at the very worst cause a partial table scan of top 50 (matching) rows.
And the larger the table, the more pronounced this distinction becomes. Index scans are not slow operations, so COUNT merely accesses indexes to locate rows, and this is in fact extremely fast. Certainly faster than possible row retrievals that FIND(‘-‘) will definitely have to do. It takes far more time to locate top 50 rows the retrieve them, than it does to index-scan a much larger number of rows. But the key word here is “much larger”. Because no matter how much faster scanning an individual row is than retrieving an individual row, FIND(‘-‘) will do only 50 retrievals, while COUNT will go on as long as there are rows.
Theory vs. Practice
Okay, this all looks fine in theory, but how does it fare in practice? Well, I decided to find out for myself.
First thing I did was a mistake. I took a Cronus database and compared behavior over the G/L Entry table. COUNT was consistently faster than FIND(‘-‘) AND (NEXT = 0). And it was roughly twice as fast.
However, when I posted about 40.000 G/L Entries, doing a COUNT over the table was suddenly clearly slower, and consistently slower than FIND(‘-‘) AND (NEXT = 0). After I posted about 500.000 G/L Entries, COUNT was darned slow, whereas FIND(‘-‘) AND (NEXT = 0) was as fast as it ever was.
The thing is, COUNT Is going to be slower, and this increase in slowness (or decrease of speed) will be roughly linear. On the other end, FIND(‘-‘) AND (NEXT = 0) will have roughly the same performance at any row count.
Of course, it depends a great deal on the primary key structure, presence or absence of a necessary index, and many other factors. But all things being equal, given a large enough number of rows, COUNT will be noticeably slower.
Yes, there are better ways, they only require some more work. To get the best performance, simply use a query object.
Whenever you need to look for a specific count of rows on a table (be it 1, or another exact number) you can write a query over that table, and have it return the first primary key field. Then, you set the TOP clause on the resulting SQL SELECT statement through the TOPNUMBEROFROWS function. If you are looking for COUNT = 1, you use TOPNUMBEROFROWS(2). Then, you count how many rows you retrieved.
Using this approach you can approach the blazing fast performance of ISEMPTY, at the expense of having to create one object per table of interest. This won’t be that many query objects, because you very rarely need COUNT = 1, and when you do, it may be that the table in question isn’t too busy or too populated, so COUNT would be faster anyway no matter what. Large, busy tables are when you need to start worrying about performance of COUNT, and only these tables will require you to write the COUNT-replacement queries.
Another solution you may come up with, that a friend of mine suggested was to do a FINDFIRST, then a FINDLAST and then compare if both returned the same record. It may work or not work, and in most likelihood it will be slower than other options because:
- It has minimum of two SQL Server roundtrips
- It requires writing C/AL logic to determine row equality
Now, in theory this could be the fastest way, if the conditions are right. If your network is fast and there are no bottlenecks there, and if the primary key structure is simple, then an extra SQL Server roundtrip may be a fair price to pay for additional performance gain. However, in most likelihood, the overall performance benefit will be very slim.
Whoa, you stayed with me all the way through this beating-a-dead-horse exercise? Good, because I am not done with it. There will be one more post about improving COUNT speed in general.