The “IF COUNT = 1” Conundrum

Post published:March 8, 2017
Post category:Performance
Post comments:19 Comments
Reading time:10 mins read

If you haven’t read my yesterday’s post, then do that first.

Good. At this point you are extremely smart because you know that you should never use COUNT = 0 but should use ISEMPTY instead.

However, what if:

IF SomeTable.COUNT = 1 THEN
DoSomething();

Well, if this is what you need, then that’s what you need. And that’s exactly what we needed in the team a few days ago.

Let’s get this clear: I am not making all this fuss about COUNT just for sports. COUNT can get slow. It can get really, really slow. It can get minutes-per-call slow.

And when it gets slow, and you still don’t need to know exactly how many there are, but only care that there is exactly 1 record, then you do have a problem.

In case your SomeTable from the example above is a massive performance hog, then, in theory, you can make it faster by doing this:

IF SomeTable.FIND(‘-‘) AND (SomeTable.NEXT = 0) THEN
DoSomething();

If anything does, this surely qualifies as a C/AL wat situation. If it looks crazy, that’s because it is. So lets take a look how it does what it does:

It attempts to find records in the database. If this succeeds (returns TRUE), we are now positioned at the first record in the recordset.
Then it attempts to move to the next record. If there is exactly one record in the table, NEXT returns 0, so this part also returns TRUE.

To gain (still theoretical) performance benefits here, it is crucially important to use FIND(‘-‘) and not FINDSET or FINDFIRST instead. Why is that?

FINDSET?

FINDSET retrieves all records from a table and returns all of them to the NST. No matter what, this will do a full table or index scan and retrieve each row from the data file. It places no limiting clauses in the query, so it really retrieves all rows that are there. If your table has fifty million rows, all of them will be read from the database and streamed over to NST.

Regardless of what you do after issuing a FINDSET to SQL, this will be the slowest of all possible operations you can do with your database. If you set off to solve a slow COUNT, then FINDSET is definitely as wrong an answer as you can possibly imagine.

So, what then?

FINDFIRST?

If FINDSET selects all of them, and this is clearly not what you need, then FINDFIRST should be good, because it puts a TOP 1 clause, literally limiting the retrieval to one row only. in the COUNT vs. ISEMPTY story, TOP 1 saves the day. Wouldn’t it be faster to simply write:

IF SomeTable.FINDFIRST AND (SomeTable.NEXT = 0) THEN
DoSomething();

No it wouldn’t, and this is why.

FINDFIRST does indeed add the TOP 1 clause to your query, and it will indeed retrieve and return one row only. But it’s not about what FINDFIRST does here, it’s about what NEXT does. NEXT does not merely retrieve one more row from the database. It iterates through the recordset by moving the current position marker in the recordset.

NEXT does its work on NST. When you NEXT through a recordset, NST does not involve SQL Server. However, NST will involve SQL Server if you issue a NEXT on a recordset that NST is not sure if there are more records. FINDFIRST is one of those situations where NST does not know that. So, when you NEXT after a FINDFIRST (or FINDLAST for that matter) NST will actually issue an entire new SELECT statement in SQL Server. This new SELECT statement will not be as efficient as you’d expect it to be, because it will actually have to filter out the row you have already retrieved, so depending on complexity of your primary key and kinds of filters you set before calling FINDFIRST, NEXT can cause unnecessary mess deep down in SQL.

You never do IF SomeTable.FINDFIRST THEN [REPEAT UNTIL ] NEXT. It’s bound to cause performance issues.

Many NAV developers (still) believe that FINDFIRST is a one-on-one replacement for FIND(‘-‘) but it never was, and it never will be. FINDFIRST means “find one any only one and give me that one and don’t worry about any more rows in there because I don’t really give a flying rat’s bottom about them”.

So, if this is what FINDFIRST does, what does FIND(‘-‘) do, then?

FIND(‘-‘)

Okay, I admit, this one makes no sense. Honestly. FIND(‘-‘) is a wat in its own right. FIND(‘-‘) issues a SELECT statement which looks almost exactly as FINDSET’s one, with one important distinction: it places a TOP n clause, where this n depends on factors known only to NST. It starts off at 50, and NST may decide to use different numbers depending on its own usage statistics per table.

So, FIND(‘-‘) will make SQL Server select top 50 rows from a table, and stream those 50 rows to NST. Then, any NEXT you call on this 50-row recordset will happen entirely on NST level. However, when you NEXT beyond the last row, NST will issue another SELECT statement which will select the remainder of the table.

Why does NST do this? Because it wants to leave you, the developer, in control of database performance.

The rule of thumb is:

If you know you will loop through all records of a table (or within a filter) then use FINDSET. It will only ever result in one single SELECT statement.
If there is a decision within your REPEAT..UNTIL NEXT loop that may exit the loop based on a condition before the entire recordset is enumerated, then use FIND(‘-‘) because if there are chances that you exit inside the first batch of 50 rows, your database tier will be less busy and everything will be far more efficient.

So, if you have a large table, and do a COUNT, you inevitable cause at minimum a (full) index scan (it may be a partial index scan if you have a matching index and have applied a filter). A FIND(‘-‘) AND (NEXT = 0) will, at the very worst cause a partial table scan of top 50 (matching) rows.

And the larger the table, the more pronounced this distinction becomes. Index scans are not slow operations, so COUNT merely accesses indexes to locate rows, and this is in fact extremely fast. Certainly faster than possible row retrievals that FIND(‘-‘) will definitely have to do. It takes far more time to locate top 50 rows the retrieve them, than it does to index-scan a much larger number of rows. But the key word here is “much larger”. Because no matter how much faster scanning an individual row is than retrieving an individual row, FIND(‘-‘) will do only 50 retrievals, while COUNT will go on as long as there are rows.

Theory vs. Practice

Okay, this all looks fine in theory, but how does it fare in practice? Well, I decided to find out for myself.

First thing I did was a mistake. I took a Cronus database and compared behavior over the G/L Entry table. COUNT was consistently faster than FIND(‘-‘) AND (NEXT = 0). And it was roughly twice as fast.

However, when I posted about 40.000 G/L Entries, doing a COUNT over the table was suddenly clearly slower, and consistently slower than FIND(‘-‘) AND (NEXT = 0). After I posted about 500.000 G/L Entries, COUNT was darned slow, whereas FIND(‘-‘) AND (NEXT = 0) was as fast as it ever was.

The thing is, COUNT Is going to be slower, and this increase in slowness (or decrease of speed) will be roughly linear. On the other end, FIND(‘-‘) AND (NEXT = 0) will have roughly the same performance at any row count.

Of course, it depends a great deal on the primary key structure, presence or absence of a necessary index, and many other factors. But all things being equal, given a large enough number of rows, COUNT will be noticeably slower.

Better ways?

Yes, there are better ways, they only require some more work. To get the best performance, simply use a query object.

Whenever you need to look for a specific count of rows on a table (be it 1, or another exact number) you can write a query over that table, and have it return the first primary key field. Then, you set the TOP clause on the resulting SQL SELECT statement through the TOPNUMBEROFROWS function. If you are looking for COUNT = 1, you use TOPNUMBEROFROWS(2). Then, you count how many rows you retrieved.

Using this approach you can approach the blazing fast performance of ISEMPTY, at the expense of having to create one object per table of interest. This won’t be that many query objects, because you very rarely need COUNT = 1, and when you do, it may be that the table in question isn’t too busy or too populated, so COUNT would be faster anyway no matter what. Large, busy tables are when you need to start worrying about performance of COUNT, and only these tables will require you to write the COUNT-replacement queries.

Another solution you may come up with, that a friend of mine suggested was to do a FINDFIRST, then a FINDLAST and then compare if both returned the same record. It may work or not work, and in most likelihood it will be slower than other options because:

It has minimum of two SQL Server roundtrips
It requires writing C/AL logic to determine row equality

Now, in theory this could be the fastest way, if the conditions are right. If your network is fast and there are no bottlenecks there, and if the primary key structure is simple, then an extra SQL Server roundtrip may be a fair price to pay for additional performance gain. However, in most likelihood, the overall performance benefit will be very slim.

Whoa, you stayed with me all the way through this beating-a-dead-horse exercise? Good, because I am not done with it. There will be one more post about improving COUNT speed in general.

Tags: COUNT, FIND, FINDFIRST, FINDSET, Performance, SQL Server

Vjeko

Vjeko has been writing code for living since 1995, and he has shared his knowledge and experience in presentations, articles, blogs, and elsewhere since 2002. Hopelessly curious, passionate about technology, avid language learner no matter human or computer.

This Post Has 19 Comments

Andri Wianto March 8, 2017 Reply

Nicely written! Thank you.

Loading...
Peter Tijsma March 8, 2017 Reply

Always Great reading material. Loving it!

Loading...
simon March 8, 2017 Reply

Thanks, but I think you mixed up FINDSET and FIND(‘-‘). As far as I know we got FINDSET in version 4 to avoid full table scans. Or was that changed somehow later on?

Loading...
1. Vjeko March 9, 2017 Reply
  
  Nope, I didn’t mix them. Plus, it’s really easy to check yourself if you use SQL Profiler, it tells you the only truth you need to know. However, I see where you come from: a long time ago FINDSET and FIND(‘-‘) used to be what you say here. However, in 2015 (or 2013 R2, or 2013, can’t remember, and don’t really care now that we are in 2017) this was switched (for some reason or other).
  
  Loading...
lvanvugt March 8, 2017 Reply

Thanx, Vjeko, for the clear story and worthwhile reading.

Loading...
Pingback: When you just must COUNT, no matter what - Vjeko.com
Pingback: When you just must COUNT, no matter what - Microsoft Dynamics NAV Community
Simon Fischer March 9, 2017 Reply

Hi Vjeko,

first: thank you for you posts 🙂 Since I was slightly confused after reading your post (because with NAV 2009 I learned, that FIND(‘-‘) will be faster/more efficient if you read more than the preset number of records specified in on the NST) I’ll want to add a snippet of the documentation. The documentation (2009) says:

“If the number of records that is read falls within this [preset] range, then all records are read with optimized performance. If there are more records to read than the preset number, then Microsoft Dynamics NAV must establish new commands to SQL Server to continue reading records. Microsoft Dynamics NAV reads all records successfully, but the additional commands are expensive for SQL Server to execute. The FIND(‘-‘) operation is more efficient than the FINDSET operation when there are more records to read than the preset number.”

That is contrary to what you are proposing. So I checked the documentation for 2016 and it says:

“Record.FIND(‘-‘) and Record.FIND(‘+’) are optimized for reading primarily from a single table when the application might not read all records

Record.FINDSET(ForUpdate, UpdateKey) is optimized for reading the complete set of records in the specified filter and range.”

Maybe more people learned that at 2009 times…

Best Regards,
Simon

Loading...
1. Vjeko March 9, 2017 Reply
  
  Hi Simon. Microsoft doesn’t always keep documentation up to date, and that’s why sometimes you read blogs 🙂 At some point (between 2013 and 2015, I can’t remember exactly) FIND(‘-‘) and FINDSET were switched. FINDSET allows you optimization and control over locking, whereas FIND(‘-‘) allows you to optimize performance when not having to read all records. Anyway, please don’t do IF FINDSET AND (NEXT = 0) because you possibly make it incredibly slow 🙂
  
  Loading...
  1. Simon Fischer March 9, 2017 Reply
    
    Hi Vjeko, actually it seems that MS has kept it up to date (at least at MSDN) but it also seems I just didn’t read it 😉 And they switched it basically between 2009 and 2013 (also regarding MSDN). I just wanted to add that in my previous comment, so that other people (that might be on the 2009 knowledge base) know it (like the post of the other Simon before me).
    
    P.S.: Hope to see you again at Tech Days 🙂
    
    Loading...
    1. simon March 9, 2017 Reply
      
      You are right, it changed in NAV 2009. This is something I would not expect to change ever. MS even changed the code in NAV because of that.
      Us Simons need to read more carefully :).
      
      Thank you for clearing it out.
      
      Simon
      
      Loading...
      1. Vjeko March 9, 2017
        
        No worries 🙂 Things sometimes change where you wouldn’t expect them to change.
        
        Loading...
Pingback: When you just must COUNT, no matter what - Vjeko.com - Dynamics NAV Users - DUG
Pingback: More about ISEMPTY - Vjeko.com - Dynamics NAV Users - DUG
Raik Zobel June 8, 2017 Reply

Hello Vjeko,

could this be another alternative:

IF NOT SomeTable.ISEMPTY THEN BEGIN
SomeTable.FINDFIRST;
SomeTablePrimeryKey := SomeTable.”No.”;
SomeTable.FINDLAST;
IF SomeTablePrimeryKey = SomeTable.”No.” THEN
DoSomething();
END;

This would do 2 select statements, but streams only 2 rows.
Or is this the same performance as a FINDFIRST AND NEXT command?

Loading...
1. Vjeko June 8, 2017 Reply
  
  Hm… well, at least it seems to achieve the goal, and it doesn’t seem to have any obvious performance downsides, but it’s a great deal more code in it. I don’t know how well it would fare against my proposal, maybe you can measure it and share the results…
  
  Loading...
  1. Raik Zobel June 9, 2017 Reply
    
    Then maybe with a little function. Here I use RecordRefs to make this more generic:
    
    IsUnique function:
    —————————–
    IsUnique(VAR RecAsVariant : Variant;TableView : Text) : Boolean
    RecRef1.GETTABLE(RecAsVariant);
    RecRef1.SETVIEW(TableView);
    RecRef2 := RecRef1.DUPLICATE;
    RecRef1.FINDFIRST;
    RecRef2.FINDLAST;
    EXIT(FORMAT(RecRef1) = FORMAT(RecRef2));
    ——————————-
    
    Call:
    
    IF NOT SomeTable.ISEMPTY THEN
    IF IsUnique(SomeTable,SomeTable.GETVIEW) THEN
    DoSomething();
    
    I will try to compare the performance later.
    
    Loading...
    1. Vjeko June 9, 2017 Reply
      
      Look, I am pretty sure you could come up with a hundred different ways to do this thing and they would all work. I just find it too much work to add a function for something that already works quite well already. But anyway, thanks for contributing!
      
      Loading...
  2. Raik Zobel June 9, 2017 Reply
    
    I do some correction to my previous post. I decided to put the FINDFIRST in the Call. So the 1st Record is useable in the DoSomething() function without a 2nd findfirst.
    
    It is also very difficult to compare different programming styles, because of already cached data. Anyway, the use of “FIND(‘-‘) AND (NEXT = 0)” is double times faster on already cached data.
    
    Here again the corrected function, just for reference that the use of this might be slower:
    
    IsUnique function:
    —————————–
    IsUnique(VAR RecAsVariant : Variant;TableView : Text) : Boolean
    RecRef1.GETTABLE(RecAsVariant);
    RecRef1.SETVIEW(TableView);
    RecRef2 := RecRef1.DUPLICATE;
    RecRef2.FINDLAST;
    EXIT(FORMAT(RecRef1) = FORMAT(RecRef2));
    ——————————-
    
    Call:
    
    IF NOT SomeTable.ISEMPTY THEN
    IF SomeTable.FINDFIRST AND IsUnique(SomeTable,SomeTable.GETVIEW) THEN
    DoSomething();
    
    Loading...