If you are using AL Object ID Ninja already then you may want to read this post. If not, then first install AL Object ID Ninja, then read this post 😀
Since its launch three weeks ago, this extension has exploded far beyond my expectations. As of this morning, Ninja has been installed 5.916 times, there are 2.047 Business Central apps that use Ninja to assign object IDs, and there are currently around 13.37 million blob read and write operations per week performed on my Azure Storage and around 6.75 million Azure Function calls per day. These numbers nearly doubled in the past ten days, and while there is a definite cap to how far this can grow, I expect all of this to grow at a steady rate for the foreseeable future.
Which brings me to the important part. AL Object ID Ninja is free, and it will stay free. Right now there is absolutely no fear of it even remotely approaching the limit I’ve set (at the moment, Ninja is costing on average €1 per day against my €125 monthly allowance that comes with my Visual Studio subscription, so there is aaaaaa lot of room to grow. However, since Ninja grew beyond my wildest dreams already, and since I know it’s far from hitting the roof any time soon, if nothing changes, in a couple of months it could hit a €3 per day threshold that would make me pretty uncomfortable.
That said, I have been working on some important back-end improvements to keep costs (much) lower while providing even more functionality. The reason why I went there wasn’t costs at all – it was actually some new functionality I wanted to add that I realized would drive costs a bit up, so I had to do something.
All of this is to finally announce this important announcement: at some point during the weekend, there will be a major upgrade to the AL Object ID Ninja back end, and over the course of the next week there will be two new versions of AL Object ID Ninja extension. The old extension will not work with new back end, and new extension will not work with old back end.
That’s it. If you care about nitty-gritty details, then read on.
What’s the problem, anyway?
The biggest parts of Azure infrastructure cost is blob storage. In short, the bulk of this cost comes not from the amount of data stored or transferred, but from the number of blob operations performed. That price is currently €0,0051 per 10.000 read operations and €0,0591 per 10.000 write operations, and while this looks like nothing, check the numbers above. At the moment, it’s handling close to 2 million operations per day, and you can do the math. It’s free for you, but it’s not exactly free for me, and for me to keep it free for you, I must keep those numbers at maximum €3 per day.
Now, when you look at the structure of all of these operations and why my blob storage is so active, it’s the getLog API call that’s being placed every 30 seconds by every running instance of Ninja out there. At any given point in time there are between 1.000 and 3.000 running instances – according to the numbers at least. Not every getLog call ends up reading a blob, but every one in 20 calls will cause at least one blob file to be read. This function is polling the back end for the purpose of showing notifications. I have investigated a series of ways to do push notifications, but from all Azure services the way I am currently handling it is the cheapest. Anything else would cost me substantially more, and to keep Ninja free, I have even stopped looking elsewhere.
My ultimate cost reduction is always this polling interval. Right now it’s at a very unnecessary 30 seconds. Even if I set it at 2 minutes, or even 3, it would be as close to real time as most of users need it, and I have a lot of room to offset rising consumption figures with extended polling interval. Easy-peasy.
But I have a serious of fantastic features I’d like to include in Ninja, and all of them will raise the number of calls placed to the back end tremendously, and at some point, the polling interval will be the least of my problems. I have to do something much deeper with its storage access practices.
How does Ninja store your data in the back end?
Every app has its own virtual directory in a blob container. So, if your app SHA is 3e214b, then there will be a virtual directory named 3e214b. Inside that directory, there will be these files:
- _ranges.json file that contains ranges as configured in your app.json. This file is always there and is accessed every time Ninja places the getNext call to the back end.
- _authorization.json file that contains your authorization key. This file is there if you authorized the app, but most of people did authorize their apps. This file used to be read for every call, but now it’s cached for 10 minutes, so for each app, it’s being read at minimum once per 10 minutes. If authorization is changed, then this file will also be read more often. Also, since there can be multiple instances of Azure function app running at a time, these numbers multiply per instance. Every instance will read it at least once every ten minutes. There are on average 4 instances running.
- A file per object type. For example, codeunit.json, page.json, tableextension.json… you get the gist. These files are read as needed. For example, when you assign an object ID to a codeunit, codeunit.json will be read at least twice, and written once.
What is the problem with this way of storing data?
The biggest problem here is the distribution. Getting a new ID will read _authorization.json at least once (if it isn’t cached yet), read and write _ranges.json twice, read a specific object json twice, and write to it once. Those double reads are happening because every object ID assignment is a two-step operation (first one to feed IntelliSense, the second one to commit the number to the back-end storage). The obvious problem is not that I read twice from every file – it’s that there are too many files involved.
Authorization is also a big problem for me. Currently there is a documented hole in it that has to do with caching and multiple instances of the back end running. If you authorize an app on one instance, and end up getting numbers against another instance (and it’s Azure’s load balancer that decides which instance your request goes to) then you could – for 10 minutes – place unauthorized calls. Not a big deal, really, but still.
Another problem is the efficiency of getLog polling. It’s currently feeding information about assigned IDs from memory, not from a physical file. For example, when you assign a new object ID, the back end will keep it in memory, and when another user asks for log, it will be fed from memory. This reduces total number of blob reads tremendously, but costs a lot in functionality. If I assign my new object ID and the request is handled by instance #1 and you request log from instance #2, you won’t get a notification about me assigning a number. Eventually, you’ll see this notification, statistically speaking, because you’ll be placing 2 getLogs per minute and eventually instance #1 will respond and you’ll see the notification, but this is far from optimal.
So, in short, I have three problems to solve:
- Reduce the number of blobs read from and written to for getNext calls. Right now it’s at 4-6 read operations and 2 write operations with three different blobs.
- Close the authorization hole and make sure no unauthorized calls are placed to the back end regardless of which instance handled authorization request and whether it was in the past 10 minutes or not.
- Make polling more meaningful by reducing the polling interval and making it more accurate. Two calls per minute, with average of 25% accuracy amounts one call every two minutes with 100% accuracy.
How do I intend to solve this?
Here’s the deal. For me to solve all those problems in one go, I have decided to not spread your data across number of files, but have it all in one single file. So, instead of you having 3e214b/_ranges.json, 3e214b/_authorization.json, 3e214b/codeunit.json etc. you would only have 3e214b.json that would contain everything.
One of most important reasons why I wanted to spread the data across multiple blobs is concurrency. If you and I assign a new object ID at the same time, if all object IDs are in the same blob, then both you and I have to write to that blob regardless of which object type we needed. With consumption stored per object type, if you assign a new table ID and I assign a new codeunit ID, there will be no collision.
Collisions are not a problem, mind you! Ninja has a optimistic concurrency mechanism in place and all object IDs are guaranteed to be unique, that’s not what the problem is here. The problem is that if – by any remote accident – you and I really write to the same blob (say, 3e214b/codeunit.json) at the exact same microsecond, one of us would have to repeat the process because another one would have written the blob in the meanwhile. This means that one of us would have to repeat blob read and blob write, which is two extra operations, and it’s blob operations that cost money, not number of blob files stored. Azure storage does not really care if there are 1.000.000 blob of 5 bytes each, or there is 1 blob of 5.000.000 bytes. But reading 1.000.000 blobs surely makes a giant difference than reading 1 blob, even though exact same data is read in both cases.
Now, how often do people really assign new numbers? Well, not that often, as you may imagine. Even the busiest apps will have only a dozen or so assignments per day. While I was writing this post, up to this point, there were grand total of 7 object IDs assigned from around 1.250 active instances of Ninja. What this tells me is that even if I put all of apps in the same blob, chances of write collision (which would trigger a repeat write attempt) are ridiculously negligible.
But I am not putting all apps in the same blob. That would be insane because it’s not just number of write operations that affects this. It’s also the duration. It surely takes far less time to read (or write) any given blob of 5 bytes size, than it takes to read (or write) any given blob of 5.000.000 bytes. So I have to keep it in balance.
I’ve decided that the balance is an app. Every app gets its own blob file, and then everything about your app can be read in one go for every call you place. Authorization? Right in there. Ranges? Yep. Consumptions? You bet.
This will give me another opportunity: since I already have all app information in a single place, I can also keep the log (when a new object ID was assigned and by whom) in that file – it does not need to be in-memory anymore. Also, I can feed all of this information back to the caller on every call. So, even if you place getNext call to obtain a next object ID, you’d receive all other information about actual state of your app at the same time. This means that I can really reduce the number of getLog calls and I don’t even have to place them every two minutes. What I need to do is place a pre-emptive getLog only if two minutes have passed since whatever last call was sent to the back end, and I’d still have all the info I need.
And this is what this announcement is about.
So, what’s going to happen?
First, a new version of back end will be deployed. It will read from and write to the single per-app blob. But here’s the catch! That file currently does not exist, and the relevant data (object ID consumptions) is spread around a number of files. The “v2” version of API will have to work with this single per-app blob, while the “v1” version cannot read from there. This means that either every developer in the same repository uses “v2”, or none of them do. There is no mix and match!
In order to avoid having each and every one of you repeat object ID syncing for each and every one of your apps, I have to do an automatic migration. Whenever the first developer in any given app gets an updated version of Ninja and places the first call to the back end, it will be a “v2” call, and it will see that the data is still in “v1” format. It will then upgrade the back-end storage to “v2”. Anyone still using “v1” for the same app will get an error telling them that they must upgrade their AL Object ID Ninja extension.
This is going to be a two-step upgrade process.
First, there will be a minor update of AL Object ID Ninja extension today, that will be configured to fail on “v1” calls if there is “v2” information in the back end. There will be a week of grace period for this update to propagate. Then, over the weekend of October 9-10, there will be a major version update of AL Object ID Ninja (v2.0.0) that will be configured to call “v2” endpoints. This means that from October 9, people will gradually be forced by AL Object ID Ninja to update their extensions, if the update was not configured to happen automatically.
Keep in mind – most people do have automatic update of extensions configured. I don’t why you wouldn’t want to have it. If you have configured extensions to not update automatically (and I know there are around 2% of all Ninja users that are still using some very old version of Ninja), then please configure automatic updates.
What if not everyone on my team updates?
Yeah, well, then too bad, I must say. Once your app is migrated to “v2” the “v1” endpoints will stop assigning numbers and they will fail on every call. You will be able to see that in Ninja output. If you updated to the first minor release that’s happening today, then you’ll see an error message directly telling you what’s happening and why. But if you never update to this one, then the only way to see why Ninja is not working is by looking at its output channel in Visual Studio Code.
I believe that for great majority of users (98% of them to be precise) this transition will be smooth and painless, and will at most require a single restart of Visual Studio Code at some point on October 4.
That’s it. I’ll also link this blog post from Ninja’s “Learn more” messages that will start appearing during transition period to every user whose back end gets upgraded to “v2” while there are still developers on their team using “v1” endpoints.
If you have any questions or concerns, please let me know!