You are currently viewing Context Engineering – The Thing Almost Nobody Is Actually Talking About

Context Engineering – The Thing Almost Nobody Is Actually Talking About

Today I want to talk about something that barely anyone is talking about. Context engineering.

We hear about prompts constantly. We hear about the latest model releases, the agentic frameworks, the AI-powered IDEs, the MCP servers (luckily we don’t hear about vibe coding all that much anymore). We hear about a lot of things. You name it – we hear about it.

But context engineering? Not so much.

And I find that strange, because if there is one lesson I have taken away from spending the past nine months writing code almost exclusively with AI agents (last six of which you can drop the “almost” word), it is this: context matters more than your prompt. Significantly more. Whenever my agents produced results that I could never match – faster, more consistent, undeniably better – every single time I could trace that back to one thing. Precise context.

That word, precise, is doing a lot of work in that sentence. Remember it.

The thing everyone argues about

The loudest conversation in AI tooling right now is which LLM is best. And yes, it matters. Of course it matters. But let me give you an analogy.

An LLM is the engine in your agentic car. The engine matters – you would not win a Formula 1 race with a lawnmower engine, and you would not get competitive code out of a genuinely weak model either. But whether a car gets from point A to point B, or how fast it gets there (for when speed matters), depends on a lot more than just the engine. The tires. The fuel. The driver. The track itself. Almost all modern LLMs can write decent code. Some do much better, some much worse – but here is the thing: which model you pick is probably the least decisive factor in your overall results.

The second big conversation is about toolchains. Which MCPs, which plugins, which last-week’s-hottest-idea you picked from the vastness of GitHub. And these matter too. They change how you work. They expand what’s possible. But better tools could never replace the lack of skills. Faster tools can’t do either.

(Agents can do an awful lot of effort-doubling these days!)

There is one skill in particular that almost nobody is talking about at the right level of abstraction. Context engineering.

What context engineering actually is

Let me be precise, because the word “context” is getting stretched in too many directions right now.

Andrej Karpathy defined it as “the delicate art and science of filling the context window with just the right information for the next step.”

Anthropic calls it “curating what will go into the limited context window from that constantly evolving universe of possible information” – and frames it as the natural progression beyond prompt engineering, because agents operating over multiple turns need strategies for managing the entire context state: system instructions, tools, MCP, external data, message history, and everything else that accumulates as the agent runs.

When I say context, I mean all of that. Everything that lands inside the model’s context window at inference time. Not just your system prompt. Not just your message. Everything the agent sees when it decides what to do next.

Prompt engineering, as we have been practicing it, is really just a subset of this. You write a careful instruction string, you tweak the wording, you add a few examples. That helps. But you are optimizing one piece of a much larger puzzle.

And here is the part that most people miss: in a multi-step agent loop, your prompt stays static. But the context keeps changing. Every tool call adds output. Every step adds tokens. Every retrieved document shapes what the model can attend to. The agent is not reading your prompt over and over again. It is reading the accumulated state of everything that has happened so far. That accumulated state is your real prompt.

This is also why context has a very real enemy: context rot. The longer an agent runs, the more tokens pile up. And models do not treat all tokens equally – when relevant information is buried deep in a bloated context window, performance degrades. The model loses sight of what matters. More tokens is not more, it’s less. Precise, targeted, well-maintained context produces better results than an ever-growing dump of everything the agent has touched. The skill is knowing what to put in, what to leave out, and how to keep it clean as the agent runs.

Where LLMs are blind – and what you have to do about it

There is something fundamental about how language models learn that matters enormously here.

Models are trained on enormous amounts of text. Code, documentation, Stack Overflow threads, GitHub repositories, papers, blog posts – an almost incomprehensible breadth of human-generated content. But they have not seen everything. What they missed during training, they genuinely do not know. And the only way to compensate for that gap is context.

With TypeScript, this dynamic is almost invisible. Nearly every TypeScript library is open source. Models were trained on the source code, the documentation, the GitHub issues, the blog posts explaining how and when and why to use nearly any package you come across. The context is already in the model – implicit, acquired during training. That is why my TypeScript development these days runs mostly unattended. My agentic setup with curated context documents and clear test-first instructions reliably produces accurate code without me hovering over it.

And this brings us to the elephant in the room: AL.

Most of you reading this write AL, not TypeScript. I have one thing to say to you: AL is not the elephant. Not at all. It’s in the room, alright.

AL is not the exception. AL is the proof.

I have talked to a lot of people in the BC community about AI agents and AL development. And I keep hearing the same thing: “our situation is different.” And it is. The models were not trained on the Base App, the System App, or the countless first-party and third-party dependency apps we all work with. Their implicit context – the stuff that just works out of the box in TypeScript – is simply not there for us.

But here is the thing. The models are not blank. They have seen ERP theory. They have read accounting textbooks, SAP documentation, general business process descriptions, open-source ERP systems, academic papers about ledger entries and posting routines. They read Business Central documentation, too. They have a vague, generic ERP (and BC) mental model – assembled from everything adjacent to BC that they have ever seen. And that is arguably more dangerous than knowing nothing.

Ask a model to work with the VAT Entry table without giving it proper context, and it will not just shrug. It will confidently infer nonsense. It will reason by analogy from SAP, from F&O, from whatever open-source accounting system it once read about. It will apply posting logic that does not exist in BC, reference fields that do not exist on that table, and describe workflows that BC has never heard of. Not because the model is bad – but because it is filling a context gap with the closest thing it has. That closest thing is almost right. And almost right, in code, is often worse than wrong.

This is not an argument against using AI agents for AL development. This is the most powerful possible argument for context engineering.

When it comes to Business Central and AL, the models do not have the implicit, learned context that training would have given them. So we have to supply it explicitly. We have to curate it. We have to give them precisely what they need to work with – and only that. When you do that, the results change dramatically. Because the model was never the problem. The missing context was.

Plan mode is not magic. It is just self-guided context engineering.

I have been using plan mode in Cursor and Claude Code extensively. And I want to demystify something about it, because I think a lot of people treat it as some kind of intelligent oracle. It is not.

Plan mode is an agent sifting through your code trying to find the context it needs. That is all. It reads files, follows references, assembles a picture of what is relevant to the task. It is context engineering – except the agent is doing it autonomously, without your guidance. When it works well, it is because the codebase is structured well enough that the agent can find what it needs. When it fails – and it fails a lot in AL – it is usually because the codebase made that impossible.

And this is where we need to have an uncomfortable conversation about our codebases.

The best models today have context windows around 128K tokens. Claude Opus 4.6 in Claude Code just defaulted to 1 million tokens this morning (I have no idea yet what it will do with that, but I am genuinely curious).

And now this: the Sales-Post codeunit in the standard Base App consumes roughly 158,000 tokens on its own. One codeunit. That is already beyond what most models can hold in full. Let the agent try to understand the full picture of sales posting – all the related codeunits, all the tables involved – and the context window is not just full. It is overwhelmed. The agent cannot get a real full big picture, and without that it cannot even know which small pictures to start focusing on.

I want to say this as plainly as I can: when an agent produces bad results on something related to a business process in BC, the failure is not the model’s. The failure is ours. We built those 158,000-token codeunits. We scattered logic across forests of god codeunits. We normalized the kind of structural complexity that makes self-guided context engineering essentially impossible.

The agents are not failing us. We are failing the agents.

Conclusion

If there is one thing I want you to take away from this: context is your responsibility. The model is not going to excavate it from your god codeunits. You have to curate it, structure it, and put it precisely where the agent can use it.

That is context engineering. And it is the most undertalked skill in our entire community right now. And I argue it’s the most important.

What has your experience been? Are you curating context intentionally, or are you relying on plan mode and hoping for the best? Let me know.

Leave a Reply