statement-on-the-need-to-rethink-ai

Document

A few weeks ago our infrastructure bill told us something our roadmap had not. The automatic AI actions that hold our ecosystem together, unifying duplicate patient records and stitching fragmented events into a single care journey, were costing far more than the value of any individual operation could justify. Each unification pass ran against large models that had to be queried in full to recover facts that, in principle, take up a few bytes. This was not a billing anomaly, but a symptom.

We traced it to the problem on how models are stored as weights of a large neural network. Those weights are learned once and then frozen. The approach works, and it has produced the capability everyone is now building on, but the moment you need to maintain that knowledge rather than simply use it, the cracks appear.

Four of them, specifically:

Opacity. A judgment like "this record and that record describe the same person" is not stored anywhere you can point to. It is spread across millions of parameters, and the model cannot tell you why it decided what it did.
No clean edits. Correcting one stale fact means retraining or fine-tuning, with no guarantee the fix stays where you put it.
Forgetting. Teach the model something new and it quietly degrades on things it already knew. The field calls this catastrophic forgetting, and the name is not an exaggeration.
Cost. Training and everyday inference both burn energy and capital, and the bill grows with scale rather than with usefulness.

Here is the part worth sitting with. The human brain runs comparable and broader competence on roughly twenty watts. It manages this with strong functional modularity and mostly local learning, where a connection changes based on the two neurons it links rather than a recalculation across the whole system. We are not claiming to copy the brain. We are pointing out that the dense weight matrix is one option rather than a law of nature, and that something far cheaper is demonstrably possible.

So we began building toward a different foundation. We have set out the technical argument in a white paper, Beyond the Weight Matrix, and the short version runs as follows.

Instead of one large frozen network, knowledge lives in many small compartments. Each compartment owns a single fragment of what the system knows, and each can be read or replaced on its own without disturbing the rest. Inside a compartment, a fragment is written as an explicit equation rather than a buried pattern of weights. That difference matters more than it sounds: unlike a weight, an equation can be read by a person and rewritten in place.

Updates behave differently too. New information does not silently nudge a stored value. It proposes a change, and the change is admitted only after it clears a verification gate that checks it against related compartments and, where relevant, against outside evidence. If the incoming information fails the check, the system can refuse it. Every change that does get in carries a record of why it was made.

The harder problem is fluency. Most of what a competent language user knows is not a tidy fact. We prefer "strong tea" to "powerful tea," yet we prefer "powerful engine" to "strong engine," and no rule of grammar explains the difference. This graded, context-dependent knowledge is the larger share of competence, and it resists being written down. Our approach keeps it explicit regardless: each preference is stored with its strength in a named, editable slot, so the magnitude an ordinary model would bury becomes something an auditor can read and a maintainer can change.

Why does this matter most in healthcare? Because qualities that are merely convenient elsewhere become requirements here.

A record-unification decision that touches a patient's care has to be inspectable, not a black box.
A fact that turns out to be wrong has to be correctable in minutes, not at the next retraining cycle.
Any clinical or administrative inference should be able to state what it rested on.

Regulated, audit-heavy work is exactly where editable and verifiable knowledge earns its keep, and it tends to involve a high share of crisp, checkable facts. That is why we are starting here rather than with open-ended language.

We owe you the honest part as well. This is a direction, not a finished product, and it has a genuine open problem at its core. Because graded preferences depend on context, and context can reverse them, the system cannot store one rule for each pair of expressions. In the worst case it would need a rule for every context, and the number of contexts in natural language is large enough to rebuild the very weight matrix we set out to avoid, only spelled out at length. The work that turns this proposal into a running system is, in large part, the work of compressing that growth: factoring shared structure and organising contexts into hierarchies so the representation expands slowly. Whether that can be done while keeping the editability that motivated the effort in the first place is the question the whole architecture stands or falls on.

This is why we are publishing rather than keeping it in-house. No single company is going to crack the compression problem alone, and the field has spent years refining one paradigm while under-investing in the alternatives. The energy cost of frozen weight matrices is turning into an industry-level liability, not a line item on one balance sheet. Healthcare in particular cannot afford models it has no way to audit or correct, and no one organisation holds enough of the data, the methods, or the clinical context to get there by itself.

So we are asking the research community to point some of its attention this way:

Shared benchmarks for how cleanly a model can be edited and how well it holds up once it has been.
Open methods for compressing context-conditioned knowledge. This is the bottleneck, and it belongs to everyone working in the space.
Common interfaces between compartments, so a result produced in one lab can be reused elsewhere instead of rebuilt from scratch.

We will keep publishing what we learn, including the parts that fail. If you work on neuro-symbolic systems, model editing, efficient inference, or clinical knowledge representation, we would value the chance to compare notes. The white paper is the place to start.

Stay tuned for our monthly news!

Stay informed

Statement on the need to rethink AI