The DAL D Ceiling on LLMs Is Not a Bug. It Is the Right Architecture for a Technology We Cannot Yet Audit.
EASA's Proposed Issue 03 of its AI Concept Paper caps general-purpose LLMs at DAL D, the lowest assurance tier, for safety-critical aviation applications. The cap is commercially uncomfortable but technically honest, and founders who engage the consultation before 12 August 2026 can still shape how it hardens into binding rules.
&w=3840&q=75)
Photo: Nataliya Vaitkevich / Pexels
EASA released Proposed Issue 03 of its Concept Paper on Artificial Intelligence on 3 June 2026, opening a consultation period of ten weeks [1]. The most commercially uncomfortable sentence in the document is this: large off-the-shelf models, explicitly including general-purpose LLMs, are capped at DAL D for initial airworthiness and AL 5 for ATM/ANS applications [2]. Those are the lowest assurance tiers in their respective scales. For founders betting on foundation-model architectures for safety-critical cockpit or ATC applications, the classification table that anchors that conclusion is the single most commercially consequential boundary in the document. The cap is correct. This article explains why, honestly engages the strongest objection, and tells you what to do before the deadline.
What the Document Actually Says
This is the final Concept Paper deliverable foreseen under the EASA Artificial Intelligence Roadmap 2.0 [2]. The document uses development assurance level (DAL) for initial and continuing airworthiness and air operations, and software assurance level (SWAL, AL) for air traffic management. Table 2 in the document highlights the limitations currently observed based on the available means of compliance and the state of the art of technology [3].
On scale direction: the assurance vocabulary runs from the most demanding level at the top to the least demanding at the bottom. IDAL A and B, along with SWAL 1 or 2 and AL 1 or 2, represent the highest and currently excluded assurance levels, while DAL D / SWAL 4 / AL 5 represent the floor, the only tier currently accepted for AI/ML constituents [3]. AL 5 is the lowest assurance level on a five-point scale where AL 1 is the most stringent. For founders: your LLM cannot be used in any application where the consequence of failure rises above that floor.
The engineering reason for the ceiling is stated precisely. AI assurance differs from traditional development assurance because the system's intended function is driven by data, scenarios, or knowledge, making full requirements-design-implementation traceability infeasible [3]. The document draws a clear distinction between AI/ML models of reasonable size, for which a design process is fully managed, and large off-the-shelf models like general-purpose LLMs, for which the number of parameters and the nature of the training data render the control of the design process intractable [3].
The word intractable is doing significant work here. EASA is not saying LLMs perform poorly. It is saying they cannot be subjected to the design-process control that higher assurance levels require.
Understanding and mitigating emergent behaviours is identified as a new challenge [3]. Emergent behaviour is not a defect you can trace to a line of code or a mislabelled datum. It arises from interactions among billions of parameters trained on non-tractable corpora. The W-shape process adapts the classical V-shape development assurance process to the realities of AI development, but that adaptation is premised on gaining controlled confidence in the data, scenarios, and knowledge that drive the system's intended behaviour [3]. With a general-purpose LLM, that confidence path is broken before you even start.
The Counterargument Deserves a Real Hearing
The industry objection, and it is gaining genuine traction, runs as follows: EASA itself conditionally endorses a strategy of "performance evaluation and safe incorporation" for large off-the-shelf models. If you cannot audit the model's internals, you instead audit its outputs exhaustively, bound its operational domain tightly, and wrap it in architectural safeguards. The objection says that combination, rigorous black-box testing plus safe incorporation plus a tightly defined operational design domain, can produce sufficient safety evidence even at higher criticalities, and that refusing to accept this will simply push LLM-based cockpit and ATC tools to less rigorous jurisdictions.
The regulatory-arbitrage version of this argument is the hardest to dismiss, and a development from 24 June 2026 sharpens it considerably. On that date, OpenAI and Broadcom unveiled Jalapeño, OpenAI's first custom inference processor [6]. OpenAI handles the chip architecture, with Broadcom contributing silicon manufacturing and networking technology [7]. The companies are aiming for initial deployment of Jalapeño chips by the end of 2026, with small prototype runs before scaling [7]. The direct relevance to EASA's cap is this: as LLM inference costs fall, more vendors will have commercially viable LLM products to position into aviation workflows, and every one of them will face the question of which jurisdiction offers the fastest path to deployment. If EASA-zone applicants face a hard wall at DAL D while vendors operating under less mature frameworks can make a performance-evaluation argument and get a signature, capital and product development will route around the EU. Falling inference costs are not a distant concern; they are already compressing the cost of deployment and accelerating the commercial pressure to find a permissive regulatory home.
Why the Cap Is Still the Honest Position
I support applied AI in aerospace. Precisely because of that, I think EASA's cap is correct, for three reasons.
First, performance evaluation without white-box traceability cannot scale to catastrophic-failure-condition territory. A correct and complete definition of the operational design domain is a prerequisite to adequate dataset quality in the AI assurance process; however, an exhaustive exploration of all possible operating conditions is acknowledged to be intractable for high-dimensional systems [3]. You can test ten thousand scenarios and miss the ten thousand and first, which the model reaches through an emergent reasoning path that did not exist in any of your test cases. The Operational Design Domain gives you a proxy for traceability, but it cannot close that gap at DAL A or B criticalities where a single failure mode can contribute to fatalities.
Second, the performance-evaluation-plus-safe-incorporation strategy works best when you control what changes. A frozen, purpose-built model of manageable size, say a convolutional network trained on a bounded dataset for a specific perception task, can be wrapped in architectural guards and tested to higher assurance levels, as EASA allows for supervised learning [3]. A general-purpose LLM is not frozen in the relevant sense: its emergent properties are a function of scale and corpus diversity that no applicant controls or can retrospectively audit. "Safe incorporation" assumes a stable artefact. LLMs at the scale that makes them commercially interesting are not that.
Third, the regulatory-arbitrage risk is real but the response is not to lower the bar. The right answer to regulatory arbitrage in safety-critical aviation is interoperability agreements between jurisdictions, not a race to the bottom on assurance. EASA is moving faster and more comprehensively than any other aviation regulator on AI, and because the EU AI Act is already law, EASA's framework is positioned to become a global reference for AI assurance in aviation [4]. The correct move for founders is to engage the consultation, shape the framework, and build toward it.
The document also introduces a net safety benefit concept that is underused by applicants. Several AI applications received by EASA come with a potential for net safety benefit; historically, safety assessments have focused on risks associated with malfunctioning systems or equipment, and generally no credit has been provided for the operational safety benefits that the installation of such systems can deliver [3]. The document introduces credits for systems that can demonstrate those operational safety benefits, which may support a reduction of the required assurance level for qualifying applications. Founders with a credible safety-benefit case should develop and document it now, because it is one of the few available levers within the current ceiling.
Where the Genuinely Good News Lives
Building on Issue 02, which explored Level 1 AI and Level 2 AI, Proposed Issue 03 completes the technical scope foreseen in the EASA AI Roadmap 2.0 [2]. It further broadens the framework by addressing additional AI techniques including reinforcement learning and symbolic AI, and explores Level 3 AI applications corresponding to advanced automation, opening the way to novel types of operations in which the human end user may be either remotely present or not present during the operation [2]. That matters enormously for founders because it means the regulatory surface area for applied, non-LLM AI has expanded substantially.
For many of the genuinely hard aerospace problems, predictive maintenance, anomaly detection, trajectory optimisation, image-based defect detection, purpose-built supervised or logic-and-knowledge-based models are more appropriate than a foundation model anyway. The same inference-cost trends that make LLMs commercially attractive are also lowering the cost of running purpose-built models, so the economic argument for reaching into a general-purpose foundation model instead of training something fit for purpose is weaker than it looks.
A word on where NPA 2025-07 fits in this picture. NPA 2025-07, published 10 November 2025, proposes a new set of detailed specifications on AI trustworthiness for the safe use of AI in aviation in response to the EU AI Act, Chapter III, Section 2 [5]. The original comment deadline was 10 February 2026, extended to 10 March 2026 [5]. That publication is the first step of Rulemaking Task 0742, to be followed by a second NPA in 2026 to deploy this generic framework to the regulations of the relevant aviation domains [5]. That second NPA will be the instrument that translates Issue 03's guidance into binding certification requirements across aircraft, ATM, and drones. Founders should track it: it is the downstream instrument that will determine exactly where and how the DAL D ceiling becomes a hard legal constraint rather than guidance.
For Founders
If you are building an early-stage aerospace or defence venture in Europe and your architecture depends on a general-purpose LLM in any safety-critical loop, you now have a regulatory fact to plan around: DAL D is the ceiling, and that ceiling excludes you from the applications that attract the highest contract value and the most defensible competitive moats. The path forward is one of two things.
The first is deliberate architectural separation: put the LLM in the ground-based, non-safety-critical workflow, documentation generation, maintenance query answering, training support, where DAL D is entirely workable, and build the safety-critical constituent with a purpose-built supervised or logic-and-knowledge-based model that can reach higher assurance levels. This is not a compromise. It is an architecture that will survive scrutiny and investor due diligence.
The second is to engage EASA directly. Stakeholders are invited to provide their comments and to send their feedback to ai@easa.europa.eu no later than 12 August 2026 [1]. If you have a credible engineering argument for why performance evaluation plus safe incorporation can work at higher criticality for a specific, tightly bounded application, make that argument in writing before the deadline. That is how regulatory frameworks get refined.
Do not build a business plan around hoping this cap disappears. Build one that works within it, and you will be structurally ahead of every competitor who waited to find out.
Sources
[1] easa.europa.eu
[2] easa.europa.eu
[3] easa.europa.eu
[5] easa.europa.eu
[6] openai.com
[7] cnbc.com
&w=3840&q=75)
&w=3840&q=75)
&w=3840&q=75)