Explainability as Certification Theatre: The Question EASA's Issue 03 Doesn't Answer
EASA's Proposed Issue 03 AI Concept Paper builds a coherent operational explainability framework, but the empirical evidence base for that framework is thin and largely lab-derived. European aerospace AI founders risk engineering to a provisional specification that will not be tested in approved production systems for close to a decade, consuming scarce budget that would be better spent on AI constituent performance and learning assurance.
&w=3840&q=75)
Photo: RDNE Stock project / Pexels
On 3 June 2026, EASA released Proposed Issue 03 of its Concept Paper on Artificial Intelligence, open for stakeholder comment until 12 August 2026 [2]. It is the final Concept Paper deliverable foreseen under the EASA AI Roadmap 2.0, and at over 230 pages it is the most comprehensive statement of intent the agency has yet produced on AI in aviation [1]. The explainability framework inside it is carefully constructed, and it deserves credit before we examine where founders risk spending scarce resources in ways that will deliver no measurable safety return.
That tension is where this piece spends its time.
What EASA Actually Built
The human-centred design building block introduces guidance to account for the specific human factors needs linked with the introduction of AI. Among other aspects, AI operational explainability deals with the capability to provide end users with understandable, reliable, and relevant information with the appropriate level of detail and with appropriate timing on how an AI-based system produces its results [1].
The framework splits cleanly into two lanes. Operational explainability faces the end user: pilot, controller, maintenance technician. Development and post-operational explainability lives inside the AI assurance building block, oriented at engineers, certification authorities, and safety investigators. The separation is not arbitrary: the document is explicit that operational explainability depends on the user's level of expertise and the task to achieve [1].
The framework also formally acknowledges cognitive load as a design constraint. A peer analysis of the EASA assurance model notes that a balanced level between the information given to the user and the user's cognitive load is necessary [4]. And EASA's own definition is careful not to over-promise: explainability does not imply full transparency or complete interpretability of AI models; it aims to provide a context-appropriate level of understanding sufficient for design, operational use, approval, and investigation [1].
The conceptual architecture is coherent. The engineers and human factors researchers behind this document have done their homework.
The Strongest Counterpoint, Stated Honestly
Here is the case for EASA's approach at its strongest. Aviation has a long history of adding interface elements that crews initially resist and then cannot fly without. TCAS resolution advisories were considered intrusive cognitive load when first introduced. They are now part of the operational reflex. The argument is that operational explainability will follow the same trajectory: designed correctly, integrated early, tested iteratively, it will reduce the ambiguity that currently surrounds AI outputs and improve crew decision-making at critical moments.
The EU AI Act reinforces this logic directly. Article 14 of Regulation (EU) 2024/1689, published in the Official Journal on 12 July 2024, requires that natural persons assigned to human oversight of high-risk AI systems be enabled, among other things, to correctly interpret the high-risk AI system's output and to remain aware of the possible tendency of automatically relying or over-relying on the output produced by a high-risk AI system [7]. A pilot or controller who cannot understand why an AI produced a recommendation cannot exercise that oversight in any meaningful sense. On that reading, explainability is not decoration; it is the mechanism by which human authority over the system remains real rather than nominal.
That argument is structurally correct. It is not yet empirically validated for the operational aerospace context.
The Problem: The Evidence Base Is Thin and Largely Lab-Derived
Despite significant theoretical advancements in the broader XAI domain, empirical evidence addressing the specific impact of visual explanations on human-AI interactions in safety-critical environments like ATC remains limited. [4]
That sentence comes from a peer-reviewed study published in January 2026. It is as current as anything in this field, and it states the problem plainly. Empirical evidence regarding the effectiveness of specific visual explanation methods remains scarce within the aviation sector. Given the highly dynamic and cognitively demanding nature of ATC tasks, understanding how visual explanations influence cognitive workload, user acceptance, and intention to use AI-driven tools is paramount [4], but that understanding does not yet exist in production conditions.
The broader XAI research literature is similarly thin on operational ground truth. A 2025 systematic review, following PRISMA 2020 guidelines and covering 35 peer-reviewed studies across cognitive psychology, human factors engineering, and neuroscience, found that while XAI and transparency mechanisms are designed to mitigate automation bias, overly technical, cognitively demanding, or even simplistic explanations may inadvertently reinforce misplaced trust, especially among less experienced professionals with low AI literacy [11]. A 2025 experimental study in automation transparency, published in the International Journal of Human-Computer Interaction, confirmed that increased automation transparency can lead to increased bias toward agreeing with automated advice [12]. In other words, a poorly calibrated explainability layer does not merely fail to help; it may actively increase the risk it was designed to reduce.
A 2024 peer-reviewed study found that increased automation transparency can improve the accuracy of automation use but can simultaneously lead to increased bias toward agreeing with automated advice, with both effects present in the same experimental population [54]. Carefully balancing these outcomes is not a means of compliance. It is a research programme that the industry has not yet completed.
The automation bias literature reinforces this from a different angle. Parasuraman and Manzey's foundational 2010 review in Human Factors, covering empirical studies of complacency and bias in human interaction with automated and decision support systems, found that automation complacency occurs under conditions of multiple-task load, when manual tasks compete with the automated task for the operator's attention, and that automation complacency is found in both naive and expert participants and cannot be overcome with simple practice [13]. If your operational explainability layer is not grounded in validated human factors work, it may instil misplaced confidence in users rather than genuine calibrated trust. You have then built something that looks good at a certification review and actively degrades safety in practice. That is the compliance theatre scenario this piece is about.
The Timeline Problem: Engineering to a Specification That Has Not Been Tested
There is a further structural problem for founders. EASA's own document describes its Level 2 AI deployment trajectory, covering progressively more automated solutions to assist in extended minimal crew operations and single-pilot operations in large commercial air transport, and in ATM for conflict detection and resolution, as currently foreseen to happen around 2035, qualifying immediately that this is not a formal target but rather a prediction based on current prognostics [7]. That qualification matters: EASA's own language describes its own projection as a prediction rather than a commitment.
The first expected approvals at Level 2 and beyond, and the validated means of compliance that accompany them, sit close to a decade away. Founders building right now are being asked to engineer operational explainability infrastructure against objectives that will not be tested in approved production systems for close to that long.
RMT.0742 will facilitate the integration of the anticipated guidance from the AI Concept Paper into a comprehensive framework of generic rules and acceptable means of compliance [8]. Until those AMC exist, the Issue 03 document itself states that it provides a set of actionable objectives but does not constitute at this stage definitive or detailed guidance [1]. A second NPA in 2026 will deploy this generic framework to the regulations of the relevant aviation domains [9]. The requirements will sharpen further, in directions that are not yet fully determined.
An early-stage venture that interprets EASA's current operational explainability objectives as a finished specification and engineers to them at depth is making a bet: that the eventual means of compliance will align with what the team built today. That is not an obviously bad bet, but it is made without knowing the odds, and it consumes engineering weeks that could go toward AI constituent performance, ODD characterisation, or the learning assurance processes where the certification path is considerably clearer.
Why the Automation Bias Risk Operates in Both Directions
Automation bias, the tendency for humans to favour suggestions from automated decision-making systems and to ignore contradictory information even when it is correct, affects operators in the field and engineering teams during development [35]. The International AI Safety Report 2026, a structured expert review led by Yoshua Bengio and over 90 co-authors commissioned by the nations attending the AI Safety Summit at Bletchley, synthesises the current scientific evidence on AI risks and consistently emphasises the gap between designed safety properties and observed behaviour in deployment conditions [15]. Automation bias research originating in the aviation domain has since been observed across a wide range of decision contexts [41], and the mechanism is consistent: high system reliability leads to disengagement from active monitoring, increasing the probability that a user will act on incorrect automated output without independent verification.
This risk operates in both directions. It applies to operators using your AI system, and it applies to your own team when evaluating whether an explainability layer is actually working or simply feels persuasive during a certification review. An explainability layer that satisfies an evaluator but has never been validated with real end users in operational conditions is a compliance artefact.
The Honest Assessment
Development-side explainability directly aids model debugging, bias identification, and the safety investigation of incidents involving AI outputs. EASA's careful bifurcation between development and operational explainability is conceptually the correct architecture, and the document is right to make it a first-class building block.
Where the line falls is here: building thick operational explainability infrastructure now, to a specification that remains provisional, tested against lab evidence that does not yet transfer to production aerospace environments, is a misallocation of the scarce engineering budget of an early-stage company. The framework will evolve. The means of compliance will crystallise via RMT.0742. Until those deliverables are available, the Issue 03 guidelines can be used as a reference for preparing the approval basis of applications introducing AI technology, not as a certification target to engineer against immediately [1].
The consultation period closes 12 August 2026. That is the moment to shape the outcome, not to over-engineer against a draft.
For Founders: What to Do This Quarter
If you are building a safety-related AI application in Europe, here is what the Issue 03 explainability situation means for your engineering priorities right now.
Do the development-side explainability work thoroughly. The development and post-ops explainability requirements embedded in learning assurance are more stable, directly serve your own debugging and model improvement cycles, and will be mandatory regardless of how the operational layer evolves. This is applied engineering value, not checkbox work.
Treat operational explainability as a human factors design question, not a software feature. If you are not doing structured user research with actual pilots, controllers, or maintenance technicians, you are guessing. Spend the next sprint on a structured task analysis with real end users before writing a single line of XAI interface code. The evidence base in the Issue 03 document on human-centred design objectives is your starting reference for what rigorous process looks like [1]. Then validate against your actual end users, not against the document.
Submit comments to the EASA consultation before 12 August 2026. This is the highest-leverage action available to a European aerospace AI founder right now. The operational explainability objectives in Issue 03 are explicitly provisional [1]. If the lab-derived evidence base is inadequate for setting binding production requirements, say so, precisely, with engineering rigour. That is how the means of compliance get calibrated to operational reality rather than regulatory assumption.
Do not confuse depth of documentation with depth of safety value. Explainability documentation that satisfies a certification reviewer but has never been validated with real end users in operational conditions is a compliance artefact. Build for the human, then document for the regulator, not the other way around.
The regulator is doing serious work here. The honest response from founders is not to rubber-stamp it, but to engage it with the same engineering seriousness it deserves, and to resist the temptation to over-engineer a moving target at the expense of the core AI system quality that will determine whether your application ever reaches certification in the first place.
Sources
[1] easa.europa.eu
[2] easa.europa.eu
[3] easa.europa.eu
[5] arxiv.org
[6] arxiv.org
[9] easa.europa.eu
[10] icao.int
[11] link.springer.com
[12] tandfonline.com
[13] pmc.ncbi.nlm.nih.gov
[14] researchgate.net
[15] arxiv.org
[16] en.wikipedia.org
[17] arxiv.org
[18] ncbi.nlm.nih.gov
&w=3840&q=75)
&w=3840&q=75)
&w=3840&q=75)