Exception Management
Specification
Implement a procedure for the management of exceptions, including emergencies, in the change and configuration process. Align the procedure with the requirements of GRC-04: Policy Exception Process.
Threat coverage
Architectural relevance
Lifecycle
Data collection, Data curation, Data storage, Resource provisioning, Team and expertise
Design, Training, Guardrails
Evaluation, Validation/Red Teaming, Re-evaluation
Orchestration, AI Services supply chain, AI applications
Operations, Maintenance, Continuous monitoring, Continuous improvement
Archiving, Data deletion, Model disposal
Ownership / SSRM
PI
Shared across the supply chain
Shared control ownership refers to responsibilities and activities related to LLM security that are distributed across multiple stakeholders within the AI supply chain, including the Cloud Service Provider (CSP), Model Provider (MP), Orchestrated Service Provider (OSP), Application Provider (AP), and Customer (AIC). These controls require coordinated actions, communication, and governance across all involved parties to ensure their effectiveness.
Model
Owned by the Model Provider (MP)
The model provider (MP) designs, develops, and implements the control as part of their services or products to mitigate security, privacy, or compliance risks associated with the Large Language Model (LLM). Model Providers are entities that develop, train, and distribute foundational and fine-tuned AI models for various applications. They create the underlying AI capabilities that other actors build upon. Model Providers are responsible for model architecture, training methodologies, performance characteristics, and documentation of capabilities and limitations. They operate at the foundation layer of the AI stack and may provide direct API access to their models. Examples: OpenAI (GPT, DALL-E, Whisper), Anthropic(Claude), Google(Gemini), Meta(Llama), as well as any customized model.
Orchestrated
Shared Model Provider-Orchestrated Service Provider (Shared MP-OSP)
The MP and OSP are jointly responsible and accountable for the design, development, implementation, and enforcement of the control to mitigate security, privacy, or compliance risks associated with Large Language Model (LLM)/GenAI technologies in the context of the services or products they develop and offer.
Application
Shared Orchestrated Service Provider-Application Provider (Shared OSP-AP)
The OSP and AP are jointly responsible and accountable for the design, development, implementation, and enforcement of the control to mitigate security, privacy, or compliance risks associated with Large Language Model (LLM)/GenAI technologies in the context of the services or products they develop and offer.
Implementation guidelines
Auditing guidelines
1. Inquiry with Control Owners 1.1 Understand Infrastructure Exception Handling Practices: Interview infrastructure operations leaders, hardware engineers, and data center managers responsible for exception handling. Review documented exception policies covering: emergency hardware/firmware updates and infrastructure maintenance, expedited capacity expansion and resource reallocations (e.g., GPU/TPU), disruption response (e.g., network, storage, power, environmental), and post-incident review and documentation requirements. Verify that exception criteria are clearly defined for: emergencies requiring immediate changes (e.g., failures, vulnerabilities), expedited patches or reallocations due to performance constraints, and authorization levels needed based on severity and customer impact. 1.2 Review Exception Process Documentation: Examine procedures and artifacts that detail: exception request templates and approval workflows, risk assessment steps for infrastructure-related exceptions, temporary approval and escalation pathways, required documentation and post-change validation, and exception tracking and status monitoring. 1.3 Assess Emergency Response Protocols: Evaluate documented procedures for handling critical infrastructure events: hardware failures and firmware vulnerabilities, storage/data integrity issues or cache corruption, network fabric disruptions or latency spikes, power, cooling, or physical plant failures, and emergency resource quota adjustments. 1.4 Evaluate Governance and Oversight Structures: Confirm existence of: designated approval authorities and escalation paths, on-call emergency response teams per infrastructure domain, exception review boards and governance charters, executive oversight and GRC-04 alignment, and integration with enterprise risk and incident management. 2. Define and Verify Population of Exception Records 2.1 Complete Exception Inventory: Obtain a full inventory of exception records, including: emergency hardware/firmware updates, capacity expansion approvals, resource reallocation (e.g., accelerator pooling), and unplanned maintenance and retroactive exceptions. 2.2 Cross-Verify for Completeness: Ensure population accuracy by cross-referencing monitoring alerts and change tickets, incident and escalation records, service status reports and customer impact notifications, post-incident reviews, and risk registers. 3. Exception Sample Selection and Testing 3.1 Select Representative Exceptions: Choose samples that vary by: type (e.g., hardware update, network fix, quota increase), affected infrastructure (compute, storage, network), customer impact (high, medium, low), approval level and timeframe, justification category (performance, failure, security). 3.2 Evaluate Lifecycle of Each Exception: Review the following categories. Justification: clear rationale and urgency documented, evidence from monitoring or capacity thresholds, risk assessment and consideration of alternatives, and fit within defined exception criteria. Approval: approval by appropriate authority (or retroactively for emergencies), conditions/time limitations documented and followed. Implementation: verified through logs or infrastructure management tools, confined to approved scope and components, monitoring and mitigation applied during exception, stakeholder communication documented (e.g., customer alerts). Closure and Follow-up: timely closure and rollback (if applicable), validation tests conducted, lessons captured and documented, reintegration into standard processes completed. 4. Exception Tracking, Governance, and Continuous Improvement 4.1 Assess Tracking and Oversight: Verify centralized tracking of infrastructure exceptions, expiration tracking for temporary approvals, governance reporting and executive visibility, trend analysis and identification of recurring issues, and integration with customer impact and risk reporting. 4.2 Evaluate Improvement Mechanisms: Assess maturity of the CSP’s improvement processes: regular exception pattern reviews, incident-driven process refinements, reductions in emergency change frequency, improved emergency response calibration, updates to exception criteria as operations evolve, and infrastructure architecture adaptations to minimize exceptions. From CCM: 1. Verify that the organization establishes and documents mandatory configuration settings for information technology products employed within the information system, as determined by adoption of the latest suitable security configuration baselines. 2. Confirm that the process identifies, documents, and approves exceptions from the mandatory established configuration settings for individual components based on explicit operational requirements. 3. Determine that the organization monitors and controls changes to the configuration settings in accordance with organizational policy and procedures.
Standards mappings
42001: Clause 6.3 Planning of changes 42001: Clause 8.1 Operational planning and control 42001: Clause 10.2 Nonconformity and corrective action
Addendum
N/A
No Mapping
Addendum
The EU AI Act does not cover the CCC-08 topic, "Implement a procedure for the management of exceptions, including emergencies, in the change and configuration process. Align the procedure with the requirements of GRC-04: Policy Exception Process," for any of the AI structures defined within the EU AI Act.
GV-1.3-007 GV-6.2-003 GV-2.3-001 MG-2.4-002 MG-4.3-001 GV-1.5-002 GV-2.1-002 GV-6.2-006
Addendum
NIST AI 600-1 does not cover "emergency change management process requirements."
DEV-03 SIM-01 DEV-08 SP-03
Addendum
N/A
AI-CAIQ questions (1)
Is a procedure implemented (aligning with the requirements of GRC-04: Policy Exception Process) for the management of exceptions, including emergencies, in the change and configuration process?