Failures and Anomalies Reporting
Specification
Define, implement and evaluate processes, procedures and technical measures for the reporting of anomalies and failures of the monitoring system and provide immediate notification to the accountable party.
Threat coverage
Architectural relevance
Lifecycle
Data collection, Data curation, Data storage, Resource provisioning, Team and expertise
Design, Training, Guardrails, Supply Chain
Evaluation, Validation/Red Teaming, Re-evaluation
AI Services supply chain, AI applications
Operations, Maintenance, Continuous monitoring, Continuous improvement
Archiving, Data deletion, Model disposal
Ownership / SSRM
PI
Shared Cloud Service Provider-Model Provider (Shared CSP-MP)
The CSP and MP are jointly responsible and accountable for the design, development, implementation, and enforcement of the control to mitigate security, privacy, or compliance risks associated with Large Language Model (LLM)/GenAI technologies in the context of the services or products they develop and offer.
Model
Owned by the Model Provider (MP)
The model provider (MP) designs, develops, and implements the control as part of their services or products to mitigate security, privacy, or compliance risks associated with the Large Language Model (LLM). Model Providers are entities that develop, train, and distribute foundational and fine-tuned AI models for various applications. They create the underlying AI capabilities that other actors build upon. Model Providers are responsible for model architecture, training methodologies, performance characteristics, and documentation of capabilities and limitations. They operate at the foundation layer of the AI stack and may provide direct API access to their models. Examples: OpenAI (GPT, DALL-E, Whisper), Anthropic(Claude), Google(Gemini), Meta(Llama), as well as any customized model.
Orchestrated
Shared Model Provider-Orchestrated Service Provider (Shared MP-OSP)
The MP and OSP are jointly responsible and accountable for the design, development, implementation, and enforcement of the control to mitigate security, privacy, or compliance risks associated with Large Language Model (LLM)/GenAI technologies in the context of the services or products they develop and offer.
Application
Shared Orchestrated Service Provider-Application Provider (Shared OSP-AP)
The OSP and AP are jointly responsible and accountable for the design, development, implementation, and enforcement of the control to mitigate security, privacy, or compliance risks associated with Large Language Model (LLM)/GenAI technologies in the context of the services or products they develop and offer.
Implementation guidelines
Auditing guidelines
1. Inquiring with Control Owners 1.1 Conduct interviews with personnel responsible for defining, implementing, and evaluating processes for reporting cloud infrastructure monitoring system anomalies and failures to understand their procedures for detecting, classifying, and immediately notifying accountable parties of infrastructure issues affecting customer workloads and data protection. Verify their understanding of technical measures for cloud infrastructure anomaly detection, notification workflows for different infrastructure failure types, and evaluation processes that ensure cloud monitoring system reliability and timely escalation to responsible stakeholders including cloud operations teams and customer security contacts. 2. Inspecting Records and Documents 2.1 Verify cloud infrastructure systems are configured to detect logging anomalies such as dropped compute events, storage access failures, or customer workload data format corruption affecting service availability and customer isolation. 2.2 Check processes are in place for classifying cloud infrastructure failure severity and identifying responsible owners including cloud operations teams, customer success managers, and security incident response staff. 2.3 Validate cloud infrastructure failures trigger alert workflows in ticketing or incident response platforms with appropriate escalation to cloud operations and customer security teams. 2.4 Ensure fallback mechanisms exist when primary cloud infrastructure logging systems fail, including backup resource monitoring and customer workload tracking capabilities. 2.5 Confirm logs of cloud infrastructure failure events are themselves collected and analyzed to understand impact on customer services and infrastructure reliability. 2.6 Check that post-incident reviews incorporate root cause analysis for cloud infrastructure failures with focus on customer impact and service level agreement compliance. 2.7 Verify metrics are defined to track detection and resolution of cloud infrastructure anomalies including customer service impact and infrastructure availability measures. 2.8 Examine immediate notification procedures for cloud infrastructure monitoring failures to ensure accountable parties including cloud operations teams and customer security contacts receive timely alerts. 2.9 Review evaluation processes for assessing the effectiveness of cloud infrastructure anomaly reporting and failure notification procedures in maintaining customer trust and service reliability. 2.10 Validate that technical measures include automated escalation mechanisms for cloud infrastructure monitoring failures when initial notifications are not acknowledged by responsible cloud operations teams. 2.11 Confirm logging infrastructure includes built-in anomaly detection for write failures, latency, or integrity. 2.12 Verify system health metrics feed into anomaly classification engines. 2.13 Check for anomalies in cross-tenant logs, such as unauthorized metadata modifications. 2.14 Validate regulatory reporting mechanisms for significant failure events. 2.15 Ensure anomaly dashboards support both internal use and tenant visibility. 2.16 Confirm documented workflows route failures to engineering and trust teams for triage.
Standards mappings
ISO 42001 B.8.4 ISO 42001 B.8.5 ISO 27001 9.1 ISO 27001 9.2 ISO 27001 A.5.27
Addendum
N/A
No Mapping
Addendum
No mention is made in the EU AI Act.
GV-4.3-002
Addendum
N/A
C4 SR-06 C5 OPS-17
Addendum
N/A
AI-CAIQ questions (2)
Are processes and technical measures for reporting monitoring system anomalies and failures defined, implemented, and evaluated?
Are accountable parties immediately notified about anomalies and failures?