LOG-13Cloud & AI Related

Failures and Anomalies Reporting

Specification

Define, implement and evaluate processes, procedures and technical measures for the reporting of anomalies and failures of the monitoring system and provide immediate notification to the accountable party.

Threat coverage

Model manipulation

Data poisoning

Sensitive data disclosure

Model theft

Model/Service Failure

Insecure supply chain

Insecure apps/plugins

Denial of Service

Loss of governance

Architectural relevance

Physical infrastructure

Network

Compute

Storage

Application

Data

Lifecycle

Preparation

Data collection, Data curation, Data storage, Resource provisioning, Team and expertise

Development

Design, Training, Guardrails, Supply Chain

Evaluation

Evaluation, Validation/Red Teaming, Re-evaluation

Deployment

AI Services supply chain, AI applications

Delivery

Operations, Maintenance, Continuous monitoring, Continuous improvement

Retirement

Archiving, Data deletion, Model disposal

Ownership / SSRM

Shared Cloud Service Provider-Model Provider (Shared CSP-MP)

The CSP and MP are jointly responsible and accountable for the design, development, implementation, and enforcement of the control to mitigate security, privacy, or compliance risks associated with Large Language Model (LLM)/GenAI technologies in the context of the services or products they develop and offer.

Model

Owned by the Model Provider (MP)

The model provider (MP) designs, develops, and implements the control as part of their services or products to mitigate security, privacy, or compliance risks associated with the Large Language Model (LLM). Model Providers are entities that develop, train, and distribute foundational and fine-tuned AI models for various applications. They create the underlying AI capabilities that other actors build upon. Model Providers are responsible for model architecture, training methodologies, performance characteristics, and documentation of capabilities and limitations. They operate at the foundation layer of the AI stack and may provide direct API access to their models. Examples: OpenAI (GPT, DALL-E, Whisper), Anthropic(Claude), Google(Gemini), Meta(Llama), as well as any customized model.

Orchestrated

Shared Model Provider-Orchestrated Service Provider (Shared MP-OSP)

The MP and OSP are jointly responsible and accountable for the design, development, implementation, and enforcement of the control to mitigate security, privacy, or compliance risks associated with Large Language Model (LLM)/GenAI technologies in the context of the services or products they develop and offer.

Application

Shared Orchestrated Service Provider-Application Provider (Shared OSP-AP)

The OSP and AP are jointly responsible and accountable for the design, development, implementation, and enforcement of the control to mitigate security, privacy, or compliance risks associated with Large Language Model (LLM)/GenAI technologies in the context of the services or products they develop and offer.

Implementation guidelines

[All Actors]
1. Define AI-Specific Anomaly Criteria and Thresholds: Collaborate to set clear criteria for AI anomalies (e.g., performance drops, data drift, out-of-distribution inputs). Configure anomaly thresholds aligned with risk appetite, business requirements, and changing data conditions.

2. Adaptive Thresholding and Diagnostics: Employ adaptive thresholds to account for evolving data distributions and model behaviors. Integrate diagnostic tools that automatically analyze logs, performance metrics, and metadata to identify likely causes of detected anomalies.

3. Continuous Evaluation and Joint Reviews: Conduct regular reviews of model performance and system health with relevant stakeholders. Align on remediation strategies for anomalies driven by consumer-specific data or business logic changes. Update anomaly detection thresholds, response procedures, or backup mechanisms as AI use cases evolve.

4. Oversight of Critical Operations: Monitor high-impact AI functions (e.g., financial approvals, healthcare diagnostics) with real-time alerts and immediate escalation paths. Ensure the system can roll back to stable model versions or activate backup infrastructure to maintain continuity when high-severity anomalies occur.

Auditing guidelines

1. Inquiring with Control Owners

1.1 Conduct interviews with personnel responsible for defining, implementing, and evaluating processes for reporting cloud infrastructure monitoring system anomalies and failures to understand their procedures for detecting, classifying, and immediately notifying accountable parties of infrastructure issues affecting customer workloads and data protection. Verify their understanding of technical measures for cloud infrastructure anomaly detection, notification workflows for different infrastructure failure types, and evaluation processes that ensure cloud monitoring system reliability and timely escalation to responsible stakeholders including cloud operations teams and customer security contacts.

2. Inspecting Records and Documents

2.1 Verify cloud infrastructure systems are configured to detect logging anomalies such as dropped compute events, storage access failures, or customer workload data format corruption affecting service availability and customer isolation.

2.2 Check processes are in place for classifying cloud infrastructure failure severity and identifying responsible owners including cloud operations teams, customer success managers, and security incident response staff.

2.3 Validate cloud infrastructure failures trigger alert workflows in ticketing or incident response platforms with appropriate escalation to cloud operations and customer security teams.

2.4 Ensure fallback mechanisms exist when primary cloud infrastructure logging systems fail, including backup resource monitoring and customer workload tracking capabilities.

2.5 Confirm logs of cloud infrastructure failure events are themselves collected and analyzed to understand impact on customer services and infrastructure reliability.

2.6 Check that post-incident reviews incorporate root cause analysis for cloud infrastructure failures with focus on customer impact and service level agreement compliance.

2.7 Verify metrics are defined to track detection and resolution of cloud infrastructure anomalies including customer service impact and infrastructure availability measures.

2.8 Examine immediate notification procedures for cloud infrastructure monitoring failures to ensure accountable parties including cloud operations teams and customer security contacts receive timely alerts.

2.9 Review evaluation processes for assessing the effectiveness of cloud infrastructure anomaly reporting and failure notification procedures in maintaining customer trust and service reliability.

2.10 Validate that technical measures include automated escalation mechanisms for cloud infrastructure monitoring failures when initial notifications are not acknowledged by responsible cloud operations teams.

2.11 Confirm logging infrastructure includes built-in anomaly detection for write failures, latency, or integrity.

2.12 Verify system health metrics feed into anomaly classification engines.

2.13 Check for anomalies in cross-tenant logs, such as unauthorized metadata modifications.

2.14 Validate regulatory reporting mechanisms for significant failure events.

2.15 Ensure anomaly dashboards support both internal use and tenant visibility.

2.16 Confirm documented workflows route failures to engineering and trust teams for triage.

Standards mappings

ISO 42001No Gap

ISO 42001 B.8.4
ISO 42001 B.8.5
ISO 27001 9.1
ISO 27001 9.2
ISO 27001 A.5.27

Addendum

N/A

EU AI ActFull Gap

No Mapping

Addendum

No mention is made in the EU AI Act.

NIST AI 600-1No Gap

GV-4.3-002

Addendum

N/A

BSI AIC4No Gap

C4 SR-06
C5 OPS-17

Addendum

N/A

AI-CAIQ questions (2)

LOG-13.1

Are processes and technical measures for reporting monitoring system anomalies and failures defined, implemented, and evaluated?

LOG-13.2

Are accountable parties immediately notified about anomalies and failures?