CCC · Change Control and Configuration Management

CCC-07Cloud & AI Related

Detection of Baseline Deviation

Specification

Implement detection measures with proactive notification in case of changes deviating from the established baseline.

Threat coverage

Model manipulation

Data poisoning

Sensitive data disclosure

Model theft

Model/Service Failure

Insecure supply chain

Insecure apps/plugins

Denial of Service

Loss of governance

Architectural relevance

Physical infrastructure

Network

Compute

Storage

Application

Data

Lifecycle

Preparation

Data collection, Data curation, Data storage

Development

Design, Training, Guardrails

Evaluation

Evaluation, Validation/Red Teaming, Re-evaluation

Deployment

Orchestration, AI Services supply chain, AI applications

Delivery

Operations, Maintenance, Continuous monitoring, Continuous improvement

Retirement

Archiving, Data deletion, Model disposal

Ownership / SSRM

Shared across the supply chain

Shared control ownership refers to responsibilities and activities related to LLM security that are distributed across multiple stakeholders within the AI supply chain, including the Cloud Service Provider (CSP), Model Provider (MP), Orchestrated Service Provider (OSP), Application Provider (AP), and Customer (AIC). These controls require coordinated actions, communication, and governance across all involved parties to ensure their effectiveness.

Model

Owned by the Model Provider (MP)

The model provider (MP) designs, develops, and implements the control as part of their services or products to mitigate security, privacy, or compliance risks associated with the Large Language Model (LLM). Model Providers are entities that develop, train, and distribute foundational and fine-tuned AI models for various applications. They create the underlying AI capabilities that other actors build upon. Model Providers are responsible for model architecture, training methodologies, performance characteristics, and documentation of capabilities and limitations. They operate at the foundation layer of the AI stack and may provide direct API access to their models. Examples: OpenAI (GPT, DALL-E, Whisper), Anthropic(Claude), Google(Gemini), Meta(Llama), as well as any customized model.

Orchestrated

Shared Model Provider-Orchestrated Service Provider (Shared MP-OSP)

The MP and OSP are jointly responsible and accountable for the design, development, implementation, and enforcement of the control to mitigate security, privacy, or compliance risks associated with Large Language Model (LLM)/GenAI technologies in the context of the services or products they develop and offer.

Application

Owned by the Application Provider (AP)

The Application Provider (AP) is responsible for the design, development, implementation, and enforcement of the control to mitigate security, privacy, or compliance risks associated with Large Language Model (LLM)/GenAI technologies in the context of the services or products they develop and offer. The AP is responsible and accountable for the implementation of the control within its own infrastructure/environment. If the control has downstream implications on the users/customers, the AP is responsible for enabling the customer and/or upstream partner in the implementation/configuration of the control within their risk management approach. The AP is accountable for carrying out the due diligence on its upstream providers (e.g MPs, Orchestrated Services) to verify that they implement the control as it relates to the service/product develop and offered by the AP. These providers build and offer end-user applications that leverage generative AI models for specific tasks such as content creation, chatbots, code generation, and enterprise automation. These applications are often delivered as software-as-a-service (SaaS) solutions. These providers focus on user interfaces, application logic, domain-specific functionality, and overall user experience rather than underlying model development. Example: OpenAI (GPTs,Assistants), Zapier, CustomGPT, Microsoft Copilot (integrated into Office products), Jasper (AI-driven content generation), Notion AI (AI-enhanced productivity tools), Adobe Firefly (AI-generated media), and AI-powered customer service solutions like Amazon Rufus, as well as any organization that develops its AI-based application internally.

Implementation guidelines

[All Actors]
1. Define baseline states and metrics for every relevant layer: configurations, infrastructure, service behaviour, model-performance KPIs and security settings.

2. Continuously monitor telemetry and logs against those baselines by using rules-based or ML-driven detectors (e.g., SIEM, APM, drift-detection services).

3. Generate automated alerts (severity, owner, escalation path) as soon as deviations exceed predefined thresholds.

4. Record every deviation event with root cause, corrective action and outcome; store results in a change / incident repository for audit and learning.

5. Review and tune baselines, thresholds and detection logic on a regular basis or whenever major incidents, architectural change or new threat intelligence.

Auditing guidelines

1. Inquiry with Control Owners

1.1 Interview monitoring and operations personnel responsible for detecting changes to AI infrastructure. Obtain and review the organization's monitoring strategies, alert thresholds, and notification workflows for: AI accelerator (GPU/TPU) performance characteristics, distributed computing environment configurations, high-performance storage system metrics, specialized networking fabric performance, hardware driver and firmware versions, and resource allocation and scheduling policies. Verify the existence of documented detection mechanisms for: hardware performance degradation patterns, driver compatibility or stability issues, storage throughput and latency deviations, network fabric performance regression, resource contention and scheduling anomalies, and infrastructure capacity constraints. Identify monitoring tools used for: hardware telemetry collection and analysis, accelerator-specific performance profiling, storage I/O pattern monitoring, network packet flow analysis, resource utilization heat mapping, and distributed system synchronization monitoring.

1.2 Review Notification and Response Procedures: Examine documentation describing notification pathways when infrastructure issues are detected. Understand escalation procedures based on customer impact and resource criticality. Verify integration between detection systems and infrastructure engineering teams. Assess emergency response capabilities for high-severity infrastructure incidents impacting multiple customers. Review response playbooks for different types of infrastructure-related issues: accelerator hardware performance degradation, storage system throughput or latency issues, network fabric congestion or packet loss, resource scheduler inefficiencies, driver or firmware compatibility problems, and distributed system synchronization failures.

2. Obtaining and Verifying the Population of Records

2.1 Define the complete population of monitoring records by inventorying monitoring systems for AI infrastructure, including hardware telemetry collection platforms, accelerator (GPU/TPU) monitoring systems, storage performance tracking tools, network fabric monitoring infrastructure, resource manager logging and metrics, virtualization and container monitoring, driver and firmware version tracking, and capacity planning and forecasting systems. 

2.2 Verify completeness of the population by cross-referencing monitoring coverage against the inventory of AI infrastructure component. Verify monitoring covers all regions, availability zones, and deployment models.

3. Inspection of Evidence

3.1 Monitoring System Verification: Verify that monitoring systems are configured to detect deviations in the following categories. For AI Accelerator Performance: computational throughput (FLOPS, operations/second), memory bandwidth and utilization, power consumption and thermal characteristics, training/inference benchmark performance, error rates and correction events, and utilization efficiency across workloads. For Storage System Characteristics: I/O operations per second (IOPS), throughput (GB/s) for sequential access, latency distributions for different operation types, queue depths and blocking operations, cache hit rates and effectiveness, and storage capacity utilization trends. For Network Fabric Performance: bandwidth utilization for collective operations, latency profiles for inter-node communication, packet loss rates and retransmissions, congestion events and back pressure signals, quality of service enforcement effectiveness, and network topology efficiency. For Resource Management Effectiveness: allocation efficiency and resource fragmentation, scheduling fairness across customers, queue wait times for resource types, preemption rates and impact, resource affinity and locality effectiveness, and quota enforcement accuracy.

3.2 Alert Configuration Assessment: Examine alert configuration to verify: tiered thresholds based on resource type and cost, graduated alerting based on deviation persistence, different sensitivity for different customer tiers, correlation between related infrastructure metrics, seasonality and workload-aware baselines, forecasting-based proactive notifications, and hardware generation-specific threshold adjustments.

3.3 Sample-Based Testing of Detection Capabilities: Select a representative sample of infrastructure components and perform controlled tests: induce synthetic accelerator load patterns, create storage I/O contention scenarios, simulate network congestion conditions, generate resource allocation imbalances, inject driver or firmware compatibility issues, and test distributed system synchronization edge cases. Verify that monitoring systems: accurately detect the simulated issues, generate appropriate alerts with correct severity, include sufficient diagnostic context, trigger within expected timeframes, follow defined notification workflows, and properly identify fault domains and impact scope.

3.4 Alert Notification Workflow Verification: Trace the notification path for different types of infrastructure issues: initial detection and enrichment with telemetry, routing to appropriate infrastructure teams, escalation for customer-impacting issues, hardware vendor coordination workflows, customer notification processes, maintenance scheduling integration, and cross-region incident coordination. 

3.5 Response Effectiveness Evaluation: Review historical infrastructure incidents to evaluate: time to detect performance deviations, quality of diagnostic information, response time to critical infrastructure issues, effectiveness of remediation actions, 
customer impact minimization, vendor coordination effectiveness, and root cause analysis thoroughness.

3.6 Automated Remediation Assessment: Verify implementation of automated remediation for common issues: accelerator thermal throttling management, storage path failover mechanisms, network route optimization and reconfiguration, resource rebalancing and workload migration, driver rollback capabilities, self-healing distributed system recovery, and preemptive resource capacity expansion.

3.7 Integration with Capacity Planning: Assess how detection systems feed into capacity management: early warning indicators for capacity constraints, trend analysis for resource utilization, predictive analytics for hardware procurement, seasonal demand pattern recognition, capacity risk assessment automation, hardware lifecycle and refresh monitoring, and geographic expansion trigger indicators.

Standards mappings

ISO 42001Partial Gap

42001: Clause 9.1 - monitoring
including deviating changes from baseline
42001: Clause 7.4 - for notification of deviated changes
42001: A.6.2.6 AI system operation and monitoring
42001: B.7.1.1 - Monitoring and review of AI system behavior
27001: A.8.32 Change management
27001: A.8.9 Establish & maintain baseline configuration

Addendum

Require: Establishing an explicit baseline, Monitoring against the baseline, Proactive notification when deviations occur, Periodic validation of these mechanisms.

EU AI ActFull Gap

No Mapping

Addendum

The EU AI Act does not cover the CCC-07 topic, "Implement detection measures with proactive notification in case of changes deviating from the established baseline," for any of the AI structures defined within the EU AI Act.

NIST AI 600-1Partial Gap

MS-5.1-005
MS-2.7-007
MG-4.1-006
MG-4.1-002
MS-2.6-005
MS-2.7-009
GV-6.2-004
GV-1.5-002
MS-1.1-002
MS-1.3-002

Addendum

The NIST AI 600-1 is not clear that "detection measures with proactive notification must be in place" is the actual requirement.

BSI AIC4No Gap

OPS-23 Additional Criteria
DEV-07
DEV-08

Addendum

N/A

AI-CAIQ questions (1)

CCC-07.1

Are detection measures with proactive notification implemented in case of changes deviating from the established baseline?