CCC · Change Control and Configuration Management

CCC-09Cloud & AI Related

Change Restoration

Specification

Define and implement a process to proactively roll back changes to a previous known good state in case of errors or security concerns.

Threat coverage

Model manipulation

Data poisoning

Sensitive data disclosure

Model theft

Model/Service Failure

Insecure supply chain

Insecure apps/plugins

Denial of Service

Loss of governance

Architectural relevance

Physical infrastructure

Network

Compute

Storage

Application

Data

Lifecycle

Preparation

Data curation, Data storage, Resource provisioning

Development

Training, Guardrails

Evaluation

Re-evaluation

Deployment

Orchestration, AI Services supply chain, AI applications

Delivery

Maintenance, Continuous improvement

Retirement

Archiving, Data deletion, Model disposal

Ownership / SSRM

Shared across the supply chain

Shared control ownership refers to responsibilities and activities related to LLM security that are distributed across multiple stakeholders within the AI supply chain, including the Cloud Service Provider (CSP), Model Provider (MP), Orchestrated Service Provider (OSP), Application Provider (AP), and Customer (AIC). These controls require coordinated actions, communication, and governance across all involved parties to ensure their effectiveness.

Model

Owned by the Model Provider (MP)

The model provider (MP) designs, develops, and implements the control as part of their services or products to mitigate security, privacy, or compliance risks associated with the Large Language Model (LLM). Model Providers are entities that develop, train, and distribute foundational and fine-tuned AI models for various applications. They create the underlying AI capabilities that other actors build upon. Model Providers are responsible for model architecture, training methodologies, performance characteristics, and documentation of capabilities and limitations. They operate at the foundation layer of the AI stack and may provide direct API access to their models. Examples: OpenAI (GPT, DALL-E, Whisper), Anthropic(Claude), Google(Gemini), Meta(Llama), as well as any customized model.

Orchestrated

Shared Model Provider-Orchestrated Service Provider (Shared MP-OSP)

The MP and OSP are jointly responsible and accountable for the design, development, implementation, and enforcement of the control to mitigate security, privacy, or compliance risks associated with Large Language Model (LLM)/GenAI technologies in the context of the services or products they develop and offer.

Application

Shared Orchestrated Service Provider-Application Provider (Shared OSP-AP)

The OSP and AP are jointly responsible and accountable for the design, development, implementation, and enforcement of the control to mitigate security, privacy, or compliance risks associated with Large Language Model (LLM)/GenAI technologies in the context of the services or products they develop and offer.

Implementation guidelines

[All Actors]
1. Define rollback procedures for critical systems, configurations, models, and services, ensuring that fallback mechanisms are in place in case of failed or unauthorized changes.

2. Establish rollback baselines by capturing pre-change states (e.g., system configurations, model weights, orchestration templates, and environment variables) as part of every approved change request.

3. Store rollback baselines in version-controlled repositories or configuration management tools with traceable metadata (who, what, when).

4. Implement automated or semi-automated rollback mechanisms that allow rapid restoration to the last known good state in the event of an error, misconfiguration, or breach.

5. Define a formal rollback process, including initiation criteria, required approvals, execution steps, and post-rollback validation (e.g., functional and security checks).

6. Document rollback events, including cause, execution steps, impact, and lessons learned, to improve future change resilience and readiness.

Auditing guidelines

1. Inquiry with Control Owners

1.1 Interview Infrastructure Operations Teams and Review Rollback Policies: Interview infrastructure operations leaders, hardware engineers, and data center managers responsible for AI infrastructure change management and rollback processes, and obtain organizational rollback policies and procedures including criteria for initiating infrastructure rollbacks, rollback decision authority matrix for different infrastructure components, emergency rollback procedures for critical hardware/firmware issues, planned rollback testing requirements for infrastructure changes, post-rollback validation protocols for AI workload performance, and rollback process documentation requirements for customer communication, while verifying documented criteria for hardware/firmware issues requiring immediate rollback, performance degradation thresholds warranting rollback, security vulnerabilities requiring immediate remediation, resource availability issues necessitating configuration rollback, and customer impact thresholds triggering intervention.

1.2 Review Process Documentation and State Management: For Rollback Process Documentation, examine documentation describing rollback planning requirements for all infrastructure changes, technical rollback mechanisms for different infrastructure components including accelerator (GPU/TPU) driver and firmware rollback, compute cluster configuration restoration, storage system configuration rollback, network fabric and interconnect parameter rollback, resource scheduler configuration restoration, and virtualization platform version rollback, along with multi-tenant impact management during rollbacks, communication protocols for AI workload customers during rollbacks, service-level agreement considerations during infrastructure rollbacks, and verification requirements after infrastructure rollback. For Known Good State Management, review procedures for establishing and validating known good states including definition of "known good state" for AI infrastructure components, infrastructure performance benchmarking requirements, configuration snapshot and backup procedures, hardware-software compatibility validation procedures, performance characteristic documentation for stable configurations, version tagging for firmware, drivers, and configuration artifacts, and AI workload validation tests for baseline configurations.

1.3 Evaluate Deployment Architecture: Assess how the deployment architecture supports rollback capabilities through infrastructure-as-code implementation and versioning, configuration management database (CMDB) capabilities, hardware firmware/driver rollback mechanisms, hypervisor and container platform version management, network configuration version control, resource management policy versioning, and automated infrastructure deployment pipeline rollback capabilities.

2. Inspection of Evidence

2.1 Rollback Strategy Documentation Review: Verify comprehensive rollback strategy documentation, including: Component-Specific Rollback Approaches (accelerator hardware driver/firmware rollback, compute cluster configuration restoration, storage system parameter and firmware rollback, network fabric configuration rollback, resource scheduler policy restoration, virtualization/container platform version rollback, infrastructure monitoring system rollback); Rollback Decision Process (performance degradation thresholds triggering rollback, security vulnerability severity assessment methodology, customer workload impact evaluation process, decision authority and escalation protocols for different components, multi-tenant impact consideration in decision-making); Rollback Execution Process (step-by-step rollback procedures for each infrastructure component, required validation steps during rollback execution, customer workload handling during transitions, dependency management across infrastructure layers, order of operations for complex multi-component rollbacks, monitoring requirements during transition states); Post-Rollback Activities (infrastructure performance validation procedures, customer workload validation requirements, notification procedures for affected customers, root cause analysis requirements, documentation and knowledge capture, long-term remediation planning).

2.2 Tools and Technical Implementation Assessment: Evaluate tools and technical implementations supporting rollback, including: infrastructure-as-code version control, configuration management database implementation, hardware management interfaces and rollback capabilities, firmware/driver repository management, infrastructure monitoring during transitions, automated configuration deployment and rollback, testing frameworks for infrastructure validation, and resource scheduling and workload migration tools.

2.3 Sample-Based Testing of Rollback Capabilities: Select a representative sample of infrastructure components and verify: Rollback Planning (documentation of rollback plans for recent infrastructure changes, identification of known good configuration states, customer workload impact analysis for potential rollbacks, testing protocols for validating rolled back configurations, time and resource estimates for rollback execution); Rollback Testing (evidence of regular rollback capability testing, performance benchmarking following test rollbacks, simulation exercises for critical infrastructure components, customer workload validation during test rollbacks, measurement of infrastructure restoration times); Known Good State Verification (infrastructure performance validation procedures, configuration validation against baselines, hardware-software compatibility verification, documentation of acceptable performance parameters, preservation of configuration artifacts for known good states).

2.4 Previous Rollback Execution Review: For a sample of previously executed rollbacks, verify: Rollback Trigger Assessment (clear documentation of infrastructure issues triggering rollback, alignment with defined performance or security criteria, customer impact assessment documentation, appropriate authority involvement in decision); Rollback Execution Documentation (component-specific rollback execution records, configuration management and version control evidence, issues encountered during transition, timing of infrastructure restoration, communication to affected customers); Post-Rollback Activities (infrastructure performance verification results, customer workload validation outcomes, impact assessment on AI workloads, root cause identification for original issue, preventative measures implementation).

2.5 Automated Monitoring and Rollback Integration: Assess the integration between monitoring systems and rollback processes: automated detection of infrastructure performance degradation, hardware failure and anomaly detection capabilities, resource utilization monitoring and threshold alerting, automated rollback triggers for critical infrastructure issues, progressive configuration deployment with automatic reversion, and continuous performance monitoring during transition periods.

2.6 Customer Communication Procedures: Evaluate procedures for customer communication during rollbacks: proactive notification protocols based on service tier, status update frequency during rollback operations, expected impact and timeline communications, customer-specific workload handling guidance, post-rollback verification communication, and root cause explanation and remediation planning.

3. Evaluation and Reporting

3.1 Rollback Capability Effectiveness Assessment: Evaluate how well rollback processes: meet defined recovery time objectives for different service tiers, successfully restore infrastructure performance to baseline levels, maintain hardware-software compatibility, minimize customer workload disruption, cover all AI infrastructure components comprehensively, balance automated detection and human judgment, and scale across deployment environments and availability zones. 

3.2 Known Good State Management Assessment: Assess the effectiveness of known good state management: clarity of infrastructure performance baseline definition, comprehensive validation of configuration changes before promotion, preservation of configuration artifacts and snapshots, accessibility of configuration backups during incidents, and frequency of validation testing for known good configurations.

3.3 Rollback Process Documentation Quality: Evaluate the quality of rollback process documentation: clarity of component-specific rollback procedures, 
technical details for different hardware and software combinations, customer impact considerations across service tiers, accessibility to operations and incident response teams, alignment with actual infrastructure architecture, and regular updates following infrastructure changes.

3.4 Continuous Improvement Mechanisms: Evaluate processes for improving rollback capabilities: regular review of infrastructure recovery metrics, incorporation of lessons learned from performance incidents, technical capability enhancement for faster restoration, process refinement based on customer feedback, architectural improvements to simplify rollback procedures, and evolution of infrastructure validation methods.

From CCM:
1. Examine policy and/or procedures related to change management and determine if roll back procedures are defined and implemented, including procedures and responsibilities for aborting and recovering from unsuccessful changes and unforeseen events.
2. Examine relevant documentation, observe relevant processes, and/or interview the control owner(s) and/or relevant stakeholders, as needed to ensure that rollback procedures are defined and implemented and determine if the policy control requirements stipulated in the policy have been implemented. Select a sample of changes and examine the change management record to confirm that the change was assessed and included appropriate fallback procedures in the event of a failed change.
3. Examine measure(s) that evaluate(s) the organization's compliance with the change management policy and determine if these measures are implemented according to policy control requirements.
4. Obtain and examine supporting documentation maintained as evidence of these metrics, measures, tests, or audits to determine if the office or individual responsible reviews the information and, if issues were identified, they were investigated and corrected.

Standards mappings

ISO 42001No Gap

42001: Clause 6.3 Planning of changes
42001: Clause 8.1 Operational planning and control
42001: Clause 10.2 Nonconformity and corrective action

Addendum

N/A

EU AI ActFull Gap

No Mapping

Addendum

The EU AI Act does not cover the CCC-09 topic, "Define and implement a process to proactively roll back changes to a previous known good state in case of errors or security concerns," for any of the AI structures defined within the EU AI Act.

NIST AI 600-1No Gap

GV-6.2-006

Addendum

N/A

BSI AIC4No Gap

DEV-08

Addendum

N/A

AI-CAIQ questions (1)

CCC-09.1

Is a process defined and implemented to proactively roll back changes to a previous known good state in case of errors or security concerns?