BCR · Business Continuity Management and Operational Resilience

BCR-06Cloud & AI Related

Business Continuity Exercises

Specification

Follow a structured approach to evaluate the effectiveness of the business continuity and operational resilience plans at planned intervals or upon significant changes.

Threat coverage

Model manipulation

Data poisoning

Sensitive data disclosure

Model theft

Model/Service Failure

Insecure supply chain

Insecure apps/plugins

Denial of Service

Loss of governance

Architectural relevance

Physical infrastructure

Network

Compute

Storage

Application

Data

Lifecycle

Preparation

Data storage, Resource provisioning, Team and expertise

Development

Not applicable

Evaluation

Not applicable

Deployment

AI applications, Orchestration

Delivery

Operations, Maintenance, Continuous monitoring

Retirement

Archiving, Data deletion, Model disposal

Ownership / SSRM

Shared across the supply chain

Shared control ownership refers to responsibilities and activities related to LLM security that are distributed across multiple stakeholders within the AI supply chain, including the Cloud Service Provider (CSP), Model Provider (MP), Orchestrated Service Provider (OSP), Application Provider (AP), and Customer (AIC). These controls require coordinated actions, communication, and governance across all involved parties to ensure their effectiveness.

Model

Owned by the Model Provider (MP)

The model provider (MP) designs, develops, and implements the control as part of their services or products to mitigate security, privacy, or compliance risks associated with the Large Language Model (LLM). Model Providers are entities that develop, train, and distribute foundational and fine-tuned AI models for various applications. They create the underlying AI capabilities that other actors build upon. Model Providers are responsible for model architecture, training methodologies, performance characteristics, and documentation of capabilities and limitations. They operate at the foundation layer of the AI stack and may provide direct API access to their models. Examples: OpenAI (GPT, DALL-E, Whisper), Anthropic(Claude), Google(Gemini), Meta(Llama), as well as any customized model.

Orchestrated

Shared Model Provider-Orchestrated Service Provider (Shared MP-OSP)

The MP and OSP are jointly responsible and accountable for the design, development, implementation, and enforcement of the control to mitigate security, privacy, or compliance risks associated with Large Language Model (LLM)/GenAI technologies in the context of the services or products they develop and offer.

Application

Owned by the Application Provider (AP)

The Application Provider (AP) is responsible for the design, development, implementation, and enforcement of the control to mitigate security, privacy, or compliance risks associated with Large Language Model (LLM)/GenAI technologies in the context of the services or products they develop and offer. The AP is responsible and accountable for the implementation of the control within its own infrastructure/environment. If the control has downstream implications on the users/customers, the AP is responsible for enabling the customer and/or upstream partner in the implementation/configuration of the control within their risk management approach. The AP is accountable for carrying out the due diligence on its upstream providers (e.g MPs, Orchestrated Services) to verify that they implement the control as it relates to the service/product develop and offered by the AP. These providers build and offer end-user applications that leverage generative AI models for specific tasks such as content creation, chatbots, code generation, and enterprise automation. These applications are often delivered as software-as-a-service (SaaS) solutions. These providers focus on user interfaces, application logic, domain-specific functionality, and overall user experience rather than underlying model development. Example: OpenAI (GPTs,Assistants), Zapier, CustomGPT, Microsoft Copilot (integrated into Office products), Jasper (AI-driven content generation), Notion AI (AI-enhanced productivity tools), Adobe Firefly (AI-generated media), and AI-powered customer service solutions like Amazon Rufus, as well as any organization that develops its AI-based application internally.

Implementation guidelines

[All actors]
1. Preparation Steps:
a. Define clear learning objectives for each activity and involve representatives from all interconnected services.
b. Create realistic scenarios based on genuine vulnerabilities.
c. Establish safe environments that won't impact production systems

2. Exercise Activities: 
a. Tabletop Discussions: Team conversations walking through potential scenarios.
b. Functional Drills: Testing specific capabilities like model rollbacks or alternative workflows.
c. Simulated Incidents: Comprehensive scenarios involving multiple teams and systems.
d. Surprise Challenges: Unannounced tests of readiness with realistic constraints.

3. Post-Exercise:
a. Conduct immediate debrief sessions while experiences are fresh.
b. Document specific improvement opportunities and assign action items with clear ownership.
c. Share relevant insights across organizational boundaries and update continuity plans based on lessons learned.

4. Combined Scenarios
a. Annual end-to-end simulation involving all ecosystem participants.
b. Regular validation of shared alert and escalation mechanisms.
c. Create common evaluation framework for cross-entity exercises.

Auditing guidelines

1. Examine the plans for business continuity and operational resilience tests, regarding their intended outputs.

2. Examine the schedules of such tests and their periodicity.

3. Evaluate if the plans are tested upon significant changes or at least annually.

4. Verify that the exercise scenarios include various infrastructure failure modes, including power outages, hardware failures, network disruptions, and regional disasters that affect AI processing capabilities.

5. Review exercise results and documentation to confirm that critical AI infrastructure components (compute, networking, storage) are included in the scope and that recovery time objectives (RTOs) and recovery point objectives (RPOs) were measured against established targets.

6. Assess documentation of lessons learned from exercises and verify that identified deficiencies in infrastructure resilience were documented in a corrective action plan with clear ownership and timelines.

7. Examine evidence that infrastructure redundancy mechanisms (e.g., failover systems, load balancing, backup power) were tested explicitly during exercises.

8. Verify that the appropriate management responsible for infrastructure operations reviewed and approved the exercise planning, execution, and results.

Standards mappings

ISO 42001Partial Gap

ISO 27001 A.5.30 (ICT readiness for business continuity)

Addendum

There is no control in ISO 42001 that covers the AICM BCR-06 topic of exercising a BCP. However, there is coverage with 27001 A.5.30 ICT readiness for business continuity, where a BCP is required to be tested based on organization's risk appetite.

EU AI ActPartial Gap

Article 16
Article 17(1) (d)

Addendum

Develop business continuity and operational resilience plans, conduct regular and event-driven exercises, include these in the QMS and technical documentation, and track lessons learned and updates.

NIST AI 600-1Partial Gap

MS-2.7-007
MP-5.1-005
GV-6.2-003

Addendum

Include an action that explicitly mandates that business continuity and disaster recovery plans be exercised at least annually (or upon significant changes) and that the results be used for continual improvement.

BSI AIC4No Gap

BCM-04

Addendum

N/A

AI-CAIQ questions (1)

BCR-06.1

Is a structured approach to evaluate the effectiveness of the business continuity and operational resilience plans, followed at planned intervals or upon significant changes?