AIS-14AI-Specific

AI Cache Protection

Specification

Implement security measures to protect caches in GenAI systems and services.

Threat coverage

Model manipulation

Data poisoning

Sensitive data disclosure

Model theft

Model/Service Failure

Insecure supply chain

Insecure apps/plugins

Denial of Service

Loss of governance

Architectural relevance

Physical infrastructure

Network

Compute

Storage

Application

Data

Lifecycle

Preparation

Not applicable

Development

Design, Guardrails

Evaluation

Validation/Red Teaming

Deployment

Orchestration, AI applications

Delivery

Operations, Maintenance

Retirement

Data deletion

Ownership / SSRM

Shared Cloud Service Provider-Model Provider (Shared CSP-MP)

The CSP and MP are jointly responsible and accountable for the design, development, implementation, and enforcement of the control to mitigate security, privacy, or compliance risks associated with Large Language Model (LLM)/GenAI technologies in the context of the services or products they develop and offer.

Model

Owned by the Model Provider (MP)

The model provider (MP) designs, develops, and implements the control as part of their services or products to mitigate security, privacy, or compliance risks associated with the Large Language Model (LLM). Model Providers are entities that develop, train, and distribute foundational and fine-tuned AI models for various applications. They create the underlying AI capabilities that other actors build upon. Model Providers are responsible for model architecture, training methodologies, performance characteristics, and documentation of capabilities and limitations. They operate at the foundation layer of the AI stack and may provide direct API access to their models. Examples: OpenAI (GPT, DALL-E, Whisper), Anthropic(Claude), Google(Gemini), Meta(Llama), as well as any customized model.

Orchestrated

Shared Orchestrated Service Provider-Application Provider (Shared OSP-AP)

The OSP and AP are jointly responsible and accountable for the design, development, implementation, and enforcement of the control to mitigate security, privacy, or compliance risks associated with Large Language Model (LLM)/GenAI technologies in the context of the services or products they develop and offer.

Application

Owned by the Application Provider (AP)

The Application Provider (AP) is responsible for the design, development, implementation, and enforcement of the control to mitigate security, privacy, or compliance risks associated with Large Language Model (LLM)/GenAI technologies in the context of the services or products they develop and offer. The AP is responsible and accountable for the implementation of the control within its own infrastructure/environment. If the control has downstream implications on the users/customers, the AP is responsible for enabling the customer and/or upstream partner in the implementation/configuration of the control within their risk management approach. The AP is accountable for carrying out the due diligence on its upstream providers (e.g MPs, Orchestrated Services) to verify that they implement the control as it relates to the service/product develop and offered by the AP. These providers build and offer end-user applications that leverage generative AI models for specific tasks such as content creation, chatbots, code generation, and enterprise automation. These applications are often delivered as software-as-a-service (SaaS) solutions. These providers focus on user interfaces, application logic, domain-specific functionality, and overall user experience rather than underlying model development. Example: OpenAI (GPTs,Assistants), Zapier, CustomGPT, Microsoft Copilot (integrated into Office products), Jasper (AI-driven content generation), Notion AI (AI-enhanced productivity tools), Adobe Firefly (AI-generated media), and AI-powered customer service solutions like Amazon Rufus, as well as any organization that develops its AI-based application internally.

Implementation guidelines

[Applicable to all providers (CSP, MP, OSP, AP) excluding AIC unless otherwise specified]
1. Establish provider-specific policy scopes for securing cache systems in LLMs. Ensure policies address encryption of cached data follow CEK controls, enforcement of strict access controls, and mechanisms to prevent retention of sensitive information in fast memory (e.g., clearing caches after use) across all LLM-related operations.

2. Define roles for overseeing cache security measures, involving cross-functional teams (e.g., security engineers, AI developers, compliance). Set approval workflows with senior management and security leads to ensure alignment with organizational data protection standards and risk management goals.

3. Create structured documentation standards for cache security policies, including procedures for encryption (e.g., AES-256), access control implementation (e.g., RBAC), and data retention mitigation (e.g., automatic cache expiry). Use templates to document configurations, access logs, and compliance checks.

4. Implement a review process for cache security policies. Conduct reviews at least annually or after significant changes (e.g., new caching mechanisms, updated LLM workloads, identified vulnerabilities). Align with standards like NIST 800-53 (SC-28 Data at Rest), OWASP Data Protection Guidelines, and AI-specific frameworks like NIST AI RMF, where applicable.

5. Define requirements for communicating cache security policies: distribute formal documentation, mandate training for teams managing LLMs (e.g., developers, operators), and run awareness campaigns on data protection risks. Ensure accessibility via internal portals and comprehension across teams.

6. Set policies for quality assurance in cache security, including requirements for encryption validation (e.g., key management audits), access control testing (e.g., penetration tests), and retention mitigation checks (e.g., memory forensics). Require audit logs for cache access and clearing events to detect unauthorized access or poisoning attempts.

Auditing guidelines

Focus: The Cloud Service Provider/AI Processing Infrastructure Provider has implemented effective security measures to protect caches in their AI-optimized infrastructure, ensuring both performance optimization and protection of customer workloads, training data, and model artifacts.

1. Inquiry with Control Owners

1.1 Interview Infrastructure and Security Leadership: Interview cloud architects, hardware engineers, and security specialists responsible for AI infrastructure deployment and cache implementation. Obtain and review the organization's caching strategy and security policies covering GPU/TPU memory caches, distributed training caches, model weight storage systems, high-performance storage caches, and hardware-accelerated data pipelines, verify documented security requirements exist for multi-tenant cache isolation, hardware-level cache security, accelerator memory protection, shared storage cache isolation, and secure resource allocation for high-performance caches.

1.2 Review Caching Implementation Details: Examine documentation describing the technical implementation of caching within the AI infrastructure, including GPU/TPU memory management, high-bandwidth memory (HBM) allocation, NVMe and persistent memory caches, distributed file system caches, accelerator on-chip caches, and network fabric cache systems, assess how customer workload isolation is maintained in shared accelerator environments, how training data caches are protected, and how specialized AI hardware caches are secured to prevent data leakage between tenants.

1.3 Assess Hardware-Level Cache Security: Review mechanisms implementing security for hardware-level caches used in AI computing, including GPU/TPU memory isolation, accelerator cache partitioning, NVMe controller security features, CPU cache isolation for AI workloads, and hardware-assisted memory protection. Evaluate how shared accelerator memory is protected between customer workloads, how persistent cache systems maintain isolation, and how hardware-level cache coherence mechanisms preserve security boundaries.

1.4 Evaluate Infrastructure Cache Management: Review procedures for AI infrastructure cache lifecycle management, including cache clearing between customer workloads, secure reallocation of accelerator memory, monitoring of cache utilization patterns, and isolation verification during hardware 
maintenance. Assess monitoring systems for detecting cache-based side-channels, memory residency attacks, accelerator resource contention, and cache poisoning attempts within shared AI infrastructure.

2. Obtaining and Verifying the Population of Records

2.1 Define the Complete Population of Cache Components: Obtain a comprehensive inventory of caching mechanisms within the AI infrastructure, including GPU/TPU memory caches, distributed training communication caches, hardware accelerator on-chip caches, high-performance storage caches, interconnect and network fabric caches, compute node memory caches, and model weight distribution systems. Include specialized hardware caches, driver-level caching mechanisms, and firmware cache implementations used throughout the AI computing infrastructure.

2.2 Verify Population Completeness: Cross-reference the cache inventory against infrastructure architecture documentation, hardware specifications, driver documentation, firmware configurations, and performance optimization strategies. Ensure the inventory aligns with available AI accelerator types, storage architectures, network fabrics, and compute resource specifications to confirm completeness of cache component identification across the cloud provider's AI infrastructure.

2.3 Categorize Cache Components by Risk Level: Segment the cache component population based on shared usage between customers, hardware proximity to customer workloads, persistence characteristics, memory technology used, performance criticality, data sensitivity exposure, and potential impact if compromised. This risk-based categorization should guide the depth and frequency of security assessment for each infrastructure caching component.

3. Inspection of Evidence

3.1 Cache Implementation Security Review: Select a representative sample of AI infrastructure caching mechanisms based on risk levels and verify the implementation of security controls, tenant isolation measures, and access restrictions. Examine cache configurations in hardware accelerators, storage systems, and network fabrics. Verify memory isolation for GPU/TPU workloads, implementation of cache partitioning between tenants, and access control enforcement for persistent cache resources.

3.2 Multi-Tenant Isolation Assessment: Review evidence of tenant isolation measures for infrastructure caches, including hardware virtualization boundaries, 
memory address translation controls, hypervisor-enforced isolation, physically separated cache resources where applicable, and cache flushing between tenant 
allocations. Evaluate how accelerator memory is protected from unauthorized access across tenant boundaries, how storage caches maintain isolation, and how hardware-level caches prevent information leakage in multi-tenant environments.

3.3 Cache Clearing and Resource Reallocation: Verify implementation of cache clearing mechanisms triggered by resource reallocation, tenant transitions, hardware maintenance events, and security incidents. Assess memory scrubbing procedures for GPU/TPU resources, cache flushing protocols for persistent storage, and verification procedures to confirm complete removal of customer data from cache systems before reallocation to different tenants.

3.4 Hardware Accelerator Cache Controls: Assess controls for hardware accelerator cache security including memory context isolation, secure virtualization of accelerator caches, resource partitioning techniques, driver-level security controls, and firmware-enforced boundaries. Evaluate hardware-specific security features such as Multi-Instance GPU (MIG) isolation, TPU slice protection, and dedicated hardware resources for sensitive workloads.

3.5 Protection Against Infrastructure Cache Attacks: Examine the implementation of protections against cache timing side-channels, accelerator memory residency attacks, cache coherence exploits, and other hardware-level cache vulnerabilities. Assess hypervisor controls for cache isolation, memory page coloring techniques, cache flushing frequency calibration, and other mitigations for hardware cache vulnerabilities in shared AI infrastructure.

3.6 Storage Cache Security: Review protections for high-performance storage cache systems, including secure cache allocation in distributed file systems, NVMe cache isolation, persistent memory protection, and secure block cache management. Assess how cached storage blocks are protected between customer workloads, how distributed cache coherence protocols maintain security boundaries, and how caching policies prevent unauthorized data access.

3.7 Network Fabric and Interconnect Caches: Verify the security of network fabric caches and interconnect buffers, including isolation between tenant traffic flows, secure RDMA caching mechanisms, network buffer protection, and high-speed interconnect security features. Evaluate how network-level caches maintain separation between customer data paths while enabling high-throughput, low-latency communication for distributed AI workloads.

4. Evaluation and Reporting

4.1 Cache Security Effectiveness Assessment: Evaluate how well cache security implementations protect customer workloads and data while maintaining infrastructure performance. Assess the balance between caching for compute efficiency and appropriate security controls based on hardware sharing models. Evaluate the effectiveness of defenses against unauthorized access, data leakage between tenants, and cache-targeted attacks across the AI computing infrastructure.

4.2 Hardware Isolation Strategy Assessment: Assess the effectiveness of cache isolation strategies based on infrastructure architecture, hardware capabilities, virtualization technologies, and performance requirements for AI workloads. Evaluate whether security controls provide appropriate isolation given hardware constraints and whether defense-in-depth is implemented for the most sensitive shared cache resources.

4.3 Documentation and Process Adequacy: Evaluate the quality of cache-related security documentation, including clarity of hardware caching architecture, completeness of security controls, cache clearing procedures, and incident response workflows. Assess whether documentation is maintained as new 
accelerator hardware is deployed and as caching strategies evolve to support emerging AI computing requirements.

4.4 Continuous Improvement Mechanisms: Evaluate processes for improving cache security through regular security testing, incorporation of lessons learned from incidents, adaptation to new hardware technologies, security architecture reviews, and vulnerability management. Assess whether the organization demonstrates a commitment to continuously enhancing cache protection as new AI accelerator technologies are deployed and as understanding of hardware-level vulnerabilities evolves.

Standards mappings

ISO 42001Partial Gap

42001: 5.2 - AI policy
42001: B.2.2 - AI policy
42001: B.2.3 - Alignment with other organizational policies
42001: A.6.2 - AI System life cycle
27001: A.8.10 Information deletion
27001: A.8.12 Data leakage prevention
27001: A.8.24 Use of cryptography
27001: A.8.26 Application security requirements
27001: A.8.27 - Secure system architecture and engineering principles

Addendum

Introduce explicit technical protections for inference-time memory and cache data.

EU AI ActFull Gap

No Mapping

Addendum

Requirements for protecting runtime memory and caching mechanisms used in GenAI systems. This includes safeguarding prompts, tokens, inference outputs, and contextual memory from unauthorized access or leakage. Such provisions should apply regardless of risk classification, be based on a data sensitivity assessment, and be integrated into both documentation and lifecycle management.

NIST AI 600-1No Gap

MS-2.7-001

Addendum

N/A

BSI AIC4No Gap

C4 DM-02
C5 COS-01

Addendum

N/A

AI-CAIQ questions (1)

AIS-14.1

Are security measures implemented to protect cache systems in GenAI systems and services?