TVM-11AI-Specific

Guardrails

Specification

Define and implement processes, procedures and technical measures to apply guardrails to the AI system. Continuously evaluate guardrails for changes in regulatory requirements and risk scenarios.

Threat coverage

Model manipulation

Data poisoning

Sensitive data disclosure

Model theft

Model/Service Failure

Insecure supply chain

Insecure apps/plugins

Denial of Service

Loss of governance

Architectural relevance

Physical infrastructure

Network

Compute

Storage

Application

Data

Lifecycle

Preparation

Data storage

Development

Guardrails

Evaluation

Validation/Red Teaming, Re-evaluation

Deployment

Orchestration, AI Services supply chain, AI applications

Delivery

Operations, Maintenance, Continuous monitoring, Continuous improvement

Retirement

Not applicable

Ownership / SSRM

Shared Cloud Service Provider-Model Provider (Shared CSP-MP)

The CSP and MP are jointly responsible and accountable for the design, development, implementation, and enforcement of the control to mitigate security, privacy, or compliance risks associated with Large Language Model (LLM)/GenAI technologies in the context of the services or products they develop and offer.

Model

Owned by the Model Provider (MP)

The model provider (MP) designs, develops, and implements the control as part of their services or products to mitigate security, privacy, or compliance risks associated with the Large Language Model (LLM). Model Providers are entities that develop, train, and distribute foundational and fine-tuned AI models for various applications. They create the underlying AI capabilities that other actors build upon. Model Providers are responsible for model architecture, training methodologies, performance characteristics, and documentation of capabilities and limitations. They operate at the foundation layer of the AI stack and may provide direct API access to their models. Examples: OpenAI (GPT, DALL-E, Whisper), Anthropic(Claude), Google(Gemini), Meta(Llama), as well as any customized model.

Orchestrated

Shared Model Provider-Orchestrated Service Provider (Shared MP-OSP)

The MP and OSP are jointly responsible and accountable for the design, development, implementation, and enforcement of the control to mitigate security, privacy, or compliance risks associated with Large Language Model (LLM)/GenAI technologies in the context of the services or products they develop and offer.

Application

Owned by the Application Provider (AP)

The Application Provider (AP) is responsible for the design, development, implementation, and enforcement of the control to mitigate security, privacy, or compliance risks associated with Large Language Model (LLM)/GenAI technologies in the context of the services or products they develop and offer. The AP is responsible and accountable for the implementation of the control within its own infrastructure/environment. If the control has downstream implications on the users/customers, the AP is responsible for enabling the customer and/or upstream partner in the implementation/configuration of the control within their risk management approach. The AP is accountable for carrying out the due diligence on its upstream providers (e.g MPs, Orchestrated Services) to verify that they implement the control as it relates to the service/product develop and offered by the AP. These providers build and offer end-user applications that leverage generative AI models for specific tasks such as content creation, chatbots, code generation, and enterprise automation. These applications are often delivered as software-as-a-service (SaaS) solutions. These providers focus on user interfaces, application logic, domain-specific functionality, and overall user experience rather than underlying model development. Example: OpenAI (GPTs,Assistants), Zapier, CustomGPT, Microsoft Copilot (integrated into Office products), Jasper (AI-driven content generation), Notion AI (AI-enhanced productivity tools), Adobe Firefly (AI-generated media), and AI-powered customer service solutions like Amazon Rufus, as well as any organization that develops its AI-based application internally.

Implementation guidelines

[All Actors]
1. Identify critical guardrail requirements, by reviewing applicable regulatory standards (e.g., GDPR) and internal policies and by assessing potential impact on safety, privacy, fairness and transparency.

2. Implement Technical Safeguards:
a. Develop Input and Output Validation Mechanisms:
- Ensure the AI model processes only authorized inputs and generates outputs within pre-defined acceptable parameters.
- Apply filters to flag or reject anomalous inputs or outputs, particularly in cases where outputs could be harmful.
- Leverage canary tokens to detect potential prompt leakage attacks and block responses containing unauthorized system instructions.
b. Set Thresholds for Model Confidence:
- Establish a threshold confidence level below which the AI system must defer to human intervention or request additional validation.
- Log instances when outputs fall below the threshold for future analysis and system improvement.
c. Integrate Risk-Based Guardrails:
- Apply risk-scoring methods to outputs, adjusting system responses accordingly based on risk level (e.g., higher scrutiny for high-impact decisions).

3. Develop Fail-Safe Mechanisms:
a. Implement Emergency Stop Mechanisms:
- Include a shutdown feature that can be triggered automatically in response to detected anomalies, or manually by a human supervisor.
- Ensure that shutdowns do not lead to data loss and create recovery checkpoints where possible.
b. Set Up Recovery Protocols:
- Establish recovery protocols that enable safe system restart following a shutdown, ensuring that unresolved issues are addressed before the system is reactivated.

4. Establish Governance and Monitoring Procedures:
a. Assign a Guardrail Review Team:
- Designate a team responsible for reviewing guardrail effectiveness, responding to flagged issues, and recommending guardrail adjustments based on system performance and risk assessment.
- Ensure the team includes personnel capable of intervening and making decisions in case of system failures or unexpected outputs.
b. Continuously Monitor System Performance and Risk Indicators:
- Implement continuous monitoring tools to track system performance, accuracy, and risk indicators in real time.
- Set up automated alerts to notify the guardrail review team of anomalies or potential risks.
- Include human-in-the-loop (HITL) mechanisms for reviewing flagged anomalies or when the system reaches critical decision thresholds.

5. Define Operational Protocols for Guardrail Adjustments:
a. Regularly Review and Update Guardrails:
- Conduct regular reviews of guardrails, considering changes in regulatory requirements, new risk scenarios, and feedback from system monitoring.
- Ensure that human oversight is involved when adjustments to guardrails are necessary, particularly when dealing with high-impact or high-risk decisions.
b. Document Adjustments and Interventions:
- Maintain a log of all guardrail adjustments and system interventions, including the rationale for changes and the outcome of each intervention.
- Conduct periodic audits to evaluate the effectiveness of guardrails and identify opportunities for improvement.

Auditing guidelines

1. Verify that the Cloud Services Provider (CSP) has defined processes, procedures, and technical measures to apply guardrails to the AI system. Ensure that the processes are documented in detail, covering scope, objectives, roles and responsibilities.

2. Examine whether the above-mentioned processes, procedures, and technical measures are compliant with relevant regulatory requirements and industry best practices.

3. Confirm that the above-mentioned processes, procedures, and technical measures are concretely and appropriately implemented.

4. Inspect whether the above-mentioned processes, procedures, and technical measures are monitored against sets of efficacy and efficiency metrics / indicators.

5. Inspect whether the above-mentioned processes, procedures, and technical measures are periodically reviewed and updated by responsible parties.

Standards mappings

ISO 42001No Gap

42001: A.6.1 Risk assessment for AI systems
42001: A.6.2.6 AI system Operation and monitoring
42001: A.6.3.2 Planning of AI-specific controls
42001: A.7.4 Quality of data for AI Systems
42001: A.8.3 AI system impact assessment
27001: 5.7 Threat Intelligence
27001: 8.8 Management of technical vulnerabilities
27001: 9.1 Monitoring
measurement
analysis and evaluation
27002: 8.12 Secure software development

Addendum

N/A

EU AI ActNo Gap

Article 9 (5) (a)
Article 15 (1)
Article 15 (2)

Addendum

N/A

NIST AI 600-1No Gap

MS-2.5-006

Addendum

N/A

BSI AIC4No Gap

C4 SR-02
C4 SR-05
C4 SR-06

Addendum

N/A

AI-CAIQ questions (2)

TVM-11.1

Are processes, procedures, and technical measures to apply guardrails to the AI system defined and implemented?

TVM-11.2

Are guardrails continuously evaluated for changes in regulatory requirements and risk scenarios?