DSP · Data Security and Privacy Lifecycle Management

DSP-20Cloud & AI Related

Data Provenance and Transparency

Specification

Define, implement and evaluate processes, procedures and technical measures to: 1) Document and trace data sources, and 2) Make the data source available according to legal and regulatory requirements

Threat coverage

Model manipulation

Data poisoning

Sensitive data disclosure

Model theft

Model/Service Failure

Insecure supply chain

Insecure apps/plugins

Denial of Service

Loss of governance

Architectural relevance

Physical infrastructure

Network

Compute

Storage

Application

Data

Lifecycle

Preparation

Data collection, Data curation, Data storage

Development

Design

Evaluation

Validation/Red Teaming

Deployment

Orchestration

Delivery

Operations, Continuous improvement

Retirement

Archiving, Data deletion

Ownership / SSRM

Shared across the supply chain

Shared control ownership refers to responsibilities and activities related to LLM security that are distributed across multiple stakeholders within the AI supply chain, including the Cloud Service Provider (CSP), Model Provider (MP), Orchestrated Service Provider (OSP), Application Provider (AP), and Customer (AIC). These controls require coordinated actions, communication, and governance across all involved parties to ensure their effectiveness.

Model

Shared Cloud Service Provider-Model Provider (Shared CSP-MP)

The CSP and MP are jointly responsible and accountable for the design, development, implementation, and enforcement of the control to mitigate security, privacy, or compliance risks associated with Large Language Model (LLM)/GenAI technologies in the context of the services or products they develop and offer.

Orchestrated

Shared Model Provider-Orchestrated Service Provider (Shared MP-OSP)

The MP and OSP are jointly responsible and accountable for the design, development, implementation, and enforcement of the control to mitigate security, privacy, or compliance risks associated with Large Language Model (LLM)/GenAI technologies in the context of the services or products they develop and offer.

Application

Shared Application Provider-AI Customer (Shared AP-AIC)

The AP and AIC both share responsibility and accountability for the design, development, implementation, and enforcement of the control to mitigate security, privacy, or compliance risks associated with Large Language Model (LLM)/GenAI technologies in the context of the services or products they offer and consume.

Implementation guidelines

[Applicable to all providers]
1. Establish standards and guidelines to track the origin, history, and transformations of data. This should include tracking data from its source through all stages of processing, storage and usage.

2. Identify and document all data sources, along with associated metadata and contextual information.

3. Implement mechanisms to automatically track data lineage and ensure data integrity through immutable logs and data versioning. Use digital signatures, checksums, or hash values to validate the integrity of incoming data.

4. Implement mechanisms to track changes and updates to data.

5. Establish and implement processes to make data sources available according to legal and regulatory requirements.

6. Implement access controls to ensure only authorized personnel can access data.

7. Establish and implement measures to track and manage consent and licensing agreements for data use.

8. Implement data catalogs to manage and audit data access and usage.

9. Establish and implement data handling processes in accordance with data protection laws and regulations, such as GDPR, HIPAA data traceability, AI model traceability under the EU AI act.

10. Perform regular audits to ensure compliance with legal and regulatory requirements.

11. Establish processes to inform stakeholders about data access policies and any changes to them.

12. Document all data transformations, such as imputation, normalization, and encoding.

13. Implement procedures to assign data stewardship roles and implement automated tracking systems.Protect audit logs from unauthorized access or tampering.

14. Implement monitoring and anomaly detection to enforce compliance and adjust processes.

15. Document data gaps and shortcomings that prevent compliance with regulations.

16. Implement data provenance checks as part of data quality and governance audits.

17.Establish mechanisms to integrate provenance tracking with data governance and ETL tools.

18.Implement segregation of duties among users who manage, consume, and audit provenance data to reduce fraud or manipulation risk.

19. Establish a process to securely dispose of provenance records when no longer needed, in alignment with organizational data lifecyle policies.

Auditing guidelines

1. Verify that all infrastructure-level data sources (e.g., logs, metrics, model checkpoints) are documented with source, type, and format.

2. Verify that lineage information is maintained for operational data, showing data flow from ingestion to storage or processing.

3. Verify that dictionaries or schemas exist for metadata and logs captured during AI processing.

4. Verify that provenance tracking includes infrastructure actions such as resource provisioning, access history, and pipeline changes.

5. Verify that automated systems monitor changes to infrastructure datasets and logs (e.g., audit logs, object storage access).

6. Verify that system-level controls ensure the integrity of data and metadata in transit and at rest.

7. Verify that operational processes handle data volume growth, privacy-sensitive logs, and infrastructure-specific complexities.

8. Verify that infrastructure-level data practices comply with laws and cloud service obligations (e.g., data residency, retention).

9. Verify that encryption and granular access controls are in place for all customer and operational data.

10. Verify that data cleanup and deletion are performed according to retention schedules.

11. Verify that infrastructure personnel are trained in secure data lifecycle handling and monitoring practices.

12. Verify that metadata about AI data processing and resource usage can be produced for audit or forensic purposes.

13. Verify that versioning is applied to deployment templates, pipelines, and logs associated with infrastructure-level AI workloads.

14. Verify that disclosure protocols are defined for infrastructure data and comply with legal and contractual frameworks.

Standards mappings

ISO 42001No Gap

42001: A.2.3 Alignment with other organizational policies
42001: A.4.3 Data Resources
42001: A.5.4 Assessing AI system impact on individuals or groups of individuals
42001: A.7.5 Data Provenance
42001: A.8.2 System documentation and information for users
42001: A.9.3 Objectives for responsible use of AI system
27001: A.5.9 Inventory of Information and Other Associated Assets
27001: A.5.12 Classification of Information
27001: A.5.13 Labelling of Information
27001: A.5.14 Information Transfer
27001: A.5.19 Information Security in Supplier Relationships
27001: A.5.28 Collection of Evidence
27001: A.5.15 Access Control
27001: A.8.11 Data masking
27001: A.8.12 Data leakage prevention
27001: A.8.15 Logging
27001: A.8.16 Monitoring Activities
27002: 5.9 Inventory of Information and Other Associated Assets
27002: 5.12 Classification of Information
27002: 5.13 Labelling of Information
27002: 5.14 Information Transfer
27002: 5.19 Information Security in Supplier Relationships
27002: 5.28 Collection of Evidence
27002: 5.15 Access Control
27002: 8.11 Data masking
27002: 8.12 Data leakage prevention
27002: 8.15 Logging
27002: 8.16 Monitoring Activities

Addendum

N/A

EU AI ActNo Gap

Article 10 (2)
Article 11 (1)

Addendum

N/A

NIST AI 600-1Partial Gap

GV-1.1-001
GV-1.6-003
MP-2.2-001
MG-2.2-002
MG-3.2-003

Addendum

NIST AI 600-1 only covers training data for #1 of the DSP-20 topic of "tracing."

BSI AIC4No Gap

BC-05
DM-03
DQ-02

Addendum

N/A

AI-CAIQ questions (1)

DSP-20.1

Are processes, procedures, and technical measures defined, implemented, and evaluated to: 1) Document and trace data sources, and 2) Make the data source available according to legal and regulatory requirements