Data Provenance and Transparency
Specification
Define, implement and evaluate processes, procedures and technical measures to: 1) Document and trace data sources, and 2) Make the data source available according to legal and regulatory requirements
Threat coverage
Architectural relevance
Lifecycle
Data collection, Data curation, Data storage
Design
Validation/Red Teaming
Orchestration
Operations, Continuous improvement
Archiving, Data deletion
Ownership / SSRM
PI
Shared across the supply chain
Shared control ownership refers to responsibilities and activities related to LLM security that are distributed across multiple stakeholders within the AI supply chain, including the Cloud Service Provider (CSP), Model Provider (MP), Orchestrated Service Provider (OSP), Application Provider (AP), and Customer (AIC). These controls require coordinated actions, communication, and governance across all involved parties to ensure their effectiveness.
Model
Shared Cloud Service Provider-Model Provider (Shared CSP-MP)
The CSP and MP are jointly responsible and accountable for the design, development, implementation, and enforcement of the control to mitigate security, privacy, or compliance risks associated with Large Language Model (LLM)/GenAI technologies in the context of the services or products they develop and offer.
Orchestrated
Shared Model Provider-Orchestrated Service Provider (Shared MP-OSP)
The MP and OSP are jointly responsible and accountable for the design, development, implementation, and enforcement of the control to mitigate security, privacy, or compliance risks associated with Large Language Model (LLM)/GenAI technologies in the context of the services or products they develop and offer.
Application
Shared Application Provider-AI Customer (Shared AP-AIC)
The AP and AIC both share responsibility and accountability for the design, development, implementation, and enforcement of the control to mitigate security, privacy, or compliance risks associated with Large Language Model (LLM)/GenAI technologies in the context of the services or products they offer and consume.
Implementation guidelines
Auditing guidelines
1. Verify that all infrastructure-level data sources (e.g., logs, metrics, model checkpoints) are documented with source, type, and format. 2. Verify that lineage information is maintained for operational data, showing data flow from ingestion to storage or processing. 3. Verify that dictionaries or schemas exist for metadata and logs captured during AI processing. 4. Verify that provenance tracking includes infrastructure actions such as resource provisioning, access history, and pipeline changes. 5. Verify that automated systems monitor changes to infrastructure datasets and logs (e.g., audit logs, object storage access). 6. Verify that system-level controls ensure the integrity of data and metadata in transit and at rest. 7. Verify that operational processes handle data volume growth, privacy-sensitive logs, and infrastructure-specific complexities. 8. Verify that infrastructure-level data practices comply with laws and cloud service obligations (e.g., data residency, retention). 9. Verify that encryption and granular access controls are in place for all customer and operational data. 10. Verify that data cleanup and deletion are performed according to retention schedules. 11. Verify that infrastructure personnel are trained in secure data lifecycle handling and monitoring practices. 12. Verify that metadata about AI data processing and resource usage can be produced for audit or forensic purposes. 13. Verify that versioning is applied to deployment templates, pipelines, and logs associated with infrastructure-level AI workloads. 14. Verify that disclosure protocols are defined for infrastructure data and comply with legal and contractual frameworks.
Standards mappings
42001: A.2.3 Alignment with other organizational policies 42001: A.4.3 Data Resources 42001: A.5.4 Assessing AI system impact on individuals or groups of individuals 42001: A.7.5 Data Provenance 42001: A.8.2 System documentation and information for users 42001: A.9.3 Objectives for responsible use of AI system 27001: A.5.9 Inventory of Information and Other Associated Assets 27001: A.5.12 Classification of Information 27001: A.5.13 Labelling of Information 27001: A.5.14 Information Transfer 27001: A.5.19 Information Security in Supplier Relationships 27001: A.5.28 Collection of Evidence 27001: A.5.15 Access Control 27001: A.8.11 Data masking 27001: A.8.12 Data leakage prevention 27001: A.8.15 Logging 27001: A.8.16 Monitoring Activities 27002: 5.9 Inventory of Information and Other Associated Assets 27002: 5.12 Classification of Information 27002: 5.13 Labelling of Information 27002: 5.14 Information Transfer 27002: 5.19 Information Security in Supplier Relationships 27002: 5.28 Collection of Evidence 27002: 5.15 Access Control 27002: 8.11 Data masking 27002: 8.12 Data leakage prevention 27002: 8.15 Logging 27002: 8.16 Monitoring Activities
Addendum
N/A
Article 10 (2) Article 11 (1)
Addendum
N/A
GV-1.1-001 GV-1.6-003 MP-2.2-001 MG-2.2-002 MG-3.2-003
Addendum
NIST AI 600-1 only covers training data for #1 of the DSP-20 topic of "tracing."
BC-05 DM-03 DQ-02
Addendum
N/A
AI-CAIQ questions (1)
Are processes, procedures, and technical measures defined, implemented, and evaluated to: 1) Document and trace data sources, and 2) Make the data source available according to legal and regulatory requirements