DSP · Data Security and Privacy Lifecycle Management

DSP-23AI-Specific

Data Integrity Check

Specification

Regularly validate the consistency and conformity of training, fine-tuning or augmentation data. Implement dataset versioning to ensure traceability and enforce restrictions to prevent unauthorized changes.

Threat coverage

Model manipulation

Data poisoning

Sensitive data disclosure

Model theft

Model/Service Failure

Insecure supply chain

Insecure apps/plugins

Denial of Service

Loss of governance

Architectural relevance

Physical infrastructure

Network

Compute

Storage

Application

Data

Lifecycle

Preparation

Data curation, Data storage

Development

Training, Guardrails

Evaluation

Validation/Red Teaming

Deployment

Orchestration, AI Services supply chain

Delivery

Operations, Continuous monitoring

Retirement

Data deletion, Archiving

Ownership / SSRM

Owned by the Customer (AIC)

The Customer (AIC) is responsible and accountable for the design, development, implementation, and enforcement of the control to mitigate security, privacy, or compliance risks associated with Large Language Model (LLM)/GenAI technologies services or products they consume.

Model

Owned by the Customer (AIC)

Orchestrated

Owned by the Customer (AIC)

Application

Owned by the Customer (AIC)

Implementation guidelines

[All Actors]
1. Data Consistency and Conformity Checks
a. Implement a process for validation of datasets to ensure schema consistency, format conformity and label accuracy.
b. Define data quality metrics (missing values, duplication rate, label distribution) and monitor deviations over time.

2. Dataset Versioning 
a. Maintain change logs to document changes along with rationale and approver history.
b. Maintain dataset version linkage to corresponding model version and training configuration.

3. Access Control
a. Implement role based access controls to ensure access is limited to authorized personnel.
b. Implement approval workflows for any modification to AI datasets.

4. Monitoring 
a. Conduct periodic reviews to detect drift, corruption or misuse of AI datasets.

Auditing guidelines

1. Verify that all data sources handled by the infrastructure services are identified and traceable.

2. Verify that logging systems track all changes or updates to data processed or stored on infrastructure platforms.

3. Verify that automated integrity monitoring tools are implemented at the infrastructure layer to detect anomalies.

4. Verify that infrastructure access controls prevent unauthorized data modifications.

5. Verify that encryption is enforced for sensitive data at rest and in transit within infrastructure systems.

6. Verify that version control tracks changes to datasets and AI models managed by the infrastructure.

7. Verify that infrastructure staff are trained on data integrity best practices and system controls.

8. Verify that documented procedures address data integrity incidents occurring within infrastructure services.

Standards mappings

ISO 42001No Gap

42001: A.5.2 AI system impact assessment process
42001: A.6.1.3 Processes for responsible design and development of AI systems
42001: 6.3 Planning of changes
42001: A.7.2 Data for development and enhancement of AI system
42001: A.7.3 Acquisition of data
42001: 7.5.3 Control of documented information

Addendum

N/A

EU AI ActNo Gap

Article 10 (2)
Article 11 (1)
Article 15
Recital 67

Addendum

N/A

NIST AI 600-1No Gap

GV-6.1-008
MS-2.5-005
MS-2.8-003
MG-4.1-006

Addendum

N/A

BSI AIC4No Gap

DQ-03

Addendum

N/A

AI-CAIQ questions (2)

DSP-23.1

Is the consistency and conformity of training, fine-tuning or augmentation data regularly validated?

DSP-23.2

Is dataset versioning to ensure traceability implemented and are restrictions to prevent unauthorized changes, enforced?