How we protect Privacy

Here’s a practical way to configure raia so customer data stays inside the guardrails of major privacy laws (GDPR/UK GDPR, CCPA/CPRA), and your AI Agents don’t accidentally leak or over-retain personal data.

1) Identity & access (who can see/do what)

  • Enforce SSO + MFA for all raia admins and builders.

  • Turn on agent-level RBAC: for each Agent, grant the minimum roles and restrict which users/teams can invoke it.

  • Create least-privilege data scopes per Agent (e.g., Support Agent → read-only to tickets; Sales Agent → CRM “leads only”). Why: These align with GDPR’s data-minimisation & integrity/confidentiality principles and UK ICO guidance on proportionality and access control. (ICO)

2) Data ingestion & indexing (what goes into the model context)

  • Field/record filters at the connector: exclude sensitive fields (SSN, bank, health) before ingest.

  • PII redaction pipeline (pre-vectorization): mask emails, phone numbers, national IDs; keep a reversible token only if absolutely required for the use-case.

  • Purpose tags on data sources (e.g., “Support-answering only”) and bind Agents to allowed purposes.

  • Hot glass vs. copy: prefer real-time “read through” to systems of record rather than copying full datasets. Why: Enforces purpose limitation & minimisation under GDPR Art. 5. (GDPR, ICO)

3) Retrieval & prompt guardrails (what the model can pull/use)

  • Retriever allowlists: restrict each Agent to specific indices/collections and tenants/customer IDs.

  • Top-K and domain fences: cap results (e.g., K≤5) and disallow open web unless explicitly approved.

  • Grounding-only mode for regulated Agents: responses must cite retrieved internal docs; otherwise decline.

  • Prompt rules:

    • “Never reveal secrets or raw PII unless the verified user is the data subject and the task requires it.”

    • “If asked for PII, verify identity, check purpose, log the disclosure.” Why: Maps to GDPR transparency & accountability; NIST AI RMF recommends explicit controls around data use. (NIST)

4) Tools & actions (what the Agent is allowed to do)

  • Tool allowlist per Agent (e.g., read-only CRM for Support; no email-send tool unless human-approve).

  • Human-in-the-loop (HITL) for any action that would disclose or move personal data outside the original system.

  • Rate limits & anomaly detection on export/download tools to catch mass exfiltration patterns. Why: ICO’s AI guidance emphasises risk-based controls and proportionate mitigations. (ICO, BDO UK)

5) Output filtering (what leaves the Agent)

  • Response DLP: run an outbound PII scanner; mask sensitive values unless policy & identity checks pass.

  • Safety fallbacks: if the answer requires PII and checks fail, return a “can’t share—here’s how to proceed” template. Why: Supports GDPR integrity/confidentiality & CPRA restrictions around sensitive personal info. (ICO, California Privacy Protection Agency)

6) Logging, retention, and deletion (how long you keep it)

  • Log redaction: store prompts/outputs with PII masked; keep an unredacted version only in your SIEM with strict access.

  • Short retention: set chat/event logs to the minimum you truly need (e.g., 0–30 days) and enable verified deletion workflows.

  • DSR playbooks: for access/correction/deletion requests, tag records by subject ID so you can search & purge across raia and connected systems. Why: GDPR storage-limitation & data-subject rights; CPRA rights to know/correct/delete. (GDPR, ICO, California Privacy Protection Agency)

7) Cross-border data transfers (EU/UK personal data)

  • If you process EU/UK personal data, pin storage/processing in-region where possible.

  • Where transfers are necessary, use the European Commission’s Standard Contractual Clauses (SCCs) and document supplementary measures. Why: GDPR Chapter V requires appropriate safeguards; SCCs are the primary legal tool endorsed by the Commission/EDPB. (European Commission, European Data Protection Board)

8) Vendors & model providers (OpenAI Enterprise, etc.)

  • Use enterprise tiers that do not train on your inputs/outputs by default and allow customer-controlled retention; document these settings in your DPIA.

  • Prefer providers with SOC 2 reports and map their controls to your own. Why: OpenAI Enterprise states it doesn’t train on your business data by default and gives retention controls; SOC 2 demonstrates audited controls over security/confidentiality/privacy. (OpenAI, AICPA & CIMA)

9) Governance & proof (what you show auditors)

  • Maintain a DPIA/LIA per Agent with: purposes, lawful basis, data categories, retention, vendors, transfers, and mitigations.

  • Keep change control + red-team reports for risky Agents; align with NIST AI RMF generative-AI profile.

  • Map your controls to SOC 2 Trust Services Criteria for ongoing assurance. (NIST, Contentful)


Last updated