1. Architecture Proposal

Executive summary

This document presents a pragmatic approach to protecting customer Personally Identifiable Information (PII), designed for a retail/e-commerce business with over one million customer records, focused on preventing data leaks and tracing their source when incidents occur.

Unlike building a full PII Vault (costly, high-risk, long timeline) or buying a commercial product (license cost, vendor lock-in), this hybrid approach leverages the database’s built-in security features (encryption, masking, access control) and adds a centralized audit and access-control layer to address the core pain point: today nobody knows who read which customer’s data, when, and why.

The problem

Three observed symptoms share one root cause:

Exposed and scattered data: PII (name, phone, email, address) stored in plaintext, scattered across CRM, order DB, logs, manual Excel exports, backups, partner services.
No access log: When a leak occurs, there is no evidence to identify the source. Every investigation is “groping in the dark.”
No purpose-based access control: Too many people and services can read almost everything; no record of why a record was accessed.

Legal implication: Decree 13/2023/ND-CP on personal data protection requires processing data for the correct purpose, with control and traceability. The lack of logs and access control makes compliance hard to demonstrate during audits.

Hybrid architecture overview

The solution has two complementary pillars.

Pillar A — DB-native + Masking (data layer)

TDE (Transparent Data Encryption): encrypts all data at rest.
Column / Field-level Encryption: encrypts sensitive columns with separately managed keys.
Dynamic Data Masking: masks data by role at query time.
Row-Level Security (RLS): restricts each role to rows in its scope.

Pillar B — Centralized Audit & Access-Control layer

The biggest differentiator. Every query touching PII is recorded by a central layer, leaving an immutable trail:

Access gateway / Data Access Layer: apps read PII via a shared service instead of querying sensitive tables directly.
Purpose binding: every PII read request must include a reason.
Immutable audit log (append-only, hash-chained): each access records who · what · when · why · result.
RBAC + least-privilege, default-deny: sensitive operations require “four-eyes” approval.
Real-time anomaly detection & alerting.

Why this hybrid approach

Criterion	Build Vault	Buy product	Hybrid
Upfront cost	High	Medium–high	Low
Time to deploy	Slow (12+ months)	Medium	Fast (per quarter)
Crypto error risk	High (self-borne)	Low	Low
Leak prevention & tracing	Yes	Yes	Yes (focus)
Data-in-use protection	Yes (if done right)	Strong	Limited
Vendor lock-in	No	High	Low
Data residency (VN)	Self-controlled	Needs confirmation	Self-controlled

Positioning conclusion: for the top goal of leak prevention and tracing, the hybrid achieves most value at the lowest cost and risk, while keeping full control of data domestically.

Phased roadmap summary

Phase	Focus	Key outcome
P1	Survey & data mapping	PII map, classification matrix
P2	Centralized audit (top priority)	Every PII access leaves an immutable trail
P3	Encryption (TDE + column) & masking	Leaked DB/backup no longer leaks real data
P4	Tighten access control	Least-privilege, default-deny
P5	Monitoring & anomaly detection	Early detection; minutes to investigate
P6	Compliance, audit, operations	Decree 13/2023 compliance evidence