REDACT-PII · BEFORE → AFTER

客户数据脱敏 · 真实流水线产物 Customer data redaction — real local pipeline output

用 Microsoft Presidio 在本地把混合输入(TXT / CSV / PDF / 图片)一次脱敏。无任何 LLM 调用、无远端推理、无商业兜底转发。左侧是真实输入,右侧是脱敏副本;每个 PLACEHOLDER_NNN 在同一批次内跨文件保留同一 id。

PERSON 154US_BANK_NUMBER 120ORGANIZATION 79EMAIL_ADDRESS 78DATE_TIME 78US_DRIVER_LICENSE 60LOCATION 37CN_ID_CARD 15CN_MOBILE_PHONE 15PHONE_NUMBER 5URL 2
№ 01 客户工单 CSV customer-tickets.csv · customer-tickets.csv presidio-structured
before · rawcells flagged
ticket_idcustomer_nameemailphoneaddresscredit_cardissue_summaryagent_notes
TIX-1000Charles Garciacjiang@example.net1817211755065423 Garcia Light West Melanieview…3596001338908389billing dispute on last invoiceSpoke with Charles, called from 181…
TIX-1001何淑珍michael41@example.net14265774428福建省佛山县孝南贺街c座 664139341034131647528subscription renewal failedSpoke with 何淑珍, called from 1426577…
TIX-1002Maria Thomashernandezernest@example.net184227133326978 Maddox Wall Lake Jenniferside,…4884969653287101billing dispute on last invoiceSpoke with Maria, called from 18422…
TIX-1003李淑兰yangmin@example.com18419026831吉林省秀珍市清河兰州路W座 9278243582814893252886billing dispute on last invoiceSpoke with 李淑兰, called from 1841902…
TIX-1004Jennifer Powersadrianzimmerman@example.org136798834811834 Lewis Curve Sanchezfort, AS 53…378713315098394password reset not arrivingSpoke with Jennifer, called from 13…
after · redactedsame schema
ticket_idcustomer_nameemailphoneaddresscredit_cardissue_summaryagent_notes
ORGANIZATIONPERSONEMAIL_ADDRESSUS_BANK_NUMBERUS_DRIVER_LICENSEUS_BANK_NUMBERbilling dispute on last invoicePERSON
ORGANIZATIONPERSONEMAIL_ADDRESSUS_BANK_NUMBERUS_DRIVER_LICENSEUS_BANK_NUMBERsubscription renewal failedPERSON
ORGANIZATIONPERSONEMAIL_ADDRESSUS_BANK_NUMBERUS_DRIVER_LICENSEUS_BANK_NUMBERbilling dispute on last invoicePERSON
ORGANIZATIONPERSONEMAIL_ADDRESSUS_BANK_NUMBERUS_DRIVER_LICENSEUS_BANK_NUMBERbilling dispute on last invoicePERSON
ORGANIZATIONPERSONEMAIL_ADDRESSUS_BANK_NUMBERUS_DRIVER_LICENSEUS_BANK_NUMBERpassword reset not arrivingPERSON
№ 02 图片 / 扫描件 id-card-mock.png presidio-image-redactor
before · rawraw upload
id-card-mock.png original
after · redactedOCR-detected boxes
id-card-mock.png redacted
№ 03 PDF 文档 resume-sample.pdf analyzer + pypdf rewrite
before · rawname · email · phone · org
Chris Curtis nicholsonclinton@example.net · 3838016613 · linkedin.com/in/chris-curtis 90996 Theresa Cape, West Robert, ND 15940 Summary Senior data engineer with 8 years building privacy-aware pipelines for fintech and healthcare clients. Comfortable owning end-to-end ingestion, from raw ETL to GDPR-compliant data marts. Experience Lead Data Engineer · Northwind Health (Boston, MA) · 2022 — Present Designed claims-data warehouse with column-level PHI masking; led migration of 14 TB Snowflake estate; supervised four engineers. Senior Engineer · Helio Capital (New York, NY) · 2019 — 2022 Owned customer-360 model, integrating CRM, transactions, and support tickets for Parker, Cabrera and White and three other partners. Education M.S. Computer Science, Carnegie Mellon University · 2017 B.S. Statistics, University of Michigan · 2015 References Margaret Watson (former manager) — elliottchristine@example.com — +1-686-957-9754x32319 Monica Roberts (peer) — wilsonrobert@example.com — (776)901-8955
after · redactedlayout preserved
PERSON EMAIL_ADDRESS · 3838016613 · URL DATE_TIME PERSON, LOCATION, ND 15940 Summary Senior data engineer with DATE_TIME building privacy-aware pipelines for ORGANIZATION and healthcare clients. Comfortable owning end-to-end ingestion, from raw ORGANIZATION to ORGANIZATION-compliant data marts. Experience Lead Data Engineer · ORGANIZATION (LOCATION, LOCATION) · DATE_TIME Designed claims-data warehouse with column-level PHI masking; led migration of 14 ORGANIZATION estate; supervised four engineers. Senior Engineer · ORGANIZATION (LOCATION, LOCATION) · DATE_TIME Owned LOCATION model, integrating CRM, transactions, and support tickets for LOCATION, PERSON and PERSON and three other partners. Education ORGANIZATION, ORGANIZATION · DATE_TIME ORGANIZATION, ORGANIZATION · DATE_TIME References PERSON (former manager) — EMAIL_ADDRESS — +1-686-957-9754x32319 PERSON (peer) — EMAIL_ADDRESS — (776)901
№ 04 TXT / 对话 support-chat.txt analyzer + anonymizer
before · rawChinese PII flagged
客服对话记录 · 2026-05-30 14:23 — 16:48
客服坐席:A07 · 渠道:在线工单

[14:23] 客服:您好,请问怎么称呼?
[14:24] 客户:姓名 陈浩,联系电话 15273796172,邮箱 duna@example.com。
[14:25] 客服:核对一下身份证后六位方便我帮您找单子。
[14:26] 客户:身份证 440106197104102325,配送地址是西藏自治区上海市海陵萧路H座 133974。
[14:27] 客服:好的 陈浩 先生/女士,工单已经记录。

[14:25] 客服:您好,请问怎么称呼?
[14:26] 客户:姓名 叶萍,联系电话 14371002967,邮箱 haomin@example.org。
[14:27] 客服:核对一下身份证后六位方便我帮您找单子。
[14:28] 客户:身份证 110101196510104807,配送地址是辽宁省宇县长寿余路W座 718501。
[14:29] 客服:好的 叶萍 先生/女士,工单已经记录。

[14:27] 客服:您好,请问怎么称呼?
[14:28] 客户:姓名 龙慧,联系电话 16392810456,邮箱 bluo@example.net。
[14:29] 客服:核对一下身份证后六位方便我帮您找单子。
[14:30] 客户:身份证 320105197410100873,配送地址是青海省娜县淄川海门街Z座 311426。
[14:31] 客服:好的 龙慧 先生/女士,工单已经记录。

[14:29] 客服:您好,请问怎么称呼?
[14:30] 客户:姓名 陈宁,联系电话 13545673925,邮箱 baili@example.com。
[14:31] 客服:核对一下身份证后六位方便我帮您找单子。
after · redactedstable placeholders
客服对话记录 · DATE_TIME 14:23 — 16:48
客服坐席:PERSON · 渠道:在线工单

[DATE_TIME 客服:您好,请问怎么称呼?
[DATE_TIME 客户:姓名 PERSON,联系电话 CN_MOBILE_PHONE,邮箱 EMAIL_ADDRESS。
[DATE_TIME:核对一下身份证后六位方便我帮您找单子。
[DATE_TIME 客户:身份证 CN_ID_CARD,配送地址是LOCATIONLOCATION海陵萧路H座 133974。
[DATE_TIME:好的 PERSON 先生/女士,工单已经记录。

[DATE_TIME:您好,请问怎么称呼?
[DATE_TIME 客户:姓名 LOCATION,联系电话 CN_MOBILE_PHONE,邮箱 EMAIL_ADDRESS。
[DATE_TIME:核对一下身份证后六位方便我帮您找单子。
[DATE_TIME] 客户:身份证 CN_ID_CARD,配送地址是LOCATIONPERSON县长寿余路W座 718501。
[DATE_TIME:好的 PERSON 先生/女士,工单已经记录。

[DATE_TIME:您好,请问怎么称呼?
[DATE_TIME] 客户:姓名 ORGANIZATION,联系电话 CN_MOBILE_PHONE,邮箱 EMAIL_ADDRESS。
[DATE_TIME:核对一下身份证后六位方便我帮您找单子。
[DATE_TIME 客户:身份证 CN_ID_CARD,配送地址是LOCATIONLOCATION淄川海门街Z座 311426。
[14:31] 客服:好的 ORGANIZATION 先生/女士,工单已经记录。

[DATE_TIME:您好,请问怎么称呼?
[DATE_TIME 客户:姓名 PERSON,联系电话 CN_MOBILE_PHONE,邮箱 EMAIL_ADDRESS。
[14:31] 客服:核对一下身份证后六位方便我帮您找单子。