Skip to main content

Amazon Transcribe can now automatically redact personally identifiable data

Attendees at Amazon's annual cloud computing conference walk past the AWS logo
Image Credit: Reuters, Salvador Rodriguez.

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now


Amazon is adding a new privacy-focused feature to its business transcription service, one that automatically redacts personally identifiable information (PII), such as names, social security numbers, and credit card credentials.

Amazon Transcribe is part of Amazon’s AWS cloud unit and was launched in general availability in 2018. An automatic speech recognition (ASR) service, Transcribe enables enterprise customers to convert speech into text, which can help make audio content searchable from a database, for example. Contact centers can also use the tool to mine call data for insights and sentiment analysis. However, privacy issues have cast a spotlight on how technology companies store and manage consumers’ data.

Privacy

Text-to-speech services can be used to search for keywords and sentiment at a later date, but phone calls often feature significant private data that may be transcribed by Amazon and stored in a searchable database — even if that information is not necessary for analysis. Meanwhile, regulations are springing up around the world to protect consumer data — including the recently implemented California Consumer Privacy Act (CCPA) and Europe’s General Data Protection Regulation (GDPR).

Against this backdrop, Amazon Transcribe will now enable companies to automatically redact personal data, including credit/debit card numbers, expiration dates, CVV codes, PINs, social security numbers, bank account numbers, customer names, email addresses, phone numbers, and postal addresses. It’s worth noting that Google Cloud Platform offers a data loss prevention API that could be used in conjunction with its speech-to-text service to identify and redact sensitive data. But building automated redaction directly into Amazon Transcribe should make the process a lot easier to implement.


AI Scaling Hits Its Limits

Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are:

  • Turning energy into a strategic advantage
  • Architecting efficient inference for real throughput gains
  • Unlocking competitive ROI with sustainable AI systems

Secure your spot to stay ahead: https://bit.ly/4mwGngO


Companies using Amazon Transcribe can use automatic redaction as they see fit and can choose which PII elements they wish to obfuscate. The transcribed text will then display a [PII] tag in place of the sensitive information, and the corresponding timestamps mean anyone with sufficient system access will still be able to locate the necessary PII in the original audio file. This may also prove useful if a company wants to carry out extra audio processing to fully redact the information in the original recording.

Amazon Transcribe is available in 31 languages, six of which are supported by real-time transcription, though for now the automated redaction feature is limited to U.S. English. The feature is billed monthly at a rate of $0.00004 per second of content.