A Dutch language expert working for Google to train its speech technology leaked private information in a breach of the company's security policies, company officials said. The disclosure came after Belgian broadcaster VRT NWS reported that its reporters listened to more than 1,000 conversations recorded by the search giant's virtual assistant, including some that revealed identifiable information about the users.
"As part of our work to develop speech technology for more languages, we partner with language experts around the world who understand the nuances and accents of a specific language," Google executive David Monsees wrote in a blog post posted on Thursday. These language experts review and transcribe a small set of queries to help us better understand those languages."
"We just learned that one of these language reviewers has violated our data security policies by leaking confidential Dutch audio data. Our Security and Privacy Response teams have been activated on this issue, are investigating, and we will take action. We are conducting a full review of our safeguards in this space to prevent misconduct like this from happening again," Monsees wrote.
The admission echoes Amazon's disclosure earlier this year that its workers can listen in and transcribe user conversations directed at Alexa, to train the virtual assistant to be smarter.
"Throughout the world -- so also in Belgium and the Netherlands -- people at Google listen to these audio files to improve Google’s search engine," VRT NWS reported.
"VRT NWS was able to listen to more than a thousand recordings. Most of these recordings were made consciously, but Google also listens to conversations that should never have been recorded, some of which contain sensitive information," VRT NWS reported.
Google officials maintained Thursday that they have privacy safeguards in place, adding that only 0.2 percent of audio is reviewed by its language experts.
"Audio snippets are not associated with user accounts as part of the review process, and reviewers are directed not to transcribe background conversations or other noises, and only to transcribe snippets that are directed to Google," Monsees wrote.
However there was enough personal information either recorded or associated with the voice recordings for VRT NWS to surprise users by playing audio of their own voice or that of their family members.
The Flemish broadcaster also reported that Google had recorded people's fights, bedroom experiences, and private work conversations, as well as a "woman who was in definite distress."