Breaking Phone-Call Encryption

A technique for saving bandwidth in Internet phone calls could undermine their security, according to research recently presented at the IEEE Symposium on Security and Privacy. Johns Hopkins University researchers showed that, in encrypted phone calls using a certain combination of technologies, preselected phrases can be spotted up to 50 percent of the time on average, and up to 90 percent of the time under optimal conditions.

Voice-over-Internet-protocol (VoIP) phone calls, in which a computer converts a voice signal into data packets and sends them over the Internet, are increasingly popular for personal and business communication. Although most VoIP systems don't yet use encryption, says Jason Ostrom, director of the VoIP-exploitation research lab at Sipera Systems, it's absolutely necessary, particularly for business users. In many cases, security measures aren't in place because companies haven't realized how vulnerable VoIP can be, he says. He cites an assessment that he did for a hotel that uses VoIP phones, in which he showed that an attacker could access and record guests' calls using a laptop plugged into a standard wall connection. The Johns Hopkins researchers hope that pointing out possible holes in voice encryption systems can help ensure their security when they become more commonplace.

The Johns Hopkins attack takes advantage of a compression technique called variable-bit-rate encoding, which is sometimes used to save bandwidth in VoIP calls, explains Charles Wright, lead author of the paper. (Wright, who recently received his PhD from Johns Hopkins, will join the technical staff at the MIT Lincoln Laboratory in August.) Variable-bit-rate encoding, Wright says, adjusts the size of data packets being sent over the Internet based on how much information they actually contain. For example, when the person on one end of a VoIP call is listening rather than speaking, the packets sent from that person's computer shrink significantly. Also, packets containing certain sounds, such as "s" or "f," can take up less space than those containing more-complex sounds, such as vowels.

Encrypting the packets after they've been compressed scrambles their contents, making them look like gibberish. But it doesn't change their size, which is what would give away information to potential eavesdroppers.

In their tests, the Hopkins researchers simulated the packets that a combination of compression and encryption would produce for particular phrases. While an example of the way that a targeted speaker pronounced a particular phrase would give eavesdroppers a big advantage, they could still simulate the phrase using a pronunciation dictionary and a database of sample sounds from multiple speakers. The researchers can create many versions of the sounds in the phrase, which lets them accommodate different accents and other variations in pronunciation. They then use probabilistic methods to look for likely instances of the phrase. Wright says that the method can identify the phrase, on average, about half the time that it occurs, and that about half of the phrases it flags turn out to be exact matches of the desired phrase. In some circumstances, as when the phrases are longer, or when the speakers are particularly well matched to the simulated versions of the phrase, the accuracy became as high as 90 percent, Wright says. Because eavesdroppers have to know what phrase they're listening for, Wright says, "the threat would be more to technical, professional jargon than to an informal call between friends or family members."

While 50 percent accuracy may not sound like much, "these are encrypted conversations, so your expectation is not to be able to do this at all," says Fabian Monrose, an associate professor of computer science at Johns Hopkins, who was also involved in the research.

Matt Bishop, a professor of computer science at the University of California, Davis, agrees. "Fifty percent is quite scary," he says, "because what it means is that, in essence, you could potentially understand a fair portion of the conversation. The whole purpose of encryption is to prevent understanding." He adds that the attack is made more realistic by its ability to simulate phrases from standard sample sounds, which would be easier for an attacker to obtain than speech samples from the person he or she wants to spy on.

Sipera Systems' Ostrom says that he found the research particularly interesting "because it shows that you shouldn't feel safe just because you're using a security control. You still have to validate it to ensure that it meets your requirements." He adds, "In VoIP, there's always a fight between quality of service and security." The researchers' attack is a good example, he says, because it explores how an effort to improve quality of service by reducing bandwidth usage can affect efforts to protect calls. However, Ostrom notes that most corporations aren't currently using variable-bit-rate encoding and wouldn't now be at risk.

Wright and Monrose say that they see their work as more of a cautionary tale. Monrose says that recently he has been seeing drafts of technical specifications that call for variable-bit-rate encoders. "Our gut reaction was, this has privacy implications that people have not well studied," he says. The researchers say that they hope their work will prevent people from making design decisions in isolation and encourage them to think about solutions that will increase both efficiency and security. "If we start combining tools the way a lot of the specifications are calling for," Monrose says, "then we need to make sure that we do it in the right way."