Boffins convert typing sounds into text with 95% accuracy

Boffins convert typing sounds into text with 95% accuracy

Source Node: 2205345

Researchers in the UK claim to have translated the sound of laptop keystrokes into their corresponding letters with 95 percent accuracy in some cases.

That 95 percent figure was achieved with nothing but a nearby iPhone. Remote methods are just as dangerous: over Zoom, the accuracy of recorded keystrokes only dropped to 93 percent, while Skype calls were still 91.7 percent accurate. 

In other words, this is a side channel attack with considerable accuracy, minimal technical requirements, and a ubiquitous data exfiltration point: Microphones, which are everywhere from our laptops, to our wrists, to the very rooms we work in. 

To make matters worse, the trio said in their paper that they’ve achieved what they claim is an accuracy record for acoustic side-channel attacks (ASCA) without relying on a language model. Instead, they used deep learning and self-attention transformer layers to capture the sounds of typing and translate it into data for exfiltration.

We’ve previously written about people using mics in interesting ways to snoop on folks; for example, experiments involving laser microphones and hard disk drives. In the end, it’s typically easier to get some malware onto a target’s PC and access their data and keystrokes that way without any Bond-esque shenanigans.

Defending against ‘Fully-automated on-site and remote ASCA’

To go from keystroke sounds to actual letters, the eggheads recorded a person typing on a 16-inch 2021 MacBook Pro using a phone placed 17cm away and processed the sounds to get signatures of the keystrokes. Those were then analyzed by a deep learning model, which fed them into convolution and attention networks to guess which particular key, or sequence of keys, was pressed. 

“Both the phone and Zoom recording classifiers achieved state-of-the-art accuracy given minimal training data in a random distribution of classes,” the team said in their paper. To add to security fears, “recording in this manner required no access to the victim’s environment and in this case, did not require any infiltration of the device or connection,” the boffins noted. 

As is often the case with side-channel attacks, mitigation isn’t always easy. Luckily in this case it’s not power usage, CPU frequencies, blinking lights or RAM buses leaking data unavoidably, but a good old-fashioned problem occurring between the computer and chair that can actually be mitigated somewhat easily. 

The simplest protection method, said the researchers, is changing one’s typing style. The researchers note that skilled users able to rely on touch typing are harder to detect accurately, with single-key recognition dropping from 64 to 40 percent at the higher speeds enabled by the technique. 

For those who don’t want to take the time to learn to be a proficient typist, the team recommends a few additional techniques like using randomized passwords with multiple cases. “Multiple methods succeed in recognizing a press of the shift key,” the academics said, but “no paper in the surveyed literature succeeded in recognizing the ‘release-peak’ of the shift key amidst the sound of other keys.” 

In other words, mixing uppercase and lowercase letters continues to be a good habit. The team also said those worried about acoustic side channel attacks can also just use a second authentication factor to prevent someone snooping keystrokes and stealing passwords. 

That’s all well and good for passwords, but what about other secret information, like company records or customer info? To address that the researchers suggest playing fake keystroke sounds to mask the real ones. 

Working among the clacking of phantom keyboards would surely annoy everyone, which is why the researchers suggest only adding the sounds to Skype and Zoom transmissions after they’ve been recording instead of subjecting employees to real-time noisemakers. That, the team found, “appears to have the best performance and least annoyance to the user.” 

Followup research is now going on into using new sources for recordings, like smart speakers, better keystroke isolation techniques and the addition of a language model to make their acoustic snooping even more effective. ®

Time Stamp:

More from The Register