Believe it or not, your phone’s tiniest vibrations can reveal your conversations — thanks to AI.
A team of computer science researchers at Penn State has developed a startling new way to eavesdrop on phone calls remotely by decoding subtle vibrations emitted by a cellphone’s earpiece.
Using millimeter-wave radar combined with an AI speech recognition system, their setup can capture and transcribe conversations from up to 10 feet away with about 60% accuracy.
This breakthrough raises significant privacy concerns about the potential misuse of such emerging technologies.
The research builds on a 2022 project where the team achieved up to 83% accuracy in recognizing 10 predefined words using a similar approach.
The new work extends this capability to continuous speech transcription, though the accuracy is lower due to the complexity of decoding noisy radar data.
“When we talk on a cellphone, we tend to ignore the vibrations that come through the earpiece and cause the whole phone to vibrate,” said first author Suryoday Basak, a doctoral candidate in computer science.
“If we capture these same vibrations using remote radars and bring in machine learning to help us learn what is being said, using context clues, we can determine whole conversations. By understanding what is possible, we can help the public be aware of the potential risks.”
The team used a millimeter-wave radar sensor, the same technology employed in self-driving cars, motion detectors, and 5G wireless networks, to measure the tiny surface vibrations generated by speech played through a phone earpiece.
To interpret this noisy and low-quality data, they adapted Whisper, an open-source AI speech recognition model developed for clean audio, using a low-rank adaptation machine learning technique.
This method allowed them to retrain just 1 percent of Whisper’s parameters specifically for radar data, improving transcription results without rebuilding the entire model from scratch.
Radar tech breakthrough
The experimental setup involved positioning the radar sensor about three meters (10 feet) away from the phone to capture the minute vibrations.
The data was then fed into the customized AI model, which produced transcriptions with around 60 percent accuracy over a vocabulary of up to 10,000 words.
While this is far from perfect, the researchers noted that even partial keyword matches could have serious security implications.
“The result was transcriptions of conversations, with an expectation of some errors, which was a marked improvement from our 2022 version, which outputs only a few words,” said co-author Mahanth Gowda, associate professor of computer science and engineering.
“But even picking up partial matches for speech, such as keywords, are useful in a security context.”
The team compared their approach to lip reading, which typically captures only 30% to 40% of spoken words but can still help people infer conversations when combined with context.
Similarly, the radar-AI system’s output, though imperfect, can reveal sensitive information when supplemented with prior knowledge or manual correction.
Privacy risks amplified
Basak emphasized the potential privacy risks posed by this emerging technology.
“Similar to how lip readers can use limited information to interpret conversations, the output of our model combined with contextual information can allow us to infer parts of a phone conversation from a few meters away,” he said.
“The goal of our work was to explore whether these tools could potentially be used by bad actors to eavesdrop on phone conversations from a distance. Our findings suggest that this is technically feasible under certain conditions, and we hope this raises public awareness so people can be more mindful during sensitive calls.”
The U.S. National Science Foundation supported the research, and the team stressed that their experiments are intended to highlight possible vulnerabilities before malicious actors exploit them.
They envision future efforts to develop protective measures to secure personal conversations from this kind of remote surveillance.
Source link