This article was originally published on Automotive News.
Mack Compton owns two vehicles and doesn’t use the voice assistant in either of them.
The 73-year-old retired mail carrier, who now shines shoes in Midtown Manhattan, said the knobs and buttons in his 2013 Chevrolet Sonic and his 2022 Nissan Kicks are more straightforward.
“They are much easier to deal with,” he said. “I’m just old school.”
Consumers complain of unresponsiveness, awkward pauses, misunderstandings and false activations in their vehicle voice communications. Rigidity, lag and clunkiness continue to undermine voice assistants, which are marketed as a safe way to navigate layered menus.
Voice recognition, including voice assistants, has been a top 10 problem in the J.D. Power U.S. Initial Quality Study for more than a decade, said Kristin Kolodge, vice president of auto benchmarking and mobility development at J.D. Power.
But the industry may be starting to solve some of its voice woes.
Artificial intelligence enables systems to parse more natural speech instead of specific commands. Some automakers are also moving the vehicle’s ability to understand commands for common functions, such as navigation and streaming, to the vehicle instead of the cloud, which makes responses faster.
“I believe that next generation will actually close many of those consumer expectation gaps significantly,” said Dave Tsai, chief technology officer at Toyota Connected North America. “We have hit a lot of those marks to be more close to human-based conversations in terms of latency as well as natural language.”
Voice assistants will access advanced vehicle capabilities
Vehicles used to be the domain of knobs and buttons for relatively few functions.
Then “you have an explosion of entertainment options, ADAS functionality that starts, much more complicated climate capabilities and many other assistants that are happening across the vehicle,” but “you cannot have 100 buttons,” said Alex Koster, managing director and senior partner at Boston Consulting Group.
Later, automakers relied on touchscreens and joysticks, which enabled a flexible array of capabilities accessible through a central infotainment display.
Select your streaming app of choice through the infotainment system. Search a vast media library, select a podcast and navigate through episodes. Adjust playback speed and tune an equalizer for more bass or treble.
But how to navigate this buffet of features while cruising along in a moving vehicle?
Touchscreens are “limited because you cannot really go into multidimensional menus in a safe way while you’re driving,” said Koster.
Voice is the answer for many automakers. But improving it is easier said than done.
Voice assistants have been limited by the information that can be stored on board vehicles
The first step in parsing and responding to speech is hearing it. The vehicle is a noisy environment, and sometimes there are multiple voices in the cabin.
Most vehicles have only one microphone, said Dani Cherkassky, CEO of Kardome, a voice interaction technology company. The microphone, which costs about $5 per vehicle, cannot resolve 3D space and flattens the audio.
“Which portion of the speech [do you] attribute to which seat?” he said. “The entire thing of understanding the acoustical scene is lost.”
It is either not possible or not desirable to have the microphone listening every time a human enters the vehicle. Privacy concerns abound, plus large language models require a lot of power.
To parse every bit of speech to deduce when a command is being made is prohibitively expensive and energy-intensive.
“In the same way that we’re hearing about all these data centers that AI uses up an enormous amount of electricity in, say, the city of Phoenix, it does the same thing” in a vehicle, said Sean Tucker, lead editor at Kelley Blue Book.
This has created the “wake word” or “trigger word,” which tells the system that a command is forthcoming. Calibrating the wake word is its own challenge.
“If you overtune it, then it won’t pick up the intentional” calls, said Michael Zagorsek, COO at SoundHound AI, a voice platform company. “If it’s too broad, it’ll pick up these false positives.”
Historically, drivers have been able to repeat a set of cached commands stored in the vehicle that the system can respond to. That library has become more and more comprehensive.
For more advanced commands, AI enables parsing complex speech in the cloud. This is how some automakers can turn “it’s too cold in here” into an adjustment of the vehicle climate.
However, accessing the cloud requires connectivity. Responses can be slow, and drivers can be met with confusing silence, unsure if the command was heard.
Vehicles also must have the compute power to retain consistent connectivity and receive large-bandwidth feedback. The vehicle has to process a lot of complex information.
“Fragmented, underpowered hardware within the vehicle limits their ability to integrate robust voice interaction, and limited and poor connectivity makes interaction with the cloud spotty and inconsistent as well,” said Ken Johnston, vice president of AI and analytics at Envorso, a digital transformation consultancy.
Companies are offering more limited, fine-tuned AI models on board vehicles
Cloud-based large language models enable more complex conversations and smarter voice assistants. But they introduce latency problems and require connectivity and significant processing power.
Onboard libraries of commands are more efficient but create a limited, stiff and inflexible conversation.
Companies are turning to a third option: smaller models that can be run in the vehicle, also called “the edge.” These offer the natural conversational elements of large language models without the technical limits of constant cloud access.
“There’s the massive, large compute models that can answer everything,” said Zagorsek, but “you can build narrower, more fine-tuned ones, which we’re doing, that ultimately can live in an entirely embedded environment.”
For its next generation of multimedia, Toyota will push AI onto the vehicle for common tasks, such as using the phone, playing media and navigating to specific destinations. Then, the company will rely on models in the cloud for more advanced commands, such as looking up information.
“There will be many things that will be running on board the vehicle for latency purposes,” said Toyota’s Tsai. “In a connected world where knowledge is actually required as well,” we’ll “still need to make connectivity calls.”
The race is on.
Big tech companies, auto-specific audio partners and automakers themselves are forging ahead with the voice assistants of the future, with which seamless conversations will remind the driver of chatting with a friend.
“I think those that will fail to implement a seamless and welcoming voice experience in their cars will eventually sell less cars,” said Cherkassky.