How artificial intelligence gave a paralyzed woman her voice back
Pat Bennett’s prescription is a bit more complicated than “Take a couple of aspirins and call me in the morning.” But a quartet of baby-aspirin-sized sensors implanted in her brain are aimed at addressing a condition that’s frustrated her and others: the loss of the ability to speak intelligibly. The devices transmit signals from a couple of speech-related regions in Bennett’s brain to state-of-the-art software that decodes her brain activity and converts it to text displayed on a computer screen.
Bennett, now 68, is a former human resources director and onetime equestrian who jogged daily. In 2012, she was diagnosed with amyotrophic lateral sclerosis, a progressive neurodegenerative disease that attacks neurons controlling movement, causing physical weakness and eventual paralysis.
“When you think of ALS, you think of arm and leg impact,” Bennett wrote in an interview conducted by email. “But in a group of ALS patients, it begins with speech difficulties. I am unable to speak.”
Usually, ALS first manifests at the body’s periphery—arms and legs, hands and fingers. For Bennett, the deterioration began not in her spinal cord, as is typical, but in her brain stem. She can still move around, dress herself and use her fingers to type, albeit with increasing difficulty. But she can no longer use the muscles of her lips, tongue, larynx and jaws to enunciate clearly the phonemes—or units of sound, such as “sh”—that are the building blocks of speech.
Although Bennett’s brain can still formulate directions for generating those phonemes, her muscles can’t carry out the commands.
Rather than train the AI to recognize whole words, the researchers created a system that decodes words from phonemes. These are the sub-units of speech that form spoken words in the same way that letters form written words. “Hello,” for example, contains four phonemes: “HH,” “AH,” “L” and “OW.”
Using this approach, the computer only needed to learn 39 phonemes to decipher any word in English. This both enhanced the system’s accuracy and made it three times faster.
On March 29, 2022, a Stanford Medicine neurosurgeon placed two tiny sensors apiece in two separate regions—both implicated in speech production—along the surface of Bennett’s brain. The sensors are components of an intracortical brain-computer interface, or iBCI. Combined with state-of-the-art decoding software, they’re designed to translate the brain activity accompanying attempts at speech into words on a screen.
About a month after the surgery, a team of Stanford scientists began twice-weekly research sessions to train the software that was interpreting her speech. After four months, Bennett’s attempted utterances were being converted into words on a computer screen at 62 words per minute—more than three times as fast as the previous record for BCI-assisted communication.
“These initial results have proven the concept, and eventually technology will catch up to make it easily accessible to people who cannot speak,” Bennett wrote. “For those who are nonverbal, this means they can stay connected to the bigger world, perhaps continue to work, maintain friends and family relationships.”
Approaching the speed of speech
Bennett’s pace begins to approach the roughly 160-word-per-minute rate of natural conversation among English speakers, said Jaimie Henderson, MD, the surgeon who performed the surgery.
“We’ve shown you can decode intended speech by recording activity from a very small area on the brain’s surface,” Henderson said.
Henderson, the John and Jean Blume-Robert and Ruth Halperin Professor in the department of neurosurgery, is the co-senior author of a paper describing the results, published Aug. 23 in Nature.
His co-senior author, Krishna Shenoy, Ph.D., professor of electrical engineering and of bioengineering, died before the study was published.
Frank Willett, Ph.D., a Howard Hughes Medical Institute staff scientist affiliated with the Neural Prosthetics Translational Lab, which Henderson and Shenoy co-founded in 2009, shares lead authorship of the study with graduate students Erin Kunz and Chaofei Fan.
In 2021, Henderson, Shenoy and Willett were co-authors of a study published in Nature describing their success in converting a paralyzed person’s imagined handwriting into text on a screen using an iBCI, attaining a speed of 90 characters, or 18 words, per minute—a world record until now for an iBCI-related methodology.
In 2021, Bennett learned about Henderson and Shenoy’s work. She got in touch with Henderson and volunteered to participate in the clinical trial.
How it works
The sensors Henderson implanted in Bennett’s cerebral cortex, the brain’s outermost layer, are square arrays of tiny silicon electrodes. Each array contains 64 electrodes, arranged in eight by eight grids and spaced apart from one another by a distance of about half the thickness of a credit card. The electrodes penetrate the cerebral cortex to a depth roughly equaling that of two stacked quarters.
The implanted arrays are attached to fine gold wires that exit through pedestals screwed to the skull, which are then hooked up by cable to a computer.
An artificial-intelligence algorithm receives and decodes electronic information emanating from Bennett’s brain, eventually teaching itself to distinguish the distinct brain activity associated with her attempts to formulate each of the 39 phonemes that compose spoken English.
It feeds its best guess concerning the sequence of Bennett’s attempted phonemes into a so-called language model, essentially a sophisticated autocorrect system, which converts the streams of phonemes into the sequence of words they represent.
“This system is trained to know what words should come before other ones, and which phonemes make what words,” Willett explained. “If some phonemes were wrongly interpreted, it can still take a good guess.”
Practice makes perfect
To teach the algorithm to recognize which brain-activity patterns were associated with which phonemes, Bennett engaged in about 25 training sessions, each lasting about four hours, during which she attempted to repeat sentences chosen randomly from a large data set consisting of samples of conversations among people talking on the phone.
An example: “It’s only been that way in the last five years.” Another: “I left right in the middle of it.”
As she tried to recite each sentence, Bennett’s brain activity, translated by the decoder into a phoneme stream and then assembled into words by the autocorrect system, would be displayed on the screen below the original. Then a new sentence would appear on the screen.
Bennett repeated 260 to 480 sentences per training session. The entire system kept improving as it became familiar with Bennett’s brain activity during her speech attempts.
The iCBI’s intended-speech translation ability was tested on different sentences from those used in the training sessions. When the sentences and the word-assembling language model were restricted to a 50-word vocabulary (in which case the sentences used were drawn from a special list), the translation system’s error rate was 9.1%.
When the vocabulary was expanded to 125,000 words (large enough to compose almost anything you’d want to say) the error rate rose to 23.8%—far from perfect, but a giant step from the prior state of the art.
“This is a scientific proof of concept, not an actual device people can use in everyday life,” Willett said. “But it’s a big advance toward restoring rapid communication to people with paralysis who can’t speak.”
“Imagine,” Bennett wrote, “how different conducting everyday activities like shopping, attending appointments, ordering food, going into a bank, talking on a phone, expressing love or appreciation—even arguing—will be when nonverbal people can communicate their thoughts in real time.”
The device described in this study is licensed for investigative use only and is not commercially available. The study, a registered clinical trial, took place under the aegis of BrainGate, a multi-institution consortium dedicated to advancing the use of BCIs in prosthetic applications, led by study co-author Leigh Hochberg, MD, Ph.D., a neurologist and researcher affiliated with Massachusetts General Hospital, Brown University and the VA Providence (Rhode Island) Health care System.
More information:
Edward Chang et. al., A high-performance neuroprosthesis for speech decoding and avatar control, Nature (2023). DOI: 10.1038/s41586-023-06443-4 www.nature.com/articles/s41586-023-06443-4
Francis Willett et. al., A high-performance neuroprosthesis, Nature (2023). DOI: 10.1038/s41586-023-06377-x www.nature.com/articles/s41586-023-06377-x
Journal information:
Nature
Source: Read Full Article