“Won’t this be helpful?” the Speech Therapist asked as she handed me an iPad with preprogrammed phrases, such as, “I want a sandwich,” “Turn on the TV,” and “Turn off the lights, please.” Thankfully, my voice was still strong and clear enough to say all those phrases, and I couldn’t imagine searching through all the options to find my particular request at the moment. The whole process seemed only slightly better than when, as a neurology resident at Strong Memorial Hospital, I gave stroke patients their laminated sheet with similar phrases so they could point to their request. Each was a step ahead and a welcomed aide for those who needed either, but I was grateful that wasn’t for me, at least not now. I hoped that my progressive neuromuscular illness, Primary Lateral Sclerosis, was slow enough in its progression that newer technology would be available for me when/if I needed it to speak more clearly.
I hoped for a device that would allow me to speak into it and then repeat whatever I’d said, but in a clearer and louder voice so others could understand me, even in a noisy room or in groups where not everyone was watching my mouth to catch every word. Voice banking, a procedure where one records various voice samples, was available at the Children’s Hospital in Boston, but I hadn’t learned of it when my voice was still intact. Still, I’d be happy for any voice that could say what I wanted at that moment, permitting me to be spontaneous.
Non-profit Help
Team Gleason, a nonprofit foundation founded by Steve Gleason, a former NFL player who developed ALS, was mentioned on a PLS Facebook group. Team Gleason paid for various accessibility items for persons with ALS (PALS) and PLS. More recently, Team Gleason offered voice cloning.
Voice cloning uses AI to produce a computerized voice by taking samples of one’s voice before ALS or PLS rendered it less clear due to the illness affecting the muscles of speaking. The voice samples are entered into a computer program to produce a voice like the original one. Another computer program would accomplish text-to-speech so that words could be formed with the computerized voice. In order to speak into a phone to have a verbal output, the program would first convert (one’s dysarthric) voice to text and then the text to (the new computerized) voice. The resulting computerized voice is quite similar to one’s former voice, including inflection and cadence. The cloned voice doesn’t even sound computerized!
I was intrigued and knew I had a brief sample of my healthy voice from a documentary made about military TBI, where I spoke about the capabilities of the human brain, especially its abilities to self-reflect and contemplate the future, known as executive functions.
Team Gleason was very professional and welcoming, and referred me to their contractor, Bridging Voices. Bridging Voices assigned me a Speech and Language Pathologist (SLP), who worked with Eleven Labs to create my voice clone. Eleven Labs apparently has many contracts with the film industry, including translating movies into other languages. They kindly do the medical business of making voice clones for individuals with dysarthria, or disordered speaking, for free.
The SLP I worked with, Alice, was very encouraging and friendly. She found testimony I’d given on Capitol Hill years before, now online with C-Span, and she said that recording was a better length than the brief segment in my 5-minute documentary on YouTube. She submitted them both to Eleven Labs, and we waited.
The Clone Was Ready
Soon, the voice clone was ready, and Alice directed me to pick one of the available apps to use my voice clone to “speak” words for me. I chose Speech Assistant because it had good reviews and was less expensive ($30) than another option. Alice walked me through the steps to load my voice clone into the Speech Assistant app, and we were ready! Alice showed me how to type in a phrase that I wanted to “speak,” which was good to know, but I wanted to be able to speak to the app in my dysarthric speech and have it “speak” in my cloned voice that was easily understood by all. Alice agreed that approach was more useful, and showed me how to speak into the app on my phone. The only glitch was that the app seemed to understand my current voice only about 85% of the time, so I needed to correct the “typos” before hitting the speak button. That slowed down my ability to speak using my cloned voice, but the app could learn and increase its accuracy if I practiced.
I realized I wanted to be able to give a speech, rather than use my clone in conversation. People understood me well when we were face-to-face, especially if the background noise was minimal. I even dared to make doctors’ appointments on the phone, and I succeeded unless the connection was poor or if English wasn’t the first language of the other person on the line. However, if I tried to read a paragraph or speak for a long time, my speech became less clear and more poorly understood. What was worse, I heard myself become less clear and became nervous, which made my speech even less understandable.
At that time, my book coach nominated me for an award with an international competition, Women Changing the World, and I entered. I was a finalist in my class, and we needed to have a 2-minute acceptance speech in case we won. Suddenly, I needed the Speech Assistant to deliver this brief speech in case I won.
Abby showed me how to convert my Word document into a text document and then load it into the Speech Assistant app. The Speech Assistant app divided my 2-minute talk into smaller segments, with each segment loaded on a different button. I needed to hit the buttons in sequence to finish the brief remarks.
It was a good proof of principle, if a bit clunky to move through the segments. Still, it would work. I didn’t win the contest, but I was grateful to have been a finalist.
A few months later, I was invited to speak to the ALS Association Support Group DC/MD/VA about living with a motor neuron disease. I knew I wanted the voice clone and app to work well without having the 18-minute talk divided up. Fortuitously, our older son was visiting us, and he suggested trying the Eleven Labs website. We did, and it worked beautifully.
The procedure is simply saving the Word document and pasting it into the Text-to-Speech page of the website. There are two options depending on the length of the document, and each is straightforward. Soon, the Word document is converted to an MP3 file and saved in the Downloads folder. So simple that any subsequent revisions are very easy.
With the ALS Support Group, I was in their Zoom room, and I opened the MP3 file on my computer; out came the voice clone, seamlessly delivering my remarks. With my voice rested during the talk, I could easily answer questions with my current voice and appreciate how viable it is to give a talk this way.
The Eleven Labs output is even more true to life and smooth in delivery than the Speech Assistant, although they are continually making modifications. When my son first played the Eleven Labs output, I was stunned at how similar my cloned voice was to my former speaking. My husband had tears in his eyes to feel transported back to when he was with me from before PLS. A very poignant moment.
Thank you, Team Gleason, Bridging Voices, and Eleven Labs!