Speech Recognition: A Work in Progress

Home | Subscribe | Resources | Reprints | Writers' Guidelines

Special Showcase Edition April 2013

Speech Recognition: A Work in Progress
By Selena Chavis
For The Record
Vol. 25 No. 7 P. 10

While largely reliable, it’s important to note that the technology is not infallible.

While controversy remains within the health care community regarding the best approach to using speech recognition technology, there are some points that most industry professionals agree on. Few would dispute that the vendor community has made tremendous strides to advance applications to better meet the needs of the health care industry, and innovation continues as developers look for ways to make the technology more intelligent and accurate.

Most industry professionals also acknowledge the technology’s potential to align with federal initiatives to streamline documentation practices and support improved information sharing in real time.

Alongside these positives, vendors and health care providers alike are quick to point out an important absolute: Speech recognition will never be infallible. “Even if you use the latest and greatest technology, you will still see errors,” says Juergen Fritsch, PhD, MSc, chief scientist at speech recognition provider M*Modal. “It is very important to realize that there will always be errors.”

Errors Still Exist
A recent in-house investigation at Ireland’s Cork University Hospital revealed that even after three years of using front-end speech recognition within the facility’s radiology department, reports still contained obvious errors. Installed in 2008, the system features radiologists performing their own editing and final proofing. The initial return on investment was promising as the facility realized immediate improvements to turnaround times—three to four hours compared with several days when sent to transcription.

The efficient turnaround times were a big plus to patient care, but Maria Twomey, MD, a specialist registrar in radiology, along with her oncologist colleagues, identified a need for ongoing error review to support process improvement. Following a random sample review of 350 reports from June 2008 through December 2011, researchers found that 12% of reports contained errors, and 3% contained critical errors that could adversely impact patient care.

At Ohio-based UC Health system, Sherry Doggett, director of corporate transcription services, was asked to start a quality assurance (QA) program for physician documentation following the implementation of speech recognition technology in the emergency department. Much like the experience of Cork University Hospital, UC Health used data from the QA process to identify areas where speech recognition operations could be bolstered.

As the most effective use of speech recognition technology continues to evolve, many industry professionals believe that QA will be a critical component. Specifically, this kind of follow-up will be needed to spot where errors are originating, whether with the technology itself or poor dictation practices.

The Cork University Hospital study found that the speech recognition technology had difficulty understanding Irish accents related to certain words. When physicians used an American pronunciation for “problem” words, the error rate improved.

Other problems were related to the omission of key words such as “yes” and “no” that can alter the meaning of a sentence. Measurements also were problematic, as words such as “centimeter” were sometimes replaced with “millimeter.” In addition, there were grammatical errors that caused comprehension issues with reports.

According to Rich Micheil, voice recognition manager with Emdat, there are several poor dictation practices that can cause such errors. “If a doctor is not speaking in proper sentence structure, the technology will have a hard time discerning where a sentence begins and ends,” he explains. “Consistency is key for punctuation and speech understanding.”

Fritsch points to other issues such as mumbling or speaking too fast—practices that can cause short words such as no to be missed if not clearly pronounced.

Technology continues to advance to address problematic areas, such as accents, while proactively identifying potential issues. Even with these advancements, more health care organizations are recognizing the need to get a second set of eyes on documentation created by front-end speech recognition. “We have seen more emphasis on quality in the last couple of months than I have seen in a long time,” says Karen Fox-Acosta, CMT, AHDI-F, president of the Association for Healthcare Documentation Integrity (AHDI). “When physicians are doing editing on the front end, the system will only work as well as the time the physician takes with it.”

Front End vs. Back End
Currently, there are two approaches—front end and back end—to incorporating speech recognition into provider workflows. Industry professionals agree that front-end speech practices have the greatest potential to align with current federal initiatives pushing for greater efficiency and more real-time information sharing of patient data.

Front-end speech eliminates the need for time-consuming transcription services because doctors dictate directly into the EMR. “The beauty is it’s available in real time, and the doctor does his or her own editing,” says Keith Belton, senior director of product marketing at Nuance.

Back-end speech recognition is much more transparent and may appear to a physician like a traditional dictation/transcription format. Physicians typically dial into the system from a wall phone or a mobile device and dictate into the speech recognition application. A document is produced and later edited by a medical transcriptionist (MT) for accuracy.

While front-end speech offers the most potential to meet industry needs moving forward, it’s generally assumed that back-end processes lend to greater accuracy. “The back-end speech process, in my experience, seems to work better because you have the MT involved,” Micheil says. “The experience behind that MT will give you a better product in the long run.”

While not comparing front-end with back-end speech practices, a 2009 study conducted by the AHDI concluded that the accuracy of medical records improves when MTs verify information dictated by physicians. The study found that error rates were 22% for traditional dictation practices and 52% for dictation with speech recognition translation.

While errors in both instances can be corrected with proper editing techniques, the question that many professionals have is whether physicians have the time or are willing to take the necessary steps to make corrections in a front-end speech setting. “What can happen when physicians are in that front-end situation is that there is never a second look,” Fox-Acosta says. “That second look is critical.”

A review of data gathered in the Cork University Hospital study revealed that two-thirds of the reports that contained errors were finalized at the end of the day when radiologists were fatigued and rushing to complete work, adding fuel to the idea that another set of eyes can be pivotal.

Emdat CEO Randy Olver says speech recognition technology should make the doctor as efficient as possible. “We at Emdat have been an advocate of having an MT see the finished product,” he says, pointing out that health care organizations need to be realistic when trying to implement speech recognition into workflows. “We want to make sure we are setting proper expectations. Clients need to be aware that speech recognition is not for everybody.”

Better Equipment, Profiling, and Other Advances
Fox-Acosta acknowledges that speech recognition equipment and software have made significant strides in recent years. “Speech recognition platforms have gotten better, and the microphone technology is better,” she says. “Having quality equipment is a big plus.”

To overcome issues related to accents and personal speech preferences, Belton says software applications have become smarter and more interpretive in nature. A system of user profiling allows speech recognition software to essentially learn the “style guide” and voice of a particular health care professional.

For example, if a physician typically provides prescription information at the end of dictation but the health care organization prefers it at the beginning of the document, changes can be entered into the user profile and completed automatically. The system learns to adapt to individual users based on the habits or preferences fed back into a preestablished profile. Advanced features such as vocabularies sorted by physician specialty and regional accent wizards also can be built into a profile.

Fritsch says the market is moving more toward a “speech understanding” model that provides a foundation to identify obvious or nonsensical errors. “We have been working to expand beyond the speech recognition paradigm,” he explains, adding that tailoring technology to focus specifically on health care keeps the technology from making “stupid” mistakes. “When they are talking about chest pain, it’s very unlikely that anything about the foot should show up.”

According to Fritsch, a cardiologist’s user profile might start with a broad cardiology platform then expand as the speech engine begins to learn the dictator’s characteristics. It’s important to note, however, that user profiles are only as effective as the information entered into them, and considerations must be made for the differences between back-end and front-end workflows. In the back-end speech world, MTs make sure proper edits are fed into the user profile system to ensure the highest level of accuracy for individual dictators. With front-end speech workflows, the responsibility falls on the physician.

Doggett also points out that MTs help identify when a user profile is ready for the real world. While most vendors suggest 60 to 100 minutes of voice to prime the speech engine for a particular user, she says it depends on the individual. “That’s a ballpark. Some may take 180 minutes of voice,” she says, pointing out that even when the time is extended, some professionals simply won’t qualify for speech recognition. “You have to take a realistic approach before a [user profile] is amenable to be released into the speech recognition platform. You are not gaining anything if you turn a physician on too soon.”

Regardless of whether back-end or front-end processes are used, once a profile is released into the regular system, consistency is the key to success, Micheil says. “The speech engine will always produce some kind of text,” he notes while pointing out that the user profile data need to match what the user actually does.

Checks and Balances
Alongside QA programs, Belton says training and regular feedback are critical to ensuring speech recognition practices produce accurate results. “Some of it is what we [the vendor] do in our training, some of it needs to be what happens in the department,” he says, pointing to workflow processes that include regular chart reviews. “A fundamental tenet is to provide training not just on [the technology] but also on the EMR you work in.”

At a recent deployment at a large California health system, Belton says the release of the speech recognition technology coincided with the implementation of the Epic EMR, with training for both conducted simultaneously. “This way, a trainer is able to customize the workflow for a physician,” he explains. “You would not dream of giving [a particular physician] a login and password for Epic without eight to 10 hours of training. The same is true for any technology.”

Fritsch notes that with physicians, training may only go so far. In light of regular interruptions and busy patient care schedules, best practices may not receive the attention they deserve. Acknowledging that the success of front-end processes is dependent on physicians who tend to not be as careful with editing as MTs, M*Modal has taken steps to provide cues where errors may exist. Essentially, the technology highlights certain areas of the document to draw attention to potential problems.

“We took the other approach: Let them dictate the way they are used to and highlight questionable areas,” Fritsch says. For example, if a dictator talks continuously about the left foot and then mentions the right foot, a red flag would be raised.

The technology also spotlights the relevance of key components such as measurements. “Any measurement would be highlighted to make sure someone checks it,” Fritsch says, pointing out that proper training regarding best practices will always lay a more effective foundation for accuracy. “Those dictators who take time to think about what they say before dictating and speak clearly and consistently … those are the ones who are most successful.”

— Selena Chavis is a Florida-based freelance journalist whose writing appears regularly in various trade and consumer publications covering everything from corporate and managerial topics to health care and travel.