April 25, 2005
Case for Speech Recognition
A sturdy hardware platform, a comprehensive pilot program, and professional training can help ensure a tidy return on investment.
Traditionally viewed as simply a means of dictating text into a personal computer, today’s speech-recognition software can play a far more significant role in the healthcare environment. In addition to pure dictation, speech-recognition software can be used to manage e-mail, streamline repetitive tasks on the PC, reduce transcription and charting costs, speed up information turnaround, and protect employees from repetitive stress injuries (RSIs).
The software can be integrated with most electronic medical record (EMR) applications to make those programs more effective and easier to use. Rapid hardware advancements and improvements in the technology itself have increased its utility, accuracy, speed, and ease of use. This has brought the cost of ownership to an affordable level for any size medical office or clinic, medical department within a healthcare organization, and even entire hospitals. When properly implemented, speech-recognition software can increase productivity for every employee who works with a computer.
Like any technology, the deployment of a speech-recognition program should be carefully planned so as to achieve the full benefit of the software and maximize the return on investment. This article provides an overview of the basics of how the software works, what medical offices or departments can do with speech recognition, examples of savings, and recommendations for implementation.
Why Use Speech-Recognition Software?
How Does Speech-Recognition Software
The software enables users to input text and data into virtually any Microsoft Windows-based application by voice, as well as to navigate the computer desktop with little or no use of their hands. Users speak naturally into a noise-canceling microphone connected to the computer.(3) The software “recognizes” the spoken words, converts them into text, and displays them on the screen for review.
Most speech-recognition programs also allow users to speak a standard command that prompts the computer to perform an action. For example, the user says, “Start WordPerfect.” The more advanced speech-recognition programs also enable users to create customized commands (macros), such as “Send an e-mail to Doug Z,” which will open an e-mail addressed to Doug Z.
Configuring the software during set-up is referred to as “enrollment.” After installing the program, each user must read aloud from a choice of prepared texts for approximately five minutes. Based on the dictation the application captures, the software analyzes how the user pronounces each word and stores the data to prepare a unique user profile for that individual.
As an individual uses the software and corrects recognition errors, the software becomes increasingly accurate by learning his or her particular speaking style. Most medical recognition programs enable users to add new words or customize the vocabulary for their particular practice or specialty. Using specialty vocabularies can improve accuracy even further. Some speech-recognition software programs include a medical vocabulary—incorporating diseases, medications, procedures, and acronyms in addition to the standard business vocabulary—and can automatically recognize and format prescriptions and patient encounters. For certain programs, specialty medical vocabularies can also be created in-house or purchased from third-party sources.
How is Speech Recognition Used
to Replace Traditional Transcription?
Speech Recognition Uses
Dictation is the most versatile and widespread use for speech-recognition software. Some individuals can’t or prefer not to type, either because they are untrained as typists, have a disability, or wish to prevent RSIs. Many practices have decreased the number of support staff and require physicians to generate their own records. Even doctors who typically dictate documents for others to transcribe may use speech recognition occasionally, such as when they need to produce a document on the spot or after hours or when they are responding to e-mail.
Doctors who wish to maintain their traditional workflow can dictate into a handheld recorder (4) or save their recorded dictation (5) with their documents for someone else to transcribe or correct at a later time. This can substantially reduce the turnaround time over traditional transcription. If transcription is produced in-house, using speech-recognition software frees up support staff for more productive tasks. If transcription is outsourced for correction, it can significantly reduce an organization’s overhead costs.
Navigate the Windows Desktop
Create, Manage, and Send e-Mail
Mastering the Mundane
Create a Paperless Office
Increase Productivity Outside
Work on the Web by Voice
RSIs, which are often incurred by employees working at computers, are the most common MSD. RSIs occur when muscles or tendons are repeatedly overused or forced into an unnatural position. Keyboarding, clicking, and maneuvering the mouse strains and damages muscles and tendons in the fingers, hands, wrists, and arms.
The widespread use of computers in the workplace has contributed to the ubiquity of RSI pain and discomfort. OSHA has identified repetition, such as using a keyboard and/or mouse steadily for more than four hours daily, as a risk factor that could cause an RSI or MSD. “Intensive computer use accounts for a significant number of MSDs each year, and occupational computer use is growing,” according to OSHA reports.(7)
While most RSI sufferers are able to find appropriate treatment and return to their positions, some become permanently disabled and are never able to use their hands to operate a computer again. Workers with severe MSDs often face permanent disability that prevents them from returning to their jobs.
Speech-recognition software can minimize or eliminate keyboarding and mouse movements that damage and strain muscles, tendons, and nerves due to excessive repetition. By giving employees with intensive computer use access to speech-recognition software, you can prevent an injury before problems arise or help employees return to work sooner, reducing workers’ compensation, medical, and replacement labor costs. A recent study on RSIs in the workplace highlights the average cost of this type of injury at $20,000 per affected employee.
Assisting with ADA Compliance
Since speech-recognition software can help employers hire and maintain qualified workers with RSIs and other disabilities, this technology plays an important role in employers’ ADA compliance strategies.(8)
Return on Investment
Typically, a single doctor or nurse who utilizes an outside transcription service spends between $10,000 and $30,000 per year digitizing dictation depending on the individual’s workload. For example, a private practice doctor in San Diego replaced outsourced transcription with a voice-recognition solution and saved more than $10,000 per year by eliminating the need for transcription. In addition, he now has time to see more patients each day because he completes the paperwork for each patient during their visit.
The savings potential in larger organizations can be tremendous. A large medical group in Seattle saved $90,000 the first year it deployed speech recognition and $240,000 the next year as it rolled out the solution to all its doctors and eliminated the need for an in-house transcription staff.
Basics of Implementing a Speech-Recognition
Although speech-recognition programs will automatically adjust to the processor and memory of your computer to provide the best combination of accuracy and speed possible, most users will be happier with systems that exceed the software manufacturer’s minimum requirements. Speech-recognition software is processor-intensive, and in general, the faster the processor, the better the performance. Users who wish to have multiple applications running at the same time will also benefit from having more RAM on their system than the minimum.
A computer’s sound card is another factor that can affect performance. Speech-recognition programs require a sound card that will accurately process the electrical charges that your voice creates when you speak into the microphone. Static or electrical interference will make it difficult or impossible to achieve good speech recognition accuracy. Because of this, speech programs require a high-quality 16-bit sound card. Check with the software manufacturer to verify which sound cards are certified to work with the program.
The software performance can also be affected by the quality of the microphone. Speech recognition requires a high-quality, high-level speech signal. Noise-canceling microphones help block out high ambient noise levels. Most speech-recognition programs are sold with a high-quality, noise-canceling headset microphone that is specifically tuned to the software. Users who do not like wearing a headset may prefer an array microphone; others may opt for a wireless headset. Combined dictation/telephone headsets are also available. Most laptop users achieve high performance with a regular headset microphone, but users who are unable to achieve satisfactory sound quality from their laptop’s built-in sound hardware may wish to use a USB (universal serial bus) microphone that processes their voice signal before sending it to the computer. Check with the software manufacturer to verify which microphones are certified to work with its program.
User Expectations and Training
Although users can begin dictating and using the software after completing their initial five-minute enrollment session, most people increase their productivity when they receive training. Training speeds the learning curve, instills confidence in users, reduces support costs, promotes the success of a pilot program, and maximizes return on investment.
Customization may be as simple as the creation of a macro that inserts your name and title at the end of a letter when you say “my signature” or as complex as a macro that executes a series of keyboard commands and mouse strokes with a spoken command. Macro creation tools are typically included in high-end speech-recognition software systems. Although simple macros are easy for users to create, in most cases firms will achieve better results if an IT (information technology) staff member or a speech recognition consultant works with each user to analyze their workflow and customize the program to their needs.
Creating a custom vocabulary including patient, staff, and other physicians’ names will increase accuracy. Many speech-recognition programs permit custom vocabularies and macros to be exported and shared by multiple users, which decreases the time and cost associated with customization. Individual users can increase their accuracy by running a feature contained in most speech programs that analyzes the user’s written documents to learn their writing style and the words they use most often.
Conducting a Pilot
For best results, select four to eight computer-savvy employees who want to use speech recognition and are likely to have the time to use the software on a daily basis during the pilot period. A typical pilot, from initial assessment through final evaluation, lasts one to three months. Before the pilot begins, someone from IT or the training department, the vendor, consultant, or VAR should sit down with each participant to analyze his or her daily routine. By doing so, custom vocabularies and macros can be developed to enhance productivity. After the software has been customized for each participant’s needs, group or one-on-one training should be provided.
— Matt Revis is the senior product marketing manager for dictation products at ScanSoft. He has an MBA from Columbia Business School and has been working in speech technology marketing for five years.
2. How do speech-recognition software programs understand speech?
Speech-recognition software programs are based on statistical probability. The software analyzes an incoming stream of sounds and interprets those sounds as commands and dictation. This process of interpretation is called speech recognition, and its success is measured by the percentage of correct interpretations or recognition accuracy.
The software relies on three sources of information to achieve high recognition accuracy:
• Acoustic model — a mathematical model of the sound patterns used by the speaker’s language.
• Vocabulary — a list of words the program can recognize. Each word in the vocabulary has a text representation and pronunciation.
• Language model — statistical information associated with a vocabulary that describes the likelihood of words and sequences of words occurring in the user’s speech.
When you create and train a user profile, you start with a standard set of models and then customize them for the way you speak (acoustic model) and the way you use words (vocabulary and associated language model). The software employs your customized user files to determine the words you spoke.
3. The quality and type of noise-canceling microphone is a critical success factor in implementing speech recognition.
4. The handheld recorder is typically a digital recorder. Not all recorders work with all speech-recognition software programs. Check with the software manufacturer to confirm whether a recorder is approved for use with their product.
5. Some speech-recognition programs enable users to save their recorded dictation with their text file so they or a third party can correct or edit the file while listening to or periodically checking the original dictation. Check with the software manufacturer to confirm whether this feature is available.
6. OSHA Fact Sheet. Ergonomics By the Numbers.
7. OSHA Ergonomics Program. Federal Register. 2000;65(220):68343.
8. The information contained in this article does not constitute legal advice. If you have any questions regarding the Americans with Disabilities Act or any other law, you should contact a qualified attorney.