International Engineering Consortium
Web ProForums
The Benefits of a Conversational Voice User Interface in a Voice Portal

7. Important Issues to Consider When Selecting a Conversational VUI
When considering a conversational voice user interface, it is important to select one that meets the following requirements:
  1. Accepts Natural Speech
    A good conversational VUI should be based on a natural model of speech. Phrases and commands must be intuitive and should be spoken without any unnatural pauses between words. In addition, users should not need to spend any time training the system for it to understand their unique vocal patterns—they simply should start talking.

    It is important to understand that there is a limit to what a conversational VUI can understand. Generally, a conversational VUI will accept multiple ways to say the same thing; for example, "Call John Smith at work" or "Dial 555-1212." However, the system might not understand, "Please phone John Smith at his place of employment." Fortunately, most people do not speak this way. Nevertheless, a conversational VUI system should be designed to accommodate the unique speech patterns in a given culture.

  2. Responds Intelligently
    Conversational VUI systems should be designed to respond appropriately to any user request. For example, if a user called someone who did not answer the phone, a conversational VUI system might say, "There was no answer at the work number for John Smith, should I call him at home?" The conversational VUI should act as an advisor, recommending next steps as needed.

  3. Provides Invisible Help
    A conversational VUI–based system should transparently guide the user toward a preferred speech pattern and remind him or her of the current location within the application. For example, if a user says, "Dial John Smith at work," the system might echo "Calling John Smith at work." As the user hears "calling" from the conversational VUI rather than "dialing," he or she becomes unconsciously aware that "call" is the preferred command. After hearing this a few times, experience has shown that the user will change the way in which he or she interacts with the system.

  4. Recognizes Appropriate Grammar
    Given the limits of today's speech-recognition technology, it is important to limit the grammar and commands of a conversational VUI system to those that are common. With grammar, it is important to assess the most common word usage for any given commands and limit commands to a list of those that are acceptable. For example, the words call, dial, contact, get, ring, phone, and buzz share a similar meaning. However, to add all of those words to the VUI's list of acceptable grammar will reduce the accuracy of the system, diminishing the user experience. Instead, analysis of the most common words is important. In this example, call and dial might be the most commonly used words in the United States and would be included in the acceptable grammar list. The actual preferred word choices should be a result of human-factor analysis—a process in which studies are conducted to determine the most appropriate word choices for any given culture or group of people.

  5. Context-Dependent Commands
    Commands must be understood in their spoken context. For example, the word "next" may have a different meaning when a user is listening to voice-mail messages than when reading e-mails. Although from a functional perspective the command "next" does the same thing in both contexts, from a practical perspective it would be inappropriate to read an e-mail when a user who is listening to voice-mail messages says "next."

  6. Natural Commands via Intelligent Prompts
    The voice prompts spoken by the conversational VUI should lead the user toward a desired response, without the user needing to remember cryptic commands. Consider the following examples:

    Option A
    System: What do you want to do?
    User: Talk to John Smith

    Option B
    System: What shall I do for you?
    User: Call John Smith

    In option A, the user speaks about what he or she wants to do, effectively saying "I want to talk to John Smith". In option B, the user says what he or she wants the system to do, effectively saying, "I want YOU to call John Smith for me." The first response is a desire, the second is a command. To more effectively allow users to know they are in control, option B is the better choice, because the system puts itself into an assistant role by asking how it can help and eliciting commands from the user. A conversational VUI should be designed to produce a response that is natural, appropriate, and within the list of understood words. The prompts should be appropriate for the given market, language, and culture targeted by the conversational VUI.

  7. Acoustically Appropriate Commands
    Commands must also be acoustically distinguishable. Generally, consonants are much more difficult for speech recognizers to distinguish than vowel patterns. For example, the words gnat and mat are much harder to properly distinguish than pot and pat. An intelligent conversational VUI system will use commands that consider the importance of acoustics. Putting this into practice, the commands "replay" and "erase" are more effective than "repeat" and "delete."

  8. Provides Intelligent Feedback
    An intelligent conversational VUI system will help to increase interpretation quality by providing feedback to the user. For example, if a user says, "Call John Smith," and the system thinks the user said, "Delete John Smith," a system without any feedback would perform the unintended function, unknown to the user. On the flip side, feedback is not always required. For example, if the "Call John Smith" command was misunderstood as "Tell me the time," and the system responded with "The time is now 4:30 pm," the risk of something bad occurring is relatively low. So a proper feedback system will report what it is doing and give the user an option to cancel based on the risk involved. For example, "Deleting John Smith. To cancel, say 'cancel' " would be appropriate feedback. It would be inappropriate for the system to respond, "Now telling the time, to cancel, say 'cancel.' It is now 4:30 pm."

  9. User Cancellation Control
    A good conversational VUI will provide three levels of cancellation: feedback, confirmation, and undo. Certain low-risk commands will not offer any cancellation control, such as asking the time. In these situations, the risk of telling the time is so minimal that adding a cancellation control will actually diminish the user experience. With the feedback level, the conversational VUI echoes back what it is doing, allowing the user to say, "cancel," thus aborting the action. The confirmation level will ask for user confirmation during higher-risk actions, such as deleting information. The undo level is a failsafe that allows a user to reverse any command just executed.

Registered Users
Enjoy exclusive access to free On-Line Education and receive the biweekly IEC newsletter.

IEC Newsletter
Get the latest industry information including critical insights from key industry leaders, technology briefings, and an Analyst Corner.
Current
Subscribe

Newsroom

IEC Corporate Member

Advertising Kit