KVoiceControl- User's Guide


Introduction

KVoiceControl is a tool that gives you voice control over your unix commands. It uses a template matching based speaker dependent isolated word recognition system with non-linear time normalization (DTW).

Note, that isolated word does not necessarily mean one single word. It just describes a class of recognition systems among

Consider the following example:

KVoiceControl knows five utterances (being connected to appropriate commands ...)

So the recognition vocabulary consists of five "words" (=utterances) that can only be recognized one at a time, i.e
you cannot connect the words to build "sentences" like "<how are you today?> please launch <netscape communicator> and <connect to internet>"!

Important Note:

As mentioned above you do not need to use one single word as an utterance - and it is strongly recommended to use longer sequences !!!!! This is due to the base concept of this recognition system: template matching. So if you used short commands like say <xterm> and <xedit> confusability would be increased significantly and therefore recognition accuracy drops rapidly!!!


Basics

KVoiceControl uses a speakermodel that contains sample utterances for the recognition process.
These speakermodels can be loaded/saved, so one can create different speakermodels for different speakers or even have different models for the same speaker.

A speakermodel contains references that consist of the following elements (*):

All The references are listed within the ListBox of KVoiceControl's main GUI.


Edit References

The Buttons to the right of the ListBox can be used to edit the references. Within the reference editor one can adjust the stuff listed at (*).
Text contains the name of the reference, Command contains the command(-sequence) to be executed.

Note: There are several special commands that can be used to control KVoiceControl itself:

A command can consists of a sequence of commands. Use a semicolon to separate the commands!
Example: Say we have a programm tell that takes an ASCII string and hands it to a speech synthesis software. Then we can have a sequence like:
tell "Yes Master, of course I can start netscape for you";netscape

The commands openFile=, appendFile=, openDir=, appendDir= can be used to create a hierarchical structure off utterances (a simple grammar if you want)
For instance a tree structure like [[tell-me [time | date]] | sound [ cd-player [ stop | start | play | eject]]] and so forth....

The ListBox below contains the sample utterances for this reference.
You should enter between two and five (or even more) sample utterances per reference in order to ensure good recognition performance! (The more the better, but the more machine power is needed!)

You can use the button Enable Autorecording ... to switch to auto speech signal detection mode, i.e. in this mode KVoiceControl automatically detects signals coming from the soundcard (the button's background color is red while kvoicecontrol is listening). Thus your sample utterance is being recorded automagically! -> Just speak!
BTW: A pre- and a postfetch sound buffer ensure that the signal is not cut off dramatically, so automatic recording is working fine!
Recorded utterances are always being replayed to your soundcard, so you can check whether the recording was OK! After recording (and preprocessing) the utterance is being added to the listbox using actual system date and time as the entry name. Broken utterances can be deleted using the Delete button.

After one utterance has been recorded, preprocessed and added to the utterance list, KVoiceControl does not automatically switch to signal detection mode. You have to use the Enable Autorecording ... button explicitely to do so!

If you entered automatic detection mode but decide not to enter a speech token simply use the same button (now labelled Disable Autorecording ...) to switch off the automatic detection mode.


Recognition Mode

Select Detect Voice from the Options menu to let KVoiceControl enter "action mode". Having Detect Voice selected, KVoiceControl again automatically detects sound signals, records what you say, "pattern matches" this utterance to all references and executes the command of the "best fitting" reference.
An utterance is accept if:

Options

You can adjust the following options within KVoiceControl:

Train References From File

To train several references comfortably select Train From File ... from the Options menu.
In the following file dialog you can choose a .txt file that has to contain per line:
<Reference Name>TAB<Associated Unix Command>
(see commands.txt for an example)
After this file selection a Reference Trainer dialog pops up. The use of this dialog should be clear ...
Remind: Sample Utterance recording is done automatically, too!


Calibrate Microphone

KVoiceControl uses a calibration dialog to adjust your microphone's levels (start level and stop level).
For this purpose choose Calibrate Micro ... from the Options menu.
You are then asked to start a mixer program (like kmix) that is needed to adjust the soundcard's microphone in level.
The next dialog then shows the actual level value coming coming from MIC IN. You must adjust the mic level in the mixer so that this value
is stable at zero!
Pushing OK leads you to the calibration of the start recording level. You are asked to talk to your microphone for some seconds. KVoiceControl then extracts a proper recording level automatically.
If the level values seem plausible calibration is done. Else KVoiceControl restarts the calibration process.


Panel Docking

KVoiceControl is docking onto the panel, showing two LED lamps. The functionality is as follows:
The upper (yellow) LED is on, when KVoiceControl is in voice autodetection mode. When you start speaking and as long as you speak, this LED
will blink. After the utterance is finished the LED switches off and the lower LED blinks red - meaning: recognition in progress. If the recognition is successful this LED will switch to green, otherwise deactivates ..... after recognition is done, KVoiceControl automatically switches back to voice autodetection mode.
 

Mouse control on panel:



Last changed: 29. Jan 1998
Daniel Kiecza