KVoiceControl- User's Guide
Introduction
KVoiceControl is a tool that gives you voice control over your unix commands.
It uses a template matching based speaker dependent isolated word
recognition system with non-linear time normalization (DTW).
Note, that isolated word does not necessarily mean one single
word. It just describes a class of recognition systems among
-
isolated word
-
connected words
-
continuous speech
-
spontaneous speech
Consider the following example:
KVoiceControl knows five utterances (being connected to appropriate
commands ...)
-
<connect to internet>
-
<netscape communicator>
-
<one xterm please>
-
<launch emacs NOT VI!>
-
<how are you today?>
So the recognition vocabulary consists of five "words" (=utterances) that
can only be recognized one at a time, i.e
you cannot connect the words to build "sentences" like "<how
are you today?> please launch <netscape communicator> and <connect
to internet>"!
Important Note:
As mentioned above you do not need to use one single word as an utterance
- and it is strongly recommended to use longer sequences !!!!! This
is due to the base concept of this recognition system: template matching.
So if you used short commands like say <xterm> and <xedit>
confusability would be increased significantly and therefore recognition
accuracy drops rapidly!!!
Basics
KVoiceControl uses a speakermodel that contains sample utterances
for the recognition process.
These speakermodels can be loaded/saved, so one can create different
speakermodels for different speakers or even have different models for
the same speaker.
A speakermodel contains references that consist of the following elements
(*):
-
The reference's name (that means normally you would enter here what is
being said)
-
The command to execute (when KVoiceControl has recognized this reference)
-
Sample Utterances for this reference
All The references are listed within the ListBox of KVoiceControl's main
GUI.
Edit References
The Buttons to the right of the ListBox can be used to edit the references.
-
New create a new reference (untitled)
and add it to the ListBox
-
Delete delete a reference from the list
-
Edit ... invokes the reference editor dialog....
Within the reference editor one can adjust the stuff listed at (*).
Text contains the name of the reference, Command contains
the command(-sequence) to be executed.
Note: There are several special commands that can be used to
control KVoiceControl itself:
-
detectModeOff
can be used to switch off recognition mode
-
openFile=<filename.spk>
replaces the currently loaded speakermodel by the speakermodel stored in
filename.spk
-
appendFile=<filename.spk> add the speakermodel
stored in filename.spk to the currently loaded speakermodel
-
openDir=<dirname>
replaces the currently loaded speakermodel by the speakermodel stored in
<dirname>/index.spk
-
appendDir=<dirname>
add the speakermodel stored in <dirname>/index.spk to the currently
loaded speakermodel
-
@<command>
prepending a command with an '@' tells kvoicecontrol to execute the command
while recognition
mode is switched off, after that it switches recognition mode back on!
That way you can realize
calls to speech synthesis software (speech feedback) on non-duplex sound
cards (or non-duplex
sound drivers!).
A command can consists of a sequence of commands. Use a semicolon
to separate the commands!
Example: Say we have a programm tell that takes an ASCII string
and hands it to a speech synthesis software. Then we can have a sequence
like:
tell "Yes Master, of course I can start netscape for you";netscape
The commands openFile=, appendFile=, openDir=,
appendDir=
can be used to create a hierarchical structure off utterances (a simple
grammar if you want)
For instance a tree structure like [[tell-me [time | date]] | sound
[ cd-player [ stop | start | play | eject]]] and so forth....
The ListBox below contains the sample utterances for this reference.
You should enter between two and five (or even more) sample utterances
per reference in order to ensure good recognition performance! (The
more the better, but the more machine power is needed!)
You can use the button Enable Autorecording ... to switch to
auto speech signal detection mode, i.e. in this mode KVoiceControl automatically
detects signals coming from the soundcard (the button's background color
is red while kvoicecontrol is listening). Thus your sample utterance
is being recorded automagically! -> Just speak!
BTW: A pre- and a postfetch sound buffer ensure that the signal is
not cut off dramatically, so automatic recording is working fine!
Recorded utterances are always being replayed to your soundcard,
so you can check whether the recording was OK! After recording (and preprocessing)
the utterance is being added to the listbox using actual system
date and time as the entry name. Broken utterances can be deleted using
the Delete button.
After one utterance has been recorded, preprocessed and added to the
utterance list, KVoiceControl does not automatically switch to signal
detection mode. You have to use the Enable Autorecording ... button
explicitely to do so!
If you entered automatic detection mode but decide not to enter a speech
token simply use the same button (now labelled Disable Autorecording
...) to switch off the automatic detection mode.
Recognition Mode
Select Detect Voice from the Options menu to let KVoiceControl
enter "action mode". Having Detect Voice selected, KVoiceControl again
automatically detects sound signals, records what you say, "pattern matches"
this utterance to all references and executes the command of the "best
fitting" reference.
An utterance is accept if:
-
its score is below a specified threshold and
-
the first and second best score belong to the same utterance or
the distance between first and second score is higher than a given
threshold
Options
You can adjust the following options within KVoiceControl:
-
Recording Threshold
This is the minimum integer value, that triggers the automatic sound
recording process.
A value around 10 works fine for me (This value can be adjusted
automatically now using the Calibration functionality!).
-
Accepted Silence Frames
Here one can specify how many silence frames (1 frame = 0.125
sec) shall be accepted during recording
without stopping. This is needed to be able to use multi-word-utterances
that
contain silence frames.
My system accepts 4 frames
-
Adjustment Window Width
This value is used within the pattern-matching process. Roughly
speaking, the bigger the value, the better the recognition but the more
calculation power needed. For more details: daniel@kiecza.de
(I use a value around 70)
-
Rejection Threshold
The score of the best matching reference utterance must not be bigger
than this threshold; otherwise nothing is being recognized (15.0 suits
for me)
-
Minimum Distance Between Different Hypos
A reference is accepted as recognition result, when the two best
scored utterances belong to that reference (and are below the rejection
threshold) or when only the best utterance belongs to that reference but
not the second best and the score distance between these two utterances
is bigger than the value specified here. (I use 3.5 here)
Train References From File
To train several references comfortably select Train From File ...
from the Options menu.
In the following file dialog you can choose a .txt file that has to
contain per line:
<Reference Name>TAB<Associated Unix Command>
(see commands.txt for an example)
After this file selection a Reference Trainer dialog pops up. The use
of this dialog should be clear ...
Remind: Sample Utterance recording is done automatically, too!
Calibrate Microphone
KVoiceControl uses a calibration dialog to adjust your microphone's levels
(start level and stop level).
For this purpose choose Calibrate Micro ... from the Options
menu.
You are then asked to start a mixer program (like kmix) that is needed
to adjust the soundcard's microphone in level.
The next dialog then shows the actual level value coming coming from
MIC IN. You must adjust the mic level in the mixer so that this value
is stable at zero!
Pushing OK leads you to the calibration of the start recording level.
You are asked to talk to your microphone for some seconds. KVoiceControl
then extracts a proper recording level automatically.
If the level values seem plausible calibration is done. Else KVoiceControl
restarts the calibration process.
Panel Docking
KVoiceControl is docking onto the panel, showing two LED lamps. The functionality
is as follows:
The upper (yellow) LED is on, when
KVoiceControl is in voice autodetection mode. When you start speaking
and as long as you speak, this LED
will blink. After the utterance is finished the LED switches off and
the lower LED blinks red - meaning: recognition
in progress. If the recognition is successful this LED will switch
to green, otherwise deactivates ..... after
recognition is done, KVoiceControl automatically switches back to voice
autodetection mode.
Mouse control on panel:
-
left click: toggle the state of the main window between hidden
and
on-screen
-
right click: toggle voice autodetection mode
Last changed: 29. Jan 1998
Daniel Kiecza