Monday, April 28, 2008

Georgia Tech Gesture Toolkit: Supporting Experiments in Gesture Recognition

[Summary]


This paper introduces a HMM-based gesture recognition library, called Georgia Tech Gesture Toolkit, which leverages Cambridge University's speech recognitiontoolkit, HTK. It abstracts the lower level details of the HMM process and allows users to focus instead on high level gesture recognition concepts. Georgia Tech Gesture Toolkit provides users with tools for preparation, training, validation and recognition. It also provides tools allowing novice users to automatically generate models with different topologies.


All the data put into this library must be annotated by the user, and each gesture is modeled using a separate HMM. In addition, GT2K accepts a rule-based or stochastic grammar to make use of knowledge about the structure of data. It provides two kinds of traning/validation techniques: cross-validation and leave-one-out validation.


This paper also shows four applications of GT2K. The first one is "Gesture Panel", which provides a gesture recognition in automobiles to let users control a radio. It employs a black and white camera and a grid of 72 infrared lights. It has an recognition accuracy of 99.20%. The second is for blink pattern recognition, named "Prescott". It aims to use "blinkprint" as a way to identify people in a restricted area. The next system, "TeleSign", is a sign language recognition system for mobile environments. It achieved an accuracy of 90.48%. The fourth application is recognizing human activities, such as sawing, hammering, drilling, etc, in a workshop. It achieved an accuracy of 93.33%.


[Discussion]


I welcome this kind of libraries since it hides low-level details of HMM designs, which makes it easy to build prototype systems. And I also like the idea of specifying grammars to define data structures, thus helping recognize data streams. However, since it uses HMMs, GT2K may suffer the same problems as HMM does. For example, it may not be able to deal with data set having a large number of categories, and may have low accuracy on un-segmented data.

0 comments: