Comments
Ben's blogSummary
Rubine introduces a new single stroke gesture recognition algorithm based on statistical pattern recognition and a toolkit called GRANDMA for adding gestural recognition to an interactive application. The work is introduced by example in a gesture-based drawing program called GDP that uses GRANDMA.GRANDMA uses a structure similar to the Model/View/Controller methodology, where controllers are the input gestures and views are view classes that represent visual objects on the screen that allow gesture interaction. GRANDMA contains a gesture designer, which allows for a developer to design new gestures and assign them to view classes.
The gesture recognition algorithm breaks a single stroke gesture down into 13 features. These features include the sine and cosine of the initial angle, the length and angle of the bounding box diagonal, distance between the first and last point, the sine and cosine of the angle between the first and last point, the total gesture length, and the total angle transversed. The algorithm then classifies the gesture by calculating weighted vector for each recognizable gesture based on sum of weights assigned to each feature. The gesture with max weighted vector is the recognized gesture.
In order to determine the weights for the features, a classic linear discriminator is used to train the feature set. The weights are based on an inverted estimate of the common covariance matrix.
In order to deal with ambiguous gestures, the author calculates the probability the gesture was recognized correctly, and if that probability falls below 0.95, the gesture is ignored.
Discussion
This work presents a new algorithm for recognizing gestures by example and a toolkit that allows for easier addition of gesture recognition in other applications. Prior gesture recognition systems used hand-coded recognizers. With Rubine's algorithm, new gestures are recognized by providing it with a variety of examples on how the gesture can be drawn. No hand-coding is needed.One of my biggest faults with this work is the rejection approach. The authors say that rejection should not occur when an application supports quick undo. As a user, I would hate having to undo every time the recognizer failed. I would much rather it ask me to input again, than to do something I did not want. As well, having to go execute undo takes me away from my current task, disrupting my flow.
If working further on this algorithm, I would design towards recognizing gestures of more than one stroke. The algorithm is already using the geometry of a single stoke, why not try and do this for multiple strokes? Difficult, definitely, but potentially worth the effort.

4 comments:
I think one of the reasons why undo is a good option is this:- If we compare two systems using the same classifying algorithm - one with undo, and one with reject. The No. of times a user has to UNDO is less than the number of times the gesture will be discarded by the other system. Consider this - If a gesture g is classified as C1 with a 0.90 prob and C2 with 0.10 prob. Suppose the user draws the same g 10 times. The rejection method will reject it 10 out of 10 times, or as you suggested, will give the user a task of selecting among the probable ones 10 out of 10 times. With the UNDO option, there is a good chance that the user might have to press UNDO just once.
An unobtrusive way to presenting the alternative classes after going with the most probable one, along with undo, might suffice as well.
The author avoids the complexity of multiple strokes to immediately start the gesture recognition process once a stroke is complete. With multiple stroke gesture the problem of when to recognize the gesture arises. But yes definitely worth a thought.
I understand your point, Akshay, that lessening the rejection threshold when undo is available allows for more gestures to be recognized, but I believe this is ignoring the problem. It's abusing the undo/redo system, whose purpose is to undo/redo user actions not mistakenly recognized gestures.
The problem should be addressed, either by asking the user to disambiguate, or some other solution. I just think the author shouldn't have proposed this as a solution, when it's really a hack to get better recognition rates at the expense of the user. Sorry, being a human-centered computing person these type things rub me the wrong way.
Post a Comment