Thursday, October 23, 2008

Ink Features for Diagram Recognition

Rachel Patel, Beryl Plimmer, John Grundy, and Ross Ihaka

Summary

Presented in this paper is an algorithm for determining whether a stroke in a sketch is either text or shape. The authors begin with a set of 46 candidate features for discerning whether strokes are text or shape. Using an rpart function to find the dividing value for determining text vs shape for each feature based on error rates, the authors selected the most significant features by choosing the features who rpart value had the least number of missclassifications. A binary classification tree is then used where at each node the rpart value for a feature is used to make a decision on which branch to follow. The tree starts with the most significant feature and moves down to the least significant.

The authors compared their divider with one from Microsoft and one from InkKit. On the training data they used to calculate the rpart functions, their approach significantly outperformed the others on shape recognition (around 10% misclassification). When using a new data set, their approach did not perform as well. The Microsoft divider always had the lowest error rate for text, and the highest error rate for shape.

Discussion

This work is interesting. Trying to find significant features to differentiate text from shape is an important research area in sketch recognition. The problem I see with their approach (and this can be seen in their results) is that it will only find significant features for a training set. Sketching is applicable in so many domains and what can be sketched and how it is sketched are infinite. Finding a large and varied enough training set for this approach seems intractable.

No comments: