Thursday, October 23, 2008

Distinguishing Text from Graphics in On-line Handwritten Ink

Christopher M. Bishop, Markus Svensén, and Geoffrey E. Hinton

Summary

Bishop et al. present an algorithm for determining if a stroke in a sketch is text or graphics. The algorithm uses 9 features based on the stroke itself, a total least squares (TLS) fit for the stroke, and fragments of the stroke defined by local maxima in curvature. To classify on these features, a multilayer perceptron (MLP) is trained. The MLP returns a probability that given a feature vector for the stroke, made up of the 9 features, that the stroke is text. Spatial and temporal context of successive strokes is used to help further in the classification. A Hidden Markov Model is used to combine the probabilities of the feature-based approach with the probabilities of the context approach. An additional approach adding to the HMM approach the use of the gap between strokes as a characteristic for classification.

The conducted evaluation found the non-gap HMM preformed better than the gap HMM. The feature-base approach performed best on text, but worst on graphics. All approaches struggled with graphics.

Discussion

This work is interesting in that it ties feature-based classification with context-based. While the results obtained are not great, they do show some improvement over only feature-based when dealing with shape. It seems that text has distinctive features, while shape has distinctive contexts. Perhaps investigating other forms of context (e.g. the size of nearby strokes relative to the size of the entire sketch) can help better classify shapes.

No comments: