Thursday, October 23, 2008

Distinguishing Text from Graphics in On-line Handwritten Ink

Christopher M. Bishop, Markus Svensén, and Geoffrey E. Hinton

Summary

Bishop et al. present an algorithm for determining if a stroke in a sketch is text or graphics. The algorithm uses 9 features based on the stroke itself, a total least squares (TLS) fit for the stroke, and fragments of the stroke defined by local maxima in curvature. To classify on these features, a multilayer perceptron (MLP) is trained. The MLP returns a probability that given a feature vector for the stroke, made up of the 9 features, that the stroke is text. Spatial and temporal context of successive strokes is used to help further in the classification. A Hidden Markov Model is used to combine the probabilities of the feature-based approach with the probabilities of the context approach. An additional approach adding to the HMM approach the use of the gap between strokes as a characteristic for classification.

The conducted evaluation found the non-gap HMM preformed better than the gap HMM. The feature-base approach performed best on text, but worst on graphics. All approaches struggled with graphics.

Discussion

This work is interesting in that it ties feature-based classification with context-based. While the results obtained are not great, they do show some improvement over only feature-based when dealing with shape. It seems that text has distinctive features, while shape has distinctive contexts. Perhaps investigating other forms of context (e.g. the size of nearby strokes relative to the size of the entire sketch) can help better classify shapes.

Ink Features for Diagram Recognition

Rachel Patel, Beryl Plimmer, John Grundy, and Ross Ihaka

Summary

Presented in this paper is an algorithm for determining whether a stroke in a sketch is either text or shape. The authors begin with a set of 46 candidate features for discerning whether strokes are text or shape. Using an rpart function to find the dividing value for determining text vs shape for each feature based on error rates, the authors selected the most significant features by choosing the features who rpart value had the least number of missclassifications. A binary classification tree is then used where at each node the rpart value for a feature is used to make a decision on which branch to follow. The tree starts with the most significant feature and moves down to the least significant.

The authors compared their divider with one from Microsoft and one from InkKit. On the training data they used to calculate the rpart functions, their approach significantly outperformed the others on shape recognition (around 10% misclassification). When using a new data set, their approach did not perform as well. The Microsoft divider always had the lowest error rate for text, and the highest error rate for shape.

Discussion

This work is interesting. Trying to find significant features to differentiate text from shape is an important research area in sketch recognition. The problem I see with their approach (and this can be seen in their results) is that it will only find significant features for a training set. Sketching is applicable in so many domains and what can be sketched and how it is sketched are infinite. Finding a large and varied enough training set for this approach seems intractable.

Tuesday, October 14, 2008

MathBrush: A Case Study for Pen-based Interactive Mathematics

George Labahn, Edward Lank, Mirette Marzouk, Andrea Bunt, Scott MacLean, & David Tausky

Summary

Presented in this paper is an evaluation of MathBrush, a pen-math system with tight integration with a computer algebra system (CAS). MathBrush allows users to draw mathematical equations using a pen. Then, using sketch recognition the system attempts to recognize what the user has drawn and convert that to a computer-rendered and -processable representation. A contextual interactive editing affordance allows users to correct errors in recognition. A pop-up menu allows them to execute commands on the recognized expressions.

The evaluation consisted of thinkalouds and semi-structured interviews. Participants were asked to enter and manipulate several mathematical equations using MathBrush. The study showed that participants had no problems entering equations. Several issues arose from recognition, including problems with participants leaving erraneous ink. Participants were able to correct recognition problems, but not without some difficulty related to having multiple representations of the same equation. The contextual pop-up menu allowed participants to easily access mathematical commands, and proved to be one of the stronger aspects of the system. The authors noted that participants adjusted their drawing style to deal with recognition issues.

Discussion

The authors conducted a case study of a pen-math system called MathBrush, and found that participants were able to "effectively" use the system with a few problems with recognition and interaction. The pop-up menu and CAS portion of the system seemed to function the best.

I am not convinced that the participants were able to "effectively" use the system. The difficulties in recognition of sketches and understanding the affordances for correction and editing seem to hinder "effective" use. No qualitative results were presented about the participants' experiences using MathBrush. I think that is one missing piece in this qualitative evaluation.

The user's adapation to the systems shortcomings is an interesting research topic. I would be curious how easily users are able to adapt and what effects it has on cognition. I would research this further.

Renegade Gaming: Practices Surrounding Social Use of the Nintendo DS Handheld Gaming System

Christine Szentgyorgyi, Michael Terry, & Edward Lank

Summary

The authors present a qualitative study of the social uses of the Nintendo DS. The handheld gaming system supports ad-hoc wireless networking for multiplayer gaming.

In the first part of the study, 9 participants were interviewed. The interviews were semi-structured with a number of research questions related to how and why people engage in multiplayer gaming using the Nintendo DS. The results of the interviews discovered what places and times are acceptable and not acceptable for gaming with the DS. The concept of renegade gaming arose from the interviews. Renegade gaming consists of playing multiplayer games in "a subcontext within a larger host context." Renegade gaming was found to be acceptable as long as it didn't physically disturb others and it is socially acceptable to do so in the given host context. The interviews pinpointed issues with DS and being able to start pick-up games with strangers.

In the second part of the study, 3 gaming events were observed. The events ranged in size and the types of gaming. One event was DS only. Another had both DS and console gaming. The last was a competitive gaming event with consoles only. The authors noted that large screen displays used with console gaming events allowed non-participants to observe the gaming; where as, with the DS, observers cannot easily watch the gaming action.

The authors proposed a number of design implicaitons to address the problems of multiplayer gaming on the Nintendo DS.

Discussion

The authors have evaluated the social uses of the Nintendo DS gaming system. The study is only the start to larger understanding of gaming practices with mobile handheld systems. Understanding the social contexts for this type gaming can help game and gaming system designers to create gaming experiences that better facilitate these social contexts.

It's interesting to me the use of a basketball pick-up game as a comparison to a Nintendo DS mulitplayer pick-up game. The comparison does an excellent job of explaining the problems associated with starting a pick-up game with strangers on the Nintendo DS.

Future work in this research would be to continue with the evluation but with a greater number of participants to obtain a wider and more accurate depiction of the gaming community. As well, quantitative results can never hurt. However, obtaining quantiative results for this context seems non-trivial and an interesting area of study.

Wednesday, October 8, 2008

Sketch-Based Educational Games: "Drawing" Kids Away form Traditional Interfaces

Brandon Paulson, Brian Eoff, Aaron Wolin, Joshua Johnston, and Tracy Hammond

Summary

The authors present a number of sketch-based education games. The goal is to improve children's ability to learn by providing them with kinesthetic and tactile learning environments. The games are:
  • APPLES - an animated planetary physics simulation that lets children experiment with how gravity, motion, and collisions effect planets.
  • Simon Says "Sketch!" - a memory game for examining how stimulus change recall is affected if participants are able to specify game piece positions.
  • Go (Sketch-a) Fish - a sketch-based memory card game.
  • Sketch-based Geography Tools - initial setup is to help children learn each U.S. state by having them find and mark states on a map. Future expansion of this system is possible.
  • Learn Your Shapes! - a tool to teach children shapes by allowing them to sketch them.
  • Sentence Diagramming - a tool to help children in understanding sentence structure by providing feedback when diagramming the parts of a sentence.
A preliminary evaluation has been done using graduate students. A formal evaluation for the work is pending IRB approval.

Discussion

There are obvious advantages to helping children learn through kinesthetic and tactile methods. It will be interesting to see the final results of the evaluation following IRB approval, which while I've never had to get IRB approval for participants under the age of 18, I know is not something they give out easily.

Recognizing Free-form Hand-sketched Constraint Network Diagrams By Combining Geometry and Context

Tracy Hammond and Barry O'Sullivan

Summary

Presented in this paper is a sketch recognition system for diagramming constraint networks. The system uses the LADDER sketch language along with GUILD to generate a user interface from LADDER descriptions. The geometric constraints used in LADDER to define the recognized shapes in the system are described. As well as geometric constraints, contextual rules are used to help disambiguate similar shapes. An informal evaluation was done using graduate students familiar with sketch recognition to show that the system is able to recognize the desired shapes.

Discussion

This work introduces sketch recognition to a new domain, constraint networks. Future work for this research includes a formal evaluation not only of the effectiveness of the system, but also, an analysis of how sketching constraint networks on computer is improvement over existing methods for creating constraint networks.

Constraint networks contain both shapes and variable letters. The authors make an interesting point of using a handwritting recognizer for the letters. Finding a method to integrate a handwritting recognizer with the geometric recognizer seems a plausible and worthwhile approach to addressing recognition issues that arise when dealing with shape versus text.

Sunday, October 5, 2008

Ambiguous Intentions: a Paper-like Interface for Creative Design

Mark D. Gross and Ellen Yi-Luen Do

Comments

Daniel's blog

Summary

The authors present Electronic Cocktail Napkin, a sketch recognition system that focuses on diagramming. The system is designed to support abstraction, ambiguity, and imprecision, all qualities involved in free-hand sketching. In support of abstraction, the system allows the user to specify configurations which are groups of primitive elements that have unique relationships with each other for a specific domain.

In recognition, the system first tries to recognize low-level glyphs. These glyphs are recognized using a number of features (pen path, aspect ratio and bounding box size, stroke and corner count). The features are compared against templates for each recognizable glyph.

After recognizing low-level glyphs the system tries to recognize configurations by checking for patterns of primitive glyphs that match one of the user-defined configurations. As well, some ambiguous low-level glyphs that were not recognized are now recognized through association with configurations.

The system supports contextual recognition by identifying glyphs that are unique to a specific context. When a user draws one of these glyphs the system is able to determine the domain context for the sketch.

The authors conducted a number of evaluations to see if pursuing sketch recognition was useful in the scope of diagramming. Once they determined it was and began implementing the Cocktail Napkin, they ran user studies to help focus the work during development.

Discussion

I like how their use of abstraction, ambiguity, and imprecision. It shows a real evaluation of sketching before jumping into recognition approaches. It looks beneficial to allow things to remain ambiguous until contextual information can assist. However, wrong contextual information can cause problems. An amount of certainty is needed before determining context, and a method for the user to correct mistaken context. I would go in line with the authors' proposed future work to extend this approach to support free-form sketching. Perhaps a solution is combining this technique for diagramming with one better suited for free-form sketching, rather than trying to implement free-form using this approach.

LADDER, a sketching language for user interface developers

Tracy Hammond and Randall Davis

Comments

Nabeel's blog

Summary

The authors present a sketching language for describing recognizable shapes and sketches to be used by persons not in the sketch recognition field. The goal is to allow people such as user interface designers and developers and experts in a specific domain to easily build systems that use sketch recognition interfaces without having to build a complex sketch recognition system from the ground up.

The described language, LADDER, defines recognized shapes using five parts: components, geometric constraints, aliases, editing behaviors, and display methods. Components are the elements that make up a shape. Constraints define the relationships between the components. Aliases specify names for components that allow for easier readability and construction in the language. Editing behaviors describe how a shape can be modified. Display methods specify how a shape is visually shown. Shapes can be defined in a hierarchy. As well, abstract shapes can be defined to describe general features shared by multiple shapes to prevent redundant declarations.

Constraints for a shape can either be specified with rotationally invariant constraints or to a specific orientation. A shape can be labeled as isRotatable when constraints are described with respect to orientation, but any orientation is acceptable.

Editing behaviors are broken down into actions (how a shape is modified) and triggers (how this action is executed). Display methods allow the shape to be drawn in a number of different ways including unchanged or cleaned-up.

The recognition system first tries to recognition primitive shapes. If only one possible primitive shape can be assigned to a drawn shape, it is passed on to the next stage; otherwise, all potential classifications for the drawn shape are passed on. The next stage recognizes domain specific shapes using a rule-based system. Each recognizable shape has a rule. The domain recognizer searches for every possible combination of primitive shapes that can satisfy a rule.

Code for domain shape recognizers, display methods, and editing behaviors are generated in a translation stage run prior to run-time.

Discussion

The authors have contributed a sketching language for non-experts to build systems that require sketch recognition. This allows experts of a domain design sketching systems applicable to their work. This minimizes the workload on the sketch recognition community to learn domain specific knowledge for building domain specific sketch recognition systems.

One thing I'm interested about is if the language is extensible. By extensible, I mean users can define constraints, actions, triggers, etc. beyond the predefined ones. This seems particularly applicable with constraints where they involve mathematical equations. The equation for the constraint could be stated in the language, and no new programming would be involved for the end user (just more use of the language).

Backpropagation Applied to Handwritten Zip Code Recognition

Y. LeCun, B. Boser, J.S. Denker, D. Henderson, R.E. Howard, W. Hubbard, L.D. Jackel

Comments

Yuxiang's blog

Summary

The authors describe an approach to recognizing handwritten zip codes using neural networks. Their approach uses a three layer neural network trained using backpropagation. The network takes a 16 x 16 normalized image of a single digit as input and outputs 10 units representing the 10 different digits. Feature detection on the input is done through weight sharing, which reduces the number of free parameters in the network and can express information about the geometry and topology of the task. Training and testing data for the neural network were provided by the U.S. Postal Service.

Discussion

My understanding of neural networks is limited to the small amount I remember from undergrad artificial intelligence. However, the task they have in mind with the large dataset available to them seems well suited for using neural networks. This points out one of the drawbacks of using neural networks in sketch recognition in that extensive training is needed. For handwriting recognition, where available traning data is enormous, neural network seem applicable, but perhaps not for free-form sketching.

What!?! No Rubine Features?: Using Geometric-based Features to Produce Normalized Confidence Values for Sketch Recognition

Brand Paulson, Pankaj Rajan, Pedro Davalos, Ricardo Gutierez-Osuna, and Tracy Hammond

Comments

Daniel's blog

Summary

The authors sought to investigate what gesture- and geometric-based features are best for recognizing sketches. The geometric features examined came from PaleoSketch; while, the gesture features came from Rubine. A quadratic classifier was used to classify strokes based on the features.

The evaluation used 1800 examples, where 900 where for training and subset selection, and the other 900 were for testing. A greedy, sequential forward selection technique was used to determine feature subsets. The optimal features subset contained 15 features, only one of which came from Rubine.The optimal feature subset reported accuracies similar, but slightly smaller than the original PaleoSketch.

Discussion

This work is beneficial to research investigating hybrid approaches for recognizing sketches using both geometric and gesture based features. One possible issue with this work, is that the training and test examples looked more suited to geometric based approaches than gesture based. Adding more gestural based examles such asalphabet characters to the set of classes may see improved performance in Rubine feature, or may not. An investigation of this would be one direction of future work.

Envisioning Sketch Recognition: A Local Feature Based Approach to Recognizing Informal Sketches

Michael Oltmans

Summary

Oltmans presents a vision-based approach to sketch recognition. Oltmans' approach uses visual parts of a shape to classify. Using a polar grid structure resembling a bullseye, feature vectors are calculated for parts of a stroke based on the ink density for each cell in the bullseye. A codebook is created which contains a standard vocabulary of parts. Input parts are compared with codebook parts, and a match vector is calcuated based on the differences. The match vectors for a training set are used to train a classifier, which can then classify an input stroke based on the match vectors of its respective parts.

Discussion

This is an intersting vision-based sketch recogntion algorithm. It is simple in design, and has potential in both on-line and off-line recogntion of sketches. One of the biggest advantages I see with vision-based techniques is that over-tracing is not an issue. However, I can see an advantage in being able to recognize when a sketch is over-traced. It emphasizes a statement of importance from the user. Perhaps this is useful from contextual standpoint, and can help in recognition sketch when the over-traced part is only a part of the whole sketch.

Constallation Models for Sketch Recognition

D. Sharon and M. van de Panne

Summary

The authors introduce a vision-based technique for sketch recognition that is based on invidiual features such as shape and size of a stroke and pair-wise features such as distances to other parts of a known sketch. The model determines a four-element feature vector for individual parts a four-element feature vector for paired parts. The matching of a label to a stroke is given a likelihood value based on a energy function derived from the differences in individual and pair-wise feature vectors between the label and the stroke. The authors run multiple passes on a branch-and-bound search tree to find the maximum likelihood value to label a stroke.

The authors evaluated the speed of their approach with and without using doing multiple passes on different classes.

Discussion

I like this approach because it gives a large amount of freedom in how a sketch can be drawn. One downside of this approach is that while sketches can be drawn differently they still require the same amount of strokes. This seems like a huge limitation if one of your biggest benefits is freedom of form. My future work would be directed towards alleviating this issue.