andrew's sketch rec blog: 2008

Thursday, December 11, 2008

Multimodal Collaborative Handwriting Training for Visually-Impaired People

Beryl Plimmer, Andrew Crossan, Stephen A. Brewster, and Rachel Blagojevic

Summary

The authors present McSig a learning system for teaching visually-impaired students handwriting and how to sign their names. The system consists of a force-feedback device that restricts movement of a pen while learning. The force-feedback is gradually reduced as a student becomes more familiar with how a shape is drawn. A teacher is able convey how something is drawn by drawing it on a separate screen, and the force-feedback device replicates the drawing movement. They conducted an evaluation consisting of 8 visually-impaired students over the age of 10 and still in school. The evaluation highlighted design issues and provided indication that this system could possibly help the visually-impaired learn handwriting.

Discussion

The work is interesting. It's always great to see work applying technology to help make lives easire for a those in need, even when they are only a small subset of the population. The force-feedback echoing when the teacher draws a shape seems like a nice and helpful idea for teaching. I wonder if a device such as this could be used to improve motor skills of those you have injured their hands and need physical therapy.

Wednesday, December 10, 2008

Sketch Recogniton User Interfaces: Guidelines for Design and Development

Christine Alvarado

Summary

The author introduces SkRUIs, sketch recognition user interfaces, as new type of interface not addressed in previous literature. Prior work has focused on HCI for pen-based input or sketch recognition, but not the combination of the two. An example application for drawing diagrams for a power presentation is presented. The author states that traditional HCI evaluation techniques are not entirely suited for SkRUIs, and these techniques require modification to support SkRUIs.

Discussion

While it's nice to see research being done in this area, I'm not convinced by the work. Why can't Powerpoint support incorporation of sketch recognition for diagramming? Why must it be done in a separate application? Doing beautification on window switches doesn't seem like the best idea. What if I'm drawing, but I get an instant message in the middle and decide to check it. I may not be ready for beautification to occur. A SkRUI is still a GUI, just more specific. The interaction fundamentals of GUI design still apply.

Fluid Sketches: Continuous Recognition and Morphing of Simple Hand-Drawn Shapes

James Avro and Kevin Novins

Summary

The authors introduce a new form of visual feedback about sketch recognition. Feedback is provided as shapes are drawn. The approach beautifies portions of the current stroke to reflect the recognition systems understanding of what is being drawn by the user. The approach works for two shapes, circles and squares.

Discussion

The approach is novel and interesting. There are two main problems I find with this work. One, it only works with two shapes. Two, they don't evaluate the effects of the feedback on human attention. Does it disrupt drawing? Do users find themselves waiting for feedback before drawing too far into a stroke?

Sunday, November 9, 2008

Interactive Learning of Structural Shape Descriptions from Automatically Generated Near-miss Examples

Tracy Hammond and Randall Davis

Summary

Hammond and Davis present an approach to fix over- and under-constrained shape definitions for a shape description language. The authors developed the approach for the LADDER shape language. Their approach requires a positive hand-drawn example and shape description that will properly recognize the provided example. Over-constrained descriptions are checked by sequentially negating each constraint and constructing a near-miss shape that tests the constraint. Under-constrained descriptions are checked similarly, except that negated constraints are added instead of checking existing constraints.

Discussion

Having used LADDER, the ability to uncover under- and over-constrained descriptions is incredibly valuable. It's very easy in LADDER to under-constrain a description.

What Are Intelligence? And Why?

Randall Davis

Summary

Randall Davis presents "definitions" for intelligence and why intelligence exists. The definitions of intelligence come from the five areas of study looked at by artificial intelligence researchers: mathematics (logic), psychology, biology, statistics, and economics. Logicists view intelligence through formal calculations of logical rules. Psychologists view intelligence as human behavior. In biology, intelligence is viewed as a response-stimuli behavior based on the physiological architecture. Statistics provides a probability theory approach, and economics presents an approach based on utility theory.

Davis proceeds to explain possible reasons why intelligence exists through an exploration of how the human mind evolved. Fossil records show that the encephalization quotient (ratio of brain size to body size) of human ancestors began to increase over four million years ago. However, early man did not begin developing tools and language skills until 300,000 years ago. He points out a number of theories why this may be. He also notes that evolution is more of random search than a goal-oriented process, and that the products of evolution are often messy and multifaceted.

A number of examples of animal intelligence are presented. These examples serve as to point out difference in intelligence, while also showing similarity between animal intelligence and human intelligence. The goal being to find ways to uncover aspects of human intelligence by investigating simpler forms of intelligence in animals.

He concludes the paper with an exploration of the idea that we think by "reliving." He explains evidence for how we create concrete visual ideas in our mind. In order to answer questions of what would happen, we picture in our mind visually how something would play out to answer these questions.

Discussion

I enjoyed this reading. It contains a lot of interesting information about areas that I know only a small amount about. I found it very captivating to look at how different areas look at human intelligence, and to theorize how it came about through evolution-based analysis. My only questions are:

How can we apply these different views of intelligence?
How can we integrate the views?
Are certain AI tasks better suited for specific models of intelligence? In other words, is the one view or combination of views that would work best to address a specific focus topic within artificial intelligence?

The view of human intelligence formed from a messy layering of evolutionary forces is a nice take that makes sense to me. The complexity of human intelligence comes not only from how advanced it is, but also how complicated and inefficiently designed it is. It makes me think that looking at simpler animal intelligence is a good idea for building a basis for looking at human intelligence.

Magic Paper: Sketch-Understanding Research

Randall Davis

Summary

This paper presents an overview of the field of sketch understanding and more specifically a sketch understanding system called Magic Paper. It points out reasons for developing systems that understand free hand sketches. A number of problems with sketch understanding are described. Solutions from other areas such as speech recognition are pointed out and explained why they are not suitable for sketch recognition. The basics of sketch understanding are presented including sketch representation, finding primitives, and recognizing shapes. A number of sketch-enabled interfaces are described where sketch understanding is connected to a back-end system such as RationalRose or ChemDraw. Techniques for automated learning of new sketch domains and the difficulties associated with this task are presented.

Discussion

This paper presents a nice overview of the issues of sketch understanding, and why solving these issues is so challenging. The goal of this work is to create "magic paper" which affords the same natural and easy interaction as paper, but is capable of understanding what is drawn on the paper. It would seem advances in both hardware technology and software algorithms are still needed to achieve this ambition goal, but several big steps have already been taken to get us there. The concept of true "magic paper" seems the killer app for sketch understanding. Only when "magic paper" is better than or at least comparable to real paper will sketch understanding find itself deeply-seated in the daily lives of humans.

Perceptually Supported Image Editing of Text and Graphics

Eric Saund, David Fleet, Daniel Larner, and James Mahoney

Summary

Presented in this paper is an image editing program called ScanScribe. ScanScribe provides special functionality for selecting and structuring groups. Grouping is represented by a lattice structure where an image object can belong to more than one group. ScanScribe has an image analysis technique for seperating foreground and background. ScanScribe uses auomatic structure recognition to group elements of an image. The group recognition is based on Gestalt laws of human visual perception. No formal evaluation was conducted, but a number of users reported that the system was easy to learn to use.

Discussion

Our ability as humans to easily differentiate text from shape and form groupings of these different objects seems a valuable place to begin investigating for methods to employ with machines. I really like the idea of using laws of perception as a basis for building mathematical calcualtions of similarity. However, the human eye and mind do not function the same way as a computer; therefore, these laws of perception may not translate as easily to machines. As well, other factors, such as domain and contextual knowledge, play a role in our ability to differentiate shape from text.

Grouping Text Lines in Freeform Handwritten Notes

Ming Ye, Herry Sutanto, Sashi Raghupathy, Chengyang Li, and Michael Shilman

Summary

The authors present an approach for grouping text lines in handwritten notes containing both text and shapes. The approach uses a cost function based on the likelihood that a set of strokes form a line of text and the configuration consistency. Likelihood is based on mesaures from a fitted line: the linear regression error and the maximum inter-stroke distances projected onto the fitted line and its orthogonal. The configuration consistency looks to form groups of text with similar spatial orientation. This is done by computing a neighborhood graph and grouping connected nodes of the graph whose connecting edge's length is below a threshold value. A gradient-descent local optimization method solves the cost minimization for the function.

An evaluation wad done using data from 600 Windows Journal pages from tens of real TabletPC users. The system was evaluated using the recall metric (number of correct lines divided by number of labelled lines in each page), and achieved an accuracy of 0.93.

Discussion

This work achieves high accuracy for grouping lines of text in freehand notes. The approach is quite different from many of the other methods we have looked at, but it also is quite focused in in terms of the recognition goal. They are looking to find lines of text in freehand notes, which already contain large amounts of text (often arranged in single or multi-column). This method would not work for state labels in finite state machines or text no arranged in a line, such as text formed around an arc.

Sketch Recognition for Computer-Aided Design

Christopher F. Herot

Summary

The author presents a sketch recognition system for use in computer-aided design that attempts to infer user intention from information on how the user sketches strokes. Speed of a stroke is used to infer whether a stroke is a line, corner, or curve. Over-traced lines are replaced with a thicker to show emphasis from the the user. Using speed, line length, and density of lines around a point, lines that are meant to be connected but were not drawn as so are made connected (latched).

Herot concludes the paper with a section detailing how the user needs to be involved in the machine's inference of intention.

Discussion

What is presented in this paper is very similar to the previous Herot reading, but what makes this paper interesting to me is the final section. In the previous paper, he only brushed on the concepts that he goes into much more detail in the final section. He mentions the idea of coordinating "two concurrent processes" which is essentially the concept of mixed initiatives. His thoughts and ideas on this correlate closely with work that has come to light over the past 10 years. Particularly, the idea that the machine forms a model of the user and adjusts this model based on user interaction; while at the same time, the user is forming a model of the system and needs methods for providing feedback to the system to help the system mimic the user perceived model.

Thursday, October 23, 2008

Distinguishing Text from Graphics in On-line Handwritten Ink

Christopher M. Bishop, Markus Svensén, and Geoffrey E. Hinton

Summary

Bishop et al. present an algorithm for determining if a stroke in a sketch is text or graphics. The algorithm uses 9 features based on the stroke itself, a total least squares (TLS) fit for the stroke, and fragments of the stroke defined by local maxima in curvature. To classify on these features, a multilayer perceptron (MLP) is trained. The MLP returns a probability that given a feature vector for the stroke, made up of the 9 features, that the stroke is text. Spatial and temporal context of successive strokes is used to help further in the classification. A Hidden Markov Model is used to combine the probabilities of the feature-based approach with the probabilities of the context approach. An additional approach adding to the HMM approach the use of the gap between strokes as a characteristic for classification.

The conducted evaluation found the non-gap HMM preformed better than the gap HMM. The feature-base approach performed best on text, but worst on graphics. All approaches struggled with graphics.

Discussion

This work is interesting in that it ties feature-based classification with context-based. While the results obtained are not great, they do show some improvement over only feature-based when dealing with shape. It seems that text has distinctive features, while shape has distinctive contexts. Perhaps investigating other forms of context (e.g. the size of nearby strokes relative to the size of the entire sketch) can help better classify shapes.

Ink Features for Diagram Recognition

Rachel Patel, Beryl Plimmer, John Grundy, and Ross Ihaka

Summary

Presented in this paper is an algorithm for determining whether a stroke in a sketch is either text or shape. The authors begin with a set of 46 candidate features for discerning whether strokes are text or shape. Using an rpart function to find the dividing value for determining text vs shape for each feature based on error rates, the authors selected the most significant features by choosing the features who rpart value had the least number of missclassifications. A binary classification tree is then used where at each node the rpart value for a feature is used to make a decision on which branch to follow. The tree starts with the most significant feature and moves down to the least significant.

The authors compared their divider with one from Microsoft and one from InkKit. On the training data they used to calculate the rpart functions, their approach significantly outperformed the others on shape recognition (around 10% misclassification). When using a new data set, their approach did not perform as well. The Microsoft divider always had the lowest error rate for text, and the highest error rate for shape.

Discussion

This work is interesting. Trying to find significant features to differentiate text from shape is an important research area in sketch recognition. The problem I see with their approach (and this can be seen in their results) is that it will only find significant features for a training set. Sketching is applicable in so many domains and what can be sketched and how it is sketched are infinite. Finding a large and varied enough training set for this approach seems intractable.

Tuesday, October 14, 2008

MathBrush: A Case Study for Pen-based Interactive Mathematics

George Labahn, Edward Lank, Mirette Marzouk, Andrea Bunt, Scott MacLean, & David Tausky

Summary

Presented in this paper is an evaluation of MathBrush, a pen-math system with tight integration with a computer algebra system (CAS). MathBrush allows users to draw mathematical equations using a pen. Then, using sketch recognition the system attempts to recognize what the user has drawn and convert that to a computer-rendered and -processable representation. A contextual interactive editing affordance allows users to correct errors in recognition. A pop-up menu allows them to execute commands on the recognized expressions.

The evaluation consisted of thinkalouds and semi-structured interviews. Participants were asked to enter and manipulate several mathematical equations using MathBrush. The study showed that participants had no problems entering equations. Several issues arose from recognition, including problems with participants leaving erraneous ink. Participants were able to correct recognition problems, but not without some difficulty related to having multiple representations of the same equation. The contextual pop-up menu allowed participants to easily access mathematical commands, and proved to be one of the stronger aspects of the system. The authors noted that participants adjusted their drawing style to deal with recognition issues.

Discussion

The authors conducted a case study of a pen-math system called MathBrush, and found that participants were able to "effectively" use the system with a few problems with recognition and interaction. The pop-up menu and CAS portion of the system seemed to function the best.

I am not convinced that the participants were able to "effectively" use the system. The difficulties in recognition of sketches and understanding the affordances for correction and editing seem to hinder "effective" use. No qualitative results were presented about the participants' experiences using MathBrush. I think that is one missing piece in this qualitative evaluation.

The user's adapation to the systems shortcomings is an interesting research topic. I would be curious how easily users are able to adapt and what effects it has on cognition. I would research this further.

Renegade Gaming: Practices Surrounding Social Use of the Nintendo DS Handheld Gaming System

Christine Szentgyorgyi, Michael Terry, & Edward Lank

Summary

The authors present a qualitative study of the social uses of the Nintendo DS. The handheld gaming system supports ad-hoc wireless networking for multiplayer gaming.

In the first part of the study, 9 participants were interviewed. The interviews were semi-structured with a number of research questions related to how and why people engage in multiplayer gaming using the Nintendo DS. The results of the interviews discovered what places and times are acceptable and not acceptable for gaming with the DS. The concept of renegade gaming arose from the interviews. Renegade gaming consists of playing multiplayer games in "a subcontext within a larger host context." Renegade gaming was found to be acceptable as long as it didn't physically disturb others and it is socially acceptable to do so in the given host context. The interviews pinpointed issues with DS and being able to start pick-up games with strangers.

In the second part of the study, 3 gaming events were observed. The events ranged in size and the types of gaming. One event was DS only. Another had both DS and console gaming. The last was a competitive gaming event with consoles only. The authors noted that large screen displays used with console gaming events allowed non-participants to observe the gaming; where as, with the DS, observers cannot easily watch the gaming action.

The authors proposed a number of design implicaitons to address the problems of multiplayer gaming on the Nintendo DS.

Discussion

The authors have evaluated the social uses of the Nintendo DS gaming system. The study is only the start to larger understanding of gaming practices with mobile handheld systems. Understanding the social contexts for this type gaming can help game and gaming system designers to create gaming experiences that better facilitate these social contexts.

It's interesting to me the use of a basketball pick-up game as a comparison to a Nintendo DS mulitplayer pick-up game. The comparison does an excellent job of explaining the problems associated with starting a pick-up game with strangers on the Nintendo DS.

Future work in this research would be to continue with the evluation but with a greater number of participants to obtain a wider and more accurate depiction of the gaming community. As well, quantitative results can never hurt. However, obtaining quantiative results for this context seems non-trivial and an interesting area of study.

Wednesday, October 8, 2008

Sketch-Based Educational Games: "Drawing" Kids Away form Traditional Interfaces

Brandon Paulson, Brian Eoff, Aaron Wolin, Joshua Johnston, and Tracy Hammond

Summary

The authors present a number of sketch-based education games. The goal is to improve children's ability to learn by providing them with kinesthetic and tactile learning environments. The games are:

APPLES - an animated planetary physics simulation that lets children experiment with how gravity, motion, and collisions effect planets.
Simon Says "Sketch!" - a memory game for examining how stimulus change recall is affected if participants are able to specify game piece positions.
Go (Sketch-a) Fish - a sketch-based memory card game.
Sketch-based Geography Tools - initial setup is to help children learn each U.S. state by having them find and mark states on a map. Future expansion of this system is possible.
Learn Your Shapes! - a tool to teach children shapes by allowing them to sketch them.
Sentence Diagramming - a tool to help children in understanding sentence structure by providing feedback when diagramming the parts of a sentence.

A preliminary evaluation has been done using graduate students. A formal evaluation for the work is pending IRB approval.

Discussion

There are obvious advantages to helping children learn through kinesthetic and tactile methods. It will be interesting to see the final results of the evaluation following IRB approval, which while I've never had to get IRB approval for participants under the age of 18, I know is not something they give out easily.

Recognizing Free-form Hand-sketched Constraint Network Diagrams By Combining Geometry and Context

Tracy Hammond and Barry O'Sullivan

Summary

Presented in this paper is a sketch recognition system for diagramming constraint networks. The system uses the LADDER sketch language along with GUILD to generate a user interface from LADDER descriptions. The geometric constraints used in LADDER to define the recognized shapes in the system are described. As well as geometric constraints, contextual rules are used to help disambiguate similar shapes. An informal evaluation was done using graduate students familiar with sketch recognition to show that the system is able to recognize the desired shapes.

Discussion

This work introduces sketch recognition to a new domain, constraint networks. Future work for this research includes a formal evaluation not only of the effectiveness of the system, but also, an analysis of how sketching constraint networks on computer is improvement over existing methods for creating constraint networks.

Constraint networks contain both shapes and variable letters. The authors make an interesting point of using a handwritting recognizer for the letters. Finding a method to integrate a handwritting recognizer with the geometric recognizer seems a plausible and worthwhile approach to addressing recognition issues that arise when dealing with shape versus text.

Sunday, October 5, 2008

Ambiguous Intentions: a Paper-like Interface for Creative Design

Mark D. Gross and Ellen Yi-Luen Do

Comments

Daniel's blog

Summary

The authors present Electronic Cocktail Napkin, a sketch recognition system that focuses on diagramming. The system is designed to support abstraction, ambiguity, and imprecision, all qualities involved in free-hand sketching. In support of abstraction, the system allows the user to specify configurations which are groups of primitive elements that have unique relationships with each other for a specific domain.

In recognition, the system first tries to recognize low-level glyphs. These glyphs are recognized using a number of features (pen path, aspect ratio and bounding box size, stroke and corner count). The features are compared against templates for each recognizable glyph.

After recognizing low-level glyphs the system tries to recognize configurations by checking for patterns of primitive glyphs that match one of the user-defined configurations. As well, some ambiguous low-level glyphs that were not recognized are now recognized through association with configurations.

The system supports contextual recognition by identifying glyphs that are unique to a specific context. When a user draws one of these glyphs the system is able to determine the domain context for the sketch.

The authors conducted a number of evaluations to see if pursuing sketch recognition was useful in the scope of diagramming. Once they determined it was and began implementing the Cocktail Napkin, they ran user studies to help focus the work during development.

Discussion

I like how their use of abstraction, ambiguity, and imprecision. It shows a real evaluation of sketching before jumping into recognition approaches. It looks beneficial to allow things to remain ambiguous until contextual information can assist. However, wrong contextual information can cause problems. An amount of certainty is needed before determining context, and a method for the user to correct mistaken context. I would go in line with the authors' proposed future work to extend this approach to support free-form sketching. Perhaps a solution is combining this technique for diagramming with one better suited for free-form sketching, rather than trying to implement free-form using this approach.

LADDER, a sketching language for user interface developers

Tracy Hammond and Randall Davis

Comments

Nabeel's blog

Summary

The authors present a sketching language for describing recognizable shapes and sketches to be used by persons not in the sketch recognition field. The goal is to allow people such as user interface designers and developers and experts in a specific domain to easily build systems that use sketch recognition interfaces without having to build a complex sketch recognition system from the ground up.

The described language, LADDER, defines recognized shapes using five parts: components, geometric constraints, aliases, editing behaviors, and display methods. Components are the elements that make up a shape. Constraints define the relationships between the components. Aliases specify names for components that allow for easier readability and construction in the language. Editing behaviors describe how a shape can be modified. Display methods specify how a shape is visually shown. Shapes can be defined in a hierarchy. As well, abstract shapes can be defined to describe general features shared by multiple shapes to prevent redundant declarations.

Constraints for a shape can either be specified with rotationally invariant constraints or to a specific orientation. A shape can be labeled as isRotatable when constraints are described with respect to orientation, but any orientation is acceptable.

Editing behaviors are broken down into actions (how a shape is modified) and triggers (how this action is executed). Display methods allow the shape to be drawn in a number of different ways including unchanged or cleaned-up.

The recognition system first tries to recognition primitive shapes. If only one possible primitive shape can be assigned to a drawn shape, it is passed on to the next stage; otherwise, all potential classifications for the drawn shape are passed on. The next stage recognizes domain specific shapes using a rule-based system. Each recognizable shape has a rule. The domain recognizer searches for every possible combination of primitive shapes that can satisfy a rule.

Code for domain shape recognizers, display methods, and editing behaviors are generated in a translation stage run prior to run-time.

Discussion

The authors have contributed a sketching language for non-experts to build systems that require sketch recognition. This allows experts of a domain design sketching systems applicable to their work. This minimizes the workload on the sketch recognition community to learn domain specific knowledge for building domain specific sketch recognition systems.

One thing I'm interested about is if the language is extensible. By extensible, I mean users can define constraints, actions, triggers, etc. beyond the predefined ones. This seems particularly applicable with constraints where they involve mathematical equations. The equation for the constraint could be stated in the language, and no new programming would be involved for the end user (just more use of the language).

Backpropagation Applied to Handwritten Zip Code Recognition

Y. LeCun, B. Boser, J.S. Denker, D. Henderson, R.E. Howard, W. Hubbard, L.D. Jackel

Comments

Yuxiang's blog

Summary

The authors describe an approach to recognizing handwritten zip codes using neural networks. Their approach uses a three layer neural network trained using backpropagation. The network takes a 16 x 16 normalized image of a single digit as input and outputs 10 units representing the 10 different digits. Feature detection on the input is done through weight sharing, which reduces the number of free parameters in the network and can express information about the geometry and topology of the task. Training and testing data for the neural network were provided by the U.S. Postal Service.

Discussion

My understanding of neural networks is limited to the small amount I remember from undergrad artificial intelligence. However, the task they have in mind with the large dataset available to them seems well suited for using neural networks. This points out one of the drawbacks of using neural networks in sketch recognition in that extensive training is needed. For handwriting recognition, where available traning data is enormous, neural network seem applicable, but perhaps not for free-form sketching.

What!?! No Rubine Features?: Using Geometric-based Features to Produce Normalized Confidence Values for Sketch Recognition

Brand Paulson, Pankaj Rajan, Pedro Davalos, Ricardo Gutierez-Osuna, and Tracy Hammond

Comments

Daniel's blog

Summary

The authors sought to investigate what gesture- and geometric-based features are best for recognizing sketches. The geometric features examined came from PaleoSketch; while, the gesture features came from Rubine. A quadratic classifier was used to classify strokes based on the features.

The evaluation used 1800 examples, where 900 where for training and subset selection, and the other 900 were for testing. A greedy, sequential forward selection technique was used to determine feature subsets. The optimal features subset contained 15 features, only one of which came from Rubine.The optimal feature subset reported accuracies similar, but slightly smaller than the original PaleoSketch.

Discussion

This work is beneficial to research investigating hybrid approaches for recognizing sketches using both geometric and gesture based features. One possible issue with this work, is that the training and test examples looked more suited to geometric based approaches than gesture based. Adding more gestural based examles such asalphabet characters to the set of classes may see improved performance in Rubine feature, or may not. An investigation of this would be one direction of future work.

Envisioning Sketch Recognition: A Local Feature Based Approach to Recognizing Informal Sketches

Michael Oltmans

Summary

Oltmans presents a vision-based approach to sketch recognition. Oltmans' approach uses visual parts of a shape to classify. Using a polar grid structure resembling a bullseye, feature vectors are calculated for parts of a stroke based on the ink density for each cell in the bullseye. A codebook is created which contains a standard vocabulary of parts. Input parts are compared with codebook parts, and a match vector is calcuated based on the differences. The match vectors for a training set are used to train a classifier, which can then classify an input stroke based on the match vectors of its respective parts.

Discussion

This is an intersting vision-based sketch recogntion algorithm. It is simple in design, and has potential in both on-line and off-line recogntion of sketches. One of the biggest advantages I see with vision-based techniques is that over-tracing is not an issue. However, I can see an advantage in being able to recognize when a sketch is over-traced. It emphasizes a statement of importance from the user. Perhaps this is useful from contextual standpoint, and can help in recognition sketch when the over-traced part is only a part of the whole sketch.

Constallation Models for Sketch Recognition

D. Sharon and M. van de Panne

Summary

The authors introduce a vision-based technique for sketch recognition that is based on invidiual features such as shape and size of a stroke and pair-wise features such as distances to other parts of a known sketch. The model determines a four-element feature vector for individual parts a four-element feature vector for paired parts. The matching of a label to a stroke is given a likelihood value based on a energy function derived from the differences in individual and pair-wise feature vectors between the label and the stroke. The authors run multiple passes on a branch-and-bound search tree to find the maximum likelihood value to label a stroke.

The authors evaluated the speed of their approach with and without using doing multiple passes on different classes.

Discussion

I like this approach because it gives a large amount of freedom in how a sketch can be drawn. One downside of this approach is that while sketches can be drawn differently they still require the same amount of strokes. This seems like a huge limitation if one of your biggest benefits is freedom of form. My future work would be directed towards alleviating this issue.

Monday, September 29, 2008

GLADDER: Combining Gesture and Geometric Sketch Recognition

Paul Corey and Tracy Hammond

Summary

The authors introduce a sketch recognition algorithm that integrates a two recognizer, a modified Rubine recognizer, a feature-based linear classifier, and the LADDER recognizer, a geometric-based hierarchical classifier. The approach first uses the Rubine classifier on a stroke, and calculates the Mahalanobis distance to any Rubine class. If this distance is shorter than a threshold, the classification is used; otherwise, the stroke is passed to the LADDER classifier. Evaluation pointed to a slight improvement in the integrate approach in comparison with each classifier individually.

Discussion

Integrating two distinct approaches that are each suited for handling different types of sketches and strokes is an advantageous approach, as the results indicated. Future work could include investigating alternate hybrids using different classifiers, as well as, determining a method to obtain correlated confidence values between Rubine and LADDER to help better choose which classification to use.

A Domain-Independent System for Sketch Recognition

Bo Yu and Shijie Cai

Summary

The authors present a domain-independent sketch recognition system. The system has two stages: an imprecise stroke approximation stage, and a post-processing stage. In the first stage, curvature and direction is calculated for each point. Area is used to direct recognition and verify results. Feature area for a stroke is determined by the area between points on a stroke and another reference object.

The system combines a corner finding approach with a primitive shape approximation approach. First, a check is made to see if any the stroke can be approximated by any primitive shape. If it can, the process stops; otherwise, it divides the stroke at the highest curvature point, and continues the process on the two new segments.

Line segments are approximated using least-squares best fit line to check if derivation is small between the stroke and the best fit line. If it is, the feature area is used to validate the candidate line. Curve segments are approximated by checking if the direction graph can be approximated as a line. Feature area is once again used to validate candidate curves.

In the post-process stage which occurs after the sketch has been complete drawn, false and redundant objects are removed, strokes are beautified, and domain-independent objects are recognized.

The system has a user interface to help users obtain their desired drawings. A modal toolbox exists for the creation, modification, and deletion of primitive shapes, as well as, command gestures for copy, delete, undo, and redo.

The authors conducted an evaluation where the system achieved accuracy for primitive shapes and polylines was 98%. The system obtained an accuracy of 70% for hybrid shapes with smooth connections.

Discussion

The incorporation of corner detection with primitive approximation seems beneficial. Corner finding algorithms are still not perfect. Any added assistance to help in the segmentation of strokes is useful. The use of command gestures was nice; however, having to use the modal toolbox to enter modification mode is not so nice. I understand the complexity of distinguishing gesture commands from actual sketching. An interesting piece of future work would be to investigate ways to separate these two. I could see using FlowMenus for this system, instead of a modal toolbox.

Eliminating False Positives During Corner Finding By Merging Similar Segments

Aaron Wolin, Brandon Paulson, and Tracy Hammond

Summary

The authors present MergeCF, a corner finding algorithm that computes an initial set of corners and then removes false positives by merging line segments. The initial corners are determined by calculating the curvature and speed at each point, and any point whose curvature value is above a specific threshold and speed value is below is deemed a corner. Small line segments are then merged with that adjacent line segment that causes the least primitive fit error. The algorithm reported a higher accuracy for both correct corners found and "all or nothing" versus algorithms by Sezgin and Kim. MergeCF saw a significant accuracy increase in "all or nothing."

Discussion

This approach of removing false positives is important in obtaining high "all or nothing" accuracies. Wolin et al. have a simple and quick approach. Further work could be done to analyze what false positives are not getting removed, and adding functionality to the algorithm to remove these false positives, thereby increasing "all or nothing" accuracy.

A Curvature Estimation for Pen Input Segmentation in Sketch-based Modeling

Dae Hyun Kim and Myoung-Jun Kim

Summary

Kim and Kim present a corner finding algorithm which can be used on-the-fly by using local curvature information. The authors first resample the stroke to have equidistant points, and then compute curvature estimates for each resampled point using differential geometry. The local curvature information used are local convexity and local monotonicity. Points are considered locally convex when the sign of the curvature value across a series of points does not change. Local montonicity determines if a series of points are continually decreasing. The algorithm proposed by the authors local monotonicity must be a threshold requirement in order to explore points as possible corners. Local maximum for positive curvature and local minimum for negative curvature are used to select corners.

Evaluations were conducted to see how different drawing styles affected the algorithm, how different approaches (curvature estimation only, local convexity only, local convexity and local monotonicity, and a bending function from Fu et al.), and how the algorithm performs under a some special test cases.

Discussion

The approach is interesting and seems to work well for both polyline and arcs. The on-the-fly aspect seems beneficial in some cases. Although in most cases, a sketch doesn't need to be recognized until it is fully drawn. By not comparing their algorithm to other approaches, the evaluation is questionable to me. While their algorithm may perform well on their test set, another approach could perform even better. Without this evaluation, they can't truly validate this approach as improvement or alternative to the other corner finding algorithms.

PaleoSketch: Accurate Primitive Sketch Recognition and Beautification

Brandon Paulson and Tracy Hammond

Summary

Paulson and Hammond present PaleoSketch, a sketch recognition approach that is able to recognize and distinguish basic shape primitives within a free-hand sketch. The algorithm begins with a pre-recognition stage where curvature and speed graphs for each stroke are computed. Also, the normalized distance between direction extremes and the direction change ratio are calculated to help in differentiate polyline shapes from curved shapes. A number of test are run on a stroke to determine what type of primitive the stroke represents. The tests are for line, polyline, ellipse, circle, arc, curve, spiral, helix, and complex shapes. Without a comparable error metric across the test, PaleoSketch uses a hierarchy to determine which shape is the best fit. The hierarchy is based on the minimum number of corners (or line segments) a particular shape is expected to have. The authors reported recognition accuracies above 98%.

Discussion

PaleoSketch is accurate and useful for a wide variety of applications. The biggest issue with this research is the use of a hierarchy. Developing a comparable error metric for the various test seems like a critical piece of future work. Currently, adding a new primitive means restructuring of the hierarchy. With an error metric, no hierarchy is needed, and new tests can be added easily as long as error metric can be computed for them.

Sketch Based Interfaces: Early Processing for Sketch Understanding

Tevfik Metin Sezgin, Thomas Stahovich, and Randall Davis

Summary

Sezgin et al. present a corner finding algorithm based on calculated curvature and speed data of a stroke. The approach involves plotting a curvature graph and a speed graph. An average-based theshold is used on each the graphs to avoid problems of a fixed threshold. The algorithm generates a hybrid fit to determine which points of the stroke are corners. This process consists of three steps: computing vertex certainties, generating a set of hybrid fits, and selecting the best fit. The hybrid fit with the fewest vertices and error below a specific threshold is the selected best fit.

Discussion

The use of curvature and speed information seems critical to helping find corners, especially when extending a polyline approach to look at strokes with curves as well. While Sezgin reports high accuracy in this approach, his accuracy is based soley on whether or not the correct corners were found, and doesn't address additional found corners that are not correct. Others have reported that his accuracy decreases greatly when using an "all or nothing" accuracy. This is not to say that Sezgin's algorithm is in effective, but that further research needs to be done to address this issue.

Algorithms for the Reduction of the Number of Points Required to Represent a Digitized Line or its Caricature

David H. Douglas and Thomas K. Peucker

Summary

The authors sought a method to reduce the number of points needed to record a digitized stroke. They proposed a corner finding algorithm to address this issue. The Douglas-Peucker algorithm uses a floating and anchor point. Initially, the floating point is the last point, and the anchor point is the first point. The points on a stroke in between the anchor and floating points are examined to find the point with the greatest perpendicular distance. If the distance is less than a threshold value, the line is determined to be a straight line segment. If the distance is greater than the threshold, the line has a curve in it, and the floating point is moved to that point. The algorithm recursively continues until it finds a straight line segment. Once a straight line is found, the floating point becomes the anchor point, and the floating point returns to the last point. The new anchor point is marked as a corner. The algorithm repeats until no more corners are found.

Discussion

The algorithm is affective for finding corners in polyline strokes. However, the algorithm struggles with finding corners with highly obtuse angles. These corners appear as straight lines to the algorithm. As well, the algorithm recursively iterates over points in the stroke extra times, which can cause longer performance times. It seems that a sawtooth line with spread out teeth would take a lengthy amount of time for this algorithm.

Monday, September 15, 2008

ShortStraw: A Simiple and Effective Corner Finder for Polylines

Aaron Wolin and Tracy Hammond

Summary

ShortStraw is a simple corner finding algorithm for polylines. The algorithm initially resamples points in a stroke to be equidistant using the approach presented by Wobbrock et al. The algorithm then uses a bottom-up approach to find initial corner candidates by finding the length of "straws" in a stroke. Straw lengths are calculated at each point using the euclidean distance between two points that form a window around that point (±3). Based on a median straw length, any straw shorter than 0.95 * median is considered a corner candidate. The algorithm then uses a top-down approach to remove false postives and find missed corners. This is done by first checking if two consecutive corners contain a line between them. If not, a corner has been missed. The corner threshold is relaxed, and new corners computed between those two corners. A collinear check is done on triplet corners where the middle corner is removed if the corners are found to be collinear. The algorithm's accuracy, whether "correct corners found" or "all-or-nothing," was greater than that of Sezgin's and Kim and Kim's.

Discussion

The algorithm is incredibly simple, and elegant in that simplicity. The concept of straws work as an effective metaphor for the algorithm. The accuracy of the algorithm speaks for itself. However, one could argue that the test symbols did not contain an equal number of acute, right, and obtuse angles, and favored right and acute angles, which favor the algorithm. If either Sezgin or Kim and Kim perform better on obtuse corners, then the accuracies may be closer with a more equally balanced set. Although, I doubt these algorithms are better with obtuse, as obtuse angles seem like the more difficult corners to recognize.

The obvious problem with this algorithm, as described by the authors, is the difficult this algorihtm has in recognizing obtuse angles. A possible solution could be to have a second more relaxed threshold for obtuse candidate corners. For each obtuse candidate corner, one of the lines could be extended to create an acute angle opposite the obtuse angle. A threshold is applied to the acute angle. Anything greater than the threshold (say 5 degrees) is considered a corner.

Prototype Pruning by Feature Extraction for Handwritten Mathematical Symbol Recognition

Stephen M. Watt and Xiaofang Xie

Summary

The authors present a mathematical symbol gesture recognizer using prototype pruning by feature extraction. Initially, a preprocessing stage smooths out strokes, resamples, size normalizes, and chops off heads and tails. The features used include a number of geometric features (number of loops, number of intersections, and number of cusps), ink related features (number of strokes, point density), directional features, and global features. Using the features, a prototype pruning process reduces the number of potential classes for an unrecognized symbol. An elastic matching recognizer is the final step, where an unknown symbol is classified.

The recognizer was evaluated on a test set of 227 symbols. The author's compared their results with those derived by J. Kurtzberg. While the accuracy of the authors' recognizer was 1% less than that of J. Kurtzberg's, the number of pruned prototypes was significantly greater for the author's approach. The author's approach pruned 85.8% of prototypes, while J. Kurtzberg's approach pruned 61.5%.

Discussion

The features selected by the authors are interesting. I can see a uniqueness in symbols coming from the number of loops, intersections, and cusps, especially with letters and numbers. The basis for their selection is what they say humans use in reconizing symbols. They don't present any evidence of this, though. Not that I necessarily disagree with them, but I think to make this statement without evidence or at least a reason for this insight seems bold.

Thursday, September 11, 2008

User Sketches: A Quick, Inexpensive, and Effective way to Elicit More Reflective User Feedback

Maryam Tohidi, William Buxton, Ronald Baecker, and Abigail Sellen

Summary

Usability testing typically involves interacting with potential users of a system through verbal communications and visual observations. The authors of this work noticed that the type of feedback received using standard usability techniques was often more reactive than reflective. Participants report more criticisms than actual suggestions and improvements. To resolve this issues, the authors looked to incorporate sketches as usability feedback metric. The authors ran a traditional usability study comparing designs for a thermostat. At the end of the study, participants were asked to draw their own design for a thermostat. The authors noticed that the sketches pointed out the same issues as the more traditional methods, but also, introduced more suggestive feedback. Participants' sketches contained new ideas or combined ideas from the designs shown earlier in the study.

Discussion

The authors introduce a new quick and inexpensive way to collect feedback in a usability study. By allowing participants to sketch out ideas, they are better able to think about and convey design suggestions than through verbal and textual means. I think this is a great idea. Communicating issues with visual designs is sometimes better reflected through drawing.

One thing I would suggest in terms of this research is to not only allow users to create their own design, but also to sketch changes to an existing design when using the more traditional usability techniques. Participants in these studies are not experts on UI design. They often may not understand restrictions placed on the design. However, that is not to say that their new designs cannot benefit, just that, new ideas may not solve existing issues and may not be practical.

This work does not directly deal with sketch recognition; however, the use of sketch recognition techniques to help in evaluating participant sketches seems useful. Allowing a computer the computer to compute quantitative results can reduce even further the small time already demanded by this approach.

Sunday, September 7, 2008

Graphical Input through Machine Recognition of Sketches

Christopher F. Herot

Comments

Manoj's blog

Summary

Herot presents an approach to sketch recognition that infers user intent by measuring how quickly a pen is moved and how hard it is pressed. He introduces a system called HUNCH and discusses several sketch processing features within it (e.g. latching and overtracing), including problems faced by those features. HUNCH is able to map 2D sketches into 3D structures. The author points out that human understanding of context helps the user make sense of a drawing, and computers could benefit from having a similar knowledge structure. He details the HUNCH system's approach in which context is specified by the user.

The final designed system is a more interactive system that incorporates improvements to the features of HUNCH. The system is able to recognize lines and curves on the fly based on the speed of the stroke and it's "bentness." In HUNCH, the user specified when the processing features were run. In the new system, these are ran in the background to avoid disrupting the user's flow in designing.

Discussion

The author has designed graphical input system for recognizing sketches that seeks to use both machine processing of sketch features and human understanding of context. I think it's better to ask user about context, than to wrongly determine context. However, persistent querying of the user also seems wrong. An adequate balance is needed. Questions about context should be asked "in-context", that is appearing at the point of interaction. Forcing the user to look elsewhere on the screen defeats any benefit from asking him/her for feedback. This of course means developing clear transitory affordances for the user to interact with the system. It sounds as if the author was headed on this approach, but I am uncertain how exactly the user provided contextual information within the presented system.

Wednesday, September 3, 2008

Gestures without Libraries, Toolkits, or Training: A $1 Recognizer for User Interface Prototypes

Jacob O. Wobbrock, Andrew D. Wilson, and Yang Li

Comments

Manoj's blog

Summary

Wobbrock, et al., introduce a new light-weight gesture recognizer that doesn't require training or special libraries and toolkits. The algorithm was designed to work on any development platform and be simple enough for UI designers to create gestures to be recognized by it. The algorithm has four steps.

The point path is re-sampled so that points on the path are equidistant from each other. This is needed so that speed a gesture is drawn has no effect on the gesture recognition.
Rotate the gesture once based on the indicative angle.
Scale the gesture to a reference square. Then, translate the gesture, so it's centroid is at the origin.
Find the optimal angle for comparing the gesture with the templates to obtain the best score.

The authors compared their algorithm with both Rubine's and a template matcher based on Dynamic Time Warping (DTW). The results showed that their algorithm performed with better accuracy than Rubine's, and comparable with DTW. Also, the authors noted the $1 recognizer had a greater separation between the 1st and 2nd gesture scores.

Discussion

The contribution of this work is a simple gesture recognizer that does not require extra software, training, or expert knowledge in the field of pattern recognition. The authors developed two implementation of the $1 recognizer, one in C# and one in javascript. It's ability to be implemented in a variety of platforms including light-weight ones such as javascript and Flash seems incredibly valuable.

To handle variability in gestures, the authors use an idea of aliasing, so that multiple templates can be assigned to one visual object (e.g. arrow example, see Figure 7 from paper). The simpleness of the algorithm probably limits them to this approach, but it seems future work could be directed towards addressing this problem better.

At interesting point to me is that in the evaluation the authors asked participants to rate the gestures subjectively. It surprises me that this has not been done in other work we have looked at so far. It seems to me that getting input from actual users about the quality of gestures is an important step in choosing gestures to recognize. Of course, I realize the majority of the work is about the actual recognition of the gestures and not the gestures themselves.

MARQS: Retreiving Sketches Using Domain- and Style-Independent Features Learned From A Single Example Using a Dual-Classifier

Brandon Paulson and Tracy Hammond

Comments

Daniel's blog

Summary

The authors introduce a new multi-stroke sketch recognition algorithm that uses dual-classifiers, and a system called MARQS that uses this algorithm to do search by sketch on a collection of photos and music organized in albums.

The algorithm uses different classifiers based on the number of existing training examples. In the case of only a single example, a simple classifier is used that computes a set of features for a sketch, and compares those features with sketches in a database. A total error is computed, and the sketches with the lowest errors are returned. Each new sketch query is added as a training example. Once multiple training examples exist, a linear classifier is used.

Discussion

The contribution of this work is a new sketch recognition algorithm requiring only a single example to recognize a sketch, but will improve it's accuracy by adding new sketches drawn by the user as training examples. The algorithm is domain-independent and is not affected by orientation, scale, and other user-specific features.

An issue with this research, noted in the paper, is that as the number of examples increases, overfitting can occur. The authors propose a threshold to stop adding new examples. A potential variation to this idea would to be only add new examples when it would improve accuracy. Any sketch that doesn't help as an example is thrown away. Old ones are removed when new ones offer improvement. I'm not sure how this would be implemented, but it could be worth future research.

The idea of a sketch query is nice, as sometimes a query cannot be easily defined simply by words. I could see searching something like the U.S. Patent Offices database using this.

Sunday, August 31, 2008

Visual Similarity of Pen Gestures

A. Chris Long, Jr., James A. Landay, Lawrence A. Rowe, and Joseph Michiels

Comments

Nabeel's blog

Summary

The authors propose metrics for evaluating the visual similarity between pen gestures. The limits of human attention and memory make it difficult to recall gestures. Their goal is to develop a systematic method for checking the similarity between gestures, so that gestures can be designed different enough from each other to not confuse the user during recall.

Two experiments were conducted by the authors. For each experiment, a set of gestures was defined. For the first experiment, the authors wanted to derive metrics for evaluating the similarity between gestures, so they generated a set of gestures that covered a wide variety and that reflected different spatial orientations. In the first experiment, participants were shown all possible combinations of creating three different gestures from the set and asked to pick the least similar gesture of the three.

The authors selected several possible features for comparing similarity. They looked at the 11 features used by Rubine, as well as others derived from examination of the data using a technique called multi-dimensional scaling (MDS). Some of the predictors resulting from the first experiment where curviness, total absolute angle, density, cosine of angle between first and last points, and aspect.

In the second experiment, the authors wanted to evaluate the predictive power of the metrics derived from the first experiment against new people and gestures, and explore how changing different types of features would affect the results. A new gesture set was derived for the experiment based on these criteria. Some of the resulting predictors for the second experiment were Log(aspect), total absolute angle, and two density metrics.

The authors found that the predictors from experiment 1 were better at predicting than those from experiment 2. Also, the authors noticed in both experiments that participants used different features in determining similarity.

Discussion

This work introduces metrics for evaluating the similarity between pen gestures. Developing systematic ways to analyze the similarity between gestures allows developers to design gestures that are dissimilar for commands that are unrelated, and similar gestures for those commands that are related.

The two faults I find with this work are one, as mentioned by the authors, they did not let the participants actually draw the gestures. Unforeseen changes in perception may arise when the participants are engaged in drawing the gestures. The other fault is that I did not believe the authors adequately examined research in perception related to this topic. There has been work done on perception of similarity by psychologist since the beginning of the early 20th century. Spatial proximity plays a role in how we perceive similarity. The snapshots of the study show the three gestures close to each other. This could have adverse effects on their results.

I think this type of research is incredibly valuable to sketch recognition. My future work would be to run more studies with the issues previously discussed addressed. A continued refinement of the metrics is needed, which can only come with further evaluation and resulting insight.

Specifying Gestures by Example

Dean Rubine

Comments

Ben's blog

Summary

Rubine introduces a new single stroke gesture recognition algorithm based on statistical pattern recognition and a toolkit called GRANDMA for adding gestural recognition to an interactive application. The work is introduced by example in a gesture-based drawing program called GDP that uses GRANDMA.

GRANDMA uses a structure similar to the Model/View/Controller methodology, where controllers are the input gestures and views are view classes that represent visual objects on the screen that allow gesture interaction. GRANDMA contains a gesture designer, which allows for a developer to design new gestures and assign them to view classes.

The gesture recognition algorithm breaks a single stroke gesture down into 13 features. These features include the sine and cosine of the initial angle, the length and angle of the bounding box diagonal, distance between the first and last point, the sine and cosine of the angle between the first and last point, the total gesture length, and the total angle transversed. The algorithm then classifies the gesture by calculating weighted vector for each recognizable gesture based on sum of weights assigned to each feature. The gesture with max weighted vector is the recognized gesture.

In order to determine the weights for the features, a classic linear discriminator is used to train the feature set. The weights are based on an inverted estimate of the common covariance matrix.

In order to deal with ambiguous gestures, the author calculates the probability the gesture was recognized correctly, and if that probability falls below 0.95, the gesture is ignored.

Discussion

This work presents a new algorithm for recognizing gestures by example and a toolkit that allows for easier addition of gesture recognition in other applications. Prior gesture recognition systems used hand-coded recognizers. With Rubine's algorithm, new gestures are recognized by providing it with a variety of examples on how the gesture can be drawn. No hand-coding is needed.

One of my biggest faults with this work is the rejection approach. The authors say that rejection should not occur when an application supports quick undo. As a user, I would hate having to undo every time the recognizer failed. I would much rather it ask me to input again, than to do something I did not want. As well, having to go execute undo takes me away from my current task, disrupting my flow.

If working further on this algorithm, I would design towards recognizing gestures of more than one stroke. The algorithm is already using the geometry of a single stoke, why not try and do this for multiple strokes? Difficult, definitely, but potentially worth the effort.

Thursday, August 28, 2008

Introduction to Sketch Recognition

Tracy Hammond and Kenneth Mock

Comments on others

daniel's blog

Summary

The authors present an overview of pen-based interactive systems and applications for these systems. The paper begins with a summary of the technology used in pen-based interaction. Passive digitizers allow the use of any stylus including one's finger, but suffer from vectoring (unintended triggers when for example a palm brushes the digitizer), require touch before recognition, and have lower resolution and accuracy than active digitizers. An active digitizer needs a special stylus, but this eliminates the problems of vectoring and required touch associated with passive digitizers.

An outline of software features across operating systems is described in the paper. Microsoft Windows has the largest feature set.

The authors compare and contrast the use of large screen displays such as SMART Board versus smaller TabletPC displays. Large displays offer more screen real estate and allow displaying information to multiple people without the need for individual displays. The TabletPC allows for greater accuracy and flexibility in movement.

Several applications of sketch recognition are presented. A few of these are:

ChemPad: converts sketched chemical diagrams to 3-D models

LADDER MechEng: recognizes and simulate hand-drawn mechanical engineering diagrams

LADDER FSM: draw finite state machines, and run an input through them

The process of using LADDER and the GUILD system to build a new sketch interface is outlined. The domain specific information is defined using LADDER, and GUILD automatically generates a system for recognizing sketches in that domain.

The paper concludes with two case studies and a future work section. The case studies illuminate the advantages of a TabletPC-based lecture, pointing to higher student involvement and better attention spans.

Discussion

The contribution of this paper (or fragments of a book) is an overview of sketch recognition technologies and their applications. Also, the paper points out the benefits of using sketch recognition technologies in an educational environment.

As this paper is an overview, it's difficult without significant knowledge in the field to point at possible faults. Difficulty in reading it arose from its fragmentation, but only until I realized it was not a continuous document.

Future work to pursue would include mentioning more related hardware (possibly a brief discussion of multi-touch and comparison). As well, evaluating designed systems to further emphasize the value of sketch recognition systems.

Wednesday, August 27, 2008

Sketchpad: A Man-machine Graphical Communiction System

Ivan E. Sutherland

Comments on others

ben's blog

Summary

Sutherland presents a new (in 1963) graphical communication system called Sketchpad that uses a pen interface instead of a keyboard. Using a light pen and a set of push buttons, a person can create drawings on a computer using Sketchpad.

Sketchpad uses a ring structure to store relationships between elements in a drawing. Elements are structured in a hierarchy where ancestors are more generic than their descendants. This allows for separation of generic and element specific code, not too unlike modern OOP. Sketchpad supports the addition of new element types as well.

Sketchpad supports the display of not only graphical elements, but also abstractions. An example of an abstraction is a constraint block, which is a rule that specifies certain values must be maintained (e.g. making lines parallel). By visualizing these abstractions, Sketchpad allows the user to make changes to them.

Several atomic operations, controlled by the push buttons, provide for the creation of new drawing elements in the display. One of these operations, the copy function, lets the user create a new instance of an existing element, referred to as a "definition picture." Definition pictures can contain "attachers," that are used in relating the definition picture to other elements. Copied instances are linked to each other, so a change to one affects the others. Using this copy functionality large patterns can easily be created and modified.

Sunderland used Sketchpad for a number of different applications including linkages, bridge structural diagrams, animation, and electrical circuit diagrams.

Discussion

This paper is obviously significant for being the first to use a pen interface, a ground breaking achievement, which was sadly not followed up on until much later. This work combines human drawing with the computer's mathematical computation. Sketchpad allows a person to apply real world constraints to a design drawing that isn't possible with pen and paper.

I found two faults with this work. The first is using a flick to terminate. A flick being, as described in the paper, a quick movement too fast for tracking program to recognize. Since the system is using the pen and paper metaphor, I think terminate should be done my moving the pen away from the display, but perhaps hardware limitations prevented them from doing so. The second is the use of push buttons. I realize hardware limitations may have necessitated this design decision, but that type interaction does not match the fluidity I feel when just using pen and paper. It does remind me of some recent work I've seen that uses bimanual pen and and direct-touch for interaction. Not the same idea, but similarities in using both hands with a pen in one.

My one question is,"where's the user study?" Perhaps it is just my HCI background that begs this question. Not that the paper needs a user study. It's innovative, that's enough a reason for writing about it. It seems that potentially more people would have pursued this approach had some evaluation shown that this work was an improvement over other interfaces.

The future work for me is fixing the problems described and evaluating the system.

9 Questions

andrew at ecologylab dot net
1st year Ph.D.

Why are you taking this class?

I'm very interested in designing systems that use a more natural and fluid interface than the mouse and keyboard. Particularly, I want to design systems that promote creativity and design. It is my believe that sketch recognition can help in the creation of such systems.

What experience do you bring to this class?

My background is in HCI, more specifically Human-centered Computing (HCC). I've done a healthy dabble in the arts and design. I have familiarity with the humanities.

What do you expect to be doing in 10 years?

Hopefully, not sitting on my ass wondering where my life went wrong in the last 10 years. Ha. No, I expect to be doing fun, exciting, and innovative research in either academia or industry. I think academia will give me more freedom, but I would like to spend a little time in industry before that (a cliché answer, meaning I don't know whether I want academia or industry).

What do you think will be the next biggest technological advancement in computer science?

The end of Microsoft, or the multi-touch MacBook. Both I'm excited for. Seriously, I'd like to see something in wearable computing happen. I'd love to be able to place an iPhone in my pocket, and be able to access features of it from my shirt sleeve such as calendar, google maps, music, and so on.

What was your favorite course in undergrad (CS or otherwise)?

There were three I really liked:

Design Communication Foundations: fantastic design course that broaden my design skills.
Structures of Interactive Information: really exposed me to the ways information can be represented.
Photography: b&w film; shooting, developing, and making prints.

If you could be another animal, what would it be and why?

Pterodactylus - I like dinosaurs, and flying seems like fun.

What is your favorite motto or slogan?

"Everyone wants to be Cary Grant. Even I want to be Cary Grant."
- Cary Grant

What is your favorite movie?

It's almost impossible to pick a favorite. It will always depend on what day you ask me. Today, it's Amelie.

Name some interesting fact about yourself.

I played in a metal band in high school. We weren't very good.