Research Spotlight

sort icon

Voice-Based Social Media for Developing Regions

Social software – email, blogs, wikis, forums, social networks – has revolutionized how people share expertise and collaborate on the web. However, in rural developing regions, many do not have direct access to Internet-connected PCs or the literacy skills to interact with textual content. How might we design a communications platform for these communities? In our research, we are designing voice-based applications for communities in rural India to access agricultural advice and share expertise, using the mobile phone. The key challenges are contending with limited capability speech recognition for regional languages, designing for illiterate users, and methods for search and filtering of user-generated audio content. We have deployed one pilot system for farmers in Gujarat, India, to record agricultural questions and get responses from experts and other farmers. Based on the enthusiastic response, the application will be launched later this year to serve over 500,000 farmers across the state.

Total Scene Understanding

Given an image, we propose a hierarchical generative model that classifies the overall scene, recognizes and segments each object component, as well as annotates the image with a list of tags. To our knowledge, this is the first model that performs all three tasks in one coherent framework. For instance, a scene of a ‘polo game’ consists of several visual objects such as ‘human’, ‘horse’, ‘grass’, etc. In addition, it can be further annotated with a list of more abstract (e.g. ‘dusk’) or visually less salient (e.g. ‘saddle’) tags. Our generative model jointly explains images through a visual model and a textual model. Visually relevant objects are represented by regions and patches, while visually irrelevant textual annotations are influenced directly by the overall scene class. We propose a fully automatic learning framework that is able to learn robust scene models from noisy web data such as images and user tags from Flickr.com. We demonstrate the effectiveness of our framework by automatically classifying, annotating and segmenting images from eight classes depicting sport scenes. In all three tasks, our model significantly outperforms state-of- the-art algorithms.

PANDA

Panda (for Provenance and Data) is a new project whose goal is to address some limitations in existing provenance systems. this short paper describes our overall plans for Panda, including: a model that fully integrates data-based and process-based provenance; a set of built-in operators for exploiting provenance after it has been captured; an ad-hoc query language over provenance together with data; supporting the range from fine-grained to coarse-grained provenance; and addressing optimization problems involving eager versus lazy evaluation and data caching.

Opportunistic Programming

Who will be writing software in the future and how will they be doing it? As computing becomes increasingly important in people's work andhobbies, a much broader range of people are engaging in programming. Understanding and building tools for professional software developers has a long history, but there has been relatively little research on how to support amateur, opportunistic programmers. Professor Scott R. Klemmer's NSF-funded research group at Stanford University is currently studying this problem. So far, they have done fieldwork with exhibit designers at San Francisco's Exploratorium Museum, and conducted several empirical studies on how these programmers use information resources while building software. Most notably, the Web has revolutionized the way these individuals write software. They build entire applications by iteratively searching for, understanding, and integrating pieces of functionality embodied in 15-line chunks of code! Right now, Professor Klemmer's research group is building a number of tools to support amateur programmers that embody and support this reliance on Web resources. The broad goal of this work is to make software development faster, easier, and less error-prone for a much larger population.

Make3d

An artist might spend weeks fretting over questions of depth, scale and perspective in a landscape painting, but once it is done, what's left is a two-dimensional image with a fixed point of view. But the Make3d algorithm, developed by Stanford computer scientists, can take any two-dimensional image and create a three-dimensional "fly around" model of its content, giving viewers access to the scene's depth and a range of points of view.

ImageNet

ImageNet is an image dataset organized according to the WordNet hierarchy. Each meaningful concept in WordNet, possibly described by multiple words or word phrases, is called a "synonym set" or "synset". There are more than 100,000 synsets in WordNet, majority of them are nouns (80,000+). In ImageNet, we aim to provide on average 1000 images to illustrate each synset. Images of each concept are quality-controlled and human-annotated. In its completion, we hope ImageNet will offer tens of millions of cleanly sorted images for most of the concepts in the WordNet hierarchy.

d.tools

What kind of tools would you need to make a functional interactive prototype of a media player in 30 minutes? d.tools is a hardware and software system that enables designers to rapidly prototype the bits (the form) and the atoms (the interaction model) of physical user interfaces in concert. d.tools was built to support design thinking rather than implementation tinkering. With d.tools, designers place physical controllers (e.g., buttons, sliders), sensors (e.g., accelerometers), and output devices (e.g., LEDs, LCD screens) directly onto form prototypes, and author their behavior visually in our software workbench.