Projects (not updated frequently. Please see publications or Personal Robotics group page.)

Anticipating Human Activities for Reactive Responses

Anticipating which activities will a human do next (and how) can enable an assistive robot to plan ahead for reactive responses in human environments. We propose a graphical model that captures the rich context of activities and object affordances, and obtain the distribution over a large space of future human activities. Tested on robots performing reactive tasks based on anticipations.

Robot Perception through Hallucinated Humans

For reasoning about human environments, it is critical we reason through humans. By observing online 3D data such as on Google 3D warehouse or RGB-D scenes, robots learn how humans use the objects and the environments. Applied to robotic arrangement of objects and 3D scene labeling.

Deep Learning for Grasping

Being able to grasp and pick up objects is critical for a robot to interact with human environments in useful ways. Although a robot should be able to reason about how to grasp any object, even one it has not seen before, it can be difficult to design good features which allow it to do so. In this work, we use a deep neural network to learn these features instead, both avoiding the need to hand-engineer them, and improving the performance of our grasp detection system.

Personal Robots: Learning Object Placements

Learning algorithms to predict robotic placements, even for objects of types never seen before by the robot. pplied to tasks such as arranging a cluttered room, loading items onto a dish-rack, or putting items in a fridge, etc.

  • Selected Papers: ICRA'12, IJRR'12.
  • Students: Yun Jiang, Marcus Lim, Changxi Zheng.
  • Research/Code/Data: project webpage.
  • Popular Press: Newswise, Zee News, News Tonight, ACM Technews, Communications of the ACM, UPI, NDTV, CBS WBNG Action News.

Learning Human Activities from RGBD Videos

Being able to detect human activities is important for making personal assistant robots useful in performing assistive tasks. Our CAD dataset comprises twelve different activities (composed of several sub-activities) performed by four people in different environments, such as a kitchen, a living room, and office, etc. Tested on robots reactively responding to the detected activities. (Code + CAD dataset available).

Personal Robots: 3D Scene Understanding

Learning algorithms to understand the 3D structure of the scenes.

Make3D: Single Image Depth Perception

Learning algorithms to predict depth and infer 3-d models, given just a single still image. Applications included creating immersive 3-d experience from users' photos, improving performance of stereovision, creating large-scale models from a few images, robot navigation, etc. Tens of thousands of users have converted their single photographs into 3D models.

Personal Robots: Learning Robotic Grasps

Learning algorithms to predict robotic grasps, even for objects of types never seen before by the robot. Applied to tasks such as unloading items from a dishwasher, clearing up a cluttered table, opening new doors, etc.

Holistic Scene Understanding: Combining Models as Black-boxes using Cascades (CCM), θ-MRF

Holistic scene understanding requires solving several tasks simultaneously, including object detection, scene categorization, labeling of meaningful regions, and 3-d reconstruction. We develop a learning method that couples these individual sub-tasks for improving performance in each of them.

Visual Navigation: Miniature Aerial Vehicles

Use monocular depth perception and reinforcement learning techniques to drive a small rc-car at high speeds in unstructured environments. Also fly a indoor helicopters/quadrotors autonomously using a single onboard camera.

STAIR: Opening New Doors

For a robot to practically deployed in home and office environments, they should be able to manipulate their environment to gain access to new spaces. We present learning algorithms to do so, thus making our robot the first one able to navigate anywhere in a new building by opening doors and elevators, even ones it has never seen before.

Sound Location from Single Microphone

The ability to perform monaural (single-ear) localization is important to many animals; indeed, monaural cues are also the primary method by which humans decide if a sound comes from the front or back, as well as estimate its elevation. In this paper, we propose a machine learning approach to monaural localization, using only a single microphone and an "artificial pinna" (that distorts sound in a direction-dependent way).

STAIR: Optical Proximity Sensors

We propose novel optical proximity sensors for improving grasping. These sensors, mounted on fingertips, allow pre-touch pose estimation, and therefore allow for online grasp adjustments to an initial grasp point without the need for premature object contact or regrasping strategies.


We developed algorithms to automatically modify videos by adding textures in them. Our algorithms perform robust tracking, occlusion inference, and color correction to make the texture look part of the original scene.

Visual Navigation: High speed obstacle avoidance

Use monocular depth perception and reinforcement learning techniques to drive a small rc-car at high speeds in unstructured environments.

Make3D extension: Large Scale Models from Sparse View

Create 3-d models of large environments, given only a small number of (possibly) non-overlapping images. This technique integrates Structure from Motion (SFM) techniques with Make3D's single image depth perception algorithms.

Improving Stereovision using monocular cues

Stereovision is fundamentally limited by the baseline distance between the two cameras. I.e., the depth estimates tend to be inaccurate when the distances considered are large. We believe that monocular visual cues give largely orthogonal, and therefore complementary, types of information about depth. We propose a method to incorporate monocular cues to stereo (triangulation) cues to obtain significantly more accurate depth estimates than is possible with either alone.

6-D wireless sourceless mouse

This device uses accelerometers and gyrometers to estimate its 3-d location and 3-d orientation. This device can be used, for example, to conveniently navigate in a 3-d virtual world.

Noise tolerant Locally Linear Isomaps

Isomaps (for non-linear dimensionality reduction) suffer from the problem of short-circuiting, which occurs when the neighborhood distance is larger than the distance between the folds in the manifolds. We proposed a new variant of Isomap algorithm based on local linear properties of manifolds to increase its robustness to short-circuiting.

Data-driven Robotics

The issue of what data is there to learn from is at the heart of all learning algorithms---often even an inferior learning algorithm will outperform a superior one, if it is given more data to learn from. We proposed a novel and practical solution to the dataset collection problem; we first use a green screen to rapidly collect data and then use a probabilistic model to rapidly synthesize a much larger training set. We used this data to build reliable classifiers for our robots.

Expression/Gesture Recognition

Infer facial expressions (e.g., smile, surprise, disgust, etc.) given an image of a face. This algorithm builds a sparse geometric model of face, and uses the parameters of the geometric model as features in a learning algorithm. Reasonably robust to partial occlusions. In a similar project, we use a web camera to track the hand and to infer the hand gestures for controlling a simple computer GUI. (No other equipment such as gloves were needed.)

Converting insulator polystyrene to moderately conducting polymer

We described a simple, bioinspired approach for the conversion of an insulator, polystyrene, to a moderately conducting polymer by introducing adenine nucleobases.

ELifebelt: Wristworn device to save a person from electric shock

We developed a electronic device that when worn as a wrist-watch protects the person from electric shocks. It monitors the skin potentials continuously and trips the power circuit wirelessly to save the person's life.

Other projects See publications page for more.