Ashutosh Saxena

Alfred P. Sloan Fellow and Microsoft Faculty Fellow.

Director, RoboBrain project, Cornell University.
Department of Computer Science, Stanford, CA 94305.
asaxena at cs.stanford dot edu

PhD, Machine Learning, Stanford University with Andrew Y. Ng (advisor), Sebastian Thrun and Stephen Boyd.



  • Eight Innovators to Watch in 2015, Smithsonian Institution.
  • Co-founder, Cognical Zibby, 2012-14.
  • RSS Early Career Award, 2014.
  • NSF Career award, 2013.
  • Microsoft Faculty Fellow, 2012.
  • Alfred P. Sloan Fellow, 2011.
  • Best Cognitive Robotics paper award at IROS'14.
  • Best student paper award at RSS'13.
  • CUAir at AUVSI competition'12: First prize, mission performance category; second price, overall.
  • Google Faculty Research Award, 2012.
  • Associate Editor, IEEE Transactions on Robotics.
  • Co-chair, IEEE Technical committee on Robot Learning.
  • National Talent Scholar, India.

Research Interests

Artificial Intelligence: Machine Learning, Robotics, Computer Vision / Perception, Large-scale device networks.

Currently my group is building RoboBrain, a large-scale knowledge engine for robots. RoboBrain crawls the Internet for images, videos and text and also interacts with robots in several universities in order to learn new concepts about the world. One important goal of the Robo Brain project is integrative learning --- Robo Brain considers various robotics tasks (perception, planning, manipulation and language), different sensing modalities (RGB, 3D, haptic, sound), different feedback mechanisms (unsupervised, deep, supervised, co-active) into one joint system.

My career award talk at RSS in July'14 details the scientific idea behind this project (see video). For an informal description of Robo Brain, please see this article in the Wired magazine.

Robot Learning

3D perception/manipulation. Deep Learning for Robots.


Machine Learning for Smart Car Cabins.


Crowd-sourced Machine Learning for Robot Manipulation.


3D depths from a single still image.

3D Perception/vision

Scene/activity understanding from RGBD data.

Tell Me Dave

Robot Language grounding into Environment.

PlanIt: Co-active Learning

Crowd-sourced Feedback for Motion Planning

PhD Students

Yun Jiang

Yun Jiang Aug 2014, McMullen Fellow

Hallucinating Humans: Learning Infinite Latent CRFs for 3D Perception and Mobile Manipulation.

Hema Koppula

Hema Koppula Google PhD Fellow, best student paper

Modeling Humans from multi-modal data, anticipate their future actions, and have robots collaborate with them.

Ian Lenz

Ian Lenz 

Deep Learning for Robotics Perception and Control.

Ashesh Jain

Ashesh Jain

Learning from Weak Human Signals, Brain4Cars

Jaeyong Sung

Jaeyong Sung  

RoboBarista: Learning Maniplation Skills from Videos.

Chenxia Wu

Chenxia Wu  

Unsupervised 3D Scene and Video Understanding.

Congcong Li

Congcong Li 2012, with Tsuhan Chen

Large-scale computer vision with Cascaded Classification Models.

Changxi Zheng

Changxi Zheng 2012, Doug James' student, now assistant professor at Columbia University


Dipendra K Misra  

Tell Me Dave: Large-scale crowdsourcing for learning robot language.

Ozan Sener 

Large-scale video mining for RoboBrain.

Zhaoyin Jia 2013, with Tsuhan Chen

3D Scene Understanding with Physics.