i-Sim2Real: Reinforcement Learning of Robotic Policies in Tight Human-Robot Interaction Loops
Saminda Abeyruwan*, Laura Graesser*, David B. D’Ambrosio, Avi Singh, Anish Shankar, Alex Bewley, Pannag R. Sanketi
Conference on Robot Learning (CoRL), 2022.
In this paper, we train robots to play table tennis cooperatively with humans for up to 340-hit rallies (i.e. more than four minutes of uninterrupted play). The core technical problem addressed is the following: how can you accurately simulate human behavior for sim2real transfer of interactive robot policies?
Don't Start From Scratch: Leveraging Prior Data to Automate Robotic Reinforcement Learning
Homer Walke, Jonathan Yang, Albert Yu, Aviral Kumar, Jedrzej Orbik, Avi Singh, Sergey Levine
Conference on Robot Learning (CoRL), 2022.
ARIEL allows robots to learn a new task with offline RL pre-training and online RL with forward and backward policy to automate resets. We pre-train of dozens of vision-based robotic manipulation tasks, and use this pre-trained policy to quickly learn several new tasks -- with (mostly) reset-free training!
Parrot: Data-Driven Behavioral Priors for Reinforcement Learning
Avi Singh*, Huihan Liu*, Gaoyue Zhou, Albert Yu, Nicholas Rhinehart, Sergey Levine
Oral Presentation (top 1.8% of all submissions)
International Conference on Learning Representations (ICLR), 2021
Reinforcement learning provides a general framework for flexible decision making and control, but requires extensive data collection for each new task that an agent needs to learn. In this paper, we propose a method for pre-training RL agents using data from a wide range of previously seen tasks, and we show how this pre-training can accelerate learning of new tasks.
COG: Connecting New Skills to Past Experience with Offline Reinforcement Learning
Avi Singh, Albert Yu, Jonathan Yang, Jesse Zhang, Aviral Kumar, Sergey Levine
Conference on Robot Learning (CoRL), 2020
In this paper, we propose an approach to incorporate a large amount of prior data, either from previously solved tasks or from unsupervised or undirected environment interaction, to extend and generalize new skills. Our hardest experimental setting involves composing four vision-based robotic skills in a row: picking, placing, drawer opening, and grasping, where a +1/0 sparse re-ward is provided only on task completion.
Scalable Multi-Task Imitation Learning with Autonomous Improvement
Avi Singh, Eric Jang, Alexander Irpan, Daniel Kappler, Murtaza Dalal, Sergey Levine, Mohi Khansari, Chelsea Finn
International Conference on Robotics and Automation (ICRA), 2020
In this work, we aim to build an imitation learning system that can continuously improve through autonomous data collection, while simultaneously avoiding the explicit use of RL to maintain the stability, simplicity, and scalability of supervised imitation. We utilize the insight that, in a multi-task setting, a failed attempt at one task might represent a successful attempt at another task. This allows us to leverage the robot's own trials as demonstrations for tasks other than the one that the robot actually attempted.
The Ingredients of Real World Robotic Reinforcement Learning
Henry Zhu*, Justin Yu*, Abhishek Gupta*, Dhruv Shah, Kristian Hartikainen, Avi Singh, Vikash Kumar, Sergey Levine
International Conference on Learning Representations (ICLR), 2020
The success of reinforcement learning for real world robotics has been, in many cases limited to instrumented laboratory scenarios, often requiring arduous human effort and oversight to enable continuous learning. In this work, we discuss the elements that are needed for a robotic learning system that can continually and autonomously improve with data collected in the real world.
End-to-End Robotic Reinforcement Learning without Reward Engineering
Avi Singh, Larry Yang, Kristian Hartikainen, Chelsea Finn, Sergey Levine
Robotics: Science and Systems (RSS), 2019
In this paper, we propose an approach for removing the need for manual engineering of reward specifications by enabling a robot to learn from a modest number of examples of successful outcomes, followed by a small number of actively solicited queries, where the robot shows the user a state and asks for a label to determine whether that state represents successful completion of the task. We evaluate our method on real-world robotic manipulation tasks where the observations consist of images viewed by the robot's camera. In our experiments, our method effectively learns to arrange objects, place books, and drape cloth, directly from images and without any manually specified reward functions, and with only 1-4 hours of interaction with the real world.
Variational Inverse Control with Events: A General Framework for Data-Driven Reward Definition
Justin Fu*, Avi Singh*, Dibya Ghosh, Larry Yang, Sergey Levine
Neural Information Processing Systems (NeurIPS), 2018
The design of a reward function often poses a major practical challenge to real-world applications of reinforcement learning. Approaches such as inverse reinforcement learning attempt to overcome this challenge, but require expert demonstrations, which can be difficult or expensive to obtain in practice. We propose variational inverse control with events (VICE), which generalizes inverse reinforcement learning methods to cases where full demonstrations are not needed, such as when only samples of desired goal states are available. Our method is grounded in an alternative perspective on control and reinforcement learning, where an agent's goal is to maximize the probability that one or more events will happen at some point in the future, rather than maximizing cumulative rewards. We demonstrate the effectiveness of our methods on continuous control tasks, with a focus on high-dimensional observations like images where rewards are hard or even impossible to specify.
Few-Shot Goal Inference for Visuomotor Learning and Planning
Annie Xie, Avi Singh, Sergey Levine, Chelsea Finn
Conference on Robot Learning (CoRL), 2018
We aim to find a more general and scalable solution for specifying goals for robot learning in unconstrained environments. To that end, we formulate the few-shot objective learning problem, where the goal is to learn a task objective from only a few example images of successful end states for that task. We propose a simple solution to this problem: meta-learn a classifier that can recognize new goals from a few examples. We show how this approach can be used with both model-free reinforcement learning and visual model-based planning, for manipulating ropes from images in simulation and moving objects into user-specified configurations on a real robot.
Divide-and-Conquer Reinforcement Learning
Dibya Ghosh, Avi Singh, Aravind Rajeswaran, Vikash Kumar, Sergey Levine
International Conference on Learning Representations (ICLR), 2018
Standard model-free deep reinforcement learning (RL) algorithms sample a new initial state for each trial, allowing them to optimize policies that can perform well even in highly stochastic environments. However, problems that exhibit considerable initial state variation typically produce high-variance gradient estimates for model-free RL, making direct policy or value function optimization challenging. In this paper, we develop a novel algorithm that instead partitions the initial state space into "slices", and optimizes an ensemble of policies, each on a different slice. The ensemble is gradually unified into a single policy that can succeed on the whole state space. This approach, which we term divide-and-conquer RL, is able to solve complex tasks where conventional deep RL methods are ineffective.
GPLAC: Generalizing Vision-Based Robotic Skills using Weakly Labeled Images
Avi Singh, Larry Yang, Sergey Levine
International Conference on Computer Vision (ICCV), 2017
We tackle the problem of learning robotic sensorimotor control policies that can generalize to visually diverse and unseen environments. Achieving broad generalization typically requires large datasets, which are difficult to obtain for task-specific interactive processes such as reinforcement learning or learning from demonstration. However, much of the visual diversity in the world can be captured through passively collected datasets of images or videos. In our method, which we refer to as GPLAC (Generalized Policy Learning with Attentional Classifier), we use both interaction data and weakly labeled image data to augment the generalization capacity of sensorimotor policies.
Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José M. F. Moura, Devi Parikh, Dhruv Batra
Computer Vision and Pattern Recognition (CVPR), 2017 (Spotlight)
We introduce the task of Visual Dialog, which requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a question about the image, the agent has to ground the question in image, infer context from history, and answer the question accurately. Visual Dialog is disentangled enough from a specific downstream task so as to serve as a general test of machine intelligence, while being grounded in vision enough to allow objective evaluation of individual responses and benchmark progress.
Recurrent Neural Networks for Driver Activity Anticipation via Sensory-Fusion Architecture
Ashesh Jain, Avi Singh, Hema S Koppula, Shane Soh, Ashutosh Saxena
International Conference on Robotics and Automation (ICRA), 2016
Anticipating the future actions of a human is a widely studied problem in robotics that requires spatio-temporal reasoning. In this work we propose a deep learning approach for anticipation in sensory-rich robotics applications. We introduce a sensory-fusion architecture which jointly learns to anticipate and fuse information from multiple sensory streams. Our approach shows significant improvement over the state-of-the-art in maneuver anticipation by increasing the precision from 77.4% to 90.5% and recall from 71.2% to 87.4%.
Brain4Cars: Car That Knows Before You Do via Sensory-Fusion Deep Learning Architecture
Ashesh Jain, Hema S Koppula, Shane Soh, Bharad Raghavan, Avi Singh, Ashutosh Saxena
International Journal on Robotics Research (IJRR), 2016
In this work we propose a vehicular sensor-rich platform and learning algorithms for maneuver anticipation. For this purpose we equip a car with cameras, Global Positioning System (GPS), and a computing device to capture the driving context from both inside and outside of the car. In order to anticipate maneuvers, we propose a sensory-fusion deep learning architecture which jointly learns to anticipate and fuse multiple sensory streams.