Many-Shot In-Context Learning

Rishabh Agarwal*, Avi Singh*, Lei M. Zhang, Bernd Bohnet, Stephanie Chan, Ankesh Anand, Zaheer Abbas, Azade Nova, John D. Co-Reyes, Eric Chu, Feryal Behbahani, Aleksandra Faust, Hugo Larochelle

arXiv, 2024.

In this paper, we investigate how in-context learning performance of large language models changes as we go from the few-shot setting (10s of examples) to the many-shot setting (1000s of examples). We also introduce a method to prompt the model with synthetically generated data, which we call Reinforced In-Context Learning, and find that it can often outperform human-written chain-of-thoughts for reasoning tasks such MATH, GSM8K, GPQA and BIG-Bench Hard.

Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

Avi Singh*, John D. Co-Reyes*, Rishabh Agarwal*, et al (~ 40 authors)

Published in TMLR, 2024.

In this paper, we investigate using synthetic data for improving performance on mathematical problem solving (Hendrycks MATH) and competitive coding (APPS). We find that synthetic data can significantly outperform an equivalent amount of human data, and our method can be seen as an instantiation of expectation-maximization applied to the reinforcement learning setting.

i-Sim2Real: Reinforcement Learning of Robotic Policies in Tight Human-Robot Interaction Loops

Saminda Abeyruwan*, Laura Graesser*, David B. D’Ambrosio, Avi Singh, Anish Shankar, Alex Bewley, Pannag R. Sanketi

Oral Presentation

Conference on Robot Learning (CoRL), 2022. 

In this paper, we train robots to play table tennis cooperatively with humans for up to 340-hit rallies (i.e. more than four minutes of uninterrupted play). The core technical problem addressed is the following: how can you accurately simulate human behavior for sim2real transfer of interactive robot policies? 

Don't Start From Scratch: Leveraging Prior Data to Automate Robotic Reinforcement Learning

Homer Walke, Jonathan Yang, Albert Yu, Aviral Kumar, Jedrzej Orbik, Avi Singh, Sergey Levine

Conference on Robot Learning (CoRL), 2022. 

ARIEL allows robots to learn a new task with offline RL pre-training and online RL with forward and backward policy to automate resets. We pre-train of dozens of vision-based robotic manipulation tasks, and use this pre-trained policy to quickly learn several new tasks -- with (mostly) reset-free training!

Parrot: Data-Driven Behavioral Priors for Reinforcement Learning

Avi Singh*, Huihan Liu*, Gaoyue Zhou, Albert Yu, Nicholas Rhinehart, Sergey Levine

Oral Presentation (top 1.8% of all submissions)

International Conference on Learning Representations (ICLR), 2021

Reinforcement learning provides a general framework for flexible decision making and control, but requires extensive data collection for each new task that an agent needs to learn. In this paper, we propose a method for pre-training RL agents using data from a wide range of previously seen tasks, and we show how this pre-training can accelerate learning of new tasks. 

COG: Connecting New Skills to Past Experience with Offline Reinforcement Learning

Avi Singh, Albert Yu, Jonathan Yang, Jesse Zhang, Aviral Kumar, Sergey Levine

Conference on Robot Learning (CoRL), 2020

In this paper, we propose an approach to incorporate a large amount of prior data, either from previously solved tasks or from unsupervised or undirected environment interaction, to extend and generalize new skills. Our hardest experimental setting involves composing four vision-based robotic skills in a row:  picking, placing, drawer opening, and grasping, where a +1/0 sparse re-ward is provided only on task completion.

Scalable Multi-Task Imitation Learning with Autonomous Improvement

Avi Singh, Eric Jang, Alexander Irpan, Daniel Kappler, Murtaza Dalal, Sergey Levine, Mohi Khansari, Chelsea Finn

International Conference on Robotics and Automation (ICRA), 2020

In this work, we aim to build an imitation learning system that can continuously improve through autonomous data collection, while simultaneously avoiding the explicit use of RL to maintain the stability, simplicity, and scalability of supervised imitation. We utilize the insight that, in a multi-task setting, a failed attempt at one task might represent a successful attempt at another task. This allows us to leverage the robot's own trials as demonstrations for tasks other than the one that the robot actually attempted. 

The Ingredients of Real World Robotic Reinforcement Learning

Henry Zhu*, Justin Yu*, Abhishek Gupta*, Dhruv Shah, Kristian Hartikainen, Avi Singh, Vikash Kumar, Sergey Levine

International Conference on Learning Representations (ICLR), 2020

The success of reinforcement learning for real world robotics has been, in many cases limited to instrumented laboratory scenarios, often requiring arduous human effort and oversight to enable continuous learning. In this work, we discuss the elements that are needed for a robotic learning system that can continually and autonomously improve with data collected in the real world. 

End-to-End Robotic Reinforcement Learning without Reward Engineering

Avi Singh, Larry Yang, Kristian Hartikainen, Chelsea Finn, Sergey Levine

Robotics: Science and Systems (RSS), 2019

In this paper, we propose an approach for removing the need for manual engineering of reward specifications by enabling a robot to learn from a modest number of examples of successful outcomes, followed by a small number of actively solicited queries, where the robot shows the user a state and asks for a label to determine whether that state represents successful completion of the task. We evaluate our method on real-world robotic manipulation tasks where the observations consist of images viewed by the robot's camera. In our experiments, our method effectively learns to arrange objects, place books, and drape cloth, directly from images and without any manually specified reward functions, and with only 1-4 hours of interaction with the real world. 

Variational Inverse Control with Events: A General Framework for Data-Driven Reward Definition

Justin Fu*, Avi Singh*, Dibya Ghosh, Larry Yang, Sergey Levine

Neural Information Processing Systems (NeurIPS), 2018

The design of a reward function often poses a major practical challenge to real-world applications of reinforcement learning. Approaches such as inverse reinforcement learning attempt to overcome this challenge, but require expert demonstrations, which can be difficult or expensive to obtain in practice. We propose variational inverse control with events (VICE), which generalizes inverse reinforcement learning methods to cases where full demonstrations are not needed, such as when only samples of desired goal states are available. Our method is grounded in an alternative perspective on control and reinforcement learning, where an agent's goal is to maximize the probability that one or more events will happen at some point in the future, rather than maximizing cumulative rewards. We demonstrate the effectiveness of our methods on continuous control tasks, with a focus on high-dimensional observations like images where rewards are hard or even impossible to specify. 

Few-Shot Goal Inference for Visuomotor Learning and Planning

Annie Xie, Avi Singh, Sergey Levine, Chelsea Finn

Conference on Robot Learning (CoRL), 2018

We aim to find a more general and scalable solution for specifying goals for robot learning in unconstrained environments. To that end, we formulate the few-shot objective learning problem, where the goal is to learn a task objective from only a few example images of successful end states for that task. We propose a simple solution to this problem: meta-learn a classifier that can recognize new goals from a few examples. We show how this approach can be used with both model-free reinforcement learning and visual model-based planning, for manipulating ropes from images in simulation and moving objects into user-specified configurations on a real robot.

Divide-and-Conquer Reinforcement Learning 

Dibya Ghosh, Avi Singh, Aravind Rajeswaran, Vikash Kumar, Sergey Levine

International Conference on Learning Representations (ICLR), 2018

Standard model-free deep reinforcement learning (RL) algorithms sample a new initial state for each trial, allowing them to optimize policies that can perform well even in highly stochastic environments. However, problems that exhibit considerable initial state variation typically produce high-variance gradient estimates for model-free RL, making direct policy or value function optimization challenging. In this paper, we develop a novel algorithm that instead partitions the initial state space into "slices", and optimizes an ensemble of policies, each on a different slice. The ensemble is gradually unified into a single policy that can succeed on the whole state space. This approach, which we term divide-and-conquer RL, is able to solve complex tasks where conventional deep RL methods are ineffective.

GPLAC: Generalizing Vision-Based Robotic Skills using Weakly Labeled Images 

Avi Singh, Larry Yang, Sergey Levine

International Conference on Computer Vision (ICCV), 2017

We tackle the problem of learning robotic sensorimotor control policies that can generalize to visually diverse and unseen environments. Achieving broad generalization typically requires large datasets, which are difficult to obtain for task-specific interactive processes such as reinforcement learning or learning from demonstration. However, much of the visual diversity in the world can be captured through passively collected datasets of images or videos. In our method, which we refer to as GPLAC (Generalized Policy Learning with Attentional Classifier), we use both interaction data and weakly labeled image data to augment the generalization capacity of sensorimotor policies.

Visual Dialog 

Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José M. F. Moura, Devi Parikh, Dhruv Batra

Computer Vision and Pattern Recognition (CVPR), 2017 (Spotlight)

We introduce the task of Visual Dialog, which requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a question about the image, the agent has to ground the question in image, infer context from history, and answer the question accurately. Visual Dialog is disentangled enough from a specific downstream task so as to serve as a general test of machine intelligence, while being grounded in vision enough to allow objective evaluation of individual responses and benchmark progress.

Recurrent Neural Networks for Driver Activity Anticipation via Sensory-Fusion Architecture 

Ashesh Jain, Avi Singh, Hema S Koppula, Shane Soh, Ashutosh Saxena

International Conference on Robotics and Automation (ICRA), 2016

Anticipating the future actions of a human is a widely studied problem in robotics that requires spatio-temporal reasoning. In this work we propose a deep learning approach for anticipation in sensory-rich robotics applications. We introduce a sensory-fusion architecture which jointly learns to anticipate and fuse information from multiple sensory streams. Our approach shows significant improvement over the state-of-the-art in maneuver anticipation by increasing the precision from 77.4% to 90.5% and recall from 71.2% to 87.4%.

Brain4Cars: Car That Knows Before You Do via Sensory-Fusion Deep Learning Architecture 

Ashesh Jain, Hema S Koppula, Shane Soh, Bharad Raghavan, Avi Singh, Ashutosh Saxena

International Journal on Robotics Research (IJRR), 2016

In this work we propose a vehicular sensor-rich platform and learning algorithms for maneuver anticipation. For this purpose we equip a car with cameras, Global Positioning System (GPS), and a computing device to capture the driving context from both inside and outside of the car. In order to anticipate maneuvers, we propose a sensory-fusion deep learning architecture which jointly learns to anticipate and fuse multiple sensory streams.