# Transfer and Multi-task Learning

To quote from the lecture: transfer learning is a machine learning problem that deals with using experience from one set of tasks for faster learning and better performance on a new task; specific to RL, we can define each task to be an MDP, and train agents that can do well on unseen (i.e. never-trained-on) tasks. Transfer learning in RL is still an active research area and has many sub-problems that vary in their task setups and problem formulations, so this section provides a broad overview of where they fit into the big picture, and since I've been pretty interested in this area lately, I'll follow up with some more detailed reading notes on the recommended papers in this section, see [Reading Notes](/deeprl-notes/transfer-learning-in-rl/paper-reading-notes.md)

### Forward Transfer

The general framework for forward transfer in RL is to train the agent on one task, and aim at good performance on target, unseen tasks. Although the definition of the one see task can be blurry: sometimes the agent is trained on samples that are not exactly the same task, for example if you randomize the training process, the underlying MDP gets changed, so each time the agent is really trained on a slightly different task. But the general agreement in forward transfer is to put most effort in the training process, and hope it will guarantee a good testing performance without much interaction with the target tasks.

### **Finetuning**

Finetuning is very popular in supervised deep learning, because as the example below, a lot of deep network's learned/extracted features are meaningful and can be reused to solve different tasks (bird v.s. dog? bird v.s. cat? etc.)

![](/files/-MTIGm81I9ufoAJQuogm)

But in RL, this idea might not be directly transferable, because usually when a policy is trained to converge, it becomes pretty deterministic in taking actions, which means when facing a new task, actions that are good for the new task might not be considered and thus hurts the agent's exploration.

1. Finetuning via MaxEnt RL

   One way to deal with this is to pre-train a policy that's random enough (high entropy), so that more actions are considered during the finetuning process. See the paper *Reinforcement Learning with Deep Energy-Based Policies*\
   &#x20;<img src="/files/-MTF2y_1oBJakGuh7b1U" alt="" data-size="original"> <br>
2. Finetuning from transferred visual features

   See paper *DARLA: improving zero-shot transfer in reinforcement learning*

### **Diversify the Source Domain**

With finetuning, the source and target domains are fixed, and we are mainly concerned with how to best apply source skills to the new target. But often we have some knowledge about how the target will be differ from the source, so an alternative approach is to design the source/training domain, so that the target domain would be a natural extension of it, and the agent can do well without even knowing this is a new task.

1. Randomize Dynamics: \
   e.g. physical parameters

   When the differences between source and target tasks are mainly in transition dynamics such as physical parameters, one thing we can do is to "enrich" the agent's trained experience with lots of different parameters, so that it will be robust to all possible parameters in the target tasks. More specifically, there are two ways to do this:&#x20;

   1. train the model to do well on all parameters

      See paper *EPOpt: Learning robust neural network policies*
   2. explicit train a recurrent model to predict the parameters

      *“Preparing for the Unknown: Learning a Universal Policy with Online System Identification*

      *Sim-to-Real Transfer of Robotic Control with Dynamics Randomization*
2. Randomize Observations: \
   (mostly for vision-based RL)

   Sometimes the underlying dynamics remain the same, only the source and target domain differ in observations (I think of this as different O but same S). In this case, data augmentation has proven to work really well and has the additional benefit of data efficiency

   1. Data augmentation\
      One downside of this, in my opinion, is that randomness from manually augmented data can be limited in both the sense that we can't predict all possible varieties in the domain, and that the model might still overfit to the manually augmented data.

### **Peek into the Target Domain**

The above method focus on enhancing the source domain to approximately "contain" the target without interacting with it, so it's still considered *0-shot transfer*. On the other hand, domain adaptation methods allow the agent to take a look at the target domain, and adjust itself to adapt to it.

1. Domain Adaptation

   These are mostly vision-based RL, and unsurprisingly, GANs become a common tool here:

   1.a. Adversarial Adaptation, use discriminator *Adapting Visuomotor Representations with Weak Pairwise Constraints*

   1.b. Turn synthetic images into realistic ones *Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping* Adversarial Adaptation, use generator transform simulation image to real-world-like images before training

### Multi-task Transfer

* **Model-based Methods**
* **Model Distillation**
  * *Contextual Policy* (more in Meta-learning section)
* **Modular Networks**
* [Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping](https://arxiv.org/pdf/1709.07857.pdf)
* [End-to-End Training of Deep Visuomotor Policies](https://arxiv.org/pdf/1504.00702.pdf)

All included screenshots credit to Lecture 16 ([slides](http://rail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-16.pdf))


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://mandi-zhao.gitbook.io/deeprl-notes/transfer-learning-in-rl/untitled.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
