# Paper Reading Notes

#### Rajeswaran, et al. (2017). EPOpt: Learning Robust Neural Network Policies Using Model Ensembles [(link)](https://arxiv.org/pdf/1610.01283.pdf)

* Overview

  They proposed a way to transfer learning across 2D mujoco source and target domains, which differ only in physical parameters. Using 2 alternating phases during training s.t. the policy is robust across all parameters in the source distribution *and* the source distribution approximates/contains the parameters in the target domain.
* Algorithm Sketch

  Alternating phases during training:

  1. Sample N=240 parameters/models with prior P, and optimize the policy using TRPO to perform well across each model’s generated trajectories; to make the policy “robust”, simply focus on optimizing over the poorest trajectories.
  2. Get some trajectory from the target domain, and update prior P to adapt the source distribution of parameters to better approximate the unknown parameters in the target. Not fully understanding the probability proof here, but the idea is to use importance sampling and compute the likelihood of target trajectories given a sampled source parameter.
* Notes This method is limited to the assumption that the only difference between source and target task domains is the physical parameters, and it works best when the varying parameters are explicitly modeled in the source distribution. Using only 1) should be sufficient (i.e. no source domain adaptation) if the source distribution is “broad” enough that the target is very similar; adapting the source with 2) is intuitively expensive but it works well when the source/target mismatch is still “model-able”, and this makes the method few-shot transfer since it needs to gather trajectories from target domain as well.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://mandi-zhao.gitbook.io/deeprl-notes/transfer-learning-in-rl/paper-reading-notes.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
