OMSCS CS7642 (Reinforcement Learning) Review and Tips

Valerie Lim
3 min readDec 26, 2021

In my first semester, I took Reinforcement Learning (RL) and Machine Learning for Trading (ML4T) which you can find my review and tips about that course here.

The RL course was a very fruitful one. We learnt how an agent can take the best action while navigating the world, and implemented seminal algorithms in the Multi-agent RL space.

Why did I take the course?

Understanding human behaviours has been an area of interest to me, in particular human behaviour with a temporal aspect (such as consumer decision journey). I’m excited to uncover how to leverage sequential data and techniques in RL to understand human behaviour, and suggest the next best action. I believe learnings from RL can help me build more accurate people-centric models. The teaching assistants (TAs) made this course a lot more enjoyable, with their regular office hours and guidance throughout this period. The TAs would clarify doubts and questions about the lectures, homeworks, projects and exams during the office hours (held twice weekly).

What’s the course like?

This course is rather heavy in terms of workload. There are 4 components to this course, 6 homeworks, 3 projects and 1 final exam. There are about 2 hours of lectures every week.

There is 1 homework assignment due almost every other week, except on weeks where we have to submit our projects too. These 6 homeworks constitute 30% of the overall grade. They are designed to hone the basics of RL, such as using Policy Iteration, or SARSA, or Q-learning to solve a problem.

There are also 3 projects that make up 45% of the overall grade. Each project involves writing a report of no more than 5 pages, and submitting your source code. In the first project, we replicated the seminal TD-lambda paper by Sutton, while in the second project, I built a deep RL agent to land a rocket in OpenAI’s LunarLander environment! The last project involved building a multi-agent deep RL agent to beat the baselines provided by the RL teaching team in the Google Football Research environment! I enjoyed the last 2 projects, as they provided hand-on experience with deep RL, though I found the last project to be the most rewarding and practical. This is because a successful agent rarely acts in a vacuum — in applications such as flying a drone, negotiating, agents must learn to act optimally not only within their environment, but also in the presence of other adaptive agents in their environment. While the last project was the most time-consuming, I’m very glad to dig into this new field of multi-agent deep RL, digging into RLlib (which is an open-source library for RL).

Lastly, there is a heavy weightage (worth 25% of the overall grade) final exam. The exam format was 50 MCQs, that tested knowledge from the whole course. I prepared for it by going through the lectures, assigned readings and textbook. Most of the MCQs were very tricky. You get to see your results immediately after submitting your responses.

What did I learn?

The course covered the basics of RL, starting from model-based methods such as Value Iteration and Policy Iteration, as well as model-free methods such as SARSA, Q-learning (using tabular methods) and deep Q-learning, to multi-agent RL (I implemented Counterfactual multi-agent Proximal Policy Optimization (COMA PPO) in my third project). I also enjoyed learning to use a new library, RLlib, to implement the COMA PPO algorithm in my final project.

There are many opportunities to read, understand and replicate research papers, and finally convey your understanding and analysis in a report.

What’s next?

I’m excited to further explore deep RL, and learn how they can be applied to understand and predict human behaviour. One of the TAs, Miguel wrote a great book on deep RL which I’ll find time to dive deeper into. I also hope I will be able to find useful use cases to apply them at work.

Tips

I did some preparation work ahead of the course, such as covering the first 8 chapters of David Silver’s seminars and looking through the textbook to fill gaps in my learning.

Grades

I’ve been an average performer for projects and homeworks. The exam was really tricky, and I was quite shocked when I received my exam results. Thankfully, there was a curve that was skewed in my favour, and I got an A in the end :)

--

--

Valerie Lim

A fast learner and self-starter, Valerie is results driven and possesses strong analytical skills | Data Scientist @ Dell | linkedin.com/in/valerie-lim-yan-hui/