Debug your life like a DRL agent

Mohamed Abogazia
5 min readAug 16, 2020

This is an intersection point I found myself in after reading “the subtle art of not giving a f***” and “Reinforcement learning”

Suffer.

why do I always suffer? what’s the point? and why always me? what’s wrong with my life?

I’m Arab. These two words have given me more disadvantages than you could possibly imagine, not being Arab itself but being born in a third world neglected country, everyone judges me for it, though I didn’t choose it, though I’m facing enough problems of no eduction, no income (American dollar valued), no nothing.

and I started questioning why my life is so f*ed up? It got me really sad, I don’t want to say depressed because depression is more serious than that, I don’t wanna abuse it.

after some reading, I came to the fact that I’m not the only one who’s suffering everyone suffer and I’ll continue to suffer my whole life whether I liked it or not, and I can bitch and whine about it or I can take responsibility and figure out a way of suffering better, of suffering beautifully as Lex Fridman says.

Imagine you’re training a reinforcement learning agent and your agent is receiving only negative rewards and then it crashes and then more negative rewards every epoch, you’ve trained it for hours, it’s not evolving. you realize something is wrong and you start debugging your agent, what could be wrong?

1- Your value function is misleading or superficial:

imagine your agent is a racing car and your value function is to be faster than other cars in the track. at first glance, this seems like a good value function, if you’re faster than everyone else, you’re going to be first! easy!

with this value function, your agent will rarely get to half of the track, no matter how much you train it no matter how much it suffers, it will be a failure.

this is you when you think that you’ll be happy when I’m richer than everyone else or when I have more knowledge than everyone else (the latter was mine for so long)

it’s just stupid because it’s dependent on external events, you can’t control others’ speed, a better one would be “go as fast as possible at every point of the race” on…