Reinforcement Learning

Reinforcement Learning: Investigating Gradient Stability in Policy Based Methods

How does the gradient stability differ between REINFORCE, G(PO)MDP, G(PO)MDP+ whitening during policy learning?