Skip to content

Fixed bug in from_checkpoint.py for recurrent PG/PPO models

Maddila Siva Sri Prasanna requested to merge fix-recurrent-replay into dev

This bug was linked to the use of compute_single_action, where the seq_lens and the state parameters were empty. This bugged out the script, preventing us from simulating learned policies using from_checkpoint.py. This has since been fixed. The QMIX LSTM model does not apparently suffer from this bug, therefore it is untouched for now.

Merge request reports