Fixed bug in from_checkpoint.py for recurrent PG/PPO models
This bug was linked to the use of compute_single_action, where the seq_lens and the state parameters were empty. This bugged out the script, preventing us from simulating learned policies using from_checkpoint.py. This has since been fixed. The QMIX LSTM model does not apparently suffer from this bug, therefore it is untouched for now.