obs spaces simplified to boxes in init and reset, superfluous methods deleted;...
This is a re-implementation of the Anti-poaching game, motivated by the growing numbers of compatibility layers and hard-to-find bugs. This new version, tentatively numbered 0.3, should be in full agreement with the current description (as described in the NeurIPS draft).
Notable changes to the environment (and other resulting code) include:
- Poachers are now (definitely) in a special
NULL_POS
state when captured, but are NOT terminated or truncated. Any penalty that they receive can now be assigned to them directly, without maintaining a global list of total_rewards. This is theoretically sound, and also drastically simplifies dealing with QMIX. - All agents are terminated simultaneously at the end of the game, which is a finite horizon game.
- The observation spaces are now Boxes by default, and not highly composite spaces. This was a major pain point, requiring the StackerWrapper and the NonCategoricalFlatten wrappers, which makes them boxes anyway. These two are now superflous, and thus removed. This shortens my tracebacks, and removes any unncessary points of failure.
- Captured Poachers receive a large penalty for being captured, penalty
C_{prey}
for each prey they were carrying when captured, andC_{trap}
for each trap. They loseC_{trap}
whenever a trap (full or empty) is captured, and gainR_{trap}
whenever they get a prey from a trap.
The code has been tested with the test suite (modified to handle the changing behaviour of the PettingZoo environment), and with the examples. This includes the manual_examples
and the rllib_examples
.
Note: episode_reward_mean
is now 0 :)