Skip to content

obs spaces simplified to boxes in init and reset, superfluous methods deleted;...

Maddila Siva Sri Prasanna requested to merge reimpl-env into main

This is a re-implementation of the Anti-poaching game, motivated by the growing numbers of compatibility layers and hard-to-find bugs. This new version, tentatively numbered 0.3, should be in full agreement with the current description (as described in the NeurIPS draft).

Notable changes to the environment (and other resulting code) include:

  1. Poachers are now (definitely) in a special NULL_POS state when captured, but are NOT terminated or truncated. Any penalty that they receive can now be assigned to them directly, without maintaining a global list of total_rewards. This is theoretically sound, and also drastically simplifies dealing with QMIX.
  2. All agents are terminated simultaneously at the end of the game, which is a finite horizon game.
  3. The observation spaces are now Boxes by default, and not highly composite spaces. This was a major pain point, requiring the StackerWrapper and the NonCategoricalFlatten wrappers, which makes them boxes anyway. These two are now superflous, and thus removed. This shortens my tracebacks, and removes any unncessary points of failure.
  4. Captured Poachers receive a large penalty for being captured, penalty C_{prey} for each prey they were carrying when captured, and C_{trap} for each trap. They lose C_{trap} whenever a trap (full or empty) is captured, and gain R_{trap} whenever they get a prey from a trap.

The code has been tested with the test suite (modified to handle the changing behaviour of the PettingZoo environment), and with the examples. This includes the manual_examples and the rllib_examples.

Note: episode_reward_mean is now 0 :)

Merge request reports