fix demo minigrid #118

conwayz · 2024-10-25T23:15:36Z

fix two issues trying to run the commands from dev branch

python demo.py --env minigrid --mode train 
python demo.py --env minigrid --mode train  --vec multiprocessing

Discrete has shape () and np.prod(()) = 1.0 so need to cast to int when building network
set num agents correctly in MultiProcessing

however it does look like dev branch is slower (35k vs 50k sps)
1.0

dev

For the vector.py change my understanding is

MultiProcessing spawn num_workers workers 'sees' a portion of the buffer
The outer buffer has dimension (num_workers, agents_per_worker, *obs_shape) concretely let's say this is (6, 8, 160)
So each worker should get a 'slice' of shape of shape (agents_per_worker, *obs_shape) = (8, 160). Before this change the are only getting a slice of (1, 160) because driver_env.num_agents is 1.
From my understanding, driver_env is like the first env; in the worker process we use Serial so it's like one of the envs in Serial.
If the slice of observations is shape (1, 160) instead of (8, 160), then in Serial._assign_buffers when we assign parts of the buffer to the worker we will end up assigning empty slices and will see errors downstream

leanke · 2024-10-26T00:13:36Z

@conwayz I can test this in pokemon_red, but my concern is how this effects the envpool.

some of the environments we run use the envpool to batch out data from x amount of agents (agents_per_worker) at a time as the finish an episode. so essentially the batch dim you would see in the policy is equal the the number of agents in a single worker.

to clarify using your above example "(num_workers, agents_per_worker, *obs_shape) concretely let's say this is (6, 8, 160)" you would expect to see (8,160) if you were to check the shape of the observation from within the policy.

conwayz · 2024-10-26T02:19:10Z

right i think this is the case in minigrid no? we pass in something like buffer[worker_idx] so like you mentioned the shape of the observation would be (8, 160) which is indexed by the number of agents per worker (in thise case 8)

conwayz · 2024-10-26T02:21:42Z

ill try to test a bunch of other envs

conwayz · 2024-10-30T03:56:07Z

@jsuarez5341 @leanke are these changes needed to run demo on dev or was my setup somehow busted?

leanke · 2024-10-30T03:58:12Z

@conwayz let me test minigrid here in a bit but last i tested it had worked

fix demo minigrid

0e20b37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix demo minigrid #118

fix demo minigrid #118

conwayz commented Oct 25, 2024 •

edited

Loading

leanke commented Oct 26, 2024

conwayz commented Oct 26, 2024

conwayz commented Oct 26, 2024

conwayz commented Oct 30, 2024

leanke commented Oct 30, 2024

fix demo minigrid #118

Are you sure you want to change the base?

fix demo minigrid #118

Conversation

conwayz commented Oct 25, 2024 • edited Loading

leanke commented Oct 26, 2024

conwayz commented Oct 26, 2024

conwayz commented Oct 26, 2024

conwayz commented Oct 30, 2024

leanke commented Oct 30, 2024

conwayz commented Oct 25, 2024 •

edited

Loading