2023 Feb 11

Observations

step count on each episode on learning

In this figure we can see that after converging to the lower step count it pops back into a higher step count. There may be an incident where when we consider the replay buffer the actual inputs of the replay buffer gets minimized so that the network has so little inputs to process and the other part where we padded will be bigger than the input.

To remove this the structure of pointer net can be used to reduce this error.

simple explanation in pointer networks

Here you can find a simple explanation on implementing a pointer network.