Machine Learning Idea – History Of Gradient Descent As Learning Data For A Deep Update?

Often my iterations stall. The error or loss is too high to say the model has converged. I wonder if this a general problem.

So I will test if you can use the update history of a iteration and feed that to another network to guess the next update.

With this I have.

Update0 Input -> Network -> Update1 target

Update1 Input -> Network -> Update2 target

Update2 Input -> Network -> Update3 target

Update3 Input -> Network -> Update4 target

So when it stalls I will see if I can replace some update with a vector I got from the Network instead. Something like this.