Inspired by spline fitting where you have a tiny model with limited number of weights. I think you can have a tiny and a big machine learning model working together.
The Tiny model get a new parameter value for each step in the solution and the Big model provides them. A bit like spline fitting. This way you have one set of weights for the big model and many different weights for the tiny model.
So the solution will have like 1000+ different tiny models. Just the weights differ. So its a fitting, not a optimization of a limited number of weights.