By generalization I mean handling more inputs. Then a strategy could be to update the weights so that you get a non random looking image. I think you can do this by applying a noise filter to the matrices.

Then to get a better generalization I wonder if you can apply a Google type super resolution to these weight matrices that now look like images. So with this you have updated the weights based on experiences from many previous images.

If something like this works then you have effectively trained your network with a ?lot of samples you did not have.