The idea is to work the input towards an easier recognition. For a letter dataset this would mean that the first network makes a cleaner version of the hard to recognize letters. That is. A smart transformation that produces a simpler to recognize image.
I think this would mean that for already clean looking letters there would be no different image.
I wonder if you can take the images with the best confidence of score of recognition. Use them as examples for the rewrite network model. That is. Use the correct best selected images with the usual data,label network.
This so the network can ask if the generated image is clean or fake. It competes with the set of most recognizable images. That is. I think I should mix in all the recognizable images from the training set into one set of most recognizable images. To extract the clean property I mean.
Then I guess I run the procedure in reverse. First the clean competition then the recognition network.
One problem could be that if non letters get into the mix they would get produce the wrong decision.
So another problem is to recognize if its a letter at all. Again some network that tells if the image is a letter or fake. It almost feels like this is a kind of biological mimicry. Fooling the network that you belong to a another set of specie. But here we have another property, eatable.
For biological mimicry the fooling specie I think should be most successful if it can stay relatively close to the target specie but not too close. This so it will not try to reproduce with those it cant and are not eaten.