Dec. 8, 2019, 11:09 a.m.
I had been trying to train a version of VAE-GAN for a few weeks and it wasn't working as well as I had hoped it would. I had added an auxiliary output to the discriminator which was attempting to predict the 40 features of each image provided with the celeb-a dataset as suggested in the VAE-GAN paper and I was scaling that loss to try to bring it in line with the GAN discriminator loss, but I was doing that incorrectly so that loss ended up overwhelming the GAN loss. (I was summing, rather than averaging the losses, and the lambda I was using to scale the loss was appropriate for a mean loss, but with 40 features the auxiliary loss was 40x the GAN loss at base, so I needed to divide the lambda by 40 to get the effect I wanted.)
After having corrected that error I am finally making some progress with these models. Below are sample images from two models I am training. The first outputs images at 160x160, the second at 128x128.
I guess the moral of this story is if something isn't working the way you expect it to, double check your math before you continue training it!
Sept. 22, 2019, 1:34 p.m.
I started working on a variational auto-encoder (VAE) for faces a few months ago. I was easily able to make a non-variational autoencoder to reproduce images that worked incredibly well, but since it was not variational there wasn't much you could do with it other than compress images. I wanted to be able to play with interpolation and such, and for that you need a VAE. So I converted my auto-encoder to a variational one, but the problem was that the resulting images were very blurry and the quality wasn't all that great. So I thought maybe I could attach a GAN to this to make the images look more realistic. And I tried that but unfortunately it didn't work very well, the GAN was trying to produce to generate images of what it though were faces will the autoencoder was trying to reproduce its input, as seen in the images below:
After fighting with this for a few months I decided to try to make sure that the GAN was working properly before I added on the autoencoder, and although I had to fight with the GAN quite a bit and was never able to get it to generate really high quality images, I was sure that it was working properly. So I decided to try to hook it up to the autoencoder again.
Then I discovered this paper Autoencoding beyond pixels using a learned similarity metric, which does the same thing I was trying to do but in a much smarter way. What I had been doing was using the MSE between the input and the generated images for my VAE loss, and training both the encoder and the decoder with the GAN loss. Obviously this did not work.
What they do in the paper is basically separate the encoder and leave the decoder and discriminator as the GAN, which is trained as usual. I had tried to think of ways to train the encoder and decoder separately, but my ideas were much more primitive and didn't work at all. What they do that is train the encoder separately, using the KLD loss and - this is the brilliant part - instead of using MSE between the input and the recreation they use the MSE between a feature map from an intermediate layer of the discriminator for the real and faked images. So rather than trying to produce an exact duplicate of the input, the encoder is trying to produce something that the discriminator thinks is close to the input.
It took me a few hours to rewrite my code to make use of this new loss, and come up with a version that would be able to run without having to keep all of the graphs in memory and be able to train in a reasonable amount of time, and I think everything is finally working. Hopefully this works better than my previous attempts, and next time I will try to remember to review the literature before trying to implement a new idea on my own.
Sept. 21, 2019, 1:18 p.m.
I've now been trying to train my GANs for quite a while and still haven't been too successful, but I have learned some tricks. I found this excellent article a while ago and I didn't really understand it completely at first, but after having tried a lot of its tricks I understand them now. Here are my thoughts and some additional tricks I have used:
Some additional tips on how to construct a GAN:
Aug. 24, 2019, 7:20 a.m.
I had been trying to train my autoencoder with a GAN component on and off for a couple of months and it just didn't seem to be working very well. I thought that maybe the autoencoder and the discriminator errors were somehow cancelling each other out or something. Just for the hell of it I decided to try to use the discriminator to optimize a reconstructed image to look real, just to see what the result would be. Instead of optimizing the weights, I created a Variable of the input and optimized that instead. To my surprise I ended up with weird splotches of primary colors against a white background, it actually made the image look less and less real rather than more. After seeing that I decided that there must be some major problem with my code so I went through it in greater detail.
I decided to train all three networks from scratch (the three being the encoder, the decoder and the discriminator) to see what would happen. I was surprised to see that the generator did not seem to be learning ANYTHING and neither did the discriminator. I found a tutorial on creating a GAN in PyTorch and I went through the training code to see how it differed from mine.
I had written my code to optimize it for speed, training the autoencoder without the GAN already took about 4 hours per epoch on a (free) K80 on Colab so I didn't want to slow that down much more, so I tried to minimize the numebr of times data had to be passed through the networks. The tutorial did not do that. First it ran a batch of real data through the discriminator, computed the gradients but did NOT back propagate them. Then it used the generator to generate a batch of faked data, passed that through the discriminator, computed the gradients, added them to the gradients from the first batch and THEN did the back prop. Then it ran the same batch of faked data through the discriminator again, and used that to update the generator. This was different from my code in several major ways:
After updating my code to bring it more in line with the tutorial both networks began to learn, I think that major change was detaching the reconstructed images before putting them through the discriminator. However I noticed a few strange things regarding the discriminator batches:
I read in a couple of places that using separate batches was a trick to make GANs train better, but no one really had an explanation for why this worked. What I am currently doing it using separate batches most of the time, before every n batches I use a single batch to encourage the discriminator to learn a bit more. I've tested values for n of 8, 16, 32 and 64. Most of those seemed to result in the worst of both worlds, nothing really seemed to improve, but with n = 64 the autoencoder loss is again decreasing, although slowly, and the discriminator accuracy is hovering around 52% rather than the 49-50% it was at using all separate batches.
To me using separate batches doesn't intuitively make sense, I don't see how the network can really learn to differentiate between classes when it only sees one class at a time. Of course the gradients are then added, and the differences should cancel out, with what's left indicating how to differentiate the classes; but to me it seems much more efficient to learn from mixed batches. One would never consider training a network on, say, the CIFAR dataset with each batch consisting exclusively of a single class. Maybe that's the point, to slow down the discriminator's learning enough for the generator to keep up? Anyway I will continue to experiment and see what works and what doesn't work.