July 5, 2018, 10:37 a.m.
I wrote about this paper before, but I am going to again because it has been so enormously useful to me. I am still working on segmentation of mammograms to highlight abnormalities and I recently decided to scrap the approach I had been taking to upsampling the image and start that part from scratch.
When I started I had been using the earliest approach to upsampling, which basically was take my classifier, remove the last fully-connected layer and upsample that back to full resolution with transpose convolutions. This worked well enough, but the network had to upsample images from 2x2x1024 to 640x640x2 and in order to do this I needed to add skip connections from the downsizing section to the upsampling section. This caused problems because the network would add features of the input image to the output, regardless of whether the features were relevant to the label. I tried to get around this by adding bottleneck layers before the skip connection in order to only select the pertinent features, but this greatly slowed down training and didn't help much and the output ended up with a lot of weird artifacts.
In "Deconvolution and Checkerboard Artifacts", Odena et al. have demonstrated that replacing transpose convolutions with nearest neighbors resizing produces smoother images than using transpose convolutions. I tried replacing a few of my tranpose convolutions with resizes and the results improved.
Then I started reading about dilated convolutions and I started wondering why I was downsizing my input from 640x640 to 5x5 just to have to resize it back up. I removed all the fully-connected layers (which in fact were 1x1 convolutions rather than fully-connected layers) and then replaced the last max pool with a dilated convolution.
I replaced all of the transpose convolutions with resizes, except for the last two layers, as suggested by Odena et al, and the final tranpose convolution has a stride of 1 in order to smooth out artifacts.
In the downsizing section, the current model reduces the input from 640x640x1 to 20x20x512, then it is upsampled by using nearest neighbors resizing followed by plain convolutions to 320x320x32. Finally there is a tranpose convolution with a stride of 2 followed by a transpose convolution with a stride of 1 and then a softmax for the output. As an added bonus, this version of the model trains significantly faster than upsampling with transpose convolutions.
I just started training this model, but I am fairly confident it will perform better than previous upsampling schemes as when I extracted the last downsizing convolutional layer from the model that layer appeared closer to the label (although much smaller) than the final output did. I will update when I have actual results.
Update - After training the model for just one epoch, with the downsizing layer weights initialized from a previous model, the results are already significantly better than under the previous scheme.
June 14, 2018, 9:21 a.m.
If you have ever used deconvolutions to upsample layers of convnets you have probably seen artifacts and possibly checkerboard patterns. This article explains why and gives some useful tips as to how to avoid the problem. I have implemented some of the suggestions and, while it's a bit early to evaluate their efficacity, so far they seem to be helping.
June 13, 2018, 1:53 p.m.
As I continue to work on my mammography project I save a lot of time by re-using weights from models I have already trained rather than training every iteration of every model from scratch, which would be very time consuming. However a drawback to this method is that if I add a new layer or change a layer when I continue training the model the layers which have not changed are prone to overfit as they have been trained for substantially longer than the new layers.
I tried only training certain variables, but when the checkpoint is saved only the trained variables are included in it, which means that the checkpoint can not be restored as it is missing many variables. This could be overcome by restoring certain variables from one checkpoint and others from a different checkpoint, but that is overly complicated and not very convenient.
Earlier today, I had added another deconvolution layer to my model. When I trained just that layer the accuracy of the model went very high very quickly, much more quickly than training all of the layers. But then I couldn't continue training all of the layers because the checkpoint only contained the layer trained. I don't have the time to retrain the entire monstrosity from scratch, so I found an ugly hack that allows me to train mostly the layers I want to train while saving all of the weights in the checkpoint.
I create two training ops - one for all variables (train_op_1) and one for the variables I want to train (train_op_2). I run train_op_2 most of the time. But right before I save the checkpoint I do one iteration of train_op_1 which updates all layers, so all variables are saved in the checkpoint. It's not pretty, but it works and best of all, the code doesn't have to be changed depending on what I want to train. I specify whether I want to train all vars or just the subset as a command line arg and if I want to train all vars, then set train_op_2 = train_op_1.
I just ran a few quick tests with no issues, hopefully this will continue to work.
June 12, 2018, 1:34 p.m.