June 6, 2018, 3:07 p.m.
Even though I only have 1/5 of the images uploaded so far, I decided to do some tests to see if this method would work. It does, but it took quite a bit of tweaking to get performance to reasonable levels.
At first I just plugged the new dataset into the old graph, and this worked but was incredibly slow with the GPU sitting idle most of the time. I tried quite a few different methods to speed the pre-processing bottleneck up, but the solution was simpler than I had thought it would be.
The biggest factor was increasing number of threads in the tf.train.batch from the default of 1. This one change made a huge difference, cutting the training time down to about 1/4 of what it had been.
I also experimented with moving some pre-processing operations around, including resizing the images individually when loading them and after being batched. This had negligible effects, but resizing them individually was slightly faster than doing it as a batch. In general I found that the more pre-processing operations I moved to the queue (and the CPU) the better the performance.
This version still trains at about 1/2 the speed the tfrecords version did, which is a big difference, but the size of the training set has increased by orders of magnitude so I guess I can live with it.
The code is available on my GitHub.
June 6, 2018, 1:32 p.m.
I am continuing to work with the CBIS-DDSM datasets and recently decided to take a new direction with the training data. Previously I had been locally segmenting the raw scans into images of varying sizes and writing those images to tfrecords to use as training data. I started by classifying the images by pathology with categorical labels, and while I got decent results using this approach, the models performed terribly on images from different datasets and on full-size images. I suspected the model was using features of the images that were not related to the actual ROIs to make its predictions, such as the amount of contrast or presence of extremely high pixel values.
To address this I started using the masks as labels and training the model to do segmentation of the images into normal and ROI. This had the added advantage of allowing me to exclude images from the DDSM dataset and only use CBIS-DDSM images which eliminated the features I believed the previous models had been relying on, as the DDSM and CBIS-DDSM datasets had substantially different variances, mins, maxes and means. The disadvantage of this approach was that the dataset was double the size due to the fact that the labels are now the same size as the images.
I started with a dataset of 320x320 images, however models trained on this dataset often had trouble with images which had bright patches running of the edge of the image and images with high contrast, misclassifying the bright patches as positive. To attempt to address this I started training the model on 320x320 images, and then switched to another dataset of 640x640 images after training through 50 or so epochs.
The dataset of 640x640 images only had 13,000 training examples in it, about 1/3 the number of examples in the 320x320 dataset, but was still larger due to the fact that each example and label is four times the size of the 320x320 images. I considered making another dataset with either more or larger images, but saw that this process could continue indefinitely as I had to keep creating new datasets of larger and larger size.
Instead I decided to create one new dataset which could be used indefinitely, for all purposes. To do this I loaded each image in the CBIS-DDSM dataset into Python. While the JPEGS are RGB, the images are grayscale so I only kept one channel of each image. I Some images have multiple masks, and rather than have multiple versions of each image with different masks, which could confuse the model, I combined all masks for each image into one mask, and then added that as the second channel of each image. In order to be able to save the array as an image I added a third channel of all 0s. Each new images was then saved as a PNG.
The resulting dataset is about 12GB, about four times the size of the largest tfrecords dataset, but the entirety of the CBIS-DDSM dataset (minus a few images which had masks of incorrect sizes and were discarded) is now represented. Now, in my model, I load each full image and then take a random crop of it and use that as training data. Since the mask is part of the image I can use TensorFlow's random crop function to crop the full image, and then separate the channels into the training example and it's label.
This not only increases the size of the training data set exponentially, but since my model is fully convolutional, I can also easily change the crop size without having to create a new dataset.
The major problem with this approach is that the mean of the labels is very low - around 0.015 - meaning that only 1% of the pixels have a positive label and the rest are negative. The previous dataset had a mean of 0.05. This will be addressed by raising the cross entropy weight from 20 to 75 so that the model doesn't just predict everything as negative. When creating the images I had trimmed as much background as possible from them to avoid having a large amount of training images of pure black, but still the random cropping produces a large number of images with little to no actual content.
At the moment I am uploading the data to S3 which should take another couple days. Once this is done I will attempt to train on this new dataset and see if the empty images cause major problems.
May 28, 2018, 9:38 a.m.
My original work with the DDSM and CBIS-DDSM dataset yielded good accuracy and recall on the test and validation data, but the model didn't perform so well when applied to the MIAS images, which came from a completely different dataset. Additional analysis of the images indicated that the negative images (from the DDSM) and the positive images (from the CBIS-DDSM) were different in some subtle but important ways:
We had become concerned about point 2 when we discovered that increasing the contrast of any image made it more likely to be predicted as positive and discovered point 1 while investigating this further. When applying our fully convolutional model trained on the combined data to complete scans, rather than the 299x299 images we had trained on, we noticed that more than 50% of the sections of a positive image were predicted as positive, even if the ROI was, in fact, only present in one section. This indicates that the model was using some feature of the images other than the ROI in its prediction.
When starting this project, we had initially planned to segment the CBIS-DDSM images and use images which did not contain an ROI as negative images, but we were not certain that there were not differences in the tissue of positive and negative scans which might make this approach not generalize to completely negative scans. When we realized that the scans had been pre-processed differently we attempted to adjust the negative images in such a way as to make them more similar to the positive images but were unable to do so without knowledge of how they had been processed.
Our solution to both of these issues was to train the model to do the segmentation of the scans rather than simple classification, using the masks as the labels. This approach had several advantages:
We recreated the model to do semantic segmentation by removing the last "fully connected" layer (which were implemented as a 1x1 convolution) and the logits layer and upsampling the results with transpose convolutions. In order for the upsampling to work properly we needed to have the size of the images be a multiple of 2 so that the dimension reduction could be properly undone, so we used images of size 320x320.
We were able to get fairly good results training on this data with a pixel level accuracy of about 90% and a pixel recall of 70%. The image level accuracy and recall were 70% and 87%, respectively. While these results were respectable, we noticed certain patterns of incorrect predictions. Images which were mostly dark, with patches that were much brighter, tended to have the bright patches predicted positive regardless of the actual label. This pattern was mostly observed when the bright patch ran off the edge of the image.
We know that the context of an ROI is important in detecting and diagnosing it, and we suspected that in the absence of context the model was predicting any patch substantially brighter than it's surroundings to be positive. While for cancer detection, it is better to make a false positive than a false negative we thought that this pattern might become problematic when applying the model to images larger than those it was trained on. To address this issue we decided to create a dataset of larger images and continue training our model on those.
We created a dataset of 640x640 images and adjusted our existing model to take those as input. As the model is fully convolutional we can restore the model trained on 320x320 images and continue training it on the larger images with no problems, which we are currently in the process of doing. If the results of this are promising we may create another dataset of even larger images are fine-tune this model on those images until we have a model which takes complete images as input.
May 23, 2018, 10:02 a.m.
For a course I was taking at EPFL I was working on classifying images from the DDSM dataset with ConvNets. I had some success, although not as much as I would have liked, and I posted an edited version of my report on Medium.
Although the course is over I am still working on this project, attempting to fix some of the issues that came up during the first stage.