April 16, 2018, 9:35 a.m.
While we tend to think of ourselves as being at the pinnacle of evolution, in reality humans are barely a step up from monkeys. Our only real differentiation from them is that we have language, which allows us to communicate knowledge to others and to preserve that knowledge through time. Language gives us a huge advantage, allowing us to progressively accumulate new knowledge by building up on previous discoveries, but in the end we are just animals who have evolved to survive, like all other animals. We are not adapted to having civilizations and technology, we evolved to find food and procreate and the results of this can be seen all over - from how tech companies use simple tricks like noises and bright colors and intermittent rewards to keep us hooked, to how food companies load their food with salt, fat and sugar to keep us eating unhealthy food, to the cognitive biases and heuristics we use to make decisions under uncertainty. The point of all of this is that humans evolved to find food and avoid predators, and our brains are incredibly ill-suited to processing the large amounts of data that are required to make evaluations about the types of issues that we face everyday in today's complex world.
Computers on the other hand are designed for processing large amounts of data - they can do this very efficiently if programmed correctly. However they lack our creativity - the ability to combine seemingly unrelated ideas into new ideas, and to come up with novel solutions to problems. Machine learning combines our creativity with the ability of computers to handle large amounts of data, specifically the ability to find patterns in data. In 2015 a journalist ran a study on chocolate, the results of which were that chocolate helps you lose weight. The study was commissioned as an example of "junk science" and only had 15 participants with 18 measurements for each participant. The author said “here’s a dirty little science secret: If you measure a large number of things about a small number of people, you are almost guaranteed to get a ‘statistically significant’ result.” Unfortunately much of science is conducted like this - the authors start a study to proof a hypothesis and maybe they'll ignore some results which contradict the hypothesis if other results confirm it. No one likes to be wrong and if the results are a bit ambiguous you can maybe just cherry pick the numbers you like. In Darrell Huff's 1954 book "How to Lie with Statistics" he says "if you torture the data long enough it will confess to anything" and this is in fact the case.
Machine learning is about letting the data speak for itself. With machine learning you set up a system of symbolic equations for transforming data into predictions and then feed the data into that system and see what happens. If you don't like the results you can change the system or the data, but the process is far too complex to be able to cherry pick the data you like and discard the rest. This combines the strengths of humans with the strengths of computers - the humans use their creativity and domain knowledge to create the system which they hope will find patterns in the data and the computers run the data through the system. While technically possible to do so, the process of analyzing the data is far more complex than could ever be done without a computer, and the computers can only do what they are told to do - they can not create novel ideas from nothing.
In my opinion, machine learning is the most important scientific technology in recent history. Just like electricity allowed energy to become uncoupled from the previous sources - fire and animal energy - machine learning uncouples the ability to process data from the constraints of the human brain. Properly used, I think machine learning will be as revolutionary as electricity was.
April 13, 2018, 9:19 a.m.
I am working on classifying mammography scans with a TensorFlow ConvNet. The scans are classified into five classes:
I was unsure of how I wanted to classify the scans so I created the model in such a way that it would work for any combination of classes. I initially started training with binary classification - normal or abnormal, with the goal of then expanding the number of classes once I had a model that made decent predictions on the binary case.
For the binary prediction I used precision, recall and a pr curve as metrics. When I expanded to multiple classes obviously those metrics no longer worked. As far as precision and recall I don't really care what type of abnormal the scan is - I just care that it is abnormal at all. And I wanted to have the same metrics to compare for all my models so I had to figure out a way to do precision and recall for all versions of the model.
The solution I came to was to "squash" my multi-class labels and predictions down into binary labels and predictions and feed those into the p/r metrics. I set up the classes so that 0 was always normal, so I can do the squashing as follows:
zero = tf.constant(0, dtype=tf.int64) collapsed_predictions = tf.greater(predictions, zero) collapsed_labels = tf.greater(y, zero)
Collapsed_predictions and collapsed_labels will then contain True if the prediction or label is NOT 0 and False if it is. Then I can feed these into my precision and recall metrics:
recall, rec_op = tf.metrics.recall(labels=collapsed_labels, predictions=collapsed_predictions) precision, prec_op = tf.metrics.precision(labels=collapsed_labels, predictions=collapsed_predictions)
I also created a pr curve metric to see how the thresholds would affect the predictions. First I convert the logits to probabilities via a softmax and then feed that into a pr_curve_streaming_op as the predictions. In order to make this work with multi-class classification I squash the probabilities down to the probability that the item is NOT normal. Since my labels are created such that normal is always 0, the probability that it is not normal is just 1 - the probability that it is:
probabilities = tf.nn.softmax(logits, name="probabilities")
_, update_op = summary_lib.pr_curve_streaming_op(name='pr_curve', predictions=(1 - probabilities[:, 0]), labels=collapsed_labels, updates_collections=tf.GraphKeys.UPDATE_OPS, num_thresholds=20)
April 1, 2018, 10:06 a.m.
I decided to try a Google Cloud GPU instance as well as EC2. Once I had my quotas set properly and was able to start the instance it took me all day to get TensorFlow running with GPU. The instructions Google provides are for CUDA 8.0, and the latest version of TensorFlow requires CUDA 9.0.
To get everything running follow these steps:
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
sudo apt-get update
sudo apt-get install cuda-9-0
sudo nvidia-smi -pm 1
These are the steps in the instructions with the proper repo to CUDA 9.0 inserted.
Then I had to install cudnn, which isn't mentioned at all in Google's instructions. I downloaded libcudnn7_18.104.22.168-1+cuda9.0_amd64.deb from the Nvidia cudnn site, and then uploaded it to the instance with scp. Then install it with:
sudo dpkg -i libcudnn7_22.214.171.124-1+cuda9.0_amd64.deb
Then you need to export the path with:
echo 'export CUDA_HOME=/usr/local/cuda' >> ~/.bashrc echo 'export PATH=$PATH:$CUDA_HOME/bin' >> ~/.bashrc echo 'export LD_LIBRARY_PATH=$CUDA_HOME/lib64' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH'>> ~/.bashrc
And finally install TensorFlow:
sudo apt-get install python-dev python-pip libcupti-dev sudo pip install tensorflow-gpu
I used pip3 and python3, but the rest is the same.
Update: I thought it was working fine but I was still getting errors about locating libcupti.so.9.0. That was fixed by making symlinks as described here.
I ran these commands and now it seems to be working...
# Put symlinks in /usr/local/cuda
sudo mkdir /usr/local/cuda
sudo ln -s /usr/lib/x86_64-linux-gnu/ lib64
sudo ln -s /usr/include/ include
sudo ln -s /usr/bin/ bin
sudo ln -s /usr/lib/x86_64-linux-gnu/ nvvm
sudo mkdir -p extras/CUPTI
sudo ln -s /usr/lib/x86_64-linux-gnu/ lib64
sudo ln -s /usr/include/ include
Another Update: TensorFlow requires version 7.0.4 of the cudnn, I had originally downloaded 7.1.2, the code has been updated accordingly.
Final Update: I set up another instance and followed this process and it almost worked. I needed to export another path which I added here. The commands to export the path were temporary and had to be repeated every time the instance was booted, I changed that to echo the path to .bashrc so it would be automatically set.
March 28, 2018, 2:53 p.m.
To resolve the problems I was having yesterday I ended up paying for an Amazon EC2 instance with the Deep Learning Ubuntu AMI. The instance type is p2.xlarge which costs $0.90/hour, but seems to be well worth it so far. In the last ten minutes I've been training a relatively small model on Google Cloud, which has been able to get through 60 steps. In contrast, on the EC2 instance the much larger model, training on the same data, has gone through 375 steps, where each epoch is 687 steps.
I did have some trouble accessing TensorBoard on the EC2 instance, but was able to get it running by following the tutorial. I also got Jupyter Notebook running and accessible from the outside world, again by following the tutorial, although I had to comment out the lines about the SSL certificates in the jupyter conf file in order to be able to connect. I decided to not use Jupyter Notebook, but it's nice to have it as an option.
Since this is just a project I am working on for myself, I'd prefer to not have to pay for the compute, but $0.90 per hour is manageable, and well worth it for the 10x increase in training speed.