Aug. 3, 2019, 7:33 a.m.
I am still working on my face autoencoder in my spare time, although I have much less spare time lately. My non-variational autoencoder works great - it can very accurately reconstruct any face in my dataset of 400,000 faces, but it doesn't work at all for interpolation or anything like that. So I have also been trying to train a variational autoencoder, but it has a lot more difficulty learning.
For a face which is roughly centered and looking in the general direction of the camera it can do a somewhat decent job, but if the picture is off in any way - there is another face off to the side, there is something blocking the face, the face is at a strange angle, etc it does a pretty bad job. And since I want to try to use this for interpolation training it on these bad faces doesn't really help anything.
One of the biggest datasets I am using is this one from ETHZ. The dataset was created to train a network to predict the age of the person, and while the images are all of good quality it does include many images that have some of the issues I mentioned above, as well as pictures that are not faces at all - like drawings or cartoons. Other datasets I am using consist entirely of properly cropped faces as I described above, but this dataset is almost 200k images, so omitting it completely significantly reduces the size of my training data.
The other day I decided I needed to improve the quality of my training dataset if I ever want to get this variational autoencoder properly trained, and to do that I need to filter out the bad images from the ETHZ IMDB dataset. They had already created the dataset using face detectors, but I want to remove faces that have certain attributes:
I started trying to curate them manually, but after going through 500 images of the 200k I realized that would not be feasible. It would be easy to train a neural network to classify the faces, but that would require training data, but that still means manually classifying the faces. So, what I did is I took another dataset of faces that were all good and added about 700 bad faces from the IMDB dataset for a total size of about 7000 images and made a new dataset. Then I took a pre-trained discriminator I had previously used as part of a GAN to try to generate faces and retrained it to classify the faces as good or bad.
I ran this for about 10 epochs, until it was achieving very good accuracy, and then I used it to evaluate the IMDB dataset. Any image which it gave a less than 0.03 probability of being good I moved into the bad training dataset, and any images which it gave a 0.99 probability of being good I moved to the good training dataset. Then I continued training it and so on and so on.
This is called weak supervision or semi-supervised learning, and it works a lot better than I thought it would. After training for a few hours, the images which are moved all seem to be correctly classified, and after each iteration the size of the training dataset grows to allow the network to continue learning. Since I only move images which have very high or very low probabilities, the risk of a misclassification should be relatively low, and I expect to be able to completely sort the IMDB dataset by the end of tomorrow, maybe even sooner. What would have taken weeks or longer to do manually has been reduced to days thanks to transfer learning and weak supervision!
Labels: coding , data_science , machine_learning , pytorch , autoencoders
May 16, 2019, 1:56 p.m.
There is one thing that absolutely drives me crazy in Python and that is the fact that you can access a variable that was defined outside of a function from within the function without passing it as an argument. I'm not going to lie, that does come in handy at times - especially when you are working with APIs; but it is still a terrible way to do things.
I like the way scoping is done in C++ - each variable is only valid within the block in which it is declared, but obviously that doesn't work in Python since we don't declare variables. Even so, I think that the only variables that should be available in a function are the ones which are created in it or passed to it as arguments. While we can't modify variables that were declared outside of a function in Python, only access them, if the variable is an object you can modify it using it's methods. So if we have a list and we append to it in a function using list.append() it will actually modify the list, which is crazy.
Being able to access variables that were declared outside of the function makes it easy to write bugs and makes it hard to find them. There have been several times when I have declared a variable outside of a function and then I pass it in to the function with a different name and then use the external variable instead of the one passed into the function. The code will work, but not as expected, and it is difficult to track things like this down.
This makes me see much of the merit in functional programming languages like Scala.
March 7, 2019, 4:52 p.m.
I've been working with AWS Lambda recently and I am very impressed. Usually if I need to set up a microservice or a recurring task or anything like that I'll just set up something on one of my virtual servers so I didn't think Lambda would be all that useful. But it makes it really, really easy to set up little tasks and it is much cheaper than having a whole virtual server.
You can create tasks in a number of different languages, and set up a variety of triggers ranging from HTTP requests to scheduled tasks, and when the Lambda is triggered AWS spins it up, executes it and then shuts it down. Since it is so ephemeral it is completely stateless, but you can load files from S3 buckets if you need data of any sort. I assume you can probably also connect to a variety of AWS databases as well, although I haven't done this yet. If you need additional libraries or packages that are not default you can create a layer containing them.
Lambda is not going to replace servers for most use cases, but I think serverless technology is going to make quite a dent in the near future.
Jan. 10, 2019, 2:01 p.m.
Usually when you think of a gradient boosted decision tree you think of XGBoost or LightGBM. I'd heard of CatBoost but I'd never tried it and it didn't seem too popular. I was looking at a Kaggle competition which had a lot of categorical data and I had squeezed just about every drop of performance I could out of LGBM so I decided to give CatBoost a try. I was extremely impressed.
Out of the box, with all default parameters, CatBoost scored better than the LGBM I had spent about a week tuning. CatBoost trained significantly slower than LGBM, but it will run on a GPU and doing so makes it train just slightly slower than the LGBM. Unlike XGBoost it can handle categorical data, which is nice because in this case we have far too many categories to do one-hot encoding. I've read the documentation several times but I am still unclear as to how exactly it encodes the categorical data, but whatever it does works very well.
I am just beginning to try to tune the hyperparameters so it is unclear how much (if any) extra performance I'll be able to squeeze out of it, but I am very, very impressed with CatBoost and I highly recommend it for any datasets which contain categorical data. Thank you Yandex!
Labels: coding , data_science , machine_learning , kaggle , catboost
Oct. 23, 2018, 9:41 a.m.
The other day I was having problems with a CoLab notebook and I was trying to debug it when I noticed that TPU is now an option for runtime type. I found no references to this in the CoLab documentation, but apparently it was quietly introduced only recently. If anyone doesn't know, TPUs are chips designed by Google specifically for matrix multiplications and are supposedly incredibly fast. Last I checked the cost to rent one through GCP was about $6 per hour, so the ability to have access to one for free could be a huge benefit.
As TPUs are specialized chips you can't just run the same code as on a CPU or a GPU. TPUs do not support all TensorFlow operations and you need to create a special optimizer to be able to take advantage of the TPU at all. The model I was working with at the time was created using TensorFlow's Keras API so I decided to try to convert that to be TPU compatible in order to test it.
Normally you would have to use a cross shard optimizer, but there is a shortcut for Keras models:
TPU_WORKER = 'grpc://' + os.environ['COLAB_TPU_ADDR']
# create network and compiler
tpu_model = tf.contrib.tpu.keras_to_tpu_model(
keras_model, strategy = tf.contrib.tpu.TPUDistributionStrategy(
tf.contrib.cluster_resolver.TPUClusterResolver(TPU_WORKER)))
The first line finds an available TPU and gets it's address. The second line takes your keras model as input and converts it to a TPU compatible model. Then you would train the model using tpu_model.fit() instead of keras_model. This was the easy part.
For this particular model I am using a lot of custom functions for loss and metrics. Many of the functions turned out to not be compatible with TPUs so had to be rewritten. While at the time this was annoying, it turned out to be worth it regardless of the TPU because I had to optimize the functions in order to make them compatible with TPUs. The specific operations which were not compatible were non-matrix ops - logical operations and boolean masks specifically. Some of the code was downright hideous and this forced me to sit down and think through it and re-write it in a much cleaner manner, vectorizing as much as possible.
After all that effort, so far my experience with the TPUs hasn't been all that great. I can train my model with a significantly larger batch size - whereas on an Nvidia K80 16 was the maximum batch size, I am currently training with batches of 64 on the TPU and may be able to push that even higher. However the time per epoch hasn't really improved all that much - it is about 1750 seconds on the TPU versus 1850 seconds on the K80. I have read code may need to be altered more to take full advantage of TPUs and I have not really tried playing with the batch size to see how that changes the performance yet.
I suspect that if I did some more research about TPUs and coded the model to be optimized for a TPU from scratch there might be a more noticeable performance gain, but this is based solely on having heard other people talk about how fast they are and not from my experience.
Update - I have realized that the data augmentation is the bottleneck which is limiting the speed of training. I am training with a Keras generator which performs the augmentation on the CPU and if this is removed or reduced the TPUs do, in fact, train significantly faster than a GPU and also yield better results.
Labels: coding , machine_learning , google_cloud
Oct. 4, 2018, 4:41 p.m.
I have previously written about Google CoLab which is a way to access Nvidia K80 GPUs for free, but only for 12 hours at a time. After a few months of using Google Cloud instances with GPUs I have run up a substantial bill and have reverted to using CoLab whenever possible. The main problem with CoLab is that the instance is terminated after 12 hours taking all files with it, so in order to use them you need to save your files somewhere.
Until recently I had been saving my files to Google Drive with this method, but while it is easy to save files to Drive it is much more difficult to read them back. As far as I can tell, in order to do this with the API you need to get the file id from Drive and even then it is not so straightforward to upload the files to CoLab. To deal with this I had been uploading files that needed to be accessed often to an AWS S3 bucket and then downloading them to CoLab with wget, which works fine, but there is a much simpler way to do the same thing by using Google Cloud Storage instead of S3.
First you need to authenticate CoLab to your Google account with:
from google.colab import auth
auth.authenticate_user()
Once this is done you need to set your project and bucket name and then update the gcloud config.
project_id = [project_name]
bucket_name = [bucket_name]
!gcloud config set project {project_id}
After this has been done files can simply and quickly be upload or downloaded from the bucket with the following simple commands:
# download
!gsutil cp gs://{bucket_name}/foo.bar ./foo.bar
# upload
!gsutil cp ./foo.bar gs://{bucket_name}/foo.bar
I actually have been adding the line to upload the weights to GCS to my training code so it is automatically uploaded every couple epochs, which removes the need for me to manually back them up periodically throughout the day.
Labels: coding , python , machine_learning , google , google_cloud
July 30, 2018, 9:37 a.m.
I recently began using the early stopping feature of Light GBM, which allows you to stop training when the validation score doesn't improve for a certain number of rounds. This is especially useful if you are bagging models, as you don't need to watch each one and figure out when training should stop. The way it works is you specify a number of rounds, and if the validation score doesn't improve during that number of rounds the training is stopped and the round with the best validation score is used.
When working with this I noticed that often the best validation round is a very early round, which has a very good validation score but an incredibly low training score. As an example here is the output from a model I am currently training. Normally the training F1 gets up to the high 0.90s:
Early stopping, best iteration is:
[7] train's macroF1: 0.525992 valid's macroF1: 0.390373
Out of at least 400 rounds of training, the best performance on the validation set was on the 7th, at which time it was performing incredibly poorly on the training data. This indicates overfitting to the validation set, which is just as bad as overfitting to the training set in that the model is not likely to generalize well.
So what to do about this issue? The obvious solution would be to provide a minimum number of rounds and begin to monitor the validation score for early stopping once that number of rounds has passed, but I don't see any way to do this through the LGB API.
I am running this code using sklearn's joblib to do parallel processing, so I have create a list of the estimators to fit and then pass that list to the parallel processing which calls a function which fits the estimator to the data and returns it. The early stopping is taken care of by LGB, so what I did is after the estimator is fit I manually get the validation results and the train performance for the best validation round. If the train performance is above a specified threshold I return the estimator as normal. If, however, the train performance is below that threshold I recursively call the function again.
The downside to this is that it is possible to get into an infinite loop, but if the thresholds are properly tuned this should be easily avoidable.
Labels: coding , data_science , machine_learning , lightgbm
July 17, 2018, 7:34 a.m.
When I began working on this project my intention was to do multi-class classification of the images. To this end I built my graph with logits and a cross-entropy loss function. I soon realized that the decision to do multi-class classification was quite ambitious, and scaled back to doing binary classification into positive and negative. My goal was to implement the multi-class approach once I had the binary approach working reasonably well, so I left the cross entropy in place.
Over the months I have been working on this I have realized that, for many reasons, the multi-class classification was a bad idea. For an academic project it might have made sense, but for any sort of real world use case it made none. There is really no use to outputting a simple classification for something as important as detecting cancer. A much more useful output is the probabilities that each area of the image contains an abnormality as this could aid a radiologist in diagnosing abnormalities rather than completely replacing her. Yet for some reason I never bothered to change the output or the loss function.
The limiting factor on the size of the model has been the GPU memory of the Google Cloud instance I am training this monstrosity on. So I've been trying to optimize the model to run within the RAM constraints and train in a reasonable amount of time. Mostly this has involved trying to keep the number of parameters to a minimum, but today I was looking at the model and realized that the logits were definitely not helping the situation.
For this problem classification was absolutely the wrong approach. We aren't trying to classify the content of the image, we are trying to detect abnormalities. The negative class was not really a separate class, but the absence of any abnormalities, and the graph and the loss function should reflect this. In order to coerce the logits into an output that reflected the reality just described, I put the logits through a softmax and then discarded the negative probability - as I said the negative class doesn't really exist. However the cross entropy function does not know this and it places equal importance on the imaginary negative class as on the positive class (subject to the cross entropy weighting of course.) This means that the gradients placed equal weight on trying to find imaginary "normal" patterns, despite the fact that this information is discarded and never used.
So I reduced the logits layer to one unit, replaced the softmax activation with a sigmoid activation, and replaced the cross entropy with binary cross entropy. And the change has been more impactful than I imagined it would be. The model immediately began performing better than the same model with the logits/cross entropy structure. It seems obvious that this would be the case as now the model can focus on detecting abnormalities rather than wasting half of it's efforts on trying to detect normal patterns.
I am not sure why I waited so long to make this change and my best guess is that I was seduced by the undeniable elegance of the cross entropy loss function. For multi-class classification it is truly a thing of beauty, and I may have been blinded by that into attempting to use it in a situation it was not designed for.
Labels: coding , machine_learning , tensorflow , mammography
July 5, 2018, 10:37 a.m.
I wrote about this paper before, but I am going to again because it has been so enormously useful to me. I am still working on segmentation of mammograms to highlight abnormalities and I recently decided to scrap the approach I had been taking to upsampling the image and start that part from scratch.
When I started I had been using the earliest approach to upsampling, which basically was take my classifier, remove the last fully-connected layer and upsample that back to full resolution with transpose convolutions. This worked well enough, but the network had to upsample images from 2x2x1024 to 640x640x2 and in order to do this I needed to add skip connections from the downsizing section to the upsampling section. This caused problems because the network would add features of the input image to the output, regardless of whether the features were relevant to the label. I tried to get around this by adding bottleneck layers before the skip connection in order to only select the pertinent features, but this greatly slowed down training and didn't help much and the output ended up with a lot of weird artifacts.
In "Deconvolution and Checkerboard Artifacts", Odena et al. have demonstrated that replacing transpose convolutions with nearest neighbors resizing produces smoother images than using transpose convolutions. I tried replacing a few of my tranpose convolutions with resizes and the results improved.
Then I started reading about dilated convolutions and I started wondering why I was downsizing my input from 640x640 to 5x5 just to have to resize it back up. I removed all the fully-connected layers (which in fact were 1x1 convolutions rather than fully-connected layers) and then replaced the last max pool with a dilated convolution.
I replaced all of the transpose convolutions with resizes, except for the last two layers, as suggested by Odena et al, and the final tranpose convolution has a stride of 1 in order to smooth out artifacts.
In the downsizing section, the current model reduces the input from 640x640x1 to 20x20x512, then it is upsampled by using nearest neighbors resizing followed by plain convolutions to 320x320x32. Finally there is a tranpose convolution with a stride of 2 followed by a transpose convolution with a stride of 1 and then a softmax for the output. As an added bonus, this version of the model trains significantly faster than upsampling with transpose convolutions.
I just started training this model, but I am fairly confident it will perform better than previous upsampling schemes as when I extracted the last downsizing convolutional layer from the model that layer appeared closer to the label (although much smaller) than the final output did. I will update when I have actual results.
Update - After training the model for just one epoch, with the downsizing layer weights initialized from a previous model, the results are already significantly better than under the previous scheme.
Labels: coding , data_science , tensorflow , mammography , convnets , ddsm
June 14, 2018, 9:21 a.m.
If you have ever used deconvolutions to upsample layers of convnets you have probably seen artifacts and possibly checkerboard patterns. This article explains why and gives some useful tips as to how to avoid the problem. I have implemented some of the suggestions and, while it's a bit early to evaluate their efficacity, so far they seem to be helping.
Labels: coding , machine_learning
June 12, 2018, 1:34 p.m.
Labels: coding
June 6, 2018, 1:32 p.m.
I am continuing to work with the CBIS-DDSM datasets and recently decided to take a new direction with the training data. Previously I had been locally segmenting the raw scans into images of varying sizes and writing those images to tfrecords to use as training data. I started by classifying the images by pathology with categorical labels, and while I got decent results using this approach, the models performed terribly on images from different datasets and on full-size images. I suspected the model was using features of the images that were not related to the actual ROIs to make its predictions, such as the amount of contrast or presence of extremely high pixel values.
To address this I started using the masks as labels and training the model to do segmentation of the images into normal and ROI. This had the added advantage of allowing me to exclude images from the DDSM dataset and only use CBIS-DDSM images which eliminated the features I believed the previous models had been relying on, as the DDSM and CBIS-DDSM datasets had substantially different variances, mins, maxes and means. The disadvantage of this approach was that the dataset was double the size due to the fact that the labels are now the same size as the images.
I started with a dataset of 320x320 images, however models trained on this dataset often had trouble with images which had bright patches running of the edge of the image and images with high contrast, misclassifying the bright patches as positive. To attempt to address this I started training the model on 320x320 images, and then switched to another dataset of 640x640 images after training through 50 or so epochs.
The dataset of 640x640 images only had 13,000 training examples in it, about 1/3 the number of examples in the 320x320 dataset, but was still larger due to the fact that each example and label is four times the size of the 320x320 images. I considered making another dataset with either more or larger images, but saw that this process could continue indefinitely as I had to keep creating new datasets of larger and larger size.
Instead I decided to create one new dataset which could be used indefinitely, for all purposes. To do this I loaded each image in the CBIS-DDSM dataset into Python. While the JPEGS are RGB, the images are grayscale so I only kept one channel of each image. I Some images have multiple masks, and rather than have multiple versions of each image with different masks, which could confuse the model, I combined all masks for each image into one mask, and then added that as the second channel of each image. In order to be able to save the array as an image I added a third channel of all 0s. Each new images was then saved as a PNG.
The resulting dataset is about 12GB, about four times the size of the largest tfrecords dataset, but the entirety of the CBIS-DDSM dataset (minus a few images which had masks of incorrect sizes and were discarded) is now represented. Now, in my model, I load each full image and then take a random crop of it and use that as training data. Since the mask is part of the image I can use TensorFlow's random crop function to crop the full image, and then separate the channels into the training example and it's label.
This not only increases the size of the training data set exponentially, but since my model is fully convolutional, I can also easily change the crop size without having to create a new dataset.
The major problem with this approach is that the mean of the labels is very low - around 0.015 - meaning that only 1% of the pixels have a positive label and the rest are negative. The previous dataset had a mean of 0.05. This will be addressed by raising the cross entropy weight from 20 to 75 so that the model doesn't just predict everything as negative. When creating the images I had trimmed as much background as possible from them to avoid having a large amount of training images of pure black, but still the random cropping produces a large number of images with little to no actual content.
At the moment I am uploading the data to S3 which should take another couple days. Once this is done I will attempt to train on this new dataset and see if the empty images cause major problems.
Labels: coding , python , machine_learning , mammography
April 1, 2018, 10:06 a.m.
I decided to try a Google Cloud GPU instance as well as EC2. Once I had my quotas set properly and was able to start the instance it took me all day to get TensorFlow running with GPU. The instructions Google provides are for CUDA 8.0, and the latest version of TensorFlow requires CUDA 9.0.
To get everything running follow these steps:
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
sudo apt-get update
sudo apt-get install cuda-9-0
sudo nvidia-smi -pm 1
These are the steps in the instructions with the proper repo to CUDA 9.0 inserted.
Then I had to install cudnn, which isn't mentioned at all in Google's instructions. I downloaded libcudnn7_7.0.4.31-1+cuda9.0_amd64.deb from the Nvidia cudnn site, and then uploaded it to the instance with scp. Then install it with:
sudo dpkg -i libcudnn7_7.0.4.31-1+cuda9.0_amd64.deb
Then you need to export the path with:
echo 'export CUDA_HOME=/usr/local/cuda' >> ~/.bashrc
echo 'export PATH=$PATH:$CUDA_HOME/bin' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=$CUDA_HOME/lib64' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH'
>> ~/.bashrc
source ~/.bashrc
And finally install TensorFlow:
sudo apt-get install python-dev python-pip libcupti-dev
sudo pip install tensorflow-gpu
I used pip3 and python3, but the rest is the same.
Update: I thought it was working fine but I was still getting errors about locating libcupti.so.9.0. That was fixed by making symlinks as described here.
I ran these commands and now it seems to be working...
# Put symlinks in /usr/local/cuda
sudo mkdir /usr/local/cuda
cd /usr/local/cuda
sudo ln -s /usr/lib/x86_64-linux-gnu/ lib64
sudo ln -s /usr/include/ include
sudo ln -s /usr/bin/ bin
sudo ln -s /usr/lib/x86_64-linux-gnu/ nvvm
sudo mkdir -p extras/CUPTI
cd extras/CUPTI
sudo ln -s /usr/lib/x86_64-linux-gnu/ lib64
sudo ln -s /usr/include/ include
Another Update: TensorFlow requires version 7.0.4 of the cudnn, I had originally downloaded 7.1.2, the code has been updated accordingly.
Final Update: I set up another instance and followed this process and it almost worked. I needed to export another path which I added here. The commands to export the path were temporary and had to be repeated every time the instance was booted, I changed that to echo the path to .bashrc so it would be automatically set.
Labels: coding , machine_learning , tensorflow , google_cloud
Feb. 25, 2018, 10:59 a.m.
It took me a while to figure out exactly what was going on with the files I was uploading and creating using Google's CoLaboratory. Each user has a VM where their notebooks run and the VM only runs for 12 hours before it is spun down and recycled, taking with it any files you may have downloaded or created. The second day I used it I was surpised that the files I had spent time downloading, unzipping and importing were no longer there, and I had deleted the code to do that, so if you are using CoLab make sure you keep the code to get your data files!
I also tried to have two notebooks running at the same time thinking it would speed up some work I was doing, but it seems as if all of a user's notebooks run in the same VM, so there really is no advantage to having multiple notebooks running.
There is an instruction notebook that explains how to save files to Google Drive, which works very well and is easy to use. To do that run:
from google.colab import auth
from googleapiclient.http import MediaFileUpload
from googleapiclient.discovery import build
auth.authenticate_user()
Then you have to enter a code to authenticate yourself. Then I use this function to save files:
drive_service = build('drive', 'v3')
def save_file_to_drive(name, path):
file_metadata = {
'name': name,
'mimeType': 'application/octet-stream'
}
media = MediaFileUpload(path,
mimetype='application/octet-stream',
resumable=True)
created = drive_service.files().create(body=file_metadata,
media_body=media,
fields='id').execute()
print('File ID: {}'.format(created.get('id')))
return created
The function takes two arguments, the name of the file and the path to it, and write the file to the root of your Google drive.
Note - This post was updated because my original guess as to how the VMs work was completely wrong. The VM instance exists for 12 hours, they are not tied to the runtime.
Labels: coding , machine_learning , tensorflow , google
Feb. 20, 2018, 6:43 p.m.
On my laptop it takes forever to train my TensorFlow models. I was looking for cheap online services where I could run the code and not having any luck finding anything, Google Cloud Computing does give you $300 worth of free processing time, but that's not really free. I did find Google Colab which is a Python notebook based environment where you can run code for free, and it includes GPU support!
It took me a little while to get everything set up, but it was relatively easy and it runs incredibly fast. The tricky part was getting my data into the notebook. While Colab saves the notebooks to your Google Drive, they do not run on your Google Drive so you can't just put the data on the Drive and then access it.
I used wget to download the data from a URL to wherever the notebook is running, then unzipped it with Python and then I was able to read the data, so it wasn't all that complicated. When I tried to follow the instructions on importing data from Google Drive via an API I was unable to get it to work - I kept getting errors about directories and files not existing despite the fact that they showed up when I did !ls.
They have Tesla K80 GPUs available and the code runs incredibly fast. I'm still training my first model, but it seems like it's going to finish in about 20 minutes whereas it would have taken 3+ hours to train it locally. This difference in speed makes it possible to do things like tune the learning rate and hyperparameters, which are not practical to do locally if it takes hours to train the model.
This is an amazing service from Google and I am already using it heavily, just hours after having discovered it.
Labels: coding , python , machine_learning , google
Feb. 12, 2018, 12:36 p.m.
A few months ago I went to a university to interview for a job working on their website. Up until that day I had been programming in PHP with Laravel and Symfony. These MVC frameworks are object-oriented, and they make PHP seem like a "real" programming language. I love them for that. Everything is nicely organized and segmented and you can do almost all the programming in an object oriented fashion.
The university was using a CMS and they asked me to take a little test by looking at some of the code. The code was written in non-object-oriented PHP, which means that everything was done in the page. If you need data from the database you do the query right there in the page, do whatever manipulation you need, then loop through the results with everything embedded right in the HTML with <?php tags.
I was shocked and dismayed looking at the code. It was about as elegant as the BASIC code I was writing when I was 10 years old, but with HTML mixed in for good measure. It was horrifying to realize that this was the language I was programming in, despite the fact that OOP frameworks make it into something that resembles nicely written code.
The other thing that happened that day was they asked me what my ideal job was. I said I would really like to be involved in some sort of research, but there was no need for PHP in that field. When I got home I thought about it for a few hours, then decided that if that was what I wanted to do I should learn whatever I needed to learn to do it. And that is what I did, and that is why I stopped programming in PHP.
Don't get me wrong - I'm not saying PHP is a horrible language, it can be used very well. But the majority of employers here in Switzerland looking for PHP programmers are not looking for people who do it very well, they are looking for people who have done a three month bootcamp and will work for very low pay. For me, it just wasn't worth the effort to do something I don't really even like that much.
Jan. 26, 2018, 10:36 a.m.
Python code to make sure two data frames have the same columns in the same order. I used this to make sure that two dataframes had the same dummy columns after using pd.get_dummies:
missing_cols = set( X1.columns ) - set( X2.columns )
for c in missing_cols:
X2[c] = 0
X2 = X2[X1.columns]
Labels: coding , python , machine_learning
Jan. 23, 2018, 11:25 a.m.
My favorite languages, in order:
Labels: coding , data_science
Jan. 18, 2018, 11:16 a.m.
In 1998, when I graduated from university the internet was in its infancy and the dot com bubble was just getting ramped up. At the time I thought the internet would change the world by making information easily accessible and available, and I was excited to be a part of this new thing that would be revolutionary.
That started to change a few years ago. The focus of internet companies had shifted from providing useful services to collecting as much data as possible on the users to be used to better target advertisements. Rather than providing useful and informative content and services, the emphasis was on keeping users online for as much time as possible. While the negative effects of this business model on the users and society are becoming more and more noticeable, notably in the recent US election, the tech companies continue to ignore them. This is no longer the industry that I signed up to work for, and I no longer want to be a part of it.Anyone who is familiar with the work of Kahneman and Tversky knows that the human brain is very poor at processing and analyzing data. Most of our decisions are made using heuristics, or rules of thumb, that allow us to make quick and easy judgements. These result in cognitive biases, which are ways in which our brains distort reality for the purpose of making decisions. One of the most famous cognitive biases is the "confirmation bias" - which is how people interpret new information in a way such as to support their existing beliefs. Kahneman and Tversky conducted experiments on people ranging from college undergraduates to statistics professors, and everyone was subject to these biases - even PhD level statisticians who should know better. This is why data science is so important.
Our brains are not designed to gather and analyze large amounts of data and we are incredibly bad at doing so. We tend to draw conclusions from small, isolated, but memorable bits of information rather than looking at the overall big picture. One example is how Americans are all very worried about terrorism even though on average only six Americans die per year from foreign terrorism. The media likes to report these stories because they are sensational and memorable, but doing so greatly exaggerates the real risks. There are also numerous medications which are commonly prescribed despite having minimal positive effects, or having no benefits at all.
Data science is a way to draw knowledge from actual observation of the world, rather than just whatever thoughts happen to be strung together in our heads, or whatever sound bites relating to a given subject most easily come to mind. I can come up with whatever theories and ideas I want, but unless they actual reflect on the real world it's all meaningless. This is the basis of scientific inquiry, and this is why I am getting out of web development and into data science.
Labels: personal , coding , data_science
Nov. 22, 2017, 3:39 p.m.
Just a little note - if anyone ever has to make a database table of languages, countries, currencies, or anything else that has a unique text code I would recommend that you use the code as the primary key and as the foreign key for related tables.
It will save a lot of database overhead and make the code a lot cleaner if you don't always have to do a table join to get the code from the table.
Labels: coding
Oct. 29, 2017, 4:51 p.m.
I have really been loving Django lately, and I wrote another version of this site in it. That site is skooch.com. The new site uses the same database as this one, so the only real difference is the language they are written in.
I find Python to be a much more opinionated and formal language than PHP, which makes it a steeper learning curve, but it forces you to think things through a bit more. I find the extra effort to be well worth it in the end as far as code quality goes.
As a note the Python code was significantly shorter than the same code in PHP, for whatever that is worth.
Sept. 29, 2017, 12:10 p.m.
For a Laravel application I am working on the need arose to log errors to the database rather than to log files. I also wanted to show a custom error page instead of Laravel's default error page. This is my solution:
In app/Exceptions/Handler.php:
public function report(Exception $exception) { if(!config('app.debug')) { if ($this->shouldReport($exception)) { $this->logError($exception); } } if(env('APP_ENV') == 'local'){ parent::report($exception); } } public function render($request, Exception $exception) { if(!config('app.debug')) { if($this->shouldReport($exception)){ return response()->view('errors.500', compact('exception')); } } return parent::render($request, $exception); }
Function report checks to see if APP_ENV is set to local, if so it displays the error as normal. If it is not local it calls function logError which writes the error to the database.
Function render also checks APP_DEBUG and if it is set to true it reports the error to the screen as normal. If it is set to false it returns a custom error template which is stored in resources/views/error/500.blade.php.
Sept. 24, 2017, 5 p.m.
I just added a Captcha to the registration and contact me pages due to large amounts of spam mail and what seem to be fake registrations. Luckily Google's ReCaptcha is easy to use.
To integrate with Laravel I used greggilbet/recaptcha which I have used for other projects and which is amazingly easy to use. If anyone needs a Captcha I highly recommend it.
It does get annoying when you are asked to keep clicking on images instead of just typing in numbers, but that only happens if you continually try to submit the same form, which means that it's working. To avoid the endless clicking in development I usually add the rule to the validation conditionally based on the env('APP_ENV') variable so that it is only added in production.
Sept. 6, 2017, 10:44 a.m.
I had been using SQLLite with Django for quite some time because I couldn't get mysqlclient for windows to install properly with pip. SQLLite was fine for local development, but before I deploy an app I wanted to get MySQL working.
It turns out it was very easy:
pip install mysqlclient==1.3.9
That's all I need to do! I had tried downloading wheels and all sorts of other stuff, none of which worked, but version 1.3.9 installs fine with no errors on Windows 10.
Sept. 6, 2017, 9:02 a.m.
If you get an error trying to connect with php_imap for Kerberos:
PHP Notice: Unknown: Kerberos error: No credentials cache found (try running kinit) for imap.example.com (errflg=1) in Unknown on line 0
This is the solution:
Pass ['DISABLE_AUTHENTICATOR' => 'GSSAPI'] to imap_open for the last options parameter.
Aug. 9, 2017, 1:21 p.m.
After spending yesterday figuring out how to write custom log files, today I changed my mind and went a different route. The problem with these log files is that they would need to be parsed to generate the reports I would need to generate. While this could easily and efficiently be done with something like Python, in order to stick with using PHP I decided to take a completely different approach.
I created a statistics table where I can just increment the appropriate column rather than writing a whole line to a text file. My concern with this was that it would down the site substantially, so rather than recording the data while the user is performing the searches I created a Job and queue it to be processed later. This doesn't help with the server load, but it does remove the response time for the user as a factor.
Aug. 8, 2017, 12:34 p.m.
Laravel comes with Monolog which it uses to log application exceptions and such. The need arose recently to log other information to text files, specifically searches performed by users. I tried logging this to the database but the table grew very quickly and performance suffered as a result.
So I decided to log to text files, which I can then process when reporting is needed. It would be relatively easy to use PHP file operations to open and append to a text file, but I decided to try to stick to using Monolog. Most of the information I was able to find online about how to do this involved overwriting Laravel's Logging Configuration classes to allow different types of data to be written to different log files, but it seemed like an awful lot of code to do something that should be relatively simple. After more searching I found a method that uses 4 lines of code and works perfectly:
$logPath = [path to log file] $orderLog = new Logger('searches'); $orderLog->pushHandler(new StreamHandler($logPath), Logger::INFO); $orderLog->info('Search : ', $data);
By including the date in the $logPath as part of the file name I can automatically rotate the logs and write to whatever location I choose to. The $data variable needs to be an array and is encoded to JSON and written to the log. The parameter passed into the new Logger() is the channel written to the log.
The only thing I have not yet been able to customize is the Priority - in this case "Info." For my purposes this isn't really necessary and could be ommitted, but having it doesn't have any downsides so I don't know if it's worth the trouble of figuring out how to remove it.
June 27, 2017, 9:21 a.m.
Over the last few weeks I have become enamored of using traits in PHP. Whereas I previously would put functions that need to be reused into either my Models or into helper functions, I have now started to make traits with these functions in them. For my Laravel application I created a directory app/Http/Traits where I keep my traits.
I started doing this when I began to optimize my code, trying to remove redundant code and remove unneccesary weight from my models. Using PHPStorm's very useful ability to find duplicated code I searched for blocks of 10 lines or more that were reused and moved those into traits. As I continued to do this I started to realize other benefits of using traits - mainly that it provides a way to simplify things. If the same action is taken in different controller or different parts of the application by using a Trait if I decide to change it I only have to change the code in one place rather than tracking down every place in the code that needs to be changed.
June 1, 2017, 9:34 a.m.
For the project I am working on we want to have multiple SMTP configurations in the database, which can be chosen at runtime. It's very easy to update the mail config using the config() helper, but for some reason that did not change the SMTP settings used to actually send the mail. I did a lot of research and found some answers for older versions of Laravel that did not work with 5.4.
Eventually I was able to find how to accomplish this from this post on Laravel.io. It seems that the mailer instance is created with the app, so updating the config won't change the properties of it which have already been set. To do that you need to use the following code:
extract(config('mail'));
$transport = \Swift_SmtpTransport::newInstance($host, $port);
// set encryption
if (isset($encryption)) $transport->setEncryption($encryption);
// set username and password
if (isset($username))
{
$transport->setUsername($username);
$transport->setPassword($password);
}
// set new swift mailer
Mail::setSwiftMailer(new \Swift_Mailer($transport));
// set from name and address
if (is_array($from) && isset($from['address']))
{
Mail::alwaysFrom($from['address'], $from['name']);
}
If you execute this after you have updated the config it will create a new instance of swift_mailer with the update mail config. Once that is done you can just send the mails and it will use the proper SMTP server.
May 30, 2017, 8:51 a.m.
This morning I wake up to emails saying that our Mailgun account has been disabled due to high volumes of email and high volumes of bounces. The logs indicate that way more emails have been sent than we had visitors. Emails are only sent as the result of a user clicking a link on the site, so I have no idea how this is possible.
After a few hours of investigation, it turns out Googlebot was crawling our site and kept following links and buttons that send emails. To prevent this from happening again I took a couple different precautions:
Hopefully that will prevent this sort of thing from happening again.
Labels: coding
May 15, 2017, 1:09 p.m.
For quite a while I have been struggling to get domain routing working in Laravel. Subdomain routing comes out of the box, and what I read said that adding domain routing should be fairly easy.
The first thing to do is get the full domain passed into the router. By default Laravel only takes what comes before the first ".", so to get the full domain passed in you need to add this to your Providers/RouteServiceProvider.php file:
Route::pattern('domain', '[a-z0-9.\-]+');
parent::boot();
Now you can access the full domain in the routes file, and you can do so by adding a route group:
Route::group(['domain' => '{domain}'], function ($domain) {
I tried to get the routes to take in the domain as a parameter and create themselves dynamically, but that did not work. So I ended up creating the route group. My issue was that some domains just use the normal route, and others were to have their own custom routes. I spent a while trying to get that working before just adding the route group. Inside the route group I check the database to see if this domain uses the normal routes or gets a special route.
The next issue was how to pass the variables from the routes to the controller, when they do not come from the URL. The special routes can be accessed via URL: http://www.maindomain.com/site/1. I wanted to be able to map via http://www.customdomain.com to that URL, but to do so I needed to pass the parameters from the URL into the controller when the parameters do not exist in the URL. That took some more figuring out, but it turns out you can do it like this:
$app = app(); $controller = $app->make('App\Http\Controllers\WhateverController'); return $controller->callAction('show', ['request' => $request, 'id' => $id]);
The controller expects a Request object, and to get that passed in you need to add it to the route explicity:
Route::get('/', function ($domain, \Illuminate\Http\Request $request) {
With that addition I am able to map the custom domain to a specific controller and pass in variables which are determined in the routes file.
The final code looks like this:
Route::group(['domain' => '{domain}'], function ($domain) { Route::get('/', function ($domain, \Illuminate\Http\Request $request) { $site = \App\Website::where('host', $domain)->first(); if($site){ $app = app(); $controller = $app->make('App\Http\Controllers\DomainController'); return $controller->callAction('show', ['request' => $request, 'id' => $site->store_id]); } else { $app = app(); $controller = $app->make('App\Http\Controllers\HomeController'); return $controller->callAction('index', []); } }); });
So for route "/" it checks to see if the domain exists in a table in the database, if so calls DomainController with a parameter from the DB and the request. If not it calls HomeController.
After a long time spent trying to figure this out it turns out to be a lot simpler than I thought it would be. However now I need to add specific routes to some domains but not others. I don't expect that will be too different than the method I am currently using.
May 15, 2017, 1:09 p.m.
For quite a while I have been struggling to get domain routing working in Laravel. Subdomain routing comes out of the box, and what I read said that adding domain routing should be fairly easy.
The first thing to do is get the full domain passed into the router. By default Laravel only takes what comes before the first ".", so to get the full domain passed in you need to add this to your Providers/RouteServiceProvider.php file:
Route::pattern('domain', '[a-z0-9.\-]+');
parent::boot();
Now you can access the full domain in the routes file, and you can do so by adding a route group:
Route::group(['domain' => '{domain}'], function ($domain) {
I tried to get the routes to take in the domain as a parameter and create themselves dynamically, but that did not work. So I ended up creating the route group. My issue was that some domains just use the normal route, and others were to have their own custom routes. I spent a while trying to get that working before just adding the route group. Inside the route group I check the database to see if this domain uses the normal routes or gets a special route.
The next issue was how to pass the variables from the routes to the controller, when they do not come from the URL. The special routes can be accessed via URL: http://www.maindomain.com/site/1. I wanted to be able to map via http://www.customdomain.com to that URL, but to do so I needed to pass the parameters from the URL into the controller when the parameters do not exist in the URL. That took some more figuring out, but it turns out you can do it like this:
$app = app(); $controller = $app->make('App\Http\Controllers\WhateverController'); return $controller->callAction('show', ['request' => $request, 'id' => $id]);
The controller expects a Request object, and to get that passed in you need to add it to the route explicity:
Route::get('/', function ($domain, \Illuminate\Http\Request $request) {
With that addition I am able to map the custom domain to a specific controller and pass in variables which are determined in the routes file.
The final code looks like this:
Route::group(['domain' => '{domain}'], function ($domain) { Route::get('/', function ($domain, \Illuminate\Http\Request $request) { $site = \App\Website::where('host', $domain)->first(); if($site){ $app = app(); $controller = $app->make('App\Http\Controllers\DomainController'); return $controller->callAction('show', ['request' => $request, 'id' => $site->store_id]); } else { $app = app(); $controller = $app->make('App\Http\Controllers\HomeController'); return $controller->callAction('index', []); } }); });
So for route "/" it checks to see if the domain exists in a table in the database, if so calls DomainController with a parameter from the DB and the request. If not it calls HomeController.
After a long time spent trying to figure this out it turns out to be a lot simpler than I thought it would be. However now I need to add specific routes to some domains but not others. I don't expect that will be too different than the method I am currently using.
May 7, 2017, 10:40 a.m.
One of my greatest frustrations with Eloquent collections has been that I needed to loop through the collection and either create a new collection or array if I wanted to have the results in a format where I could access them by a value in the query results. That is to say if I want to be able to access the results by say primary key, I would need to loop through the Collection returned by the query and create a new object with the key as whatever I wanted it to be.
I just learned that there is a much easier way to do this:
Model::all()->keyBy('whatever');
This will return the collection with "whatever" as the key, which makes life so much easier and code so much cleaner.
April 25, 2017, 4:20 p.m.
I was struggling today trying to get a Google Recaptcha integrated into a site. The space was smaller than the recaptcha size, and I couldn't figure out how to size it down. I found this article that explains very easily how to do it.
<div class="g-recaptcha" data-theme="light" data-sitekey="XXXXXXXXXXXXX" style="transform:scale(0.77);-webkit-transform:scale(0.77);transform-origin:0 0;-webkit-transform-origin:0 0;"></div>
Labels: coding
April 19, 2017, 10:24 a.m.
I've been struggling with how to display images of various sizes in the same size. You can adjust the height and width of the image, or the max-height and max-width, but those will skew the aspect ratio.
To date, I'd been addressing this by putting the image in a sized div, and hiding the image that was outside the div, but this causes issues with positioning - you have to position the image the way you want it in the div - so the image is always cropped on top or on bottom, and not centered.
I just found a simple CSS solution to this:
img {
object-fit: cover;
}
This will center and crop the image to a size specified by the image tag.
April 13, 2017, 3:56 p.m.
I tried a bunch of different things to try to address the issues I was having with using Socialite in a site that has multiple domains. The problem ended up being the session domain. I was able to generate callback URLs for each domain easily, but I couldn't get around the session domain issue.
Rather than spend more time on this I ended up using a little workaround. From my login page if you click on the login with whatever provider button it directs you to one domain, the one that the session domain is set to, and from there the Socialite logins work fine.
It's not ideal, but the worst that will happen is that someone ends up on a different domain than they started on.
April 7, 2017, 4:29 p.m.
I've been struggling with an error with Laravel Socialite for months now. At times the socialite login worked perfectly, but at other times and with certain providers I got an error:
InvalidStateException in AbstractProvider.php line 200
This was not critical functionality so I just kept pushing it off, but finally we have found the solution. It is related to the domain in config/session.php. This value defaults to NULL, and it apparently needs to be set to the domain the site is running on. This site in question runs on many domains, so setting the value to a single domain fixes one domain but leaves all the rest broken.
So for me the issue is not solved, but at least I am not seeing that error anymore.
March 20, 2017, 7:27 a.m.
When a friend first told me about Doctrine many, many years ago I thought it sounded like a terrible idea. I have also had a love affair with SQL and the thought of having to use objects and functions to interact with the database instead of writing queries and getting the data you needed was overly complicated and didn't add anything. When I first started using frameworks to code I felt the same way about them - it seemed to add unneeded layers of complexity to have to incorporate code written by someone else to do something I could very easily do on my own.
It wasn't until I started using Laravel that I came to appreciate frameworks - I could write a simple CRUD that would have taken me days writing from scratch in a couple of hours thanks to the functionality built into the framework. And the ORM made things a lot easier as far as database interaction, but only within certain parameters. I realized this when I tried to normalize a table in my database by separating out a field into another related table. It became very complicated and convoluted to do simple things like sort the query by a value in the related table. I ended up denormalizing the database and getting rid of the extra table to keep the code clean.
At the time I assumed that when you set up a relationship between models when you queried one Laravel would join the other tables in to get all the needed data in one query. It wasn't until I installed debugbar that I realized that laravel would do n+1 queries to get n rows of data with one relationship. This issue is avoidable using Laravel's "eager loading" - which will get the same data in 2 queries. However to get data from multiple tables in one query - using a JOIN - and specifying WHEREs and ORDER BYs on the joined tables, the Query Builder syntax gets quite ugly.
In my opinion this is largely because Eloquent is an implementation of the Active Record pattern, which represents one row of one table in the database as an object as far as the code is concerned. In Eloquent, if you query multiple rows you are returned a Collection of these model objects. While Active Record is great for dealing with simple databases where one row contains usable data, if you are dealing with a highly relational database where you need data from multiple tables it doesn't hold up so well.
When I started out programming we were using MS SQL Server and the programmers were not allowed to write any queries - we were to use stored procedures written by the DBA. At the time I didn't understand the reason for this, but now I realize that it allows the database structure to be separate from the code - so that database changes won't result in needing to rewrite large sections of code. This, in my opinion, is the main advantage of using an ORM. So what do you do when you need to write actual SQL queries for your code to work properly and efficiently?
One option I investigated was adding a Repository layer to the code. With the repository pattern the models handle the reading from and writing to the database, but the repository interacts with the code. For my needs, the repository would basically act the part of the SPs - the queries would be written in them and the code would call the methods in the Repository to get the data they needed. Basically it just put all of the queries in one place so that if the database was ever changed the queries that needed to be rewritten would all be in one place. I tried implementing this, and it worked, but it added another level of abstraction and complexity. And the way I implemented it was basically no different from writing the queries directly into the models, which works just as well, but for some reason it bothers me to have complex and bloated models.
I am still working through this issue and do not have a solution yet. Dealing with this has made me remember the many issues I had with ORMs and frameworks in general back when I first started using them. Using the tools of the ORM the issues can be addressed - but not in a simple, clean and elegant manner. And in my opinion, the main problem is the Active Record pattern itself - it is great if you need to work with a single row in a single table, but if you need to span multiple tables to get your data, it doesn't hold up so well.
March 5, 2017, 8:07 a.m.
Once I had the authorization working and could display the comment form properly the next step was to get the form submitting properly. First of all, to use forms in AMP you need to include the following script:
<script async custom-element="amp-form" src="https://cdn.ampproject.org/v0/amp-form-0.1.js"></script>
AMP forms work mostly the same way as normal HTML5 forms, but there are a few differences. With AMP you have to either use GET forms or you can use POST forms but you need to use action-xhr instead of action. For POST forms you also need to include a target of either _blank or _top. Using action-xhr means that the page will basically post an AJAX request which expects a JSON response instead of reloading the whole page. If you want to reload the whole page you should use method="get".
The response from the request needs to include the same headers as the authorization request, which are detailed in this post. The response doesn't need to contain any specific data, you can put whatever you want in there. You can use the response data to update the page after a successful post, but I have not done that yet.
In this case, the controller function action for my new comment just adds the comment and then returns the id and body of the new comment in JSON along with the necessary headers. I would like to display the new comment on the page, but the syntax to display the JSON data is {{ var }} which is the same syntax blade uses. I know there is a workaround for this, but I haven't looked for it yet.
The two big issues I had with this form were how to update the page after a successful post. I wanted to do two things: hide the form to leave a comment and display a success message. There are AMP components to accomplish both of these tasks.
To hide the form after a successful post you can add the following CSS:
form.amp-form-submit-success > input {
display: none
}
And set the form's class to "hide-inputs." When the success message comes back the form's class is updated to included amp-form-submit-success which will cascade down to hide any child inputs. I had my inputs in a panel inside the form and the inputs were not hidden because the input was not a direct child of the form. This was fixed by rearranging the elements so that the form was inside the divs so that the inputs were children of the form. Before I arrived at this solution I first tried to hide the entire form, which worked great, but also hid the success message. Since my form field was a textarea I had to add another item to the CSS for textareas, otherwise identical to that above.
The next step was displaying a success message, which I did by following the instructions here. The block of sample code from this page is:
<form class="hide-inputs" method="post" action-xhr="https://ampbyexample.com/components/amp-form/submit-form-input-text-xhr" target="_top">
<input type="text" class="data-input" name="name" placeholder="Name..." required>
<input type="submit" value="Subscribe" class="button button-primary">
<div submit-success>
<template type="amp-mustache">
Success! Thanks {{name}} for trying the
<code>amp-form</code> demo! Try to insert the word "error" as a name input in the form to see how
<code>amp-form</code> handles errors.
</template>
</div>
<div submit-error>
<template type="amp-mustache">
Error! Thanks {{name}} for trying the
<code>amp-form</code> demo with an error response.
</template>
</div>
</form>
The div submit-success is hidden until the form submission comes back with a success message at which point it is display, and likewise with the submit-error block. To use the amp-mustache templates you need to include the following in a script tag:
script custom-template="amp-mustache" src="https://cdn.ampproject.org/v0/amp-mustache-0.1.js"
In the submit-success section the {{ name }} will substitute in the "name" element from the JSON data returned from the post. In my case I have left this out for now and just display a success message and hide the form.
To see this in action you can look at the AMP version of this blog here. When you submit a comment the form disappears and is replaced by a success message. Ideally the new comment would show up, but I'll get to that at some point in the future.
Update - the blade syntax to display the "{{ }}" for Javascript is "@{{ whatever }}". So I updated my code to actually display the comment after it is posted.
March 4, 2017, 2:04 p.m.
I've been trying to figure out how to make packages for Laravel, and there isn't as much documentation as one would hope there is. The Laravel docs aren't as helpful as they could be for someone who has never done this before, and most of the info I found on Google was either incomplete or for older versions of Laravel.
I did find a few pages with helpful information on how to do this, this one is the one I followed. It uses this CLI tool, itself a Laravel package, which will allow you to make other Laravel packages. The CLI tool creates the directory structure along with composer.json and boilerplate code that provides a good starting point.
Other tutorials I found helpful include:
I ran into a few problems which took some research to solve, which I thought I'd put here in case anyone else is having the same issues:
Of course I had other issues but those are the ones that took a while to figure out. I hope to finish the package up in the next few days, I'll post updates as they come.
Update - to use action() to create URLs you in fact do use the full path to the controller and it works this time. Not sure what I did wrong last time, but it is working fine now.
March 4, 2017, 2:04 p.m.
I finished working on my Laravel package, which is the blog I use here (and also on my other site). I had this on my GitHub as a Laravel skeleton application, but after a few days of research and coding I now have it as a Laravel package, which can be installed via Composer. I did find a more comprehensive tutorial on writing Laravel packages, but I only just found this today after I had finished my package, so haven't really read through it.
The reason I started working on this package is because I have multiple sites that use the same code and I wanted to consolidate them so I wouldn't have to maintain two separate code bases, but the package is only in English and some of my sites are in French, so I guess my next step is adding translation to the package.
The package is on Packagist and can be installed with composer.
composer require escuccim/larablog
A few things that I struggled with and eventually figured out since my last post on this topic:
$this->publishes([
__DIR__.'/config/config.php' => config_path('blog.php'),
__DIR__ . '/resources/views' => base_path('resources/views/vendor/escuccim')
]);
$this->mergeConfigFrom(__DIR__.'/config/config.php', 'blog'); where 'blog' is the key for the config array.
$this->loadMigrationsFrom(__DIR__.'/database/migrations');
There is still work to be done, but I just marked my GitHub repo with a stable release version, so that's something.
March 4, 2017, 2:04 p.m.
Apparently Google doesn't like it if all of your pages have the same title and meta description tags. So yesterday I decided to write unique titles and description tags for all of my pages. At first I did this by setting two variables - $title and $description - in my controllers and then passing them to the views, where I displayed them in my layout/app.blade.php. Since I have multiple languages in this site I ended up setting them like this:
$title = trans('whatever.pagetitle');
$description = trans('whatever.pagedescription');
This seemed a bit inelegant and I thought I could come up with a better way, which I did this morning. What I did was set up a file in my lang directory I called metadata.php. A sample of this is here. This file contains for each page a key for title and description as follows:
'/home-title' => 'Title';
'/home-description' => 'Description';
By using the URI appended with the value I want I was able to consolidate all of the values into one file for ease of use, and I was also able to make a helper function to get those values from the translation files and display it, so that the same exact code could be run on every single page and return the data I need.
The helper function I used is on my GitHub here, and if it doesn't find data for the page it is looking for it has a default title and description it uses. For pages like blog articles and individual records I use the same title and description, but I still specify $title in the Controller, and if the value exists it is appended to the title in the layout file.
I like this solution because it allowed me to delete the redundant and ugly code in the controllers where I specified a title and description for each page with a function that pulls the data from one location, and if the data doesn't exist it substitutes a default in, instead of either failing or not doing anything. The code I used is on my GitHub Gist.
March 4, 2017, 2:04 p.m.
After having written a few packages I believe I now have it down cold. I started another one just a few hours ago and am already finished with it and had no problems at all this time. The latest package is escuccim/recordcollection which is a package with the code I use for my searchable database of vinyl records. Yesterday I did a package with the code for my online CV in it, so at this point this site is basically just the static pages and four Laravel packages. While it's a bit more complicated to make changes now because I need to go to the package code, alter it, then push it up to git and then composer update, I think the extra couple steps is well worth it in terms of maintainability and portability. My packages are largely self-contained, with their own views, controllers, models, etc. so I can just add them to a project and everything will (almost) magically work.
I don't really see much demand for a package like this for other people so I haven't thoroughly tested this one in projects other than my own, so it may not work properly out of the box. If you have a large record collection that you want to store online I would recommend Discogs.com. It has a lot more features and functions than my package does, but I don't have the time to go through each of my 2,000 or so records and add them to my Discogs collection and I've had most of mine in a database for about 15 years, so I'm sticking with my own for now. If I was starting over from scratch I'd probably put it in Discogs and then pull the data from their API to display it here.
My other packages are all tested and working on fresh installs of Laravel, so feel free to use them if you want.
Update - I ended up testing the recordcollection package on an almost fresh install of Laravel and I fixed the issues I found, so it should be in mostly usable shape.
March 4, 2017, 2:04 p.m.
Back when I wrote the code to localize this site I ran into some unexpected behavior that I couldn't figure out. I have two ways to set the language here - first you can do it by subdomain, fr defaults to French. Then regardless of the subdomain you can use the drop-down menu in the navbar to set the language, which sets a session variable. To handle the subdomain I have a middleware which runs on every request and sets the locale to the language specified by the subdomain, if a subdomain is used. I was confused by why the subdomain could be overwritten by the session var set with the drop-down menu, but I ended up leaving it that way because it worked better than the way I had originally envisioned.
This weekend I decided to try to get to the bottom of why the behavior was different than what I would have expected and I discovered something a bit bizarre about Laravel sessions. In the middleware the session is always empty, but I can set a variable and access it from within the middleware. By the time I get to the controller the values put in session in the middleware are gone, and replaced with the values previously set in the session. I haven't looked at Laravel's session code yet, but I assume that however it stores session variables is initialized somewhere between the middleware and the controller. Before I started with Laravel, I used to keep session variables in $_SESSION, so the way it works now is a bit confusing to me.
To explain, I have the following in my middleware, which is registered to run on every request:
public function handle($request, Closure $next)
{
echo "1: " . session('foo') . "
";
session(['foo' => 'bar']);
echo "2: " . session('foo') . "
";
return $next($request);
}
When I load any page, it outputs:
1:
2: bar
If I then in a controller execute:
session(['foo' => 'baz']);
And load another page which just contains:
echo "3. " . session('foo');
The output, with the middleware is:
1.
2. bar
3. baz
So, in the middleware you can set and access the session, but the session doesn't persist past the request, and by the time the controller is executed that session has been replaced with a session that does persist from the previous request. I can think of a few ways around this, but it doesn't seem worth the effort involved. For me, the result of this issue is that I have to include a call to a helper function in every single page I want to translate - if I want to keep the drop-down menu to translate. My other option would be to just do the localization based on subdomain and have the drop-down menu link to the same page on a different subdomain instead of just setting a variable and reloading the same page, which may in fact be a better solution, but again maybe not worth the effort.
I don't know if anyone else has run into this behavior in Laravel, I also don't know if this behavior is intentional or not, but if you are trying to access or set session variables in a middleware with no luck this is likely the reason.
March 4, 2017, 2:04 p.m.
I figured out how to resolve the session and middleware issues I mentioned in the previous post. I previously had the middleware in the $middleware array in /app/Http/Kernel.php. The Laravel session is started with the middleware StartSession, which is in the $middlewareGroups array under web. I moved my middleware to the web $middlewareGroups and put it after StartSession, and now the Middleware can access the session. The only difference is that the Middleware will only be run on requests that are part of the 'web' group instead of on every request, but this actually makes more sense in this case.
After having figured this out the unexpected behavior I was witnessing before makes sense now.
March 4, 2017, 2:04 p.m.
I just upgraded one of my projects to Laravel 5.4 and I immediately had some issues with PHPUnit tests. They changed the testing framework in the new release, and you will need to alter any existing tests that use browser testing accordingly. This is all documented in the upgrade notes under the testing section.
After making the changes listed in the documentation most of my tests ran fine, but I still had one that kept giving me an error I couldn't figure out:
Fatal error: Class 'BrowserKitTest' not found
I copied over the code from tests that were working into the file with the error and the exact same code that was working in one file was giving me this error in a file with a different name, which was very frustrating, to say the least. After a little bit of trial and error I came up with a way to fix this issue, which was simply to rename the file. The original name of the file was BlogTest.php, but when I rename it to CBlogTest.php it works fine. My guess as to why this is is that it loads the php files in the /tests directory in alphabetical order and it couldn't find the class BrowserKitTest until it had loaded that file. I assume this is because I have upgraded from 5.3, and the new BrowserKitTest.php file needs to be added to some autoload file somewhere.
I'll post more thoughts about Laravel 5.4 after I've had some time to mess around with it. So far, other than this one issue, all my code has worked fine after upgrading.
March 4, 2017, 2:04 p.m.
Yesterday, after playing around with Laravel 5.4 for a few days on my dev environment, I upgraded this site. The only issue I had was with the testing, which I addressed in the previous post. The PHPUnit issues were easily resolved by installing laravel/browser-kit-testing and updating my existing tests to reference the new BrowserKitTest.php class instead of the old TestCase.php. The issue of the naming of the test files was in fact due to autoloading, and I resolved it by adding the following to my composer.json under the "autoload-dev" classmap section:
"tests/BrowserKitTest.php"
After upgrading from 5.3 to 5.4 you need to clear your view cache, which you can do with:
php artisan view:clear
And the upgrade guide also suggests clearing the route cache with:
php artisan route:clear
I have never had an issue with the route cache, but I have often had issues with the view cache, so I personally clear my views after most of my updates.
The other thing to be aware of when upgrading to 5.4 is that tinker is no longer part of Laravel, so needs to be installed separately as laravel/tinker. I believe it is installed by default with a new installation of 5.4, but if you are upgrading you need to do this, especially if you use tinker as frequently as I do.
I haven't used any of the new features of 5.4 yet, from reading the upgrade guide there aren't really many features that jump out at me as something I would make a lot of use of, but I'm sure I will in the future.
March 4, 2017, 2:04 p.m.
I made an update the other day to the LaraBlog package. I added the ability to reply to comments and for users to delete their own comments. At first I had a hard time figuring out how to display the nested comments, as theoretically you can have an infinite number of replies to a comment, so how do you determine the inset when Bootstrap only has 12 columns?
This was easily resolved by two realizations:
So what I did was update my Blog model so that when getting the comments for a blog I only get the original comments, not replies. Then I added to my Comments model a function to get the replies to a specific comment.
I already had my comments display in a separate view which displayed the form to leave a comment and then all of the comments for a specific post. I separated this out into multiple views:
The index, when including replies, leaves a blank column to the left of the replies, which, when recursively included in the levels of nested replies, will size itself to the available space, so that the indent gets smaller on every nested level of replies - but there could theoretically be an infinite level of nested replies without breaking anything.
I had never thought of recursive including of views until I came across the idea on stack overflow while looking for a solution as to how to address what looked to be a nightmare of infinite nested loops; but that idea provided a simple and elegant solution to what was looking to be a complete mess.
March 4, 2017, 1:03 p.m.
Over the last week I have been messing around with adding Structured Data to my pages so that Google can display Rich Cards. Google hasn't yet indexed my pages with structured data so there's not much I can say about that so far. I have also been playing with Accelerated Mobile Pages (AMP), which are lightweight pages designed specifically for mobile devices.
The two resources for AMP which I've found to be useful are AMP Project and AMP By Example. Unfortunately neither of them goes into a whole lot of detail about how to implement this stuff and I haven't been able to find very good explanations online. However I've been able to solve most of the issues I've encountered by trial and error.
The biggest difference between AMP and normal HTML is that AMP does not allow Javascript, nor does it allow linked CSS. All CSS must be inline, must total less than 50kB, and the only Javascript you can use is special Javascript from AMP Project. All images must have sizes defined and forms work a bit differently. The reason for this is to avoid any blocking resources that could slow the load of the page. I have been using Bootstrap CSS which I thought would be compatible as it is responsive and displays great on mobile devices, but it's too big and uses Javascript. I ended up using Bootstrap's Customizer to only output the elements I needed and then minified that and included it into my page. At some point I will clean out unused styles from the CSS to trim it down even more, but I was able to get my CSS to just barely fit the maximum size requirements by only using the bare minimum.
AMP isn't really all that complicated - it really just restricts what you can use in the page, but there were a couple things that I really struggled with. Those were, in order of difficulty:
I will post more articles on each of these three issues and my solutions to them over the next few days, as I get the kinks worked out.
March 4, 2017, 1:02 p.m.
Many years ago, before cloud servers existed, I worked for an ISP. We were running Linux servers and we had way more problems with them than one would think possible. It seemed like every other month the servers would crash and we would lost most of the data on them. Back then we had a tape backup system, which seemed almost as failure prone as the servers, so when the crashes happened there was rarely any recoverable data.
As a result of that experience I am meticulous about always having my important data backed up. All my code is either on GitHub or BitBucket, so the only thing that exists only on my servers is my databases. A couple of months ago I decided I needed to have that data backed up regularly and after considering a few options I decided to back it up to Amazon S3. I wrote a two piece solution consisting of:
Since the PHP part was a command I could just call it from my shell script and schedule that as a cron job. Yesterday I decided to turn the Laravel piece of that into a package called escuccim/s3backup. The code was originally written specifically to upload my DB dumps to S3 and had most parameters hard-coded in, so I added a few options to the command and updated my shell script to pass them in.
The package currently only works for a single file at a time, as that is all I need it to do, but I may add support for directories at some point in the future. The package is available through packagist although to use it you currently need to specify version dev-master.
March 1, 2017, noon
When I worked on this site I implemented a "login with Google" feature for which I used Google's Authentication API. But I used it manually. I used Google's Javascript function and wrote a controller to handle the data the API returns. It works, but it's a bit clunky and far from ideal.
Just today I used Laravel's Socialite package for the first time. It can handle Oauth requests for Google, LinkedIn, Twitter, Facebook, GitHub and BitBucket - and it's much, much easier to use than it was doing it myself. When I was looking into using Oauth I found a Laravel package to integrate Oauth logins, but it was very complicated to use. It created about a dozen tables and I ended up abandoning it to write my own code for integrating with Google.
With Socialite all you do is put the Client ID and the Secret's into a config file and add two functions into your LoginController - one to handle the login attempt and one to handle the callback. The login function just directs the attempt to the appropriate provider:
return Socialite::driver($provider)->redirect();
And the callback function gets the information returned by the provider:
$user = Socialite::driver($provider)->user();
In the callback function I also handle adding the user to my database and logging them in. Next chance I get I'm going to take out all of my Google Javascript code from this site and replace it with Socialite. I couldn't believe how simple it was.
Feb. 28, 2017, 5:09 p.m.
As a result of figuring out what was going on with the session and the middleware yesterday I was able to rewrite my localization code and greatly simplify the whole project. Previously I had been calling a function to set the language in every single controller action that returned a view, I was able to eliminate all of that and consolidate everything in the middleware.
I made another package - escuccim/translate - that has two parts:
First is the middleware which does two things:
a. Checks the subdomain to see if the subdomain corresponds to a language. If so it sets the app locale to the appropriate language.
b. Checks to see if there is a session variable with the language in it, if so it sets the app locale accordingly.
The key for me here is that if there is a locale specified by both the subdomain and the session, the session takes precedence, thus allowing the user to display the page in whatever language they desire, irregardless of the subdomain.
The second component of the package is a route which accepts a locale as a parameter and sets a session variable to that locale, so that the middleware can then access that information.
This package is available on my GitHub and my Packagist. When I am done testing it you can install it via composer.
I'm glad I took the time to investigate the session/middleware issues because figuring that out allowed me to replace code that was unneccessary and ugly to look at with a nice, simple, elegant solution.
Labels: coding , laravel , localization
Feb. 28, 2017, 5:05 p.m.
I've been messing around with Redis for a little while now and I'm using it in a couple places on this site. The first thing I did was I started caching some DB queries that get performed a lot, like the main blog page, the blog archives menu and the list of recent posts on the home page. Laravel's Cache facade makes caching really easy, and you can switch between the default cache driver which caches to files and Redis without changing any of the code. For some pages I use the Laravel Cache facade, for others I use the Redis facade to cache directly to Redis just for some variety. In general it is probably much better to use the Cache facade than the Redis facade because if you want to switch to a different caching mechanism with one change to the .env file instead of having to rewrite all of the code.
I also use Redis to queue some tasks which don't need to be done synchronously, but that's another post.
I never used Memcache because I didn't like the fact that it's all stored in memory, so if the server goes down you would lose all of the data in it, but Redis persists data to disk by default so it provides the speed of keeping data in memory with a very low risk of losing the data. In my case, I store the data in the DB and cache it in Redis, but if I were to start from scratch (and had plenty of RAM on my server) I would probably keep a lot more data in Redis.
So, to summarize, I think Redis is awesome and I will definitely make more use of it in my stack in the future.
Labels: coding
Feb. 28, 2017, 5:04 p.m.
In my records table I used to have the full text of the label in each row, but I ended up with small typos in label names that ended up screwing things up. So I separated the labels out into their own table and added a foreign key to link the two tables. In my records admin instead of having a text field for the label I used a drop-down that was populated from the labels table. But this caused more problems than it solved, because it greatly complicated the code for searching records, updating records, and I had to write extra methods to add new labels to the labels table.
So I decided to get rid of the extra table and just put the full text back into the records table. I kept the drop-down menu in the admin section, but populated it with data pulled from the records table to eliminate my original problem. To my Record model I added a public static function that selects the distinct labels with the text as the key and the value to pass to the drop-down menu, and then was able to greatly simplify a lot of my code.
I just pushed this up a few minutes ago, and so far haven't found any problems, but that doesn't mean they aren't there. If anyone finds any errors in the Record Collection page or the API please let me know.
As a side note, having PHPUnit tests for everything already set up made this a whole lot easier. I didn't have to go through and check every possible combination of things that could be searched or test adding and editing and all of that because I already had the tests written. I used to rely on a QA department to regression test everything, but PHPUnit makes all of that much easier and instead of waiting days to have QA people check everything I can do it in under a minute myself.
Feb. 28, 2017, 5:03 p.m.
The first time I ever tried to install an SSL certificate on a web server was probably around 1999. At the time there were only a couple options - Verisign and Thawte if I remember correctly, the certificates cost a couple hundred dollars each, and you had to go through a lengthy and complicated process to get the certificate approved which involved compiling a lot of documentation (I remember being asked for a Dun & Bradstreet number for one thing), multiple phone calls, and took a couple weeks to complete. Once the certificate was finally approved and issued the process of trying to install it on the server was almost as complicated.
How times have changed. Yesterday I installed a certificate on this server. It was free and took about 15 minutes, most of which was spent trying to find the documentation. At first I was just messing around and decided to install a self-signed certificate, which was quick and easy, but having to click through the page which says that "this site is insecure" was nerve wracking, even knowing that it doesn't really mean anything. A quick Google search turned up lets encrypt which offers free SSL certificates that are recognized by most browsers.
As easy as installing an SSL certificate for Apache is, I then found CertBot which makes it even easier. The main page has instructions for different OSes and servers. For Ubuntu I just installed the certbot package and ran it, it asked me what domains I wanted the certificate to cover and for my email address and then generated it.
I was a bit wary of allowing CertBot to change my Apache config so I just had it generate the certificates and did the config myself. After I had no problems, on my other server I let CertBot do the config as well and had no problems at all. And when it was done SSL just worked, I didn't have to touch the config or even restart Apache, much less provide a DUNS number. I'd like to thank the EFF and Lets Encrypt for CertBot and for making this so easy.
Labels: coding
Feb. 28, 2017, 5:02 p.m.
Since I have multiple sites that use almost the same code I have been trying to consolidate shared code into Laravel packages for ease of maintenance. This weekend I did my second package which is escuccim/sitemap which contains my code for generating XML sitemaps for Google. Since I have this site available in more than one language and I use subdomains to set the default language it made for very messy and confusing hardcoded sitemaps. I was able to shrink the code for each sitemap down from hundreds of lines to about 50 by putting the subdomains and the corresponding language in a DB table and then looping through them to output the URLs and hreflang tags in the sitemap. This time the process of writing the package was quick and easy using the same method that I struggled with last time.
Once I had that working I went back to my LaraBlog package which I added translation functionality to. I had one big problem which took me a while to figure out which was that it wasn't loading the translations at all, it was just displaying the key: 'escuccim::blog.key'. I researched this and found no answers, but was able to solve it by changing the namespace or hint to larablog. I am not sure why this worked, but I suspect it may be because I was using the namespace escuccim for the views and maybe they conflicted? Anyway if anyone else is having this issue try to change the namespace/hint.
When I had the blog package translating properly I deleted the code I was using for this site for the blog and the sitemap and replaced it with the new packages. So far everything seems fine, but I will give it a day or two before to turn up any issues before I start using the packages in other places.
I have a few other things I want to put into packages, and I just have to say that Composer makes my life so much easier! Instead of having to go through my code line by line to copy changes from one place to another while avoiding any functionality that differs from one project to the next I just update the package and then composer does the rest!
Feb. 28, 2017, 4:47 p.m.
I was just dealing with an issue where I wanted to create routes from the database. The site has pages which are contained in a database table and I had a route which took in the name or id as a parameter and rendered the appropriate page. Of course it doesn't really look nice if you have to go to /pages/about, a more intuitive way would be just /about, so I was trying to figure out how to accomplish that.
I tried getting the pages from the database and creating the routes dynamically, but that wasn't working because the route still needs to pass a parameter to the controller. I could have gotten the URL from within the controller and used that, but I found an easier and cleaner way.
At the very end of the web.php routes file I added:
Route::get('/{slug}', 'PagesController@show');
When Laravel has a route it goes through the file and tries to find a match. When it finds one it stops and executes it. So by having this route at the end of the file it will only match routes that haven't already been matched. So for any route that isn't already defined it will called PagesController@show and pass it $slug, which is the exact same thing that the old route did:
Route::get('/pages/{slug}', 'PagesController@show');
Except this route gives me a nice, clean URI instead of a clunky, ugly one.
Feb. 21, 2017, 8:59 a.m.
It's now been a couple of weeks since I started adding Structured Data to this site and Google has started to index some of the structured data. However, Google Search Console still shows no Rich Cards. I am not sure why, it may be that Googlebot hasn't yet gotten around to that. Or it could be that Google only creates Rich Cards for certain types of Structured Data. The documentation I found was mostly from when Google started to introduce the Rich Cards and it said they would only be generated for specific types of content - Recipes, Movies, Reviews, News Articles and a few other types. I do not know if Google has started to implement Rich Cards for other types of structured data or not. I don't particularly care about having Rich Cards displayed, I mostly just wanted to figure out how to use them.
So, while I still have no Rich Cards having the Structured Data can't hurt, and Google has started to index that and it shows up in the Structured Data report. The AMP pages have also started to be indexed, although I am seeing some inconsistencies in the Search Console Reports. It could just be that Googlebot needs more time to index things. Who knows?
Feb. 14, 2017, 2:34 p.m.
I finally got the AMP forms working as expected. It was a bit tricky to figure out so I will outline the issues I encountered and how I solved them. The situation I was working with was making a comments form for the AMP version of my blog pages.
The first issue I had to deal with was that a user can't leave a comment unless they are logged in. In the rest of the app I use the session to determine if the user is logged in, but AMP has it's own protocol for doing that, which involves making AJAX requests to a page which returns a JSON response to determine if the user is logged in. In this case, in the controller I simply do an Auth::check() and return a JSON response depending on the results of the check().
The issues arose from the fact that AMP requires specific response headers, which took me a while to figure out how to set properly. I wasn't able to find much documentation on the values of these headers, but I was able to figure out the proper values.
The headers required were:
The latter two headers need to have specific values, and although they ended up being the same in most cases, I set them to the separate values to make sure errors won't occur.
The value for Access-Control-Allow-Origin needs to be the "origin" header made in the request, which I get with:
$request->header('origin')
The value for the AMP-Access-Control-Allow-Source-Origin needs reflect the value passed in the URL to the request, which is a parameter named: __amp_source_origin.
The authorization page can return a variety of values to indicate whether the user has a subscription, if they can view a specific number of free articles, and what they have access to. But in my case all I need to know is whether they are logged in or not, so I just return the JSON data:
{loggedIn: true}
To enable content being displayed differently based on authorization you need to include the following scripts:
<script async custom-element="amp-access" src="https://cdn.ampproject.org/v0/amp-access-0.1.js"></script>
<script async custom-element="amp-analytics" src="https://cdn.ampproject.org/v0/amp-analytics-0.1.js"></script>
You also need to include the following in a script to tell the scripts what to do and where to get the info from:
{
"authorization": "[Auth URI]",
"noPingback": "true",
"login": {
"sign-in": "[Login URI]",
"sign-out": "[Logout URI]"
},
"authorizationFallbackResponse": {
"error": true,
"loggedIn": false
}
}
Where [Auth URI] is the URI detailed above which returns whether the user is logged in or not; [login URI] is the URI to allow the user to login; and [logout URI] is the URI to allow the user to logout. All URIs must either be HTTPS or // or AMP will complain about them and won't function properly.
Then the following code is included in the template:
<span amp-access="NOT loggedIn" role="button" tabindex="0" amp-access-hide>
<button on="tap:amp-access.login-sign-in" class="btn btn-xs btn-primary comment-button">Login</button>
Please login to comment<br><BR>
</span>
<span amp-access="loggedIn">
@include('amp._commentForm')
</span>
The amp-access attribute in the span tells the page NOT to display the section if the user is loggedIn - presumably you could vary this to reference other data returned by the auth page. The on attribute of the button tells the page to reference the login:sign-in attribute of the amp-access script when it is tapped, so it will launch the [login URI] when the button is clicked. And finally the amp-access="loggedIn" attribute says that if the user IS logged in the commentForm will be included.
For me the most complicated part was figuring out the response headers required and their values, once I got that figured out the rest worked pretty easily. The next step was getting the actual form to submit and update the page properly. I'll write about that in the next post.
Feb. 12, 2017, 12:08 p.m.
I was having a hard time with the Laravel auth package. If you use the out-of-the-box Laravel Auth, if you try to access a page you don't have access to the Auth will redirect you to the login page and then after a successful login redirect you back to the page you were trying to access.
This works fine. But if I go to a page and then click on login it would redirect me back to a page specified in the Auth controller instead of to the page I was on before I clicked login. I searched for a while and found some info, but not much addressing this specific issue.
I finally found this thread on Laracasts which gives a simple and easy solution to the problem.
The solution is to override the login form method in the LoginController.php in the app directory. I added this function:
public function showLoginForm(){
if(!session()->has('url.intended')){
session()->put('url.intended', url()->previous());
}
return view('auth.login');
}
This pushes the previous page onto the session as url.intended, which is the same thing the middleware does. But this does it in all cases, not just when the middleware catches an auth error. After login the Auth controllers now send you back to url.intended instead of to the default page specified in $redirectTo.
Feb. 12, 2017, 12:08 p.m.
I just updated the search of my records here so it would load the results using Ajax instead of refreshing the whole page. Everything seemed to work fine, but then I noticed that it broke the Laravel pagination. I included the pagination in the section of the page reloaded by searches and sorts so it would update appropriately, but then the page links just loaded the results and didn't load the results into the div on the page where they were supposed to be.
This was a bit tricky to solve. What I ended up doing was exporting the pagination views to my resources directory, and then editing it there. To each of the pagination links I added two things:
Then I added a script to the page that triggers when you click an object with the class of page-link, that gets the page to be loaded from the data-val attribute and submits that to the script that loads the appropriate page with the appropriate variables. Then I did the same thing for the sort links - instead of having each trigger it's own script I made one script triggered by a click on the class and put the data in the data-val.
Feb. 12, 2017, 12:06 p.m.
I decided I wanted to backup my database somewhere other than my server. All of my code is in git so the only thing that could be lost in case of server errors is the database. To start I wrote a little shell script to dump the database using mysqldump. I wasn't sure where to put the SQL file to keep it off-server. My first thought was to put it in git, which was easy to do in the script. So I updated the script to add the file to git, commit the changes and push the repo up.
After a bit more thought I decided that might not be the best way to do it. It worked fine, but my usual workflow is I make all changes locally and then push it to git and then pull to production - I don't make any changes on the production server unless absolutely necessary. While adding files to git that don't exist in my dev environment shouldn't really cause any problems, I thought there must be a better way.
So I decided to put the dump file into an Amazon S3 bucket. Laravel can use S3 as a filesystem, as documented here, but I had tried to use this before and not had much luck. I saw that Amazon had a PHP package to interact with S3, which is SDKforPHP, so I thought I would try that out. After a little bit more digging I found that Amazon also has a package specifically for Laravel, which is located here. That turned out to be the winner. As opposed to trying to read pages of documentation for Laravel's file system or the Amazon SDK, all I need was a few lines of code and I was up and running. As a note, this package keeps the Amazon Keys and Secrets in the .env file, which is a lot better than keeping them in the filesystem.php config file like Laravel does. If you are going to use Laravel's S3 filesystem I suggest you update the filesystem.php file to pull them from the .env.
Now that I was able to upload files to S3 from a browser, the next step was to create an artisan command that I could add to my shell script. The Laravel documentation for this was clear and easy to follow. The only problem I had was a typo that for some reason didn't throw an error locally, but did on my production server. Other than that this is tested and working.
I had considered using S3 for this site in the past, but decided not to since I had problems with the Laravel S3 filesystem. Now that I've integrated with S3 so easily I may revisit that decision.
Labels: coding
Feb. 12, 2017, 12:06 p.m.
I initially ran into problems pulling the data from discogs because I was using a package that provided a Laravel implementation of cURL called Laracurl, and it didn't provide a header that discogs needed. So I made a change to the package and got it working. After someone advised me that Guzzle was a better package than cURL I switched to that, and in the process rewrote my matching code. After running the new, improved, streamlined code I have now matched all but 250 of my records to the discogs data, and 50 of those unmatched records are white labels which may not be matchable. So I am pretty happy with that ratio and will be moving on to my next project now.
Almost all of my records now have a link to discogs and a thumbnail pulled from discogs in the record detail page. If anyone wants to see additional info on the records such as tracklisting, year of release, or anything else, that info will be available on discogs. I didn't see any need to duplicate that data locally.
The code I wrote to automatically match my records to discogs did not match everything 100% accurately, and I tried to review the matches to make sure they were accurate, but I may have missed a couple here and there.
Thanks to Discogs.com for making a great API.
Feb. 12, 2017, 12:01 p.m.
I decided to try to translate this site into French, given that I live in the French-speaking part of Switzerland. Laravel has a lot of great tools for localization built-in, but there were a few things sorely lacking. Laravel, by default, has localization files in /resources/lang/en. Each file is just an array with a key and the translated text as the value. If you want to add a new language you just copy the files over into a new directory, in this case /fr, and translate the text directly in there. In the views instead of typing in the text directly you call trans('file.key') and it pulls the text for 'key' out of the 'file.php' in the appropriate language directory. This couldn't be any easier.
The hard part was when I started trying to figure out how to set the language to be displayed automatically. Laravel pulls this value from config/app.php, and you can change this value easily, but because Laravel is RESTful it has to be done on every request. So I decided to stick the actual language in a session variable and then change the value in the config array if needed.
I tried to make a middleware to do this on each request, but this didn't work because it seemed as if the session either wasn't saving from middleware, or possibly wasn't initialized yet in the middleware. More on this below. So I abandoned the middleware route and added a function in my controller that sets the language thusly:
App::setLocale( session('locale') ? session('locale') : config('app.locale'));
This worked fine, I just need to make sure to call the function everytime a page may need to be translated. My next step was to try to add a subdomain 'fr.' that would automatically set the language to French. You can do this in the web.php routes file, but from what I can tell it needs to be called on every single route, which seemed like an awful lot of work for something that should be pretty easy.
So I went back to the middleware and created a middleware called SetLanguage that I added to app\Http\Kernel.php so it runs on every request. The middleware is quite simply this:
$pieces = explode('.', $request->getHost());
if($pieces[0] == 'fr'){
session(['locale' => 'fr']);
}
return $next($request);
And this works fine. I think the problems I had with the middleware the first time I tried it was that Laravel has changed how sessions are handled in new releases, and the Session facade no longer works or works differently. Instead you now use the session() helper function or call $request->session() to modify the session. I had been trying to use the Session facade.
One thing that seems a bit odd is that since the middleware runs on every request it should translate every page that is called from the fr. subdomain to French. In actuallity it initially sets the language to French, but if you change the language using the drop-down menu it keeps the selection. This doesn't seem right, but in this case the actual behavior makes more sense than the expected behavior, so I am ignoring this bug.
Once I got this figured out, the translation was a simple matter of replacing text in my views with references to the lang files, which went smoothly, although I did have to spend some time trying to figure out how to translate some technical terms into French, and still am not sure I have them all translated properly.
Labels: coding , laravel , localization
Feb. 12, 2017, 12:01 p.m.
After I had everything written and working I decided to go back and try to figure out why I couldn't run my function to get the language out of session and put it into the app config globally. It didn't make sense that I needed to cut and paste the same function into every single controller. So I tried it again as a helper function and this time it works perfectly. I have no idea why it didn't work before, but it's working now. I took the function out of the controllers and replaced it with a call to the helper, which is much better because I don't need to have the same exact code repeated in 10 different places, although it is a bit frustrating that I don't know why it didn't work at first.
I also added a call to setlocale() in the helper function which allows dates to be localized using strftime() instead of date(). I spent a while trying to get this working - I had to add the locales to the server using:
dpkg-reconfigure locales
And select the locales you want to use and then restart Apache. I wasn't able to get the date localization working on my local dev environment, for which I am using Homestead. I am still not sure why, the main difference between my production server and my dev environment is that the former uses Apache and the latter Nginx, so maybe it has something to do with that. As much as I hate not knowing why things don't work that should work, I'm not going to spend more time trying to figure it out since it is working here.
Labels: coding , localization
Oct. 28, 2016, 9:45 a.m.
It was a bit tricky for me to get the archive list on the right here working properly. It was easy to do in normal PHP, but I wanted to stay within a strict MVC model where all the processing is done in the controller and the view just displays the data. I was having a hard time figuring out how to properly group the items without setting variables in the view.
I ended up building an array with three nested levels in the controller. The array is as follows:
I start with my array with the $year as key and an empty array as the value. Then for each month I push on an array with $month as key and an empty array as the value. Then for each post I push on an array with two values - the post $title and $slug. This array is passed into the view.
To display it I just do three nested foreach loops:
Simple, clean and easy! Much simpler than the other ways to do this I found online.
Labels: coding