Nn Very Young Models
Note: this post was originally written in June 2016. It is now very outdated. Please seethis guide to fine-tuningfor an up-to-date alternative, or check out chapter 8 of my book "Deep Learning with Python (2nd edition)".
nn very young models
In this tutorial, we will present a few simple yet effective methods that you can use to build a powerful image classifier, using only very few training examples --just a few hundred or thousand pictures from each class you want to be able to recognize.
In our examples we will use two sets of pictures, which we got from Kaggle: 1000 cats and 1000 dogs (although the original dataset had 12,500 cats and 12,500 dogs, we just took the first 1000 images for each class). We also use 400 additional samples from each class as validation data, to evaluate our models.
That is very few examples to learn from, for a classification problem that is far from simple. So this is a challenging machine learning problem, but it is also a realistic one: in a lot of real-world use cases, even small-scale data collection can be extremely expensive or sometimes near-impossible (e.g. in medical imaging). Being able to make the most out of very little data is a key skill of a competent data scientist.
A message that I hear often is that "deep learning is only relevant when you have a huge amount of data". While not entirely incorrect, this is somewhat misleading. Certainly, deep learning requires the ability to learn features automatically from the data, which is generally only possible when lots of training data is available --especially for problems where the input samples are very high-dimensional, like images. However, convolutional neural networks --a pillar algorithm of deep learning-- are by design one of the best models available for most "perceptual" problems (such as image classification), even with very little data to learn from. Training a convnet from scratch on a small image dataset will still yield reasonable results, without the need for any custom feature engineering. Convnets are just plain good. They are the right tool for the job.
But what's more, deep learning models are by nature highly repurposable: you can take, say, an image classification or speech-to-text model trained on a large-scale dataset then reuse it on a significantly different problem with only minor changes, as we will see in this post. Specifically in the case of computer vision, many pre-trained models (usually trained on the ImageNet dataset) are now publicly available for download and can be used to bootstrap powerful vision models out of very little data.
In our case we will use a very small convnet with few layers and few filters per layer, alongside data augmentation and dropout. Dropout also helps reduce overfitting, by preventing a layer from seeing twice the exact same pattern, thus acting in a way analoguous to data augmentation (you could say that both dropout and data augmentation tend to disrupt random correlations occuring in your data).
The code snippet below is our first model, a simple stack of 3 convolution layers with a ReLU activation and followed by max-pooling layers. This is very similar to the architectures that Yann LeCun advocated in the 1990s for image classification (with the exception of ReLU).
Note that the variance of the validation accuracy is fairly high, both because accuracy is a high-variance metric and because we only use 800 validation samples. A good validation strategy in such cases would be to do k-fold cross-validation, but this would require training k models for every evaluation round.
Our strategy will be as follow: we will only instantiate the convolutional part of the model, everything up to the fully-connected layers. We will then run this model on our training and validation data once, recording the output (the "bottleneck features" from th VGG16 model: the last activation maps before the fully-connected layers) in two numpy arrays. Then we will train a small fully-connected model on top of the stored features.
To further improve our previous result, we can try to "fine-tune" the last convolutional block of the VGG16 model alongside the top-level classifier. Fine-tuning consist in starting from a trained network, then re-training it on a new dataset using very small weight updates. In our case, this can be done in 3 steps:
Objective: To investigate the relationship between plasma Golgi protein 73 (GP73) levels and the occurrence and development of non-alcoholic fatty liver disease (NAFLD), and to establish a diagnostic model based on this combination with lipid metabolism indicators to clarify its diagnostic efficacy and clinical application value for NAFLD. Methods: 225 cases with NAFLD [diagnosed by ultrasound, transient elastography (FibroScan502) and liver biopsy (some patients)] and 108 healthy controls were selected from the Department of Hepatology and Physical Examination Center of Integrated Traditional Chinese and Western Medicine, The Third Hospital of Hebei Medical University. Clinical data, routine peripheral blood and serum biochemical test results were collected. The plasma GP73 level was detected by enzyme-linked immunosorbent assay. SPSS 21.0 statistical software was used for statistical analysis. Binary logistic regression model was used to calculate the NAFLD diagnostic model. Receiver operating characteristic curve was used to evaluate the NAFLD constructed model diagnostic efficacy. Results: NAFLD incidence was significantly reduced in younger age group, mostly in young and middle-aged male. However, the NAFLD incidence was increased with increasing age in female. The analysis of age ratio composition showed that the average age for NAFLD onset was 20 50 years old, and the incidence rate was as high as 47% in among 30 39 years old, but the incidence rate was significantly decreased in over 60 years old (4.00%). GP73 was an independent risk factor for the occurrence and development of NAFLD. The diagnostic models of GBT, GB and GT were established by GP73 (G) combined with body mass index (BMI, B) and serum triglyceride (TG, T), and the results showed that the areas under the curves of GBT, GB and GT models were 0.969, 0.937 and 0.909, respectively. The sensitivity and the specificity were 84.90%, 77.80% and 84.00%, and 95.40%, 95.40% and 82.40%, respectively, P Conclusion: NAFLD is more common in young and middle-aged male, but with advanced age, the incidence of female patients gradually increases. Plasma GP73 levels are related to the occurrence and development of NAFLD. The GBT model can be used as a new model for non-invasive diagnosis and one of the indicators for clinical evaluation of diagnostic efficacy of NAFLD.
Models proposed for theoretical calculations and based on wide-range equations of state of matter are analyzed and compared. It is shown that all previously proposed models of nonideal plasma behavior fall short in terms of accuracy for the equation of state in the range above critical.
This page shows the most frequent use-cases when using the library. The models available allow for many differentconfigurations and a great versatility in use-cases. The most simple ones are presented here, showcasing usage fortasks such as image classification, question answering, sequence classification, named entity recognition and others.
These examples leverage auto-models, which are classes that will instantiate a model according to a given checkpoint,automatically selecting the correct model architecture. Please check the AutoModel documentationfor more information. Feel free to modify the code to be more specific and adapt it to your specific use-case.
Language modeling is the task of fitting a model to a corpus, which can be domain specific. All populartransformer-based models are trained using a variant of language modeling, e.g. BERT with masked language modeling,GPT-2 with causal language modeling.
Language modeling can be useful outside of pretraining as well, for example to shift the model distribution to bedomain-specific: using a language model trained over a very large corpus, and then fine-tuning it to a news dataset oron scientific papers e.g. LysandreJik/arxiv-nlp.
In text generation (a.k.a open-ended text generation) the goal is to create a coherent portion of text that is acontinuation from the given context. The following example shows how GPT-2 can be used in pipelines to generate text.As a default all models apply Top-K sampling when used in pipelines, as configured in their respective configurations(see gpt-2 config for example).
Text generation is currently possible with GPT-2, OpenAi-GPT, CTRL, XLNet, Transfo-XL and Reformer inPyTorch and for most models in Tensorflow as well. As can be seen in the example above XLNet and Transfo-XL oftenneed to be padded to work well. GPT-2 is usually a good choice for open-ended text generation because it was trainedon millions of webpages with a causal language modeling objective. 076b4e4f54