https://ift.tt/xBIgbHc A guide that explains what the documentation doesn’t tell you Uploaded by User:Eleassar from German Wikipedia., CC...
A guide that explains what the documentation doesn’t tell you
Introduction
Keras does a great job of abstracting low-level details of neural network creation so you can focus on getting the job done. But, if you’re reading this, you’ve probably discovered that Keras’ off-the-shelf methods cannot always be used to learn your model’s parameters. Perhaps your model has a gradient that cannot be calculated through the magic of autodiff, or your loss function does not conform to the signature my_loss_fn(y_true, y_pred) mentioned in Keras’ documentation. If you found the online documentation wholly unhelpful, read on! I [hopefully] have all the answers you couldn’t find anywhere else.
Disclaimer
I don’t promise to nail all my explanations; I don’t consider myself TensorFlow/Keras guru. All of what I’ve written here has been informed by many different pages of TensorFlow/Keras documentation and a little bit of source code inspection. If you have corrections or suggestions for improvements, I encourage you to leave them in the comments for everyone to benefit from! Finally, TensorFlow is a trademark of Google LLC and this article is neither endorsed by nor affiliated with Google LLC in any way.
Learning objectives
After following this guide, you will understand how to make a custom, subclassed Keras Model object that uses custom, subclassed Keras Layers in Python. You will be able to write your own custom loss functions which http://archive.ics.uci.edu/mldo not conform to the signature my_loss_fn(y_true, y_pred) described in Keras’ documentation. You will also be able to use custom gradients in combination with the autodifferentiation (autodiff) algorithm to optimize the model’s trainable parameters.
What will this guide demonstrate?
This guide will ultimately demonstrate how you can still use custom losses and custom gradients without necessarily abandoning the convenient keras.Model.fit method for training your neural network.
In this guide we will create neural network model that has a single dense layer and a logistic regression output layer. Raw data will be input to the dense layer, and outputs from the dense layer will be input to the logistic regression layer for binary classification. This guide will not cover validation or model testing; it will only cover the steps involved in building and fitting the model on a training set.
Let’s get started.
Load (or create) a dataset
We’re going to use the Open Access german_credit_numeric dataset, which can be downloaded with the tensorflow_datasets library. It consists of 1000 examples of 24 features associated with a binary credit risk assessment of “good” (1) or “bad” (0).
import tensorflow as tf
import tensorflow_datasets as tfds
from typing import Optional
@tf.function
def sigmoid(x: tf.Tensor) -> tf.Tensor:
return 1 / (1 + tf.exp(-x))
if __name__ == "__main__":
ds = tfds.load("german_credit_numeric", split="train", as_supervised=True)
ds = ds.shuffle(1000).batch(100).prefetch(tf.data.AUTOTUNE)
Create custom Keras Layers
Before we build the model, we should build its components. Recall that the model will consist of a dense layer which transforms observed features in our dataset to a latent representation that will serve as input to a logistic regression output layer. To demonstrate how you can mix and match custom and prebuilt Keras Layers, we’ll use Keras’ built-in keras.layers.Dense class to construct the model’s first layer, and build our own custom layer class for the logistic regression by subclassing keras.layers.Layer.
Let’s break that custom Logistic layer down.
The get_config method
The get_config method is not strictly necessary for the layer to work, but it is necessary if you want to make a so-called “serializable” layer that can be used with the Functional API. You can use the get_config method to get the attributes necessary to recreate the layer. See the documentation.
The build method
Like the get_config method, the build method is not required for the layer to work, but it is the recommended way to define the layer’s trainable parameters (i.e. weights, biases). With the build method, parameter creation is deferred until the layer is first called with some input data. The build method gets the shape of the input (which may not be known beforehand) through the input_shape argument, and creates the layer’s parameters according to input_shape. For example, if the inputs argument to the layer’s call method were a tuple of tensors, Keras would assign to input_shape an array of TensorShape objects (one for each item in the input tensor). The build method is recommended because it allows weights to be created on the fly even when you don’t know the input shapes in advance.
The call method
This calculates a forward pass of inputs through the layer. Here, it returns
the probabilistic output of the logistic regression.
Before we define the model class, let’s instantiate the input layer and the dense layer in the main body of the code:
features_spec, labels_spec = ds.element_spec
del labels_spec # Not used
feature_inputs = tf.keras.Input(type_spec=features_spec, name="feature_inputs")
dense = tf.keras.layers.Dense(units=4, name="dense_layer")
Notice that we created an Input object, feature_inputs, that expects a vector-valued tensor with five components (i.e. a training example with 24 features). The dense layer will calculate a latent representation of the input with just four features. Since we are using the latent representation as input to the logistic regression, the Logistic layer will have four trainable weights (one per input feature), as well as a unitary bias. Also notice that we did not instantiate a Logistic layer. We’ll leave that to the custom Model object.
Create the model
This is the crux of the guide. We’re going to create a subclass of keras.Model that has a custom training loop, loss function, and gradients.
The loss function will be the negative log likelihood of a target label given
the associated features. The weights and bias that minimize the negative log likelihood are the logistic regression model’s optimal parameters. In other words, we are maximizing the probability that we would observe a target label given the associated features if the target label were modelled by our neural network. This is a common loss function in probabilistic machine learning. In order to maximize the log likelihood of the dataset (equivalently, minimize negative log likelihood) under the neural network model, we need to find the optimal weights and biases. We use some variant of gradient descent to find the optimal values (choose your favourite method). Of course, that requires calculating the gradients of the negative log likelihood loss with respect to the model’s trainable parameters.
Using negative log likelihood as our custom loss function poses some challenges in Keras. First of all, the negative log likelihood loss doesn’t necessarily conform to the signature my_loss_fn(y_true, y_pred) suggested for custom loss functions by the Keras documentation; in our case it is a function of input features and target labels. Thus, we cannot simply define a custom loss function for the negative log likelihood and pass it as the argument to the loss parameter of the Model.fit when we go to train the model. Secondly, not all likelihood-based loss functions will have gradients that can be “magically” solved by the autodiff algorithm, and in that case you’ll need to tell TensorFlow exactly how to calculate the gradients. When the loss function does not conform to Keras’ suggested signature, things get a bit complicated. Finally, it’s common to have some early layers that learn a latent representation of your training examples’ observed features, and if autodiff can be used to solve those layers’ gradients, we’d like to use it, defining custom gradients only when necessary (it’s a lot of work!).
Let’s define a subclass of keras.Model that overcomes these challenges. First I’ll present the code, and then I’ll explain the parts.
Let’s break the model down.
The __init__ method
The model is designed to accept an early block of layers, nn_block, that learns a latent representation of the raw input data. In this guide nn_block will be an instance of keras.layers.Dense, but it can be any keras.layers.Layer object. It can also be omitted entirely if you just want to perform logistic regression on raw input data.
The Model automatically initializes a Logistic layer that outputs a single value (units=1) that expresses the probability of a label given the input features.
Finally, the average per-sample losses are tracked by loss_tracker.
The loss function
The objective here is to define a loss function that allows us to use autodiff to calculate loss gradients with respect to the Dense layer’s trainable parameters, but use a custom-defined gradient calculation for the Logistic layer’s parameters. To do that, we must isolate the part of the loss calculation that involves parameters for the custom gradient from the part that involves the parameters for autodiff. Isolation is achieved by nesting a function for the custom component inside a broader loss function that provides a complete path from model input to model output. Through this path, all layers’ trainable parameters can be optimized.
Let’s focus on the inner function, logistic_loss(x, y), first. The first argument, x, represents the input tensor to the Logistic layer (i.e. the output from the earlier Dense layer). The second argument, y, represents a training example’s true label. This function isolates the part of the loss calculation involving parameters that we want to learn through a custom gradient. The @tf.custom_gradient decorator signals TensorFlow to use custom-defined formulae instead of autodiff to calculate the loss’ gradients with respect to the trainable parameters in the decorator’s scope. Therefore, it is important that a custom gradient be specified for all trainable parameters in the decorator’s scope. This is why we defined logistic_loss in terms of the inputs to the Logistic layer rather than in terms of inputs to earlier layers: we effectively restrict the scope of the decorator to the weights and biases we wish to learn using custom gradients, and leave the rest of the gradient calculations to autodiff.
The custom gradient is defined inside logistic_loss, as required by the
decorator (see the TensorFlow documentation for details).
The outer function, loss_fn, takes raw features and target labels from the training data (or validation/test data) as input. Notice that we did not wrap the outer loss_fn in a tf.custom_gradient decorator. This ensures that autodiff is used to calculate the loss’ gradients with respect to the remaining parameters that are not within the scope of logistic_loss. The outer function returns the negative log likelihood calculated by the inner logistic_loss function.
You may have noticed that only the inputs to the Logistic layer are necessary to calculate the negative log likelihood loss of an input batch. So why go to the trouble of nesting logistic_loss in the outer loss_fn? If you take away the outer function and instead use logistic_loss as the overall loss function, TensorFlow will warn you that the gradients of the loss with respect to the dense layer’s parameters are no longer defined. That’s because we didn’t define them under the @tf.custom_gradient decorator. The logistic layer’s parameters would be trained, but the dense layer’s parameters would not change from their initial values.
The train_step method
This method is used by the Model.fit method to update model parameters and calculate model metrics. It is the component of the custom model object that lets us use the high-level Model.fit method with our custom loss and gradients. The GradientTape context manager tracks all the gradients of the loss_fn, using autodiff where the custom gradient calculation is not used. We access the gradients associated with the trainable parameters with tape.gradient(loss, self.trainable_weights) and tell the optimizer object to use those gradients to optimize the parameters in self.optimizer.apply_gradients(zip(grads, self.trainable_weights)). Finally, the training step updates the loss_tracker with the training batch’s average loss (calculated by loss_fn).
Train the model
We’re almost ready to train the model. First, let’s write a custom callback that prints the trainable weights after each epoch, just to verify that the parameters are being tuned:
The callback will print an array of trainable parameters for each layer in the order they appear in the neural network. The first item in each array will be the layer’s weights, and the second item will be the biases.
Now, let’s go back to the main body of the code and train the model:
model = CustomModel(nn_block=dense, name="nn_logistic_model")
model.compile(optimizer=tf.keras.optimizers.SGD(learning_rate=1e-4))
model.fit(ds, epochs=5, callbacks=[ReportWeightsCallback()])
Now you’re ready to train the whole thing end-to-end! The complete code is included at the end of this guide.
Final thoughts
You may be wondering why we subclassed keras.Model to achieve our goal. It may seem complicated to you. Why not define a loss function with a custom gradient in the Logistic layer, for example, and do away with the custom Model object altogether?
That’s what I tried to do at first, but I realized it wouldn’t work well. By telling the custom Layer to expect both features and target labels as input, one can implement a negative log likelihood loss function with a custom gradient inside the Logistic layer. This works during training when you have access to labels, but it doesn’t work well when you use the model for inference because prediction on unseen data does not involve target labels as input: inputs for inference will not have the same shape as inputs for training.
Using a model solves this problem by allowing the user to create custom training, evaluation, and prediction loops without impacting the shape of the inputs expected by the model’s individual layers. It allows us to access the features and labels necessary to calculate our custom loss during training, and use only the features to make predictions during inference.
The complete code
Sources
Dua D, Graff C. German Credit Data. 2017. UCI Machine Learning Repository. Available from: http://archive.ics.uci.edu/ml.
How to use custom losses with custom gradients in TensorFlow with Keras was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
from Towards Data Science - Medium
https://towardsdatascience.com/how-to-use-custom-losses-with-custom-gradients-in-tensorflow-with-keras-e87f19d13bd5?source=rss----7f60cf5620c9---4
via RiYo Analytics
ليست هناك تعليقات