Neural networks: representation.
This post aims to discuss what a neural network is and how we represent it in a machine learning model. Subsequent posts will cover more advanced topics such as training and optimizing a model, but I've found it's helpful to first have a solid understanding of what it is we're actually building and a comfort with respect to the matrix representation we'll use.
Prerequisites:
- Read my post on logistic regression.
- Be comfortable multiplying matrices together.
Inspiration
Neural networks are a biologically-inspired algorithm that attempt to mimic the functions of neurons in the brain. Each neuron acts as a computational unit, accepting input from the dendrites and outputting signal through the axon terminals. Actions are triggered when a specific combination of neurons are activated.
In essence, the cell acts a function in which we provide input (via the dendrites) and the cell churns out an output (via the axon terminals). The whole idea behind neural networks is finding a way to 1) represent this function and 2) connect neurons together in a useful way.
I found the following two graphics in a lecture on neural networks by Andrea Palazzi that quite nicely compared biological neurons with our computational model of neurons.
To learn more about how neurons are connected and operate together in the brain, check out this video.
A computational model of a neuron
Have you read my post on logistic regression yet? If not, go do that now; I'll wait.
In logistic regression, we composed a linear model
Let's try to visualize that.
The first layer contains a node for each value in our input feature vector. These values are scaled by their corresponding weight,
The input nodes in our network visualization are all connected to a single output node, which consists of a linear combination of all of the inputs. Each connection between nodes contains a parameter,
Comparison to a perceptron unit
Most tutorials will introduce the concept of a neural network with the perceptron, but I've found it's easier to introduce the concept of neural networks by latching onto something familiar (logistic regression). However, for the sake of completeness I'll go ahead and introduce the perceptron unit and note its similarities to the network representation of logistic regression.
The perceptron is the simplest neural unit that we can build. It takes a series of inputs,
We can rewrite the perceptron function by moving the threshold to the left side and we end up with the same linear model used in logistic regression. The weights,
At a high level, they're practically identical - the main difference being the activation function,
Building a network of neurons
The previous model is only capable of binary classification; however, recall that we can perform multi-class classification by building a collection of logistic regression models. Let's extend our "network" to represent this.
Note: While I didn't explicitly show the activation function here, we still use it on each linear combination of inputs. I mainly just wanted to show the connection between the visual representation and matrix form.
Here, we've built three distinct logistic regression models, each with their own set of parameters. Take a moment to make sure you understand this matrix representation. (This is why matrix multiplication is listed as a prerequisite.) It's rather convenient that we can leverage matrix operations as it allows us to perform these calculations quickly and efficiently.
The above example displays the case for multi-class classification on a single example, but we can also extend our input matrix to classify a collection of examples. This is not simply useful, but necessary for our optimization algorithm (in a later post) to learn from all of the examples in an efficient manner when finding the best parameters (more commonly referred to as weights in the neural network community).
Again, go through the matrix multiplications to convince yourself of this.
Although I color coded the weights here for clarity, we'll need to develop a more systematic notation. Notice how the first output neuron uses all of the blue weights, the second output neuron uses all of the green weights, and the third output neuron uses all of the orange weights.
Moving forward, we'll describe our weights more succinctly as a vector
Thus, we can define a weight matrix,
Hidden layers
Up until now, we've been dealing solely with one-layer networks; we feed information into the input layer and observe the results in the output layer. (The input layer often isn't counted as a layer in the neural network.)
The real power of neural networks emerges as we add additional layers to the network. Any layer that is between the input and output layers is known as a hidden layer. Thus, the following example is a neural network with an input layer, one hidden layer, and an output layer.
I'll use the superscript
For example,
and then passing this through our activation function,
Note: Our input vector,
More generally, we can calculate the activation of neuron
Similarly, we can calculate all of the activations for a given layer
In a network, we take the output from one layer and feed it in as the input to our next layer. We can stack as many layers as we want on top of each other. The field of deep learning studies neural network architectures with many hidden layers.
Matrix representation
Let
The activations of a given layer will be a matrix of shape
When I was first learning about neural networks, the trickiest part for me was figuring out what my matrix dimensions needed to be and how to manipulate them to get them into the proper form. I'd recommend doing a couple practice problems to get more comfortable before we continue to talk about training a neural network in my next post.
Feeling like you've got a grasp? Check out this neural network cheat sheet of common architectures.