Neural Network Technology |
Basic Concepts for Neural Networks |
Note: This document is an excerpt from the NeuralystTM User's Guide, Chapter 3.
Let's start by taking a look at a biological neuron. Figure 1 shows such a neuron.
Figure 1. A Biological Neuron
A neuron operates by receiving signals from other neurons through connections, called synapses. The combination of these signals, in excess of a certain threshold or activation level, will result in the neuron firing, that is sending a signal on to other neurons connected to it. Some signals act as excitations and others as inhibitions to a neuron firing. What we call thinking is believed to be the collective effect of the presence or absence of firings in the pattern of synaptic connections between neurons.
This sounds very simplistic until we recognize that there are approximately one hundred billion (100,000,000,000) neurons each connected to as many as one thousand (1,000) others in the human brain. The massive number of neurons and the complexity of their interconnections results in a "thinking machine", your brain.
Each neuron has a body, called the soma. The soma is much like the body of any other cell. It contains the cell nucleus, various bio-chemical factories and other components that support ongoing activity.
Surrounding the soma are dendrites. The dendrites are receptors for signals generated by other neurons. These signals may be excitatory or inhibitory. All signals present at the dendrites of a neuron are combined and the result will determine whether or not that neuron will fire.
If a neuron fires, an electrical impulse is generated. This impulse starts at the base, called the hillock, of a long cellular extension, called the axon, and proceeds down the axon to its ends.
The end of the axon is actually split into multiple ends, called the boutons. The boutons are connected to the dendrites of other neurons and the resulting interconnections are the previously discussed synapses. (Actually, the boutons do not touch the dendrites; there is a small gap between them.) If a neuron has fired, the electrical impulse that has been generated stimulates the boutons and results in electrochemical activity which transmits the signal across the synapses to the receiving dendrites.
At rest, the neuron maintains an electrical potential of about 40-60 millivolts. When a neuron fires, an electrical impulse is created which is the result of a change in potential to about 90-100 millivolts. This impulse travels between 0.5 to 100 meters per second and lasts for about 1 millisecond. Once a neuron fires, it must rest for several milliseconds before it can fire again. In some circumstances, the repetition rate may be as fast as 100 times per second, equivalent to 10 milliseconds per firing.
Compare this to a very fast electronic computer whose signals travel at about 200,000,000 meters per second (speed of light in a wire is 2/3 of that in free air), whose impulses last for 10 nanoseconds and may repeat such an impulse immediately in each succeeding 10 nanoseconds continuously. Electronic computers have at least a 2,000,000 times advantage in signal transmission speed and 1,000,000 times advantage in signal repetition rate.
It is clear that if signal speed or rate were the sole criteria for processing performance, electronic computers would win hands down. What the human brain lacks in these, it makes up in numbers of elements and interconnection complexity between those elements. This difference in structure manifests itself in at least one important way; the human brain is not as quick as an electronic computer at arithmetic, but it is many times faster and hugely more capable at recognition of patterns and perception of relationships.
The human brain differs in another, extremely important, respect beyond speed; it is capable of "self-programming" or adaptation in response to changing external stimuli. In other words, it can learn. The brain has developed ways for neurons to change their response to new stimulus patterns so that similar events may affect future responses. In particular, the sensitivity to new patterns seems more extensive in proportion to their importance to survival or if they are reinforced by repetition.
Neural networks are models of biological neural structures. The starting point for most neural networks is a model neuron, as in Figure 2. This neuron consists of multiple inputs and a single output. Each input is modified by a weight, which multiplies with the input value. The neuron will combine these weighted inputs and, with reference to a threshold value and activation function, use these to determine its output. This behavior follows closely our understanding of how real neurons work.
FIgure 2. A Model Neuron
While there is a fair understanding of how an individual neuron works, there is still a great deal of research and mostly conjecture regarding the way neurons organize themselves and the mechanisms used by arrays of neurons to adapt their behavior to external stimuli. There are a large number of experimental neural network structures currently in use reflecting this state of continuing research.
In our case, we will only describe the structure, mathematics and behavior of that structure known as the backpropagation network. This is the most prevalent and generalized neural network currently in use. If the reader is interested in finding out more about neural networks or other networks, please refer to the material listed in the bibliography.
To build a backpropagation network, proceed in the following fashion. First, take a number of neurons and array them to form a layer. A layer has all its inputs connected to either a preceding layer or the inputs from the external world, but not both within the same layer. A layer has all its outputs connected to either a succeeding layer or the outputs to the external world, but not both within the same layer.
Next, multiple layers are then arrayed one succeeding the other so that there is an input layer, multiple intermediate layers and finally an output layer, as in Figure 3. Intermediate layers, that is those that have no inputs or outputs to the external world, are called >hidden layers. Backpropagation neural networks are usually fully connected. This means that each neuron is connected to every output from the preceding layer or one input from the external world if the neuron is in the first layer and, correspondingly, each neuron has its output connected to every neuron in the succeeding layer.
Figure 3. Backpropagation Network
Generally, the input layer is considered a distributor of the signals from the external world. Hidden layers are considered to be categorizers or feature detectors of such signals. The output layer is considered a collector of the features detected and producer of the response. While this view of the neural network may be helpful in conceptualizing the functions of the layers, you should not take this model too literally as the functions described may not be so specific or localized.
With this picture of how a neural network is constructed, we can now proceed to describe the operation of the network in a meaningful fashion.
The output of each neuron is a function of its inputs. In particular, the output of the jth neuron in any layer is described by two sets of equations:
[Eqn 1]
and
[Eqn 2]
For every neuron, j, in a layer, each of the i inputs, Xi, to that layer is multiplied by a previously established weight, wij. These are all summed together, resulting in the internal value of this operation, Uj. This value is then biased by a previously established threshold value, tj, and sent through an activation function, Fth. This activation function is usually the sigmoid function, which has an input to output mapping as shown in Figure 4. The resulting output, Yj, is an input to the next layer or it is a response of the neural network if it is the last layer. Neuralyst allows other threshold functions to be used in place of the sigmoid described here.
Figure 4. Sigmoid Function
In essence, Equation 1 implements the combination operation of the neuron and Equation 2 implements the firing of the neuron.
From these equations, a predetermined set of weights, a predetermined set of threshold values and a description of the network structure (that is the number of layers and the number of neurons in each layer), it is possible to compute the response of the neural network to any set of inputs. And this is just how Neuralyst goes about producing the response. But how does it learn?
Learning in a neural network is called training. Like training in athletics, training in a neural network requires a coach, someone that describes to the neural network what it should have produced as a response. From the difference between the desired response and the actual response, the error is determined and a portion of it is propagated backward through the network. At each neuron in the network the error is used to adjust the weights and threshold values of the neuron, so that the next time, the error in the network response will be less for the same inputs.
Figure 5. Neuron Weight Adjustment
This corrective procedure is called backpropagation (hence the name of the neural network) and it is applied continuously and repetitively for each set of inputs and corresponding set of outputs produced in response to the inputs. This procedure continues so long as the individual or total errors in the responses exceed a specified level or until there are no measurable errors. At this point, the neural network has learned the training material and you can stop the training process and use the neural network to produce responses to new input data.
[There is some heavier going in the next few paragraphs. Skip ahead if you don't need to understand all the details of neural network learning.]
Backpropagation starts at the output layer with the following equations:
[Eqn 3]
and
[Eqn 4]
For the ith input of the jth neuron in the output layer, the weight wij is adjusted by adding to the previous weight value, w'ij, a term determined by the product of a learning rate, LR, an error term, ej, and the value of the ith input, Xi. The error term, ej, for the jth neuron is determined by the product of the actual output, Yj, its complement, 1 - Yj, and the difference between the desired output, dj, and the actual output.
Once the error terms are computed and weights are adjusted for the output layer, the values are recorded and the next layer back is adjusted. The same weight adjustment process, determined by Equation 3, is followed, but the error term is generated by a slightly modified version of Equation 4. This modification is:
[Eqn 5]
In this version, the difference between the desired output and the actual output is replaced by the sum of the error terms for each neuron, k, in the layer immediately succeeding the layer being processed (remember, we are going backwards through the layers so these terms have already been computed) times the respective pre-adjustment weights.
The learning rate, LR, applies a greater or lesser portion of the respective adjustment to the old weight. If the factor is set to a large value, then the neural network may learn more quickly, but if there is a large variability in the input set then the network may not learn very well or at all. In real terms, setting the learning rate to a large value is analogous to giving a child a spanking, but that is inappropriate and counter-productive to learning if the offense is so simple as forgetting to tie their shoelaces. Usually, it is better to set the factor to a small value and edge it upward if the learning rate seems slow.
In many cases, it is useful to use a revised weight adjustment process. This is described by the equation:
[Eqn 6]
This is similar to Equation 3, with a momentum factor, M, the previous weight, w'ij, and the next to previous weight, w''ij, included in the last term. This extra term allows for momentum in weight adjustment. Momentum basically allows a change to the weights to persist for a number of adjustment cycles. The magnitude of the persistence is controlled by the momentum factor. If the momentum factor is set to 0, then the equation reduces to that of Equation 3. If the momentum factor is increased from 0, then increasingly greater persistence of previous adjustments is allowed in modifying the current adjustment. This can improve the learning rate in some situations, by helping to smooth out unusual conditions in the training set.
[Okay, that's the end of the equations. You can relax again.]
As you train the network, the total error, that is the sum of the errors over all the training sets, will become smaller and smaller. Once the network reduces the total error to the limit set, training may stop. You may then apply the network, using the weights and thresholds as trained.
It is a good idea to set aside some subset of all the inputs available and reserve them for testing the trained network. By comparing the output of a trained network on these test sets to the outputs you know to be correct, you can gain greater confidence in the validity of the training. If you are satisfied at this point, then the neural network is ready for running.
Usually, no backpropagation takes place in this running mode as was done in the training mode. This is because there is often no way to be immediately certain of the desired response. If there were, there would be no need for the processing capabilities of the neural network! Instead, as the validity of the neural network outputs or predictions are verified or contradicted over time, you will either be satisfied with the existing performance or determine a need for new training. In this case, the additional input sets collected since the last training session may be used to extend and improve the training data.