Neural
Network
Technology

Neuralyst^TM Implementation Details

Last modified August 20, 1996 by Ross Berteig.

Input Scaling
Forward Calculation
Backpropagation Calculation
Output Scaling
References
Release History
Legal Notices

1. Input Scaling

Each input value (from both Input and Target columns) is scaled from the user's coordinate system into an internal coordinate system with values ranging from 0 to 1. The internal coordinates leave headroom for the scale margin and noise as requested by the Neural | Set Network Parameters... and Neural | Set Enhanced Parameters... dialogs.

1.1 Symbolic Operation

User defined symbols are assigned sequential integer values. In the Min row, the values begin at 0 for the first symbol. In the MAX row, the values begin at 1 for the first symbol. In the TEST and TRAIN rows, the values begin at 0.5.

That is, if the SYMBOL row contains "A,B,C", then A is translated to 0.5, B to 1.5, and C to 2.5 on all TEST and TRAIN rows. The special treatment of the MIN and MAX rows causes each symbol to be represented as the center point of a range of values.

1.2 Scale Margin

The Scale Margin reserves some headroom in the scaling calculation for inputs which exceed the MIN and MAX row values. It represents the percentage of the range to reserve. The reserved headroom is split between the low and high ends of the range.

For example, if the MIN is 0 and the MAX is 10, then a Scale Margin of 0.1 (10%) causes the scale calculation to permit inputs ranging from -0.5 to 10.5 (5% of the range is added above and below).

1.3 Noise

For some data sets, it is important to inject some noise during training. If this is happening, then the input scaling process must leave room for the noise to be added.

The Noise parameter is specified as ranging from 0 (no noise) to 1 (lots of noise). Internally, this parameter is used to derive a noise scale which represents the headroom reserved for noise addition.

The internal noise scale is calculated as follows:

dNoise = (2 * Noise) / (1 + 2 * Noise);

This results in an internal value dNoise which ranges from 0 to 2/3 as the Noise parameter ranges from 0 to 1.

1.4 Input Scale Calculation

The input scale calculation is shown below as performed for a single input value. In practice, each value found in an Input column must be scaled as described here individually.

Values in Target columns are scaled similarly, with the small difference that the Noise Parameter does not apply to target values.

In the following formulae, Input or Target are the user's input or target value, InMin is the user's MIN row value, InMax is the user's MAX row value, Margin is the Scale Margin Parameter, Noise is the Noise Parameter. and X or T are the final scaled input or target value, In addition, the following intermediate values make the calculation much easier to represent: dNoise is the noise scale value, dMin is the min value, compensated by scale margin, dMax is the max value, compensated by scale margin, and dRange is the internal range value.

For Input Columns:

dNoise = (2 * Noise) / (1 + 2 * Noise);
dMin = InMin - (Margin / 2) * (InMax - InMin);
dMax = InMax + (Margin / 2) * (InMax - InMin);
dRange = dMax - dMin;
X = (1 - dNoise) * (Input - dMin) / dRange + dNoise / 2;

For Target Columns:

dMin = InMin - (Margin / 2) * (InMax - InMin);
dMax = InMax + (Margin / 2) * (InMax - InMin);
dRange = dMax - dMin;
T = (Target - dMin) / dRange;

2. Forward Calculation

For a clearer description of the mathematics, please refer to Chapter 3 of the Neuralyst User's Guide.

2.1 Weights

Each neuron of a layer of the network has a vector of weighting values to be multiplied by the outputs of the previous layer. These weight values are stored in the working area of a Neuralyst spreadsheet with eight values per row, on as many rows as are required. The Neural | Unpack Weights command will unpack that weight array into a representation showing the vector for each neuron individually. Notice that there is one more weight for each neuron than there are neurons in the preceding layer: the last weight for each neuron is the threshold value.

In the formulae that follow, W_i,j represents the weight applied to input i of neuron j and th_j represents the threshold value applied to neuron j.

2.2 Activation Function

Each neuron is implemented as a combination of a dot product of the input and weight vectors and an activation function. The activation function serves to introduce a non-linear response characteristic to the neuron. It also forces the output of the neuron to be restricted to the range 0 to 1.

The usual activation function is the sigmoid, which has the following form:

y = (1.0 / (1.0 + exp(-dScale * x)));

Where x is the input value, dScale is the value of the Gain parameter set in the Neural | Set Enhanced Parameters... dialog box, exp() represents the exponential function, and y is the output value.

The rest of the available activation functions have the following representations:

Augmented Ratio of Squares: tmp = (dScale * x) * (dScale * x); y = tmp / (1.0 + tmp);
Gaussian: tmp = 0.5 * dScale * x; y = exp(-0.5 * tmp * tmp);
Linear: If x < -10 / dScale Then y = 0; If x > 10 / dScale Then y = 1; Otherwise y = 0.5 + x * dScale / 20;
Sigmoid: y = (1.0 / (1.0 + exp(-dScale * x)));
Step: If x < 0 Then y = 0 If x = 0 Then y = 0.5 BR> If x > 0 Then y = 1

In the formulae that follow, Activation() will represent a call to the activation function selected in the Neural | Set Enhanced Parameters... dialog box.

2.3 Single Neuron Forward

The following formulae are written in terms of the following values:

X_i: either i-th input for the first hidden layer, or is the i-th output of the previous layer for all other layers.
W_i,j: weight associated with input i of neuron j.
th_j: threshold value associated with neuron j.
Y_j: output of Neuron j in the current layer.

The formula for calculation of the output value of a single neuron is then:

Y_j = Activation( th_j + Sum over i ( W_i,j * X_i ) )

2.4 Forward over the whole Network

Beginning with the first neuron of the first hidden layer, the calculation for each neuron is carried out in turn.

The first hidden layer takes the scaled user input values as the outputs of the input layer.

The outputs of the output layer are subsequently scaled and presented to the user as the computed outputs of the network.

The Neural | Run/Predict with Network command performs the complete forward calculation, scales the results, and updates the Output columns for each TEST and TRAIN row of the sheet.

3.0 Backpropagation Calculation

Training is performed by propagating errors backwards from the output layer through to the first hidden layer, as modifications to the weights for each layer.

3.1 Output Layer Error

For the output layer, the error signal e_j for the j-th neuron is computed from the scaled target value T_j, and the actual output value Y_j as follows:

e_j = Y_j * (1 - Y_j) * (T_j - Y_j)

3.2 Hidden Layer Error

For the hidden layers, the error signal e_j for the j-th neuron is computed from the error signals e_k and the prior weights W'_j,k of the k-th neuron in the immediately succeeding layer and the actual output value Y_j as follows:

e_j = Y_j * (1 - Y_j) * Sum over k (e_k * W'_j,k)

3.3 Updating Weights

The error signals are applied to the weight W_i,j for the i-th input to the j-th neuron by adding terms computed from the Learning Rate Parameter LR, the Momentum Parameter M, the error signal e_j, the actual input value to the neuron X_i, the previous weight value W'_i,j, and the next previous weight value W''_i,j. The formula is as follows:

W_i,j = W'_i,j + (1 - M) * LR * e_j * X_i + M * (W'_i,j - W''_i,j)

Note that the Momentum M may never have the value 1, or all learning will halt.

In the special case where the Momentum is 0, this formula may be simplified:

W_i,j = W'_i,j + LR * e_j * X_i

In practice, this backpropagation calculation is done for each TRAIN row in the sheet, updating the weights with each row. A single complete pass through all rows is called an epoch. For typical network applications, many hundreds of epochs will be required for complete training.

4. Output Scaling

If the current activation function is any other than the Hyperbolic Tangent, then the scaled output value Output is computed as follows:

Output = Y * dRange + dMin;

For the case of the Hyperbolic Tangent activation function, the scaled output value Output is computed as follows:

Output = ((Y + 1.0) / 2.0) * dRange + dMin;

The scaled output is then translated into a symbol for those columns which have symbols defined in the corresponding Target column by looking up the symbol which is closest to the scaled output.

5. References

Unfortunately, much of what is documented here is only described in development documentation and the source code itself. However, the Neuralyst User's Guide does have a good overview of the forward and backpropagation calculations in Chapter 3, and of input and output scaling in Chapter 6.

6. Release History

August 1996: Initial Release

7. Legal Notices

The information in this document is subject to change without notice and should not be construed as a commitment by Cheshire Engineering Corporation. Cheshire Engineering Corporation assumes no responsibility for errors that may appear in this document.

The EPIC logo and Neuralyst^TM are trademarks of EPIC Systems Corporation licensed to Cheshire Engineering Corporation.

Cheshire Engineering Corporation
120 West Olive Avenue
Monrovia, California 91016
+1 626 303 1602 Neuralyst Sales
+1 626 303 1602 Customer Service and Support
+1 626 303 1590 FAX
http://www.CheshireEng.com/

EMAIL to Neuralyst@CheshireEng.com.

NeuralystTM Implementation Details

Contents