Artificial Neural Networks (ANN) are computational models inspired from one of nature’s most splendid creations – the neuron. It seems our quest to make the machines smarter has converged onto the realization that we ought to code the ‘smartness’ into them, literally. What better way than to draw parallels from the source of our own intelligence, our brains?
Contrary to our expectations, the field of computational neuroscience has existed for a significant period of time – with it’s origins dating back to 1943 when the foundational research paper ‘A Logical Calculus of Ideas Immanent in Nervous Activity’  detailing the McCulloch – Pitts Neuron Model was published. However, this sphere has witnessed much advancements in recent times owing to increased emphasis and focus on Artificial Intelligence and its related domains.
BASICS AND STRUCTURE
ANNs operate by adaptively learning from the data that has been provided. In a majority of use-cases, they are subjected to a supervised learning approach. However, they find application in the unsupervised machine learning domain as well, such as Self-Organizing Maps . Generally, we divide the available dataset, over which the neural network will operate, into 2 distinct partitions. One is called the ‘Training Set’ and the other as the ‘Testing Set’. The neural network ‘trains’ on the Training Set which can be described as follows:
Thus, it is a set of observations where-in each observation consists of a datum value (or a group of data values) and a mapped value produced in accordance to a function. An example could be the marks obtained in different subjects by each student of a class in a semester and their respective percentages.
The ANN learns the implicit mapping of the data nodes to the corresponding outputs by making adjustments to its internal parameters (we’ll learn about them shortly). Once the training has concluded, the Network then operates on the Testing Set and it’s performance is measured. But the ever pertinent question still remains: what really comprises the internal structure of the network that makes it tick? Let’s get to it then!
Let’s meet our good friend, the NN node, a.k.a. the Artificial Neuron or more famously, the Perceptron.
Every neural node is composed of the following components:
- Set of weights
- Bias value
- Processing function
- Activation function
All input values into the neuron are subject to multiplication by respective weight values. The set of weights for the neuron is crucial as it imparts adaptability and the ‘learning’ nature of the network can be ascribed to it’s presence. A bias value is often introduced to enable us to offset the threshold value for the activation function.
Once we obtain the weighted inputs, they are reduced to a single value subject to the processing function, such as a simple adder.
The processed value so obtained is then supplied to an activation function. If the processed input surpasses a certain specified threshold value, the function produces the desired output.
Commonly, the Sigmoid function (Logistic function) is utilized here as it allows partial-derivative based calculations to be made for crucial network refinements such as Back Propagation.
Neural Networks are composed of such nodes stacked together in 2 or more layers. A two layer network simply consists of an Input and an Output layer. If we have three or more layers, then it’s called a Multi-Layer-Perceptron (MLP) network. All the layers other than the Input and the Output layers are called Hidden Layers as their workings and connections are not ‘visible’ and hence can’t be guessed by simply looking at the inputs and the outputs. They can majorly be either simple Feed-Forward Networks if they allow data flow in one direction and do not have directed cycles or Recurrent Networks if they do consist of directed cycles (loops).
Let’s Get Our Hands Dirty: A Python-based DIY example
Now that I’ve satiated your craving for some ANN theory, let’s head out and implement one for ourselves too!
- Python 2.7 : Familiarity with the language and it’s constructs, including Object Oriented programming concepts is required.
- Pybrain v. 0.3 : No prior experience is necessary.
I) The Prep
You must install the Python distribution version 2.7.x. where x refers to the latest release build which may vary. At the time of writing this post, the build version is 2.7.10.
Python Distribution Download Link
For PyBrain, you must install it via the instructions specified at the following link (varies according to the choice of operating system):
PyBrain Installation Instructions Link
For further documentation and experimentation, you may visit the following URL:
II) Why PyBrain?
Before we head out on our mission, I owe the reader an explanation for the choice of the package. Despite there existing multiple packages for ANNs in python, PyBrain is particularly easy to code and lends itself to lucid demonstration. Others may allow for advanced operations such as GPU utilization or for building more complex networks, but for us, PyBrain would suffice. You’re encouraged to experiment not just with PyBrain but with any other package of your liking.
III) The Objective
Through this example, we’ll form a 3 layer feed-forward ANN for performing a classic non-linear classification task of capturing the functionality of XOR gate, but with only the training set as the learning reference and then producing the output over a sequence of test inputs.
IV) PyBrain Concepts
PyBrain treats the different algorithms it uses as Modules. We create Network objects and add Layer objects to them. Subsequently, we establish Connections between the layers.
To create a standard 3-layer MLP network, you can refer to the code available on the following GitHub link:
Alternately, instead of setting up all the layers manually, we make use of a method called the buildNetwork() method that allows us to conveniently setup the network.
V) The Actual Implementation
Thus, we’ll setup a 3 layer MLP Network using the buildNetwork() method. The SupervisedDataSet() method helps to specify the training data format, with the number of inputs and target output values it can expect.
The training set would consist of 1,000 observations derived from our data model for XOR function. The values despite being randomly selected, would of course be repeated (since only 4 possible input combinations are available). But the training of the network requires exposure to maximum number of validated classification observations as possible.
We’ll utilize Back-Propagation technique that updates the weights based on an error/cost function that tells how far off the current output result of the network is from the actual value. It’s implementation involves the calculation of partial derivatives. The method BackpropTrainer() along with it’s parameters will be utilized for this purpose. the learning rate and momentum are parameters that control how quickly and properly we’ll step through and converge to our desired weights in the training phase. We’ll train the network until the best possible fit to the function is obtained (convergence).
For further information about BackPropagation, please refer to the following blogpost entry for further reading:
Back-Propagation Explanation Link
Note here an Epoch refers to one complete iteration when Neural Network in being trained.Let’s look at the final code that we can run and verify for ourselves the power of neural networks.
Note: The output can potentially vary owing to the internal workings of the PyBrain module. You may have to run the program again in such a scenario.
VI) Interpreting the Results
We’ve obtained the right values for the inputs specified in accordance to the standard XOR implementation. Here, we’ve rounded off the values to the nearest integer so as to obtain clarity about the results and not get intimidated by the real values typically generated. It may not appear a major feat, but consider the fact that the network did not know how the XOR function really operates:
Hence just be looking at the input/output combinations, it can approximate the function behaviour. Think of situations where we do not know what the mapping might be. Isn’t it fascinating that we can approximate the function and understand how the trend is evolving? Certainly a mind-blowing technique for empirical studies and beyond.
ADVANCEMENTS AND APPLICATIONS
ANNs have found widespread acceptance and have heralded a quiet revolution in contemporary times. The field witnessed a resurgence in 1980s with the introduction of techniques such as Back-Propagation by Rumelhart, Hinton and Williams . Since then, they’ve transcended all the traditional use cases in Pattern Recognition and Linear Classification and are at the forefront of research today.
Google has become one of the most prominent and prolific innovators to use ANNs on a large scale and in varied applications. From hand-writing recognition , to a smart and automated email responder , Google has successfully transformed the technology into a massively scaled and enterprise-wide architecture. One of the most creative uses has been found with the ‘Google Deep Inceptionism’ Project  that deployed a trained ANN to produce images of purely it’s own determination from random noise images.
Another novel application can be witnessed in the research paper authored by Ralf Der and George Martius  as they used ANN to model and develop Sensorimotor Intelligence and autonomous behaviour in robot test cases and possibly explain evolution of such mechanisms in nature itself.
It’s imperative for the computer scientists of today to be well-versed with the nuances and techniques of Machine Learning. The realm of data is increasing exponentially. From abundance, we’re transitioning into a state of profusion. In such a scenario, the ability to make sense and extricate information out of this heterogenous mass is a valuable skill to hone. Neural Networks are an indispensable addition to your arsenal. Yet they are capable of so much more! Hence I wish your curiosity has been piqued through my words. I hope you’ll explore this domain further to harness the true potential of ANNs.
 McCulloch, Warren S., and Walter Pitts. “A logical calculus of the ideas immanent in nervous activity.” The bulletin of mathematical biophysics 5.4 (1943): 115-133.
 Kohonen, Teuvo. “The self-organizing map.” Neurocomputing 21.1 (1998): 1-6.
 Weisstein, Eric W. “Sigmoid Function.” From MathWorld –A Wolfram Web Resource. The URL
 Williams, DE Rumelhart GE Hinton RJ, and G. E. Hinton. “Learning representations by back-propagating errors.” Nature (1986): 323-533.
Ralf Der and Georg Martius, Novel plasticity rule can explain the development of sensorimotor intelligence, PNAS 2015 : 1508400112v1-201508400.