Perceptrons

In [1]:
from IPython.display import Image
from IPython.core.display import HTML 

Perceptrons

Among the possible models for Artificial Neural Network (ANN), the most simple one is called \textbf{perceptron}. A perceptron is a ANN made by one neuron that receive the input vector $(x_0,\dots,x_n)$ and linearly maps it to a binary output. A perceptron is able to learn a linear map. Bellow, we will give the example of a linear classifier.

In the first step of the computation of the output generated by a perceptron, we compute the dot product of the vector of weigths $w = (w_0,\dots,w_n)$ and of the input vector $x = (x_0,\dots,x_n)$ to obtain $z$

$$ z = w^T x. $$

Tipically $x_0=1$ because $w_0$ represents the intersection value. This value $z$ is passed through a activation function $\phi$. In the case of the perceptron the activation function is just a step function

$$ \phi(z) =

\begin{cases} 0, \text{ if }z<0 \\ \\ 1, \text{ otherwise. } \end{cases} $$

In the trainning set, the output is then compared with the correct value and a correction on the entries of vector $w$ is then produced. The graphical representation of this process is shown bellow. (\textbf{Image credits:} Rajalingappaa Shanmugamani and Rajesh Arumugam, \textit{Hands-On Natural Language Processing with Python}, Packt Publishing, 2018.)

In [2]:
Image(filename = "perceptron001.png", width=600)
Out[2]:

The update of the weights may be done online or through a batch. In the code bellow, the update is done after each output computation of the training set. To update the weigths we need a \textbf{loss function}. As usual we will choose the square error. It goes like this: for each input vector $x^i =(x_0^i,\dots,x_n^i)$ in the training set we will have two values. One will be $o^i$, the output of the perceptron, the other $y^i$, the correct value for the output, as stored in the training set. The (online version) of the loss function is

\begin{equation}\label{eq:perceptron.10} L(w)=\frac{(y^i-o^i)^2}{2} = \frac{(y^i-w^T x^i)^2}{2}. \end{equation}

From (\ref{eq:perceptron.10}) we can deduce the value of each partial derivative

\begin{equation}\label{eq:perceptron.20} \frac{\partial L}{\partial w_j} = -x_j^i(y^i-w^Tx^i) = -\Delta w_j. \end{equation}

If $\eta$ is the learning rate and $\Delta w = (\Delta w_0,\dots,\Delta w_n)$, at each iteration over the training set, we update the entries of the weights vector by

\begin{equation}\label{eq:perceptron.30} w_{new} = w_{old} + \eta \Delta w. \end{equation}

This process is reproduced in the code bellow.

In [3]:
import numpy as np
import matplotlib.pyplot as plt
In [4]:
plt.rcParams['figure.figsize'] = (15.0, 10.0)

First of all, we create a general linear classifier in $2$D.

In [5]:
def linearclassifier(in_array , bias , slope):
    
    # as our data the vector are columns
    out = np.zeros(in_array.shape[0])
    vector_a = np.array([-bias , -slope , 1])
    
    for i in range(out.size):
        vector_b = np.array([1,in_array[i,0],in_array[i,1]])
        if (np.dot(vector_a , vector_b) > 0):
            out[i] = 1
        else :
            out[i] = 0 
            
    return out

We randomly generate a set of points in the plane and use the linear classifier to produce the classification of each.

In [6]:
npts = 10000
upperlimit = 10
lowerlimit = 0

data = np.random.uniform(lowerlimit , upperlimit , 2*npts).reshape(npts , 2)

bias = 1
slope = 1.4

classificadores = linearclassifier(data , bias , slope).reshape(npts,1)
data = np.append(data,classificadores , axis = 1)

data[0:5,:]
Out[6]:
array([[8.58030332, 6.07880956, 0.        ],
       [9.33337098, 3.30938296, 0.        ],
       [2.60091698, 9.71696539, 1.        ],
       [5.44354036, 4.2388211 , 0.        ],
       [8.45717966, 8.8165607 , 0.        ]])
In [7]:
plt.scatter(data[:,0],data[:,1], s = 9 , c = data[:,2])
plt.show()

This is just the definition of a step function.

In [8]:
def step_activation(weigths , in_put):
    
    produto = np.dot(weigths , in_put)
    out = 0
    
    if (produto > 0):
        out = 1
        
    return out

Now we train the percetron in order to fit the values of the vector $w$ from the data generated previously.

In [9]:
# online algorithm of perceptron learning
weigths = np.zeros(3)
dweights = np.zeros_like(weigths)

nepochs = 2
train_size = data.shape[0]
learning_rate = .01

train_score = np.zeros(nepochs)

for j in range(nepochs):
    learning_rate *= .9999
    for i in range(train_size):
        vector_a = np.array([1,data[i,0],data[i,1]])
        hatout = step_activation(weigths , vector_a)
        
        if(hatout == data[i,2]):
            train_score[j] += 1
        
        diff = learning_rate * (data[i,2] - hatout)
        dweigths = diff * vector_a
        
        weigths += dweigths

    train_score[j] = train_score[j]/train_size

print(weigths/weigths[2])
print(train_score)
[-0.9283906 -1.4031406  1.       ]
[0.968  0.9859]

Finally, we generate a testing set and compute the score over this set.

In [10]:
# generate the test data
npts_teste = 500

teste_data = np.random.uniform(lowerlimit , upperlimit , 2 * npts_teste).reshape(npts_teste,2)
teste_classification = linearclassifier(teste_data, bias , slope).reshape(npts_teste,1)

teste_data = np.append(teste_data,teste_classification,axis = 1)
In [11]:
# evaluate the test_data through the perceptron
score = 0

for i in range(npts_teste):
    vector_a = np.array([1,teste_data[i,0],teste_data[i,1]])
    
    hatout = step_activation(weigths , vector_a)
    
    if (hatout == teste_data[i,2]):
        score += 1
    else :
        teste_data[i,2] = 2
    
print('The classification failed in %d points.\n' %(npts_teste - score))
score = score /npts_teste
print(score)
The classification failed in 4 points.

0.992

We scatter the points from the testing set. There will be three colors, two for the elements of each class correctly classified and a third for the elements with a wrong classification.

In [12]:
toptouch = (upperlimit - bias)/slope

xx = np.linspace(0,toptouch,2)
yy = (-weigths[0] - weigths[1] * xx)/weigths[2]
plt.scatter(teste_data[:,0],teste_data[:,1],s = 15*(teste_data[:,2]+1) , c=teste_data[:,2])
plt.plot(xx,yy)
plt.show()

Sigmoid Neurons

One of the problems of having the step function as activation function comes from the fact that it is not differentiable. As consequence, one small change in the weigths vector $w$ may produce a large change in the output of the perceptron. One way to try to improve this situation is to replace the step function by some differentiable function. One possible choice is the \textbf{logistic function}

$$\label{eq:perceptron.40} f(x) = \frac{1}{1+e^{-x}} = \frac{e^x}{1+e^{x}}, $$

whose derivative is given by

\begin{equation}\label{eq:perceptron.50} f'(x) = \frac{e^x(1+e^{-x})-e^{2x}}{(1+e^{-x})^2}=f(x)(1-f(x)). \end{equation}

The loss function in (\ref{eq:perceptron.10}) will be

\begin{equation}\label{eq:perceptron.60} L(w)=\frac{(y^i-o^i)^2}{2} = \frac{(y^i-f(w^T x^i))^2}{2}. \end{equation}

From (\ref{eq:perceptron.50}), the partial derivatives in its turn will be given by

\begin{equation}\label{eq:70} \frac{\partial L}{\partial w_j} = -x_j^i f(w^Tx^i)(1-f(w^Tx^i)) (y^i-w^Tx^i) = -\Delta w_j. \end{equation}

The code bellow implements this idea.

In [13]:
def logistic_function (x):
    
    out = 1.0/(1.0 + np.exp(-x))
    return out

def logistic_activation(weigths , in_put):
    
    resultado = logistic_function(np.dot(weigths , in_put))
    out = 0
    
    if (resultado > 0.5):
        out = 1
        
    return out
In [14]:
# online algorithm of perceptron learning
weigths = np.zeros(3)
dweigths = np.zeros_like(weigths)

nepochs = 1
train_size = data.shape[0]
learning_rate = .01

train_score = np.zeros(nepochs)

for j in range(nepochs):
    learning_rate *= .9999
    for i in range(train_size):
        vector_a = np.array([1,data[i,0],data[i,1]])

        hatout = logistic_activation(weigths , vector_a)
        
        if(hatout == data[i,2]):
            train_score[j] += 1

        value_a = np.dot(vector_a , weigths)
        value_b = logistic_function(value_a) * (1 - logistic_function(value_a))
        
        diff = learning_rate * (data[i,2] - hatout) * value_b
        dweigths = diff * vector_a
        
        weigths += dweigths

    train_score[j] = train_score[j]/train_size

print(weigths/weigths[2])
print(train_score)
[-0.96896032 -1.44217818  1.        ]
[0.9703]
In [15]:
# evaluate the test_data through the perceptron
score = 0

for i in range(npts_teste):
    vector_a = np.array([1,teste_data[i,0],teste_data[i,1]])
    
    hatout = logistic_activation(weigths , vector_a)
    
    if (hatout == teste_data[i,2]):
        score += 1
    else :
        teste_data[i,2] = 2
    
print('The classification failed in %d points.\n' %(npts_teste - score))
score = score /npts_teste
print(score)
The classification failed in 7 points.

0.986
In [16]:
toptouch = (upperlimit - bias)/slope

xx = np.linspace(0,toptouch,2)
yy = (-weigths[0] - weigths[1] * xx)/weigths[2]
plt.scatter(teste_data[:,0],teste_data[:,1],s = 15*(teste_data[:,2]+1) , c=teste_data[:,2])
plt.plot(xx,yy)
plt.show()