Let's import few basic analytics libraries¶

'''
numpy : is a mathematical computation library. 
matplotlib : A plotting library 
'''
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

We use numpy library to generate input data¶

# First, size of the training set we want to generate.
observations = 2000

# We generate them randomly, drawing from an uniform distribution. There are 3 arguments of this method (low, high, size).
# The size of x0 and z0 is observations by 1. In this case: 2000 x 1.
x0 = np.random.uniform(low=-10, high=10, size=(observations,1))
z0 = np.random.uniform(low=-10, high=10, size =(observations,1))

# Combine  two dimensions of  x0 , z0 into one input matrix. 
# This is the X matrix from the linear model y = x*w + b.
# column_stack is a Numpy method, which combines two vectors into a matrix.
inputs = np.column_stack((x0,z0))

#display the dimensions of input matrix
inputs.shape

(2000, 2)

We generate output using numpy library again¶

# noise is something which isn't in our hands There are 3 arguments of this method (low, high, size).
noise = np.random.uniform(-1, 1, (observations,1))

# Produce the targets according to the f(x,z) = 3x - 4z + 5 + noise definition.
# In this way, we are basically saying: the weights should be 3 and -4, while the bias is 5.
targets = 3*x0 - 4*z0 + 5 + noise

#display the dimensions of input matrix
targets.shape

(2000, 1)

So Now we have Data Ready !¶

Plotting the data¶

'''
To use the 3D plot, the objects should have a certain shape, so we reshape the targets.
The proper method to use is reshape and takes as arguments the dimensions in which we want to fit the object.
'''
targets = targets.reshape(observations,)

# Plotting according to the conventional matplotlib.pyplot syntax
# Declare the figure
fig = plt.figure()

# A method allowing us to create the 3D plot
ax = fig.add_subplot(111, projection='3d')

# Choose the axes.
ax.plot(x0, z0, targets)

# Set labels
ax.set_xlabel('x0')
ax.set_ylabel('z0')
ax.set_zlabel('Our Targets')

# You can fiddle with the azim parameter to plot the data from different angles. Just change the value of azim=100
# to azim = 0 ; azim = 200, or whatever. Check and see what happens.
ax.view_init(azim=100)

# So far we were just describing the plot. This method actually shows the plot. 
plt.show()

''' 
We reshape the targets back to the shape that they were in before plotting.
This reshaping is a side-effect of the 3D plot. Sorry for that.
'''
targets = targets.reshape(observations,1)

'''
We will initialize the weights and biases randomly in some small initial range.
init_range is the variable that will measure that.
You can play around with the initial range, but we don't really encourage you to do so.
High initial ranges may prevent the machine learning algorithm from learning.
'''
init_range = 0.1

# Weights are of size k x m, where k is the number of input variables and m is the number of output variables
# In our case, the weights matrix is 2x1 since there are 2 inputs (x and z) and one output (y)
weights = np.random.uniform(low=-init_range, high=init_range, size=(2, 1))

# Biases are of size 1 since there is only 1 output. The bias is a scalar.
biases = np.random.uniform(low=-init_range, high=init_range, size=1)

#Print the weights to get a sense of how they were initialized.
print (weights)
print (biases)

[[-0.07174342]
 [-0.03948961]]
[0.04328575]

'''
Set some small learning rate (denoted eta in the lecture). 
0.02 is going to work quite well for our example. Once again, you can play around with it.
it is HIGHLY recommended that you play around with it.
'''
learning_rate = 0.02

The real machine learning logic !¶

'''
We iterate over our training dataset 100 times. That works well with a learning rate of 0.02.
The proper number of iterations is something we will talk about later on, but generally
a lower learning rate would need more iterations, while a higher learning rate would need less iterations
keep in mind that a high learning rate may cause the loss to diverge to infinity, instead of converge to 0.
'''
for i in range (100):
    
    # This is the linear model: y = xw + b equation
    outputs = np.dot(inputs,weights) + biases
    # The deltas are the differences between the outputs and the targets
    # Note that deltas here is a vector 1000 x 1
    deltas = outputs - targets
        
    # We are considering the L2-norm loss, but divided by 2, so it is consistent with the lectures.
    # Moreover, we further divide it by the number of observations.
    # This is simple rescaling by a constant. We explained that this doesn't change the optimization logic,
    # as any function holding the basic property of being lower for better results, and higher for worse results
    # can be a loss function.
    loss = np.sum(deltas ** 2) / 2 / observations
    
    # We print the loss function value at each step so we can observe whether it is decreasing as desired.
    print ("Loss :" , loss)
    
    # Another small trick is to scale the deltas the same way as the loss function
    # In this way our learning rate is independent of the number of samples (observations).
    # Again, this doesn't change anything in principle, it simply makes it easier to pick a single learning rate
    # that can remain the same if we change the number of training samples (observations).
    # You can try solving the problem without rescaling to see how that works for you.
    deltas_scaled = deltas / observations
    
    # Finally, we must apply the gradient descent update rules from the relevant lecture.
    # The weights are 2x1, learning rate is 1x1 (scalar), inputs are 1000x2, and deltas_scaled are 1000x1
    # We must transpose the inputs so that we get an allowed operation.
    weights = weights - learning_rate * np.dot(inputs.T,deltas_scaled)
    biases = biases - learning_rate * np.sum(deltas_scaled)
    
    # The weights are updated in a linear algebraic way (a matrix minus another matrix)
    # The biases, however, are just a single number here, so we must transform the deltas into a scalar.
    # The two lines are both consistent with the gradient descent methodology.

Loss : 423.2945063242378
Loss : 60.653335178432854
Loss : 17.27979574342842
Loss : 11.745259809852493
Loss : 10.710976877352401
Loss : 10.225241417989942
Loss : 9.818943143794698
Loss : 9.43587488705734
Loss : 9.068821565954527
Loss : 8.71640199143762
Loss : 8.377948071287504
Loss : 8.052896319671238
Loss : 7.740714840494926
Loss : 7.440893899477952
Loss : 7.152944075644615
Loss : 6.876395341708037
Loss : 6.610796282495929
Loss : 6.355713356394545
Loss : 6.1107301874693345
Loss : 5.8754468857838
Loss : 5.6494793946499176
Loss : 5.432458863725935
Loss : 5.22403104693603
Loss : 5.02385572422879
Loss : 4.831606146230568
Loss : 4.646968500887297
Loss : 4.4696414012240595
Loss : 4.299335393386301
Loss : 4.135772484159697
Loss : 3.9786856871973915
Loss : 3.8278185872139345
Loss : 3.6829249214344917
Loss : 3.5437681776162147
Loss : 3.4101212079855183
Loss : 3.2817658584611435
Loss : 3.1584926125577795
Loss : 3.0401002493889537
Loss : 2.9263955152109737
Loss : 2.8171928079717468
Loss : 2.7123138743496034
Loss : 2.611587518787536
Loss : 2.514849324047986
Loss : 2.4219413828319407
Loss : 2.3327120400243277
Loss : 2.247015645144976
Loss : 2.164712314600985
Loss : 2.0856677033525175
Loss : 2.009752785619247
Loss : 1.9368436442695125
Loss : 1.8668212685484158
Loss : 1.7995713598146512
Loss : 1.7349841449690258
Loss : 1.6729541972700697
Loss : 1.6133802642442843
Loss : 1.5561651024101193
Loss : 1.501215318545878
Loss : 1.448441217242469
Loss : 1.397756654492177
Loss : 1.349078897074402
Loss : 1.3023284875089167
Loss : 1.2574291143561482
Loss : 1.2143074876527964
Loss : 1.1728932192794639
Loss : 1.1331187080649991
Loss : 1.0949190294400315
Loss : 1.05823182945955
Loss : 1.022997223021577
Loss : 0.9891576961157555
Loss : 0.9566580119423045
Loss : 0.9254451207481239
Loss : 0.8954680732328245
Loss : 0.8666779373833728
Loss : 0.8390277186015948
Loss : 0.8124722829941318
Loss : 0.7869682836996729
Loss : 0.7624740901331799
Loss : 0.7389497200316153
Loss : 0.7163567741902646
Loss : 0.6946583737830941
Loss : 0.6738191001648581
Loss : 0.6538049370566773
Loss : 0.6345832150207275
Loss : 0.6161225581333903
Loss : 0.5983928327698345
Loss : 0.5813650984164211
Loss : 0.5650115604306348
Loss : 0.5493055246714571
Loss : 0.53422135392608
Loss : 0.5197344260618847
Loss : 0.5058210938353374
Loss : 0.49245864629221614
Loss : 0.47962527169616187
Loss : 0.46730002192502473
Loss : 0.45546277827691917
Loss : 0.44409421863014176
Loss : 0.4331757859033612
Loss : 0.4226896577646019
Loss : 0.4126187175395678
Loss : 0.4029465262718161
Loss : 0.3936572958891957

'''
We print the weights and the biases, so we can see if they have converged to what we wanted.
When declared the targets, following the f(x,z), we knew the weights should be 3 and -4, while the bias: 5.
'''
print (weights, biases)

[[ 3.00189141]
 [-3.99775609]] [4.33826281]

'''
We print the outputs and the targets in order to see if they have a linear relationship.
Again, that's not needed. Moreover, in later lectures, that would not even be possible.
'''
plt.plot(outputs,targets)
plt.xlabel('outputs')
plt.ylabel('targets')
plt.show()

First Machine learning Program From Scratch ( NO ML Libraries used )

How Do You Start Machine Learning in Python?

We Need

Problem Statement:

Let's Begin !

Let's import few basic analytics libraries¶

We use numpy library to generate input data¶

We generate output using numpy library again¶

So Now we have Data Ready !¶

Plotting the data¶

The real machine learning logic !¶

PHEW , WE DID IT !!!¶

Click HERE to know how !

Thanks For Reading !

Recent Posts