Linear Regression models the linear relationship between between independant variable(s) x & dependendant variable y .
If there is one independant variable x, it is called Simple Linear Regression.
Above is the simple linear regression formula.
w is 'weight'
b is 'bias'
If we find the value of w and b correctly, we successfully modeled the relationship betwen x and y.
There can be multiple independant variables.
We call that Multiple Linear Regression.
Above is the multiple linear regression formula.
We make hypothesis to assume the relationship between x and y.
Below is the hypothesis formula for linear regression.
The goal is to find w and b that express the rule the most.
In machine learning, we make formula that calculates the error of predicted value from the reality.
That is called
Then, we find w and b value that has the least error from the function above.
Therefore, the formula above should not only represent the value, but also should be optimized for lessening the error.
For regression, we usually use Mean Squared Error (MSE).
Below is the formula.
Below is re-defining MSE with cost function by w and b
If we have less errors, MSE also gets lower.
If we find w and b that makes Cost(w,b) 's value the least, we can find line that shows the relationship between x and y the best.
Machine Learning, Deep Learning (including linear regression) performs the task to find w and b to minimize the cost function.
We call the algorithm we use for that as 'Optimizer' or 'Optimization Algorithm'
Here is a sample formula.
Following is the code for auto-gradient by w.
import tensorflow as tf
w = tf.Variable(2.)
def f(w):
y = w**2
z = 2*y + 5
return z
with tf.GradientTape() as tape:
z = f(w)
gradients = tape.gradient(z, [w])
print(gradients)
# variables that will be learned
# initialize them with 4 & 1 (random variable)
w = tf.Variable(4.0)
b = tf.Variable(1.0)
@tf.function
def hypothesis(x):
return w*x + b
@tf.function
def mse_loss(y_pred, y):
return tf.reduce_mean(tf.square(y_pred - y))
x = [1, 2, 3, 4, 5, 6, 7, 8, 9] # study time
y = [11, 22, 33, 44, 53, 66, 77, 87, 95] # grade
# uses Gradient Descent Algorithm
# learning rate = 0.01
optimizer = tf.optimizers.SGD(0.01)
for i in range(301):
with tf.GradientTape() as tape:
y_pred = hypothesis(x)
cost = mse_loss(y_pred, y)
gradients = tape.gradient(cost, [w, b])
optimizer.apply_gradients(zip(gradients, [w, b]))
if i % 10 == 0:
print("epoch : {:3} | w : {:5.4f} | b : {:5.4} | cost : {:5.6f}".format(i, w.numpy(), b.numpy(), cost))
# check if it works
x_test = [3.5, 5, 5.5, 6]
print(hypothesis(x_test).numpy())
This is for the cases we have two options.
It is not proper to use linear expression for binary classification.
We have to use function whose range is 0 to 1, and has 'S' shape.
We call it 'Sigmoid Function'.
The below is the formula for sigmoid function.
e is an Euler's number.
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras import optimizers
x = np.array([-50, -40, -30, -20, -10, -5, 0, 5, 10, 20, 30, 40, 50])
y = np.array([0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1]) # number 10 to 0
model = Sequential()
model.add(Dense(1, input_dim=1, activation='sigmoid')) # one x, one y, using sigmoid function as an activation
sgd = optimizers.SGD(lr=0.01)
model.compile(optimizer=sgd ,loss='binary_crossentropy', metrics=['binary_accuracy']) # using sigmoid gradient descend as an optimizer & using cross entrophy function as loss function
model.fit(x, y, epochs=200) # learns 200 times
plt.plot(x, model.predict(x), 'b', x,y, 'k.') # sigmoid function graph for x between 0 to 10
Usually for Deep Learning, the number of dependant variables is more than two. Which is, in the perspective of coding the model, input vector's dimension is bigger than two.
If we have x1 as midterm score, x2 as final score, x3 as added point, and y as score, below is the hypothesis.
If we have x1 as sepal length (cm), x2 as petal length (cm), and y as species, below is the hypothesis.
There may be cases which has dependant variables more than one.
Keras is easy to use, but if we develop low-level machine learning using Numpy or TensorFlow, we must understand calculation of variables with calculation of vector & matrix.
In other words, user should be able to set the size of matrix (or tensor) from data and number of variables.
Vector is 'amount with size and direction'.
In Python, it is expressed with 1-dimension array, or list.
Matrix is '2-dimension structure with row and column'.
In Python, it is expressed with 2-dimension array.
If the dimension is more than 2, we call it tensor.
In Python, it is expressed with n-dimension array. (n >= 3)
This is for the cases we have three options.
Basically, it is a method to make the sum of possibility of each option to 1.
If the number of options(classes) is k, it gets k-dimension vector as input, and returns the possiblity for each class.