Field of study that gives computers the ability to learn without being explicitly programmed. (Arthur Samuel, 1959)Machine learning algorithms: Superv
Linear Regression model: drawing a fitting linear line. Training set: Data used to train the model. ex: Table of house sizes in feet^2 and prices. x =
at a certain w and b value, take a 360 turn to look for the steepest descent.following the steepest descent after descent takes to the lowest valleys.
vector is just an array.dot product is multiplying by pair. multivariate regression refers to something else. subscript for the specific feature, supe
feature scaling is about rescaling the training set values to make gradient descent easier. Deviding the values by maximum possible value. Mean Normal
Sigmoid function addresses the classification problem well.The output of logistic regression is the probability that class is 1 (positive). Above the
The squared cost function for logistic regression shows many local minima, which is not ideal for gradient descent. The upper branch log function work
The formula for gradient descent for logistic regression is the same as that for linear regression. The base of the log could be e, which makes the de
Underfit is when the model we came up with does not fit the data accurately because the model has high bias for certain model. Example above shows the
The features are combined to represent a factor that determines the output, which are then combined to output a single value. These factors are known
Each neuron is a logistic regression unit. The superscript with square brackets represent the n-th hidden layer. The input to layer 2 is the output fr
Dense to create a layer. row x column = 2D array / 2D Matrix2 columns for each feature. 3 units makes 1 x 3 matrix. Tensor is a data type that represe
AGI refers to making an AI that does anything a human can do. What is the "algorithm" that is used to train our brain from seeing with our eyes to s
AGI refers to making an AI that does anything a human can do. What is the "algorithm" that is used to train our brain from seeing with our eyes to s
matmul to perform matrix multiplication.
Tensorflow and KerasTensorflow is a machine learning package developed by Google. In 2019, Google integrated Keras into Tensorflow and released Tensor
Binary Cross Entropy is another name for logistic loss function. The name originates from statistics - it is the name for that function. Binary becaus
Linear Activation function - aka no activation function, just a straight line. ReLU stands for Rectified Linear Unitfor hidden layers, we use ReLU ins
No more binary classification, now more categories.
Adam optimizer for faster gradient descent. If w_j (or b) keeps moving in the same direction, we increase the learning rate a_j. If w_j (or b) keeps o
Back propagation is basically a process to find a relationship between cost J function and the parameters. It works its way backwards from the result,
1. Evaluating a Model 1-1. Motivation If there is only one feature, it is possible to plot the model and see if the model is overfitting or underfitt
High bias means the model is too simple to capture the complexity of the underlying data. Both J_train() and J_cv() are high. High variance means the
1. Iterative Loop of ML Development Choosing an architecture: ML model, what data to use, what hyperparameters, etc. Training model: Training model w
Root node, decision node, and leaf node. A training example goes down the tree and is classified whether it is a cat or not a cat. Depending on which
The entropy function (denoted by H(p1), where p1 is the fraction of examples that are cats in this case) can be a measure of purity in one batch of ex
Tree ensembles can make the algorithm more robust and less sensitive. Just having one decision tree makes the algorithm very sensitive to changes.As s