Predicttarget data usingObservation or Given Data

Target Variable, Response VariablePredictor, Features, InputsInput Vector could be like thisstatistical model asWhere captures measurement errors and other discrepancies
With a goof : Function f, we can make predictions of at new points .
We can understand which components of are important in explaining
There could be extra features like Seniority, Years of education and Incom etc.
Depending on the complexity of , we may be able to understand how each component of affects

Is there an ideal ?
What is goof value for at any selected value of , say ??
There can bemany Y valuesat .
A good value is Expected value = Average
This ideal is called regression function
ideal or optimal predictor of with regard to mean-squared prediction errorminimize over all functions at all points .irreducible error
where is some
neighborhoodof
Nearst neighrbor averageing can be pretty good for small p, p parameterNearst neighbor methods can be
lousywhenp is large.Reason : the
curse of dimesionalityNearst neighbors tend to be far away in high dimensions.
- We neeed to get a reasonable fraction of the values of to average to bring the variance down. e.g.) 10%
- A 10% neighrborhood in high dimensions need no longer be local, so we lose the spirit of estimating
bylocal averaging
Linear model is an important example of a parametric model:fitting the model to training data.almost never correct, a linear model often serves as a good and interpretable approximation to the unknown true function .Linear model gives a reasonable fit here
quadratic model fits slightly better.
- Simulated example.
Red pointsare simulated values forincomefrom the model
is theblue surface,estimator
Linear Regression modelfit to the simulated data.
- More flexible regression model fit to the simulated data.
- Here we use a technique called a
thin-plate splineto fit a flexible surface.- Just control the
roughnessof the fit.Even more flexible
spline regression modelOverfitted
accuracy versus interpretabilityLinear models are easy to interpret but... thin-plate splines are not easy to interpretGood fit versus over-fit or under-fitfit is just right?Parsimony(Simply interpretable model) versus black-boxblack-box predictor involving them all.
Average squared prediction error over trainning errortest errortest data :
Ground Truth, means Simple Linear modelSuppose we have fit a model to some training data , and let be a test observation drawn from the population.
If the true model is (with ), then
The expectation averages over the variability of as well as the variability in .
Note that
Typically as the flexibility of increases, its variance increases, and its bias decreases
So choosing the flexibility based on average test error amounts to a bias-variance trade-off
Example

qualitative - e.g. email is one of Out goadls are to:
- Build a
classifierthat assigns aclass labelfrom to a futureunlabeled observation.- Assess the
uncertaintyin eachclassification- Understand the roles of the different
predictorsamong
$$X=(X_1,X_2,\cdots,X_p)

conditional class probabilities at ;Then the
bayes optimalclassifier at is

Nearest-neighbor averaging can be used as before.dimension grows. However, the impact on is less than on .misclassification error rate Bayes classifier (using the true ) has smallest error (in the population)Support-vector machines build structured models for . etc.