[Data Analysis]#1 Normalization

Jude's Sound Lab·2023년 2월 18일

Note for 2023

목록 보기

4/10

Normalization

Norm Vs. Normalization

there is a close relationship between the concepts of norm and normalization.

In mathematics and physics, a norm is a function that assigns a length or magnitude to a vector. Norms have many important properties, such as being non-negative, homogeneous, and satisfying the triangle inequality. The norm of a vector can be used to measure its magnitude or size, and is often used in optimization, numerical analysis, and other areas of mathematics.

On the other hand, normalization is the process of scaling a vector to have a unit norm, i.e., a norm of 1. This is typically done by dividing each element of the vector by its norm. Normalization is often used in machine learning, data analysis, and other areas where the relative magnitudes of different vectors are important.

The relationship between norm and normalization is that normalization involves dividing a vector by its norm, which has the effect of scaling the vector to have a unit norm. In other words, normalization is a way to transform a vector so that its norm is equal to 1. This is useful because it allows us to compare vectors on a relative scale, without being affected by their absolute magnitudes.

For example, in machine learning, it is common to normalize the input data to a model so that each feature has zero mean and unit variance. This is done to ensure that the features are on a similar scale, which can help the model converge more quickly and avoid numerical issues. Similarly, in image processing, it is common to normalize the pixel values of an image to have zero mean and unit variance before applying certain algorithms, such as principal component analysis.

So, in summary, the norm of a vector is a measure of its magnitude or size, while normalization is the process of scaling a vector to have a unit norm. The concept of norm is fundamental to many areas of mathematics and physics, while normalization is commonly used in machine learning, data analysis, and other fields.

z-score normalization(standardization)

When we normalize the input data to a machine learning model, we typically use a process called standardization, which involves subtracting the mean value of each feature from the data(mean becomes 0) and then dividing by the standard deviation(standard deviation becomes 1). This process is also known as z-score normalization.

In summary, normalization and standardization are two different techniques for scaling input data to a machine learning model. Normalization typically involves scaling the vector to have unit norm, while standardization involves centering the data around zero by subtracting the mean and dividing by the standard deviation.

그래서 데이터를 normalize를 해주어야 하는 건지 standardize를 해주어야하는 건지가 헷갈릴 수가 있다. 개인적으로는 임베딩 값이라면 normalize를 해주는게 적당한 것 같다.