[AIFFEL] 22.Mar.22, GD_Data_Augmentation(Project)

Deok Jong Moon·2022년 3월 22일

오늘의 학습 리스트

transfer learning 할 때 사용한 코드(별 거는 아니지만, 나중에 찾게 될 듯)

  num_classes = ds_info.features["label"].num_classes
resnet50 = keras.models.Sequential([
    keras.applications.resnet.ResNet50(
        include_top=False,
        weights='imagenet',
        input_shape=(224,224,3),
        pooling='avg',
    ),
    keras.layers.Dense(num_classes, activation='softmax')
])
print('=3')

TTA(Test Time Augmentation)는 증강된 이미지를 여러번 보여준 다음 각각의 단계에 대해서 prediction을 평균하고 이 결과를 최종값으로 사용하는 것
Cross Validation
- K-Fold Cross Validation이랑 뭐가 다른가..?
  - 일단 cross-validation은 모델 하나에 대한 평가를 training set에서 나온 여러 개의 validation 셋에 대해 해본다는 의미(그리고 평균 냄)인 것 같음
  - 그런데 K-Fold는 그것을 몇 번 할 것인지를 얘기해주는 것
- Cross Validation을 사용하는 이유는?
  - 데이터가 적은데, 모든 데이터로 training도 해보고, test도 해봐서 더 좋은 예측 결과를 갖고 오도록
  - 단점은 K만큼 running time이 길어진다.
  - 설명 링크 이게 제일 좋은 설명인 것 같다.
- 제일 헷갈렸던 거는 그러면 같은 모델에 계속해서 훈련시키는 건가 했는데, 그건 아님.
tensorflow 계산할 때 자료형이 다르면 안되나보다
- tf.constant(2) - tf.constant(1.0) 하면 오류 남
tf.strided_slice
- 뭔가 tensorflow 만의 슬라이싱 방법을 넣은 자료형(?)인 것 같다.
- tf.data.Dataset 자료형을 변환할 때 배치화를 안한 상태에서 배치화가 선적용 됐어야 하는 함수를 써서 변환하니까 저거 관련한 에러가 떴다.

미니프로젝트(연습)

CutMix
- 나온 배경
  - p 1. "On the other hand, current methods for regional dropout remove informative pixels on training images by overlaying a patch of either black pixels or random noise. Such removal is not desirable because it leads to information loss and inefficiency during training. We therefore propose the CutMix augmentation strategy"
  - p 2. "While regional dropout strategies have shown improvements of classification and localization performances to a certain degree, deleted regions are usually zeroed-out [3, 33] or filled with random noise [51], greatly reducing the proportion of informative pixels on training images. We recognize this as a severe conceptual limitation as CNNs are generally data hungry [27]. How can we maximally utilize the deleted regions, while taking advantage of better generalization and localization using regional dropout?"
- 의의

Data Imbalance 생각하기
- 무턱대고 tfds에서 제공하는 oxford_flowers102 데이터셋을 활용하려 했더니 보니까 총 약 6,000장의 데이터이고, 클래스가 102개란다.
- 그런데 이거를 train, test, val로 정교하게 나눠놔서 최소한 클래스마다 10장 혹은 20장씩은 각각의 세트에 들어갈 수 있게 해놓았단다.
- 생각해보니 이런 걸 무턱대고 shuffle해서 가져오면 Data imbalance가 생길 수 있겠다...(사실 train, val, test 각각 shuffle하는 건 필요한데 이걸 다 합친 후에 shuffle하면 큰일 난다...)
tf.image.random_brightness() 의 max_delta 파라미터의 정도를 파악하기 위해 그와 비슷한 tf.image.adjust_brightness()의 정보를 찾던 중...
- "The value delta is added to all components of the tensor image. image is converted to float and scaled appropriately if it is in fixed-point representation, and delta is converted to the same data type. For regular images, delta should be in the range (-1,1), as it is added to the image in floating point representation, where pixel values are in the (0,1) range."
- delta가 무조건 -1 ~ 1까지의 값을 가져야 하는 게 아니었다.
- 이건 픽셀 값이 0, 1 사이의 값이라고 가정했을 때의 이야기였다...

Deok Jong Moon

'어떻게든 자야겠어'라는 저 아이를 닮고 싶습니다

이전 포스트

[AIFFEL] 22.Mar.21, GD - Data_Augmentation

다음 포스트

[AIFFEL] 22.Mar.22, GD_Data_Augmentation(Project)

오늘의 학습 리스트

미니프로젝트(연습)

[AIFFEL] 22.Mar.21, GD - Data_Augmentation

[딥러닝수학] 확률과통계_2

0개의 댓글

관련 채용 정보