DS 부트캠프 다음 취업하기 2일차

JAYLEE·2021년 6월 24일
0

취준시리즈

목록 보기
2/2

1. SQLD 시험 준비

  • IN : col_name in (a, b)     \iff col_name = a or col_name = b
  • BETWEEN : col_name between a and b     \iff col_name = a and col_name = b
  • LIKE : 패턴추출 'b%' b로 시작, 'b_' b로 시작하는 2글자 모두
  • ISNULL : NULL "=" 연산자 불가능, ISNULL만 가능
* 최대값을 가지는 특성 추출
SELECT 
	COL
FROM 
	TABLE
WHERE MAX IN (
		SELECT MAX
                  FROM TABLE
              ORDER BY MAX DESC
                 LIMIT 1)
;
				

2. Kernel Study

커널 스터디2

  • sns.factorplot => 카테고리형 데이터 시각화 좋은툴
  • sns.kdeplot, sns.violinplot => binary classification에서 실수형 feature와 비교
  • Null 데이터 처리 : 삭제 or imputation => Valid CV system => 결정
  • 정규포현식 : df.Column.str.extract('([A-Za-z]+)\.') 점까지 추출

3. 논문 구현 하기

Transfomer (Attention Is All You Need)

Transfomer => encoder stack + Decoder stack
Stack each has layer => transfomer encoder / decoder block => BERT
Transfomer Laungage => Decoder (GPT-3, GPT-2(36 blocks)) , Encoder(BERT(24 blocks))
Training - Language => generated training example ( second law of robotics: a robot must obey the orders given it by humna beings. )
input => a robot must => training => troll(randomly selected junk) => No (Should be obey) calcuated error => fitting model updated => more likely to say obey => 10 millions times

  • Decoder block
    One decoder block => the Shawshank (input) => pre-trained model => Redemption (Output) => feed forward neural network ( Large neural networks= making predictions) => The chicken didn't cross the road because it => feed foward neural network => what 'it' refers to ? => was covered in grass (it = road) => self-attention => first component => entire sequence

Tokenization => The Shaw sh ank => token_ids(integer) => output id => tokenizer => translate

Token embeddings => model vocabulary size (50,257) => numeric vector => training makes vocabulary embeddings =>

Input => Tokenization => embeding layer => numeric vector => decoder block => parallel => decoder block(highly concepted) => These are hidden states => a vector scores correspond to vocabulary(50,257) => softmax => probablility (all postive and add up to 1) => choose first highest one (GREEDY ONE)

Word2Vec

4. 면접질문

큰 수의 법칙에 대해서 설명해보세요.

A라는 사건에 대해서 시행 횟수가 커지면 커질수록, 수학적 확률과 통계적 확률은 같아 진다.

profile
데이터사이언티스트

0개의 댓글

관련 채용 정보