SAS_codes_time series_1

TEMP·2021년 9월 20일

SAS

목록 보기

4/13

http://statwith.com/
설명 잘되어 있는 사이트
https://documentation.sas.com/api/collections/pgmsascdc/9.4_3.5/docsets/grstatproc/content/grstatproc.pdf?locale=en#nameddest=n0yjdd910dh59zn1toodgupaj4v9
공식문서

SAS는 proc 안의 문장을 행별로 따르게 된다.

1. Base Code

data fig1_1;
do t=1 to 200;
z=5000+2*rannor(12);
output;
end; run;
rannor 정규분포로 부터 난수 생성.
https://wikidocs.net/31084
do는 output이 필요하다.
음... 원래 output이 default이지만(거의 모든 proc에서는) 여기에서는 default가 아니라고 생각하면 될듯.

data fig1_1;
set aaa.fig1_1;
date=intnx('month','1jan80'd,_n_-1);
format date monyy.; run;
intnx('month','1jan80'd,_n_-1)
INTNX (interval<multiple><.shift-index>, start-from, increment<,alignment>)
1jan80'd 을 시작으로 month간격으로 자동변수n만큼 증가해서 추가해라.

proc sgplot data=fig1_1;
series x=date y=z;
refline 5000/ axis=y; run;
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/grstatproc/n0yjdd910dh59zn1toodgupaj4v9.htm
reference line을 그리는데 x축과 평행하게 즉, y=5000이 되게끔

data fig1_2;
do t=1 to 200;
x=0.5*t;
z=0.5*t+10*rannor(12);
output;
end; run;
1 ~ 200 까지 위와같은 규칙으로 자료를 생성한다.
output주의 하자.

proc sgplot data=fig1_2;
series x=date y=z/lineattrs=(color=blue);
series x=date y=x/lineattrs=(color=red);
run;
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/grstatproc/n1jkqnmk6y6ms9n1b2vvubyzqd4a.htm
해당선의 atrribute를 지정해준다고 생각하면 될듯.

data fig1_5;
  infile 'D:\UOS_2021_2\time_analysis\SAS_codes\data\koreapass.txt';
  input z @@;
  date=intnx('month','1jan81'd,_n_-1);
  format date monyy.; run;
proc sgplot;
  series x=date y=z; run;

data fig1_6;
  do t=1 to 120;
    a=rannor(4321);
    if t <= 60 then x=0.5*t;
    else x=2*(t-46);
    z=x+a; output;
  end; run;

do 는 python의 iterator이다 끝.

2.2 다항추세모형

말 그대로 추세는 그래프의 전반적인 방향성이다.
그냥 단순회귀분석이라고 생각하면 쉽고 이를 base line으로 하여 시계열에서의 특징이 들어간 계절성등을 추가해주는 작업이 회귀와 시계열의 차이정도라고 보면 될듯한다.
즉 시계열에서의 추세성분은 단순 회귀분석으로 구하면 된다.

2.2.2 추세성분만을 갖는경우 (선형과 2차)

데이터 불러오기

data pop;
infile "D:\UOS_2021_2\time_analysis\SAS_codes\data\population.txt";
input pop @@;
pop=round(pop/10000);
t+1;
year=1959+t;
년도별 자료 이므로 그냥 초기년도 지정 해 주고 너머지는 더한다.
굳이 t 안쓰고 자동변수 사용해도 되는데 그냥 t 사용하자.
이때도 행별 연산이므로 t+1만 해주면 각 행별로 더해지게 된다.
sas data procedure에서는 for문이 default라고 생각하는것이 좋다.

plot

proc sgplot data=pop;
series x=year y=pop/ markers markerattrs=(symbol=asterisk);
xaxis values=(1960 to 1995 by 5);
run;
sas에서의 AAA / BBB 는 보통 AAA를 구할때 option으로 BBB를 사용하라는 의미이다.
따라서 여기에서는 markers를 찍을 것이고 markers의 attribution으로 *를 지정해준다는 의미이다.
다음 x축의 범위를 지정해 준다.

이분산성이 발견되어 ln추가

분산안정화 변환(상수화변환)

data pop;
set pop;
lnpop=log(pop);
t2=t*t;
run;
proc sgplot data=pop;
series x=year y=pop / markers  markerattrs=(symbol=asterisk);
xaxis values=(1960 to 1995 by 5); run;

단순 회귀분석(이것이 추세분석이다.)

proc reg data=pop;
model pop=t / dw;  
output out=out1 p=pred1 r=residual1; run;
결과를 out1이라는 이름으로 저장하고 변수p는 pred1이라는 이름으로 저장하라.
즉 output은 저장해라 이고 이름을 out=out1의 방식으로 지정해줘라 라는 의미이다.
다음 예측값이나 잔차같은 값들도 저장하라는 의미이다.

Residual plot

/*위에서 저장한 residual 시계열plot*/
proc sgplot data=out1;
series x=year  y=residual1/ markers  markerattrs=(symbol=square);
xaxis values=(1960 to 1995 by 5); 
refline  0 / axis=y; run;

단순회귀 추가사항

proc reg data=pop ;
model pop=t/dw partial r p clb cli influence ;
plot npp. *rstudent.;     
ods output outputstatistics=name_2 ; 
run; quit;
model 뒤에 붙으면 print이고 output out에 붙으면 저장인데 아래처럼 ods output을 하면 다 저장. 추가적으로 normal plot

residual plot에서 곡선의 형태를 띄므로 제곱항 추가

/* 변수 추가되면 아래와 같이 추가적으로 그림 그려서 보면 한번에 보임 */
proc reg data=pop;
model pop=t t2/ dw p r;  
plot rstudent. *(predicted.);
plot npp. *rstudent.;
output out=out2 p=pred2 r=residual2; run; quit;

이차항을 추가한 추세모형이다

/*predict plot*/
proc sgplot data=out2;
series x=year y=pop/ markers markerattrs=(symbol=circle);
scatter x=year y=pred2 / markerattrs=(symbol=plus);
xaxis values=(1960 to 1995 by 5); 
yaxis label="pop"; run;

/*residual plot*/
proc sgplot data=out2;
series x=year  y=residual2/ lineattrs=(pattern=1 color=black thickness=1) 
markers  markerattrs=(symbol=star color=black size=5);
xaxis values=(1960 to 1995 by 5); 				 
refline  0 / axis=y;  run;

이분산성 해결 (위에서는 그냥 추가만 해놓은거임.)

proc reg data=pop;
model lnpop=t t2/ dw;
output out=out3 r=residual3; run;

분산안정화 변환을 하기위헤 ln항으로 바꾸어 분석

proc sgplot data=out3;
series x=year y=residual3  / lineattrs=(pattern=1 color=black thickness=1) 
markers  markerattrs=(symbol=diamondfilled color=black size=5);
xaxis values=(1960 to 1995 by 5); 
refline  0 / axis=y; run;
/*분산이 안정화(상수화)되었긴 한데 아직 자기상관성 +로 남아있다.*/

2.2.4 계절성분과 추세성분을 동시에 갖는 경우 (Indicator function)

위에서는 추세성 즉 대체적으로 어떠한 그래프를 따른다 라는 것을 회귀분석으로 분석한 것이다.
하지만 시계열 자료의 특징중 하나인 계절성을 표현하지 못한다.(주기성이라고 이해하면 편함)
따라서 위의 회귀모델로는 표현 할 수 없는 계절성을 추가하는 방법에 대해 다룬다.

Data

data dept;
infile 'D:/UOS_2021_2/time_analysis/SAS_codes/data/depart.txt';
input dept @@;
t+1;
run; proc print; run;

proc sgplot data=dept;
series x=t y=dept/ markers markerattrs=(symbol=asterisk);
xaxis values=(1 to 60 by 3);
run;

계절성; 여기에서는 1-12월을 계절성으로 할 것이므로 12개의 indicator function을 아래와 같이 추가해준다.

/*날짜 형식으로 맞춰주고 indicator func 추가 ln추가*/
/*주기성을 month로 잡을 것이니 mon을 추출 , _n_ 자동변수이고 t라고 해도 같다. t지정했는데 굳이 ?*/
data dept;
set dept;
lndept=log(dept); t+1; 
date=intnx('month','1JAN84'D,_n_-1);  format date Monyy.;
mon=month(date);
if mon=1  then i1=1;  else i1=0;
if mon=2  then i2=1;  else i2=0;
if mon=3  then i3=1;  else i3=0;
if mon=4  then i4=1;  else i4=0;
if mon=5  then i5=1;  else i5=0;
if mon=6  then i6=1;  else i6=0;
if mon=7  then i7=1;  else i7=0;
if mon=8  then i8=1;  else i8=0;
if mon=9  then i9=1;  else i9=0;
if mon=10 then i10=1; else i10=0;
if mon=11 then i11=1; else i11=0;
if mon=12 then i12=1; else i12=0; 
run; proc print; run;

proc sgplot;
series x=date y=dept / markers markerattrs=(symbol=dot); run;
그려보니 이분산성이 발견되어 미리 ln항을 추가해주는 것이 좋다.

미리 분산안정화

proc sgplot;
series x=date y=lndept / markers markerattrs=(symbol=dot); run;

modeling

proc reg;
model lndept=t i1-i12/noint dw;
plot rstudent. *(predicted.);
plot npp. *rstudent.;
output out=deptout r=residual; run; quit;
보면 변수로 시간t와 12개의 지시함수가 들어가 있다.
이렇게 하여 추세 + 계절성의 model을 만들 수 있다.

Residual Plot

proc sgplot data=deptout;
series x=date  y=residual / markers  markerattrs=(symbol=circlefilled);
refline  0 / axis=y; run;
/*자기상관성문제는 아직 남아있다.*/
이렇다고 해서 완벽한것은 아니다.
당영히 완벽할 수는 없지만(overfitting) 잔차그림을 그려보니 어느정도의 연관성이 있는거 처엄 보이고 이는 아래의 자기상관계수이다.
즉 같은 변수의 시점간의 상관성이 있는것이다.
상관서잉 있다는 말은 예측이 가능하다는 것인데 이것이 잔차로 들어가 있으니 model로 옮겨 주어야 한다.
자기상관성을 model로 옮기는 것은 뒤에서 다룬다.

자기상관계수

proc arima data=deptout; identify var=residual; run;
/*Autocorrelations 부분이 k-차 자기 상관 계수 이다.*/
위와같이 자기상관계수만을 계산 할 수 있다.

2.3 비선형추세모형

뭐 그냥 비선형일 뿐이다.

Data

data catv;
infile 'D:/UOS_2021_2/time_analysis/SAS_codes/data/catv.txt';
input catv @@;
t+1; 
year=1969+t;
k=70000000;
lncatv=log(k/catv -1);   비선형의 부분이다.
run; proc print ; run;

proc sgplot;
series x=year y=catv  / markers markerattrs=(symbol=asterisk); 
xaxis values=(1970 to 1995 by 5); run;

proc sgplot;
series x=year y=lncatv  / markers markerattrs=(symbol=asterisk);
xaxis values=(1970 to 1995 by 5); 
yaxis values=(-2 to 3 by 1); run;

modeling

proc reg data=catv ;
model lncatv=year/ dw;
output out=out1 pred=p; run; 
잔차그림에서 자기상관성이 보인다.

data out1; 
set out1;
p1=k/(exp(p)+1);   비선형
residual=catv-p1; run;

proc sgplot;
series x=year y=catv  / lineattrs=(pattern=1 color=black)
markers markerattrs=(symbol=circlefilled color=black)
legendlabel="catv"; 
series x=year y=p1  / lineattrs=(pattern=2 color=blue)
markers  markerattrs=(symbol=starfilled color=blue)
legendlabel="forecast"; 
xaxis values=(1970 to 1995 by 5); run; 

/*잔차그림*/
proc sgplot data=out1;
  series x=year y=residual/ markers markerattrs=(symbol=circlefilled);
  xaxis values=(1970 to 1995 by 5);
  yaxis values=(-4000000 to 3000000 by 1000000);
  refline  0 / axis=y; run;


proc nlin data=catv method=gauss noitprint;
  parms k=70000000 b0=2 b1=0;
  temp=exp(b0+b1*t);
  model catv=k/(1+temp);
  output out=tvout pred=p r=residual; run;

2.4 자기회귀오차모형(추세+계절)

앞에서 말한 자기상관성을 model에 넣어주는 방법이다.

다시 보자.
단순 회귀분석으로 추세성을 찾고 부족하여 indicator variable을 이용하여 추세성 + 계절성의 model을 만들었다.

이후 잔차분석을 해보니 오차의 자기 상관성, 즉 하나의 변수인 오차이지만 시차별로 상관성이 있는것이 발견되어 이를 model로 옮겨주려 한다.

proc autoreg data=dept;
model lndept=t i1-i12/ noint backstep nlag=13  dwprob;
output out=out1 r=residual; run;
위 code가 자기회귀 오차 모형이다.
결과를 보면 두개의 model로 볼 수 있다.
SAS 결과창을 순서대로 보자.

Parameter Estimates : 이건 자기상관 신경 안쓰고 그냥 proc reg한거랑 결과가 같다.(해봄)

Estimates of Autocorrelations : 이는 13시차 보두에 대한 자기상관계수를 구한 것이다.

Backward Elimination of Autoregressive Terms : 이는 자기회귀계수가 유의하지 않음을 순서대로 보여준다.

Estimates of Autoregressive Parameters : 이는 위에서 유의 하지 않은 시차를 제외하고 유의 하다고 판단 된 시차를 선택하여 다시 proc reg를 하여 회귀분석을 한것이다.(이건 안해봤는데 개념상 맞을듯)

'Parameter Estimates' : 가장 위의 ㅠㅛ와 제목을 같지는 이는 자기상관계수를 포함하여 계산한 parameter이다.

proc sgplot data=out1;
series x=date y=residual / markers markerattrs=(symbol=dot);
refline  0 / axis=y; run;

TEMP

이전 포스트

SAS_codes_Regression

다음 포스트

SAS_codes_time series_1

SAS

1. Base Code

2.2 다항추세모형

2.2.2 추세성분만을 갖는경우 (선형과 2차)

데이터 불러오기

plot

이분산성이 발견되어 ln추가

단순 회귀분석(이것이 추세분석이다.)

Residual plot

단순회귀 추가사항

residual plot에서 곡선의 형태를 띄므로 제곱항 추가

이분산성 해결 (위에서는 그냥 추가만 해놓은거임.)

2.2.4 계절성분과 추세성분을 동시에 갖는 경우 (Indicator function)

Data

미리 분산안정화

modeling

Residual Plot

자기상관계수

2.3 비선형추세모형

Data

modeling

2.4 자기회귀오차모형(추세+계절)

SAS_codes_Regression

SAS_codes_time series_2

0개의 댓글

관련 채용 정보