인과 | Causality 와 인과추론 | Causal inference

standing-o·2022년 8월 23일
0
  • 본 포스팅은 인과, 인과추론의 개념과 관련 이론 (Back-door, Do-calculus) 들을 소개하고 있습니다.
  • Keyword : Causality, SCM, Back-door, Do-calculus
  • 👉 Click

인과 | Causality 와 인과추론 | Causal inference

Causality

  • Influence by shich one event, process, state, or object a contributes to the production of another event, process, state, or object where the cause is partly responsible for the effect, and the effect is partly dependent on the cause.
  • Causality in various academic disciplines
    • Physics, chemistry,biology, climate science
    • Psychology, social science, economics
    • Epidemiology, public health
  • Relation to AI, ML, DS
    • AI : a rational agent performing actions to achieve a goal (reinforcement learning)
    • ML : currently focused on learning correlations
    • DS : capture, process, analyze, communicate with data

Structural causal model (SCM)

  • SCM M=<U,V,F,P(U)>M = <U,V,F,P(U)> provides a formal framework.
  • SCM induces observational, interventional, and counterfactual distributions.
  • SCM induces a causal graph gg, which implies conditional independencies testable via d-separation (blockage).
  • The underlying model MM is unknown but the causal graph gg can be given from common sense or domain knowledge.
  • Intervention do(X=x) as a submodel Mx, which induces a manipulated causal graph g_\bar{x}.
  • Causal effect of X=xX=x on Y=yY=y is defined as P(ydo(x))P(y\mid{do(x)}).

Remark

  • Identifiability : causal effect may be computable from existing observational data for some causal graphs.
  • In a Markovian case an singleton X, a causal effect can be easily derivable by canceling output P(xpax)P(x\mid{pa_x})

Back-door Criterion

  • DefinitionBack-door

    • Find a set ZZ s.t. it can sufficiently explain 'confounding' between XX and YY. Then,
    P(ydo(x))=ZP(yx,z)P(z)"P(y|do(x))=\sum_Z{P(y|x,z)P(z)}"
  • DefinitionㅣBack-door criterion

    • A set ZZ satisfies the back-door criterion with respect to a pair of variables X,YX, Y in causal diagram gg if;
      • (i) no node in ZZ is a descendant of XX; and
      • (ii) ZZ blocks every path between X ∈ XX and Y ∈ YY that contains an arrow into X.
  • A back-door adjustment formula is simple and widely used but limited.

Back-door sets as substitutes of the direct parents of X

  • Rain satisfies the back-door criterion relative to Sprinkler ans Wet:
    • (i) Rain is not descendant of Sprinkler, and
    • (ii) Rain blocks the only back-door path from Sprinkler to Wet.
  • Adjusting for the direct parents of Sprinkler, we have:
P(wtdo(sp))=snP(wtsp,sn)P(sn)=...=rnP(wtsp,rn)P(rn)P(\text{wt}|do(\text{sp}))=\sum_\text{sn}P(\text{wt}|\text{sp,sn})P(\text{sn})=...=\sum_\text{rn}P(\text{wt}|\text{sp,rn})P(\text{rn})

Rules of Do-calculus

  • Backdoor criterion results in a very specific form of indentification formula.

  • Do-calculus (Pearl, 1995) provides general machinery to manipulate observational and interventional distributions.

  • TheoremㅣRules of Do-calculus (simplified)

    • Rule 1 : Adding/removing observations
    P(ydo(x),z)=P(ydo(x))if(ZYX)ingXˉP(y|do(x),z)=P(y|do(x))\,\,\,\text{if}\,\,(Z\perp{Y|X})\,\,\text{in}\,\,g_{\bar{X}}
    • Rule 2 : Action/observation exchange
    P(ydo(x),do(z))=P(ydo(x),z)if(ZYX)ingXˉZP(y|do(x),do(z))=P(y|do(x),z)\,\,\,\text{if}\,\,(Z\perp{Y|X})\,\,\text{in}\,\,g_{\bar{X}\underline{Z}}
    • Rule 3 : Adding/removing actions
    P(ydo(x),do(z))=P(ydo(x))if(ZYX)ingXˉZˉP(y|do(x),do(z))=P(y|do(x))\,\,\,\text{if}\,\,(Z\perp{Y|X})\,\,\text{in}\,\,g_{\bar{X}\bar{Z}}
  • Do-calculus is sound and complete but it has no algorithmic insight

  • A graphical condition and an efficient algorithmic procedure have developed for identifiability.

  • Do-calculus is a set of rules to manipulate observational or interventional probabilites. (Do-calculus is complete)


Modern identification tasks

  • Experimental conditions ➔ Generalized identification

    • Combining datasets of different experimental conditions

    • The identifiability of any expression of the form P(ydo(x),z)P(y\mid{do(x), z}) can be determined given any causal graph gg and an arbitrary combination of observational and experimental studies.

    • If the query is identifiable, then its estimand can be derived in polynomial time.

  • Environmental conditions ➔ Transportability

    • Combining datasets from different sources

    • Non-parametric transportability can be determined provided that the problem instance is encoded in selection diagrams.

    • When transportability is feasible, the transport formula can be derived in polynomial time.

    • The causal calculus and the corresponding transportation algorithm are complete.

  • Sampling conditons ➔ Recovering from selection bias

    • Nonparametric recoverability of selection bias from causal and statistical settings can be determined provided that an augmented causal graph is available.
    • When recoverability is feasible, the estimated can be derived in polynomial time.
    • The result is complete for pure recoverability, and sufficient for recoverability with external information.
  • Responding conditons ➔ Recovering from missingness


References

  • 본 포스팅은 LG Aimers 프로그램에 참가하여 학습한 내용을 기반으로 작성된것입니다. (전체내용 X)

LG Aimers 바로가기

[1] LG Aimers AI Essential Course Module 5.인과추론, 서울대학교 이상학 교수 
profile
공부를 마무리 할 때까지 👉👉https://standing-o.github.io/👈👈

0개의 댓글