인과 | Causality 와 인과추론 | Causal inference

standing-o·2022년 8월 23일
  • 본 포스팅은 인과, 인과추론의 개념과 관련 이론 (Back-door, Do-calculus) 들을 소개하고 있습니다.
  • Keyword : Causality, SCM, Back-door, Do-calculus
인과 | Causality 와 인과추론 | Causal inference


  • Influence by shich one event, process, state, or object a contributes to the production of another event, process, state, or object where the cause is partly responsible for the effect, and the effect is partly dependent on the cause.
  • Causality in various academic disciplines
    • Physics, chemistry,biology, climate science
    • Psychology, social science, economics
    • Epidemiology, public health
  • Relation to AI, ML, DS
    • AI : a rational agent performing actions to achieve a goal (reinforcement learning)
    • ML : currently focused on learning correlations
    • DS : capture, process, analyze, communicate with data

Structural causal model (SCM)

  • SCM M=<U,V,F,P(U)>M = <U,V,F,P(U)> provides a formal framework.
  • SCM induces observational, interventional, and counterfactual distributions.
  • SCM induces a causal graph gg, which implies conditional independencies testable via d-separation (blockage).
  • The underlying model MM is unknown but the causal graph gg can be given from common sense or domain knowledge.
  • Intervention do(X=x) as a submodel Mx, which induces a manipulated causal graph g_\bar{x}.
  • Causal effect of X=xX=x on Y=yY=y is defined as P(ydo(x))P(y\mid{do(x)}).


  • Identifiability : causal effect may be computable from existing observational data for some causal graphs.
  • In a Markovian case an singleton X, a causal effect can be easily derivable by canceling output P(xpax)P(x\mid{pa_x})

Back-door Criterion

  • DefinitionBack-door

    • Find a set ZZ s.t. it can sufficiently explain 'confounding' between XX and YY. Then,
  • DefinitionㅣBack-door criterion

    • A set ZZ satisfies the back-door criterion with respect to a pair of variables X,YX, Y in causal diagram gg if;
      • (i) no node in ZZ is a descendant of XX; and
      • (ii) ZZ blocks every path between X ∈ XX and Y ∈ YY that contains an arrow into X.
  • A back-door adjustment formula is simple and widely used but limited.

Back-door sets as substitutes of the direct parents of X

  • Rain satisfies the back-door criterion relative to Sprinkler ans Wet:
    • (i) Rain is not descendant of Sprinkler, and
    • (ii) Rain blocks the only back-door path from Sprinkler to Wet.
  • Adjusting for the direct parents of Sprinkler, we have:

Rules of Do-calculus

  • Backdoor criterion results in a very specific form of indentification formula.

  • Do-calculus (Pearl, 1995) provides general machinery to manipulate observational and interventional distributions.

  • TheoremㅣRules of Do-calculus (simplified)

    • Rule 1 : Adding/removing observations
    • Rule 2 : Action/observation exchange
    • Rule 3 : Adding/removing actions
  • Do-calculus is sound and complete but it has no algorithmic insight

  • A graphical condition and an efficient algorithmic procedure have developed for identifiability.

  • Do-calculus is a set of rules to manipulate observational or interventional probabilites. (Do-calculus is complete)

Modern identification tasks

  • Experimental conditions ➔ Generalized identification

    • Combining datasets of different experimental conditions

    • The identifiability of any expression of the form P(ydo(x),z)P(y\mid{do(x), z}) can be determined given any causal graph gg and an arbitrary combination of observational and experimental studies.

    • If the query is identifiable, then its estimand can be derived in polynomial time.

  • Environmental conditions ➔ Transportability

    • Combining datasets from different sources

    • Non-parametric transportability can be determined provided that the problem instance is encoded in selection diagrams.

    • When transportability is feasible, the transport formula can be derived in polynomial time.

    • The causal calculus and the corresponding transportation algorithm are complete.

  • Sampling conditons ➔ Recovering from selection bias

    • Nonparametric recoverability of selection bias from causal and statistical settings can be determined provided that an augmented causal graph is available.
    • When recoverability is feasible, the estimated can be derived in polynomial time.
    • The result is complete for pure recoverability, and sufficient for recoverability with external information.
  • Responding conditons ➔ Recovering from missingness


