In computing self-attention, we follow 45, 1, 29, 30 by including a relative position bias B ∈ RM2 × M2 to each head in computing similarity:자기 주의를 계산