Computational analysis of 140 years of US political speeches reveals more positive but increasingly polarized framing of immigration

임재석·2024년 5월 27일
0

paper-study

목록 보기
19/23

Abstract

  • 200K US congressional speeches + 5K presidential communications related to immigration from 1880 to the present

  • political speech about immigration is much more positive on average than the past

    • shift largely between WW2 and the passage of Immigration and Nationality Act in 1965
    • since the late 1970s, political parties become polarized
  • contextual embeddings of text

    • modern Republicans \rightarrow suggestive of metaphors long associated with immigration (animals, cargo) and frames like "crimes" and "legality"
  • nationality mentioned changed the tone of speeches (Mexican, Chinese) \rightarrow still a major factor in how immigrants are spoken of in Congress

1. Introduction

  • Recently, the attitude toward immigration became negative than ever before
    • anti-Chinese fearmongering in 1880s

    • Southern and Eastern European immigrants in 1920s

    • antiimmigration rhetoric of Trump (2017 to 2020)

      \rightarrow "Certain types of immigrants can never truly join American society?"

  • how have attitudes toward immigrants in US changed over the past century?
    • public opinion poll began in the 1960s
    • turned to Congressional Record
  • Corpus
    • full corpus of more than 17M congressional speeches from 1880 to present
    • 200K speeches relevant to immigration
    • presidential communications
    • quantitative analysis
  • Related works
    • qualitative approaches and historical archives
    • quantitative work on immigration used migration and census records
    • Rhetorical aspects of immigration debates \rightarrow dehumanizing language (vermin and cargo) with qualitative analysis
    • NLP methods to cover in news media and Congress \rightarrow not a long time span / not a comprehensive corpus with a consistent methodology
  • Methods
    • identify relevant speeches

      • automated text classification based on extensive human annotations
    • curated and applied a set of lexicons for analyzing relevant frames with semi-automated method

    • neural contextual embedding models to quantify implicit dehumanizing metaphors

  • Brief results and discussion
    • political speeches about immigration today are more positive than the past

      • the shift between WW2 and 1965 Immigration and Nationality Act
    • being net positive on average since early 1950s

    • Trump is the first president to express sentiment toward immigration more negative than the average member of his own party

    • two parties have become increasingly polarized over time

      • liniar increase in polarization on immigration since late 1970s
    • today, Democrats are unprecedentedly positive

    • generic political polarization observed in Gentzkow et al. by more than a decade

    • nationality of immigrants continues to matter greatly

      • Mexican \rightarrow more negative than European
      • Mexican framed today is similar to Chinese framed during Chinese exclusion in 19th century
      • negative frame "crime", "labor", "legality" + dehumanizing metaphors
    • there remains a string and growing strain of antiimmigration speech among Replblicans

      • expressed opinions toward immigrants still vary greatly by country of origin
      • rhetorical strategies continue to be deployed

2. Results

Tone of Immigration Speeches

  • 17M congressional speeches from 1880 to 2020

  • human annotations and trained ML classifiers to detece immigration related speech with accompanying tone (pro, con, neutral)

  • applied same models to all presidential communications by American Presidency Project (Bottom)

  • Fig 1

    • average sentiment is negative throughout the late 19th and early 20th centuries (Chinese Exclusion Act(1882) to strint immigration quotas (1920s))

    • the attitude became more positive around the start of WW2

      • rising steadily from 1940 until the end of the Johnson administration (1969)
      • average tone has been pro since the beginning of the Eisenhower (1953)
    • beginning about a dacade after 1965, an overall decline in sentiment amone Republicans and incline among Democrats is observed

      • except for the early 1990s, this coincides with the end of the Cold War and NAFTA
      • Republican shows antiimmigration as 1920s
  • Trends for presedential attitudes should be treated more cautiously as there is less text

    • involves a slight domain shift (the model is trained on congressional speeches)

    • found a similar pattern

      • early presidents were more antiimmigration
    • in recent years, presidents are uniformly more proimmigration even the Republican (Ronald Reagan) and the Democrats (Jimmy Carter)

      • Trump was a start exception (the most antiimmigration president over the past 140 years)

  • Fig 2.
    • the tone was varied dramatically depending on which groups of immigrants are being discussed

    • Mexican and Chinese (Italian is Identifying Groups)

    • Speech mentioning Chinese immigrants were overwhelmingly negative during Chinese exclusion (1882 to 1943)

      • while the tone toward Italian was slightly more favorable
    • Attitute toward all groups improved from 1940 to 1970

      • mentioning China and Mexico remained relatively more negative overall
    • since the late 1970s, the gap between Italian and Mexican is large as the gap in tone that exists between Replublicans and Democrats today.

    • this pattern is mirrored in broader regional trends

      • Most European \rightarrow referred to positively on average by the 1960s
      • Asian \rightarrow by the 1980s
      • Caribbean \rightarrow negative on average until the 2000s
      • few countries are mentioned as frequently as those three (Mexico, Italy, China)

Language, Framing, and Dehumanization

  • trained interpretable logistic regression models to approximate the predictions of the contextual embedding models and determine feature importance using Shapley values

  • Table 1
    • antiimmigration terms contains the words representing threats (dangerous, cheap), control (permit, violation), and the targets of early antiimmigration legislation (undesirable, Chinese)

      • by midcentury and beyond, another threats appear (subversive, terrorism) along with the themes of legality (aliens, illegal) and crime (criminals, smuggling)
    • proimmigration terms contain the words representing desirable characteristics (industrious), land (property, agriculture), and service (gave, served)

      • by post-WW2 era, humanitarian concerns (discriminatory, migrants) and community (citizens, families, children) appeared
      • this continued into the present (victoms, community) along with a celebration of once-vilified communities (Irish, Italian, heritage)
    • Despite the relatively negative tone toward Mexican in the modern period, Hispanic and Latino had strong positive associations

      • these are likely to be used by Democrats than Republicans \rightarrow they are proimmigrations
      • but "Mexico" and "Mexican" are mentioned with very similar frequency by Democrat and Republicans \rightarrow the tone difference is not simply a matter of Mexico
  • To understand the rhetorical divergence between parties \rightarrow they focused on the frame about immigration \rightarrow built a series of lexicons
    • working upon the previous work, they developed 14 of these lexicons with automated/manual curation

  • Fig 3
    • almost no difference in the frames by two parties in the earlier time period

    • today, they use strongly divergent use of different frames

      • Republicans : crime, legality, threats, deficiency, flood/tide \rightarrow commonly heard antiimmigration comments
      • Democrats : family, victims, contributions, culture (positive)
    • these patterns are robust to the exclusion of any individual term as well as to automated lexicon expansion

    • the most salient aspects

      • earlier time period : difeciency, culture, labor
      • today : crime, legality (partly due to frequent mentions of legal and illegal immigrants + other legal terms and crime terms (laws, visas, criminals, terrorism))
    • economy is the most uncommon in speeches about immigration \rightarrow the lease salient in both era

  • measured more implicit dehumanizing metaphors
    • only flood and tide metaphor emerge dfrom the semiautomated frame construction process

    • measure the metaphors based on how probable such terms are as substitutes according to contextual embedding models

      • animals, cargo, disease, flood/tide, machines, vermin are drawn by this method
      • "dumping produces ~~~ " \rightarrow cargo
      • "herding of ~~~" \rightarrow animal
    • Republican used more dehumanizing metaphors

Differences by Country of Origin

  • Fig 4
    • how Mexicans are framed today vs how Chinese were framed a centuryearlier
    • crime, labor, legality are deployed vastly
    • 4 most positive frames are all used far more in sentences mentioning European than the non-European groups (culture, victims, contributions, family)
  • imploicit dehumanizing language is slightly but significantly more common for mentions of the non-European group in both cases

3. Discussion

  • Congressional antagonism to immigration started much earlier than the quota period
    • China is mentioned in more than 20% of the speeches in 1880 to 1900
    • negative attitudes toward immigration remained from 1880 to 1940
  • negative tone toward Chinese is consistent with the many pieces of anti-Chinese legislation introduced into Congress
    • 1875 Page Act
    • 1882 Chinese Exclusion
    • 1888 Scott
    • mentioning Chinese remained until the Chinese Exclusion Act was repealed in 1943
    • significantly greater use of implicit dehumanizing language to mention Chinese and emphasis on the threatening aspects
  • The combination of frames (crime, threats) underscores the dual nature of immigrants (threatening vs cheap labor)
    • same pattern in the Mexican today
    • the frame of Europeans was more sympathetic although still negative until the middle of the 20th centurs
  • gradual loosening of immigration laws from 1940s
    • this trend mirrored by congressional tone toward immigration

    • eventually becoming net positive on average in 1950s

    • possibly by the humanitarian concerns

      • signaling positive attitudes
      • increasing association with the 'victims' frame and decreasing the prominence of 'deficiency' and 'threats'
  • nearly 30 years after the border reopened in 1965, the positive sentiment didn't fully erode even as the immigration from developing countries increased
    • partisnan divice on immigration cemerged in the late 1970s although the Republican showed neutral or positive stance until the election of Bill Clinton and NAFTA
    • this predates the previous work about polarization
  • the results are consistent with the patterns in the previous work about the polarizations
    • polarization + overall tone toward immigration
    • beyond the sentiment, there are the 'framing' used in immigration debates
  • understanding the causes of this polarizattion is beyond the scope of this paper
    • legislators' tone is weakly correlated with public opinion on the issue at the state level
    • noevidence of systematic differences in tone among House menbers in election vs nonelection years
  • stark differences in framing between European and non-European groups
    • Chinese in the late 19th and early 20th century, Mexican today
    • more implicitly dehumanizing metaphors for non-European
    • also for the explicit frames (crime, labor, legality)
    • the gap between mentioning Mexico and European is equivalent to the modern gap between Democrats and Republicans
  • modern immigration laws and the rhetoric of 'illegal' immegrants were crafted specifically to target immigration from Mexico
    • made associations with crime, legality and labor
  • Mexico was also the target of early discrimination like China
    • Mexico was exempt from the quota system
    • Although the tone of speeches mentioning Mexican increased with other nationalities, these gains were largely eroded in the early 1970s \rightarrow persistently nationality-based gap
  • with the public opinion polls (by Gallup)
    • also shows the increase in proimmigrant sentiment from 1965 to present
    • in 2019, 77% answered as a positive
    • in 2002, it was 52% (after 9.11)
    • for asking the decrease of immigration, 65% answered it should be decreased in 1990s
    • in 2020, this fell to 28%
  • the analysis of congressional and presidential speeches is more complicated
    • attitudes among Repuiblican are negative as the members of Congress were during the push for restrictive quotas
    • Chinese are still duscussed more negatively than European even the overall sentiment is positive today
    • recent years, COVID-19 made anti-Asian and hate crimes, anti-Chinese rhetoric
    • despite of the proimmigration among the general population, the tone differences in Congress based on nationality are strong as that between the parties

4. Materials and Methods

Data

  • 43rd to 111th Congress : digitized copy of the Congressional Record

  • 112th to 116th Congress : congressional-record tool by @unitedstates project

  • data with speaker, party, state and date

  • Procedural speeches were identified and excluded

  • presidential communication : all presidential documents from The American Presidency Project

  • Immigration statistics : Historical Statistics of the United States Millennial Edition Online + census data by the Migration Policy Institute

Classification

  • Princeton University research assistants to label a speech
    • about immigration or not
    • proimmigration / antiimmigration
    • extensive set of keyword was used to select the candidate for annotation
    • 7626 segments annotated (3643 were judged relevant)
    • the judgements were aggregated with Bayesian item response model (to get a probability distribution over labels for each segment)
  • trained RoBERTa
    • fine-tuned the pretrained roberta-base to congressional speeches with self-supervised
    • then fine-tuned it to be a classifier using annotated examples
    • ~90% accuracy on relevance and 65% on tone
    • major error in tone is between neutral and non-neutral
    • models trained on earlier and later parts of the data showed similar aggregate results in the intervening years
  • the predictions on segments are used to predict the speeches
    • same predictor is used to presidential communication

Identifying Groups

  • the mose prominent immigrant nationalities \rightarrow historical data on the countries of origin of the foreign-born US population

  • 45 countries that accounted for at least 1% of the foreitn-born population in at least 1 decade

  • manually modified the country name and nationality

Measuring Impact

  • used L1-regularized LR models to fit the predicted tone labels on all congressional segments classified as relevant

    • approximates the influence of individual words
  • words in the vocab : at lease 20 times used / excluding numbers, punctuation, stop words / counts were binarized

  • Shapley values computed (reflected in Table 1)

Curating Frames

  • curated lexicons for 14 immigration frames
    • identified significantly frequently occurred terms mentioning of immigrants compared to terms mentioning generic people
    • considered initial exploration, annotators' comments, and prior literature to identified 14 relevant categories
    • which term should be selected in which frame is aggregated by majority votes

Identifyng Mentions

  • collected direct mentions + group terms + more generic person references with nationality

  • used to measure dehumanizing metaphorical language for each group

  • included slang and derogatory terms to identify groups

Measuring Dehumanization

  • introduce a method that is based purely on context \rightarrow used BERT
    • trained on MLM task

    • fine-tuning to act as a classifier

    • to train implicit metaphorical language, began with the representative of that category

      • used static vector to fine similar terms
      • tried to find that kind of word in the BERT's vocabulary
  • training procedure
    • for each sentence that mentions and immigrant of immigrant group
    • mask the mention with [MASK] token
    • to compute the probability of the candidate for the [MASK]
    • add up all the probabilities to get an overall score for each category for that sentence
    • showed log ratio of the mean probability for one set of mentions to the mean probability for the other
  • validating procedure
    • collected human judgements on a sample of masked contexts
    • three of the auther independently rated whether a term would be a plausible replacement
    • resonably strong agreement (Krippendorff's alpha = 0.59)
    • correlated with the log probability by the model (r = 0.73)

0개의 댓글

관련 채용 정보