[Article Summary] Rashkin et al., 2020, PLOTMACHINES: Outline-Conditioned Generation with Dynamic Plot State Tracking

Minzi·2022년 9월 26일
0
post-thumbnail

Rashkin 2020

@misc{https://doi.org/10.48550/arxiv.2104.07228,
doi = {10.48550/ARXIV.2104.07228},

url = {https://arxiv.org/abs/2104.07228},

author = {Yu, Wenhao and Zhu, Chenguang and Zhao, Tong and Guo, Zhichun and Jiang, Meng},

keywords = {Computation and Language (cs.CL), Artificial Intelligence (cs.AI), FOS: Computer and information sciences, FOS: Computer and information sciences},

title = {Sentence-Permuted Paragraph Generation},

publisher = {arXiv},

year = {2021},

copyright = {Creative Commons Attribution 4.0 International}
}

https://github.com/hrashkin/plotmachines

Introduction

  • Task: outline-conditioned story generation
    • Outline: a set of phrases describing the key characters and events in a story
    • To generate a coherent narrative consistent with the given outline
  • Challenge: the input provides only the rough elements of the plot
    • How the plot elements will intertwine with each other across different parts of
    • The above should be dynamically determined based on what has already been composed while being consistent to the outline and the overall narrative structure
  • Model: PlotMachines
    • Transforms an outline into a multi-parpag
    • Uses dynamic memory blocks
      • To keep track of the implicit plot states (outline + story so far)
    • Informed with high-level narrative structures using discourse labels
      • To learn different styles of writing in different parts of the narrative (beginning, middle, end)

Outline-conditioned generation

  • How story generation models can plan long narrative (multi-paragraph) according to controllable story elements (outline)
  • To be flexible: outlines are loosely defined as lists of an arbitrary number of un-ordered multi-word plot points that guide the story to be generated
    • In this paper, the scope of plot points are limited to events and phrases to be loosely integrated in the output story
      • Because they can be automatically extracted
  • To be natural: stories have appropriate discourse and narrative flow

Dataset

  • Wikiplots corpuse
  • WritingPrompts (Fan et al., 2018)
  • NYTimes (Sandhaus, 2008) — for generalization

Outline extraction

  • RAKE algorithm (Rose et al., 2010)
    • To determine key phrases based on the word frequency and co-occurrence
    • In this paper, key-points are filtered with overlapping n-grams
      • Extracts longer outline points (3 to 8 words) with no particular order

Model

  • PlotMachines
    • End-to-end trainable transformer on top of the GPT model + memory mechanisms + special discourse features
    • Motivated by human writing styles where each paragraph is a distinct section of related sentences
(Pi,hi,Mi)=PM(o,di,hi1,Mi1)(\mathbb{P}^i, h^i, M^i) = \bold{PM}(o, d^i, h^{i−1}, M^{i−1})
  • Time step ii, a new paragraph Pi\mathbb{P}^i, outline representation oo, discourse representation dd, vector representation hh, memory MM

Outline representation

  • The plot outline oo as a sequence of tokens
  • _kw_ to delimit each plot point
  • _endkw_ to end the sequence
  • The entire outline truncated to nn tokens in maximum

Discourse representation

  • Posited: there are stylistic differences between the beginning, middle, and the end of a story
  • _i_, _b_, _c_ for the introduction, body, and conclusion paragraphs
    • Appended to the outline representation

Preceding context representation

  • To incorporate previous story context
  • An embedded representation hi1h^{i-1} added to the model input
    • Computed as the average embedding of GPT (not fine-tuned) output representations of words from the previous paragraph

Memory representation

  1. To keep track of the parts of the outline that have been mentioned
  2. To maintain semantic coherence throughout the entire story
  • KK, a set of vectors to keep track of outline points + DD, a matrix to store a latent topic distribution of the generated story so far

Notation

\begin{align*} & M= [K;D] \text{ where} \\ & M: \mathbb R^{d\times2n}\\ & K: \mathbb R^{d\times n}\text{ representation of outline points} \\ & D: \mathbb R^{d\times n}\text{ representation of latent document state} \end{align*}
  • The embedding size of the transformer model dd, the maximum number of tokens in the outline nn
  • KK initialized with embeddings of tokens in the outline
  • DD initialized randomly
  • MjiM^i_j: the jj-th column of memory at the timestep for paragraph ii

Updating memory

  • Based on the update equations in entity-based models such as Henaff et al. (2017)
  • Gating mechanism gg
    • To learn to flexibly control the update of each cell in memory
\begin{align*} & \hat M^i_j=\tanh (W_1M^{i-1}_j+W_2h^{i-1}) \\ & g^i_j=\sigma (W_3M^{i-1}_j+W_4h^{i-1}) \\ & M^i_j=(1-g^i_j) \odot M^{i-1}_j+ g^i_j \odot \hat M^i_j\\ & \text{where } W: \mathbb R^{d\times d} \text{ matrix} \end{align*}

Transformer blocks with memory

  • Attention within the transformer blocks to contain two parallel attention modules
    1. Performs the standard GPT self-attention
    2. Uses transformer input to attend over the memory vectors
    3. The outputs of the two modules are averaged

Training and decoding

  • Training: predicts each paragraph (cross-entropy loss)
    • Previous paragraphs’ gold representations are used to update the memory and compute hi1h^{i-1}
  • Decoding
    • Starts from the first paragraph
    • Uses its own predictions rto compute hi1h^{i-1} and update the memory
    • 5-paragraph structure assumed

Experiments

Experimental setup

Baselines

  • Fan et al. (2018): Fusion model
  • Yao et al. (2019): Plan-and-Write
  • Zellers et al. (2019): Grover (large-scale)
  • These generate the entire document based on outlines unlike PlotMachines (which generate recurrently paragraph by paragraph)

Ablated PlotMachines models

  • Mem: memory blocks
  • Disc: discourse tokens
  • Base GPT and GPT2 (fine-tuned) — only with outline inputs
  • PM-NoMem-NoDisc — + preceding context representations
  • PM-NoMem

Automatic metrics

  • ROUGE + self-BLEU — how realistic-looking as well as diverse

Coverage

  • PM is the best in all three datasets

Ablations

  • GPT-2 < PM-NoMem-NoDisc < PM-NoMem << PM

Diversity

  • PM is the highest on ROUGE and lowest on self-BLEU
    • Self-BLEU scores similar to those of the gold stories → its diversity similar to that of human-writing

Human evaluations

  • Outline utilization, narrative flow, ordering
  1. Small-scale study evaluating full-length stories
    1. PM as the best except for outline utilization (Grover the highest)
  2. Large-scale study evaluating single-paragraph excerpts
    • Outline usage
    1. With random paragraphs from each story
    2. With the paragraph with the most n-gram overlap with the outline (i.e. closest)
    3. PM found out to be using outlines more naturally, especially with the “closest” paragraphs
    • Narrative flow
    1. Repetitiveness, natural transition, relevant and on-topic
    2. PM as the best
    • Ordering
    1. Proxy task: humans to decipher the order of the generated paragraphs
      1. It would be easier to decipher the order if the model output is very well-structured
    2. Humans are more accurate with Grover and Fusion

N-gram-based outline usage analysis

  • /> 20% of the n-grams in the outline point also appear in the paragraph
  • Grover tends to over-repeat outline points (repetitive)
  • Fusion leaves out portions of the outline
  • PM is more inclusive and similar to the gold reference

Qualitative examples

  • Grover often finishes the story in the middle and starts a new story
  • PM adheres more to the beginning-middle-ending structure
    • Often starts by setting the scene

- Writes conclusions with a definitive closing action

0개의 댓글