[Article Summary] Xu et al., 2018, A Skeleton-Based Model for Promoting Coherence Among Sentences in Narrative Story Generation

Minzi·2022년 9월 22일

Natural Language Generation story generation

Story Generation Articles

목록 보기

1/2

Jingjing Xu, Xuancheng Ren, Yi Zhang, Qi Zeng, Xiaoyan Cai, and Xu Sun. 2018. A Skeleton-Based Model for Promoting Coherence Among Sentences in Narrative Story Generation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4306–4315, Brussels, Belgium. Association for Computational Linguistics.

GitHub - lancopku/Skeleton-Based-Generation-Model: Code for "A Skeleton-Based Model for Promoting Coherence Among Sentences in Narrative Story Generation" (EMNLP 2018)

Introduction

Most of the SOTA approaches are based on Seq2Seq models
But it is hard for these models to find semantic dependency among sentences
The connection is mainly reflected through key phrases
Skeleton — the phrases to express the key meanings of a sentence
- Other words like modifiers redundant and make the dependency sparse
- Human writing is also first made from a skeleton and then reorganized into a fluent sentence
The skeleton helps machines learn the dependency of sentences
- It avoids the interference of irrelevant information
Model: a skeleton-based generative module + a skeleton extraction module
The generative module
- Input-to-skeleton component — associates inputs and skeletons
- Skeleton-to-sentence component — expands a skeleton to a sentence
The skeleton extraction module
- Generates sentence skeletons
- Automatically explores sentence skeletons (it is hard to establish rules to extract skeletons)
- A reinforcement learning method to build the connection between the skeleton extraction module and the generative module

For simplification, many existing story generation models rely on given materials (do not generate from scratch)
But this paper works on the complete story generation task
Seq2Seq is widely used for the task but is weak on generating inter-related sentences (for coherence)
Because of this, there are some models to build the mid-level sentence semantic representation to simplify the dependency among sentences
But the unified rules these models use are non-flexible and tend to generate over-simplified representations
So the paper uses a reinforcement learning method to automatically extract sentence skeletons for simplifying the dependency of sentences

Skeleton-based model

$G_{\theta}$ : a skeleton-based generative module
- Generates a story sentence by sentence
- Input-to-skeleton component — generates a story based on the existing text
- Skeleton-to-sentence component
$E_{\gamma}$ : a skeleton extraction module
- A weakly supervised method for the initial extraction ability
Training process
1. Use extracted skeletons to train the generative module
2. The output from the generative module is used to evaluate the extracted skeletons, refining the skeleton extraction module

Skeleton-based generative module

Input-to-skeleton component

$Q_{\alpha}$
Seq2Seq structure
- A hierarchical encoder (Li et al., 2015) and an attention-based decoder (Bahdanau et al., 2014)
Encoding
1. Obtain sentence representations via a word-level LSTM network
2. Generate a compressed vector $h$ , which then goes through the decoder to generate a skeleton

L_α= −\sum^T_{i=1} P_Q(s_i|\boldsymbol{c}, α)

input $\boldsymbol{c}$ , skeleton $\boldsymbol{s} = \{s_1, …, s_i, …, s_T\}$ (extracted by the skeleton extraction module)
$\alpha$ : parameters of the input-to-skeleton component

Skeleton-to-sentence component

$D_{\theta}$
Seq2Seq structure
- Both the encoder and the decoder being one-layer LSTM networks with the attention mechanism

L_θ= −\sum^M_{i=1} P_D(y_i|\boldsymbol{s}, θ)

Input: skeleton $\boldsymbol{s}$ , target sentence $\boldsymbol{y} = \{y_1, …, y_i, …, y_M\}$
$\theta$ : parameters of the skeleton-to-sentence component

Skeleton extraction module

Seq2Seq model with the attention mechanism
- Both the encoder and the decoder are based on LSTM structures
Initialized with a weakly supervised method
- Skeleton extraction formulated as a sentence compression problem
- But the style of the sentenc compression dataset and that of the narrative story dataset are very different → noise occurs

L_γ= -\sum^T={i=1} P_E(s_i|\boldsymbol{x}, γ)

Input: original text $\boldsymbol{x}$ , the compressed version $\boldsymbol{s} = \{s_1, …, s_i, …, s_T\}$
$\gamma$ : parameters of the skeleton extraction module

Reinforcement learning method

To build a connection between the skeleton extraction module and the skeleton-based generative module
Use policy gradient (Sutton et al., 1999) for training
1. Calculate a reward $R_c$ on the feedback of the generative module
2. Optimize the parameters through policy gradient
  1. The expected reward maximized
  2. Trains the skeleton extraction module

∇J(γ) = \mathbb{E}[R_c· ∇ \log(P_E(\boldsymbol{s}|\boldsymbol{x}), γ)]

$\boldsymbol{x}$ : the original sentence
$\boldsymbol{s}$ : the skeleton generated by a sampling mechanism

Reward

Define what is a good skeleton
1. That contain all key information and ignore other information
2. Bad skeletons contain too much detailed information and lack necessary information
Good skeletons, incomplete skeletons, redundant skeletons

R_c= [K − (R_1× R_2)^{\frac{1}{2}}]

$K$ : the upper bound of the reward
$R_1$ and $R_2$ : cross entropy losses in the input-to-skeleton component and the skeleton-to-sentence component

Experiment

Dataset

Visual storytelling dataset (Huang et al., 2016)
- Use only the text data
- Data split into two parts
  1. First sentence as the input text
  2. Following sentences as the target text
Sentence compression dataset (Filippova and Altun, 2013)
- For pre-training the skeleton extraction module

Baselines

Clark et al., 2018: Entity-Enhanced Seq2Seq Model
Cao et al., 2018: Dependency-Tree Enhanced Seq2Seq Model
Martin et al., 2018: Generalized-Template Enhanced Seq2Seq Model

Metrics

Automatic evaluation

BLEU
- To compare the similarity between the output by the machine and the references by human
- All stop words removed for precise results

Human evaluation

100 items for evaluation randomly chosen
Fluency (correct in grammar)
Coherence

Results

BLEU: best
Fluency: second to GE-Seq2Seq model
- Generalized templates help achieve higher fluency, sacrificing the expressive power
- In fact, the proposed model produces much more unique phrases than the GE-Seq2Seq
Coherence: best
- GE-Seq2Seq performs worst

Incremental analysis

The skeleton extraction module is effective in leaving only the essential information

BLEU

With only the Seq2Seq model → lowest
+ skeleton extraction module → slightly improved
+ reinforcement learning → outperforms Seq2Seq by 40%

Human evaluation

+ skeleton extraction module → lower than Seq2Seq
- The dataset for training the module is different from the narrative story datast
+ reinforcement learning → outperforms Seq2Seq by 14%

Error analysis

Dimensions in coherence
1. Scene-specific relevance
2. Temporal connection
3. Non-recurrence
When the input is short and the model is familiar with the input → better coherence
- But extracting the key semantics reduces the decrease in coherence

Minzi

다음 포스트

[Article Summary] Xu et al., 2018, A Skeleton-Based Model for Promoting Coherence Among Sentences in Narrative Story Generation

Story Generation Articles

Introduction

Skeleton-based model

Skeleton-based generative module

Input-to-skeleton component

Skeleton-to-sentence component

Skeleton extraction module

Reinforcement learning method

Reward

Experiment

Dataset

Baselines

Metrics

Automatic evaluation

Human evaluation

Results

Incremental analysis

BLEU

Human evaluation

Error analysis

[Article Summary] Rashkin et al., 2020, PLOTMACHINES: Outline-Conditioned Generation with Dynamic Plot State Tracking

0개의 댓글

[Article Summary] Xu et al., 2018, A Skeleton-Based Model for Promoting Coherence Among Sentences in Narrative Story Generation

Story Generation Articles

Introduction

Related work

Skeleton-based model

Skeleton-based generative module

Input-to-skeleton component

Skeleton-to-sentence component

Skeleton extraction module

Reinforcement learning method

Reward

Experiment

Dataset

Baselines

Metrics

Automatic evaluation

Human evaluation

Results

Incremental analysis

BLEU

Human evaluation

Error analysis

[Article Summary] Rashkin et al., 2020, PLOTMACHINES: Outline-Conditioned Generation with Dynamic Plot State Tracking

0개의 댓글