Jingjing Xu, Xuancheng Ren, Yi Zhang, Qi Zeng, Xiaoyan Cai, and Xu Sun. 2018. A Skeleton-Based Model for Promoting Coherence Among Sentences in Narrative Story Generation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4306–4315, Brussels, Belgium. Association for Computational Linguistics.
GitHub - lancopku/Skeleton-Based-Generation-Model: Code for "A Skeleton-Based Model for Promoting Coherence Among Sentences in Narrative Story Generation" (EMNLP 2018)
Introduction
- Most of the SOTA approaches are based on Seq2Seq models
- But it is hard for these models to find semantic dependency among sentences
- The connection is mainly reflected through key phrases
- Skeleton — the phrases to express the key meanings of a sentence
- Other words like modifiers redundant and make the dependency sparse
- Human writing is also first made from a skeleton and then reorganized into a fluent sentence
- The skeleton helps machines learn the dependency of sentences
- It avoids the interference of irrelevant information
- Model: a skeleton-based generative module + a skeleton extraction module
- The generative module
- Input-to-skeleton component — associates inputs and skeletons
- Skeleton-to-sentence component — expands a skeleton to a sentence
- The skeleton extraction module
- Generates sentence skeletons
- Automatically explores sentence skeletons (it is hard to establish rules to extract skeletons)
- A reinforcement learning method to build the connection between the skeleton extraction module and the generative module
- For simplification, many existing story generation models rely on given materials (do not generate from scratch)
- But this paper works on the complete story generation task
- Seq2Seq is widely used for the task but is weak on generating inter-related sentences (for coherence)
- Because of this, there are some models to build the mid-level sentence semantic representation to simplify the dependency among sentences
- But the unified rules these models use are non-flexible and tend to generate over-simplified representations
- So the paper uses a reinforcement learning method to automatically extract sentence skeletons for simplifying the dependency of sentences
Skeleton-based model
- Gθ: a skeleton-based generative module
- Generates a story sentence by sentence
- Input-to-skeleton component — generates a story based on the existing text
- Skeleton-to-sentence component
- Eγ: a skeleton extraction module
- A weakly supervised method for the initial extraction ability
- Training process
- Use extracted skeletons to train the generative module
- The output from the generative module is used to evaluate the extracted skeletons, refining the skeleton extraction module
Skeleton-based generative module
- Qα
- Seq2Seq structure
- A hierarchical encoder (Li et al., 2015) and an attention-based decoder (Bahdanau et al., 2014)
- Encoding
- Obtain sentence representations via a word-level LSTM network
- Generate a compressed vector h, which then goes through the decoder to generate a skeleton
Lα=−i=1∑TPQ(si∣c,α)
- input c, skeleton s={s1,…,si,…,sT} (extracted by the skeleton extraction module)
- α: parameters of the input-to-skeleton component
Skeleton-to-sentence component
- Dθ
- Seq2Seq structure
- Both the encoder and the decoder being one-layer LSTM networks with the attention mechanism
Lθ=−i=1∑MPD(yi∣s,θ)
- Input: skeleton s, target sentence y={y1,…,yi,…,yM}
- θ: parameters of the skeleton-to-sentence component
- Seq2Seq model with the attention mechanism
- Both the encoder and the decoder are based on LSTM structures
- Initialized with a weakly supervised method
- Skeleton extraction formulated as a sentence compression problem
- But the style of the sentenc compression dataset and that of the narrative story dataset are very different → noise occurs
Lγ=−∑T=i=1PE(si∣x,γ)
- Input: original text x, the compressed version s={s1,…,si,…,sT}
- γ: parameters of the skeleton extraction module
Reinforcement learning method
- To build a connection between the skeleton extraction module and the skeleton-based generative module
- Use policy gradient (Sutton et al., 1999) for training
- Calculate a reward Rc on the feedback of the generative module
- Optimize the parameters through policy gradient
- The expected reward maximized
- Trains the skeleton extraction module
∇J(γ)=E[Rc⋅∇log(PE(s∣x),γ)]
- x: the original sentence
- s: the skeleton generated by a sampling mechanism
Reward
- Define what is a good skeleton
- That contain all key information and ignore other information
- Bad skeletons contain too much detailed information and lack necessary information
- Good skeletons, incomplete skeletons, redundant skeletons
Rc=[K−(R1×R2)21]
- K: the upper bound of the reward
- R1 and R2: cross entropy losses in the input-to-skeleton component and the skeleton-to-sentence component
Experiment
Dataset
- Visual storytelling dataset (Huang et al., 2016)
- Use only the text data
- Data split into two parts
- First sentence as the input text
- Following sentences as the target text
- Sentence compression dataset (Filippova and Altun, 2013)
- For pre-training the skeleton extraction module
Baselines
- Clark et al., 2018: Entity-Enhanced Seq2Seq Model
- Cao et al., 2018: Dependency-Tree Enhanced Seq2Seq Model
- Martin et al., 2018: Generalized-Template Enhanced Seq2Seq Model
Metrics
Automatic evaluation
- BLEU
- To compare the similarity between the output by the machine and the references by human
- All stop words removed for precise results
Human evaluation
- 100 items for evaluation randomly chosen
- Fluency (correct in grammar)
- Coherence
Results
- BLEU: best
- Fluency: second to GE-Seq2Seq model
- Generalized templates help achieve higher fluency, sacrificing the expressive power
- In fact, the proposed model produces much more unique phrases than the GE-Seq2Seq
- Coherence: best
- GE-Seq2Seq performs worst
Incremental analysis
- The skeleton extraction module is effective in leaving only the essential information
BLEU
- With only the Seq2Seq model → lowest
- + skeleton extraction module → slightly improved
- + reinforcement learning → outperforms Seq2Seq by 40%
Human evaluation
- + skeleton extraction module → lower than Seq2Seq
- The dataset for training the module is different from the narrative story datast
- + reinforcement learning → outperforms Seq2Seq by 14%
Error analysis
- Dimensions in coherence
- Scene-specific relevance
- Temporal connection
- Non-recurrence
- When the input is short and the model is familiar with the input → better coherence
- But extracting the key semantics reduces the decrease in coherence