neural sequence models have become the standard choice for modeling
time-series data for a wide range of applications such as speech recognition (Graves et al., 2013),machine translation (Bahdanau et al., 2014), conversation modeling (Vinyals & Le, 2015), image and video captioning (Vinyals et al., 2015; Venugopalan et al., 2015), and visual question answering (Antol et al., 2015).
Inference in RNNs.
Lack of Diversity in BS.
Overview and Contributions
Diverse Beam Search(DBS)
요약하자면
DBS is a task agnostic, doubly greedy algorithm that incorporates diversity in beam search with little memory or computational overhead
Diverse M-Best Lists
Diverse Decoding for RNNs.
input_ids (torch.LongTensor of shape (batch_size, sequence_length)) — The sequence used as a prompt for the generation.
beam_scorer (BeamScorer) — An derived instance of BeamScorer that defines how beam hypotheses are constructed, stored and sorted during generation. For more information, the documentation of BeamScorer should be read.
logits_processor (LogitsProcessorList, optional) — An instance of LogitsProcessorList. List of instances of class derived from LogitsProcessor used to modify the prediction scores of the language modeling head applied at each generation step.
stopping_criteria (StoppingCriteriaList, optional) — An instance of StoppingCriteriaList. List of instances of class derived from StoppingCriteria used to tell if the generation loop should stop.
max_length (int, optional, defaults to 20) — DEPRECATED. Use logits_processor or stopping_criteria directly to cap the number of generated tokens. The maximum length of the sequence to be generated.
pad_token_id (int, optional) — The id of the padding token.
eos_token_id (int, optional) — The id of the end-of-sequence token.
output_attentions (bool, optional, defaults to False) — Whether or not to return the attentions tensors of all attention layers. See attentions under returned tensors for more details.
output_hidden_states (bool, optional, defaults to False) — Whether or not to return the hidden states of all layers. See hidden_states under returned tensors for more details.
output_scores (bool, optional, defaults to False) — Whether or not to return the prediction scores. See scores under returned tensors for more details.
return_dict_in_generate (bool, optional, defaults to False) — Whether or not to return a ModelOutput instead of a plain tuple.
synced_gpus (bool, optional, defaults to False) — Whether to continue running the while loop until max_length (needed for ZeRO stage 3)
group_beam_search는 말그대로 beam search를 그룹별로 나누어서 하는 것