[llama3/llama/tokenizer.py] class ChatFormat

ma-kjh·2024년 8월 28일
0

LLM

목록 보기
10/15
class ChatFormat:
	def __init__(self, tokenizer: Tokenizer):
    	self.tokenizer = tokenizer
        
    def encode_header(self, message: Message) -> List[int]:
    	tokens = []
        tokens.append(self.tokenizer.special_tokens["<|start_header_id|>"])
        tokens.extend(self.tokenizer.encode(message["role"], bos=False, eos=False)) # message가 dictionary 형태 인 것 같은데, 여기에 role로 user, assistant 같은 것들이 드감
        tokens.append(self.tokenizer.special_tokens["<|end_header_id|>"])
        tokens.extend(self.tokenizer.encode("\n\n", bos=False, eos=False))
        return tokens
    
    def encode_message(self, message: Message) -> List[int]:
    	tokens = self.encode_header(message)
        tokens.extend(
        	self.tokenizer.encode(message["content"].strip(), bos=False, eos=False)
        )
        tokens.append(self.tokenizer.special_tokens["<|eot_id|>"]) # end of text.
        return tokens
    
    def encode_dialog_prompt(self, dialog: Dialog) -> List[int]: # Dialog는 뭐지
    	tokens = []
        tokens.append(self.tokenizer.special_tokens["<|begin_of_text|>"])
        for message in dialog:
        	tokens.extend(self.encode_message(message))
        # Add the start of an assistant message for the model to complete.
        tokens.extend(self.encode_header({"role": "assistant", "content": ""}))
        return tokens 

The list.extend() method in Python is used to extend a list by appending all the elements from another iterable (such as another list, tuple, string, etc.) to the end of the list. It modifies the original list in place and increases its length by the number of elements in the iterable.

Syntax:

list.extend(iterable)
  • list: The list that you want to extend.
  • iterable: Any iterable (e.g., another list, a tuple, a string) whose elements will be added to the end of the list.

How It Works:

The extend() method iterates over the elements in the provided iterable and appends each element to the end of the original list. It is similar to the += operator when used with lists but is often more explicit and readable.

Example:

# Example 1: Extending a list with another list
fruits = ['apple', 'banana', 'cherry']
additional_fruits = ['orange', 'grape']
fruits.extend(additional_fruits)

print(fruits)

Output:

['apple', 'banana', 'cherry', 'orange', 'grape']

In this example, the fruits list is extended by appending the elements of additional_fruits to the end.

# Example 2: Extending a list with a string
letters = ['a', 'b', 'c']
letters.extend('def')

print(letters)

Output:

['a', 'b', 'c', 'd', 'e', 'f']

Here, the letters list is extended by appending each character in the string 'def' to the end.

Key Points:

  • extend() modifies the original list in place; it does not return a new list.
  • The iterable passed to extend() can be any iterable (list, tuple, string, etc.).
  • Unlike append(), which adds its argument as a single element (which could be another list), extend() adds each element from the iterable to the list.

Comparison with append():

  • append() adds its argument as a single element to the end of the list. If you pass a list to append(), the entire list is added as a single element.
  • extend() adds each element of the iterable to the list. If you pass a list to extend(), each element of that list is added to the original list.

Example:

numbers = [1, 2, 3]
numbers.append([4, 5])
print(numbers)  # Output: [1, 2, 3, [4, 5]]

numbers.extend([6, 7])
print(numbers)  # Output: [1, 2, 3, [4, 5], 6, 7]

In the first case, append() adds the entire list [4, 5] as a single element, while extend() adds each element (6 and 7) separately to the list.

profile
거인의 어깨에 올라서서 더 넓은 세상을 바라보라 - 아이작 뉴턴

0개의 댓글