๐Ÿ”ง How to Benchmark 2023 Korean CSAT with LLMs

Weโ€™ve developed experimental code for benchmarking the 2023 Korean CSAT Language section. Use this to estimate the performance of your desired models before submitting them officially!


๐Ÿ Quick Start Guide

  1. Install AutoRAG:

    pip install AutoRAG
  2. Set your OpenAI API Key:
    Add your OpenAI API key as an environment variable in .env.

  3. Convert JSON data into AutoRAG datasets:
    Run the make_autorag_dataset.ipynb notebook to prepare the data.

  4. Edit prompts and models in autorag_config.yaml:
    Customize prompts and add models. Instructions here.

  5. Run the benchmark:
    Execute the script to run the benchmark.

    python ./korean_sat_mini_test/autorag_run.py --qa_data_path ./data/autorag/qa_2023.parquet --corpus_data_path ./data/autorag/corpus_2023.parquet
    • To update models or prompts before running, refer to this guide.
  6. Check the results:
    Results are saved in the autorag_project_dir folder.

  7. View your grade report:
    Open grading_report_card.ipynb to generate and view your performance report. Reports are saved in the data/result/ folder.


๐Ÿคท How to Modify Prompts and Models?

  • Open the autorag_config.yaml file in the korean_sat_mini_test folder.

[Case 1] Modifying the Prompt:

Edit the node_type: prompt_maker section to customize the prompt content.

Example:

    - node_type: prompt_maker
      strategy:
        metrics:
          - metric_name: kice_metric
      modules:
        - module_type: fstring
          prompt:
          - |            
            Answer the given question.
            Read paragraph, and select only one answer between 5 choices.
            
            paragraph :
            {retrieved_contents}
            
            question of problem :
            {query}
            
            Answer : 3

[Case 2] Adding or Replacing Models:

Modify the node_type: generator section to configure models.

OpenAI Models:

  • Set module_type to openai_llm.
  • Specify desired OpenAI models in the llm field.

Example:

- node_type: generator
  strategy:
    metrics:
      - metric_name: kice_metric
  modules:
    - module_type: openai_llm
      llm: [gpt-4o-mini, gpt-4o]
      batch: 5

HuggingFace Models:

  • Set module_type to llama_index_llm.
  • Use huggingfacellm in llm.
  • Specify HuggingFace models in the model field.

Example:

- node_type: generator
  strategy:
    metrics:
      - metric_name: kice_metric
  modules:
    - module_type: llama_index_llm
      llm: huggingfacellm
      model: HumanF-MarkrAI/Gukbap-Qwen2-7B

For more advanced customization, refer to the AutoRAG Documentation.


๐Ÿ“’ Notes:

  • The default prompts included in this experiment are minimal and may differ from those used in the official leaderboard benchmark.
    • To enhance performance, customize the prompt in the YAML file as needed.

Now you're ready to explore and evaluate your models against 2023 Korean CSAT benchmarks! ๐ŸŽฏ

profile
why not? ์ •์‹ ์œผ๋กœ ๋งจ๋•…์— ํ—ค๋”ฉํ•˜๊ณ  ์žˆ๋Š” ์ฝ”๋ฆฐ์ด

0๊ฐœ์˜ ๋Œ“๊ธ€

๊ด€๋ จ ์ฑ„์šฉ ์ •๋ณด

Powered by GraphCDN, the GraphQL CDN