Weβve developed experimental code for benchmarking the 2023 Korean CSAT Language section. Use this to estimate the performance of your desired models before submitting them officially!

Install AutoRAG:
pip install AutoRAG
Set your OpenAI API Key:
Add your OpenAI API key as an environment variable in .env.
Convert JSON data into AutoRAG datasets:
Run the make_autorag_dataset.ipynb notebook to prepare the data.
Edit prompts and models in autorag_config.yaml:
Customize prompts and add models. Instructions here.
Run the benchmark:
Execute the script to run the benchmark.
python ./korean_sat_mini_test/autorag_run.py --qa_data_path ./data/autorag/qa_2023.parquet --corpus_data_path ./data/autorag/corpus_2023.parquet
Check the results:
Results are saved in the autorag_project_dir folder.
View your grade report:
Open grading_report_card.ipynb to generate and view your performance report. Reports are saved in the data/result/ folder.

autorag_config.yaml file in the korean_sat_mini_test folder.Edit the node_type: prompt_maker section to customize the prompt content.
Example:
- node_type: prompt_maker
strategy:
metrics:
- metric_name: kice_metric
modules:
- module_type: fstring
prompt:
- |
Answer the given question.
Read paragraph, and select only one answer between 5 choices.
paragraph :
{retrieved_contents}
question of problem :
{query}
Answer : 3
Modify the node_type: generator section to configure models.
module_type to openai_llm.llm field.Example:
- node_type: generator
strategy:
metrics:
- metric_name: kice_metric
modules:
- module_type: openai_llm
llm: [gpt-4o-mini, gpt-4o]
batch: 5
module_type to llama_index_llm.huggingfacellm in llm.model field.Example:
- node_type: generator
strategy:
metrics:
- metric_name: kice_metric
modules:
- module_type: llama_index_llm
llm: huggingfacellm
model: HumanF-MarkrAI/Gukbap-Qwen2-7B
For more advanced customization, refer to the AutoRAG Documentation.
Now you're ready to explore and evaluate your models against 2023 Korean CSAT benchmarks! π―