Weβve developed experimental code for benchmarking the 2023 Korean CSAT Language section. Use this to estimate the performance of your desired models before submitting them officially!
Install AutoRAG
:
pip install AutoRAG
Set your OpenAI API Key:
Add your OpenAI API key as an environment variable in .env
.
Convert JSON data into AutoRAG datasets:
Run the make_autorag_dataset.ipynb
notebook to prepare the data.
Edit prompts and models in autorag_config.yaml
:
Customize prompts and add models. Instructions here.
Run the benchmark:
Execute the script to run the benchmark.
python ./korean_sat_mini_test/autorag_run.py --qa_data_path ./data/autorag/qa_2023.parquet --corpus_data_path ./data/autorag/corpus_2023.parquet
Check the results:
Results are saved in the autorag_project_dir
folder.
View your grade report:
Open grading_report_card.ipynb
to generate and view your performance report. Reports are saved in the data/result/
folder.
autorag_config.yaml
file in the korean_sat_mini_test
folder.Edit the node_type: prompt_maker
section to customize the prompt
content.
Example:
- node_type: prompt_maker
strategy:
metrics:
- metric_name: kice_metric
modules:
- module_type: fstring
prompt:
- |
Answer the given question.
Read paragraph, and select only one answer between 5 choices.
paragraph :
{retrieved_contents}
question of problem :
{query}
Answer : 3
Modify the node_type: generator
section to configure models.
module_type
to openai_llm
.llm
field.Example:
- node_type: generator
strategy:
metrics:
- metric_name: kice_metric
modules:
- module_type: openai_llm
llm: [gpt-4o-mini, gpt-4o]
batch: 5
module_type
to llama_index_llm
.huggingfacellm
in llm
.model
field.Example:
- node_type: generator
strategy:
metrics:
- metric_name: kice_metric
modules:
- module_type: llama_index_llm
llm: huggingfacellm
model: HumanF-MarkrAI/Gukbap-Qwen2-7B
For more advanced customization, refer to the AutoRAG Documentation.
Now you're ready to explore and evaluate your models against 2023 Korean CSAT benchmarks! π―