Quick Start
Welcome to the OpenR Manual, designed to guide you through the process of training large language models (LLMs) to reason effectively. Here we provide a quick guide of how to successfully run the codebase of OpenR.
Table of contents
Prerequisites
We have tested our code on machines with minimum 2 x A800 GPUs, each with 80GB memory. For optimal performance, it is recommended to run the project on machine with at least 80GB of GPU memory.
Installation
Create Environment using Conda
- Create and activate a new conda environment
conda create -n open_reasonser python=3.10
conda activate open_reasonser
- Intall dependencies
pip install -r requirements.txt
pip3 install "fschat[model_worker,webui]"
pip install -U pydantic
cd envs/MATH/latex2sympy
pip install -e .
cd -
Download Base Models
Before running the project, please ensure that all required base models are downloaded. The models used in this project include:
Qwen2.5-Math-1.5B-Instruct
,Qwen2.5-Math-7B-Instruct
Qwen2.5-Math-RM-72B
peiyi9979/mistral-7b-sft
peiyi9979/math-shepherd-mistral-7b-prm
To download these models, please refer to the Hugging Face model downloading tutorial for step-by-step guidance on downloading models from the Hugging Face Hub.
Ensure that all models are saved in their directories according to the project setup before proceeding.
Start a LLM Service
Before running inference, please modify the following variables in the reason/llm_service/create_service_math_shepherd.sh
script to set the appropriate base models for your usage:
$MODEL_BASE
: Set this to the directory where your models are stored.$POLICY_MODEL_NAME
: Set this to the name of the policy model you wish to use.$VALUE_MODEL_NAME
: Set this to the name of the value model you wish to use.$NUM_LM_WORKER
: Set this to the number of language model (LM) workers to start.$NUM_RM_WORKER
: Set this to the number of reward model (RM) workers to start.
sh reason/llm_service/create_service_math_shepherd.sh
What this scrip does is to run two separate LLM services on your hardware, for generation and value inference respectively. After successfully running the script, you will be able to see the running services as:
$ ps -ef | grep openr
xxx 623984 175535 17 18:47 pts/9 00:00:08
/home/yanxue/anaconda3/envs/openr/bin/python3 -m fastchat.serve.controller --port 28777 --host 0.0.0.0
xxx 624073 623967 99 18:47 pts/10 00:04:37
anaconda3/envs/openr/bin/python3 -m reason.llm_service.workers.reward_model_worker --model-path /mnt/nasdata/xxx/llms/huggingface/math-shepherd-mistral-7b-prm --controller-address http://0.0.0.0:28777 --host 0.0.0.0 --port 30011 --worker-address http://0.0.0.0:30011
xxx 624074 623975 99 18:47 pts/11 00:01:44
anaconda3/envs/openr/bin/python3 -m reason.llm_service.workers.vllm_worker --model-path /mnt/nasdata/xxx/llms/huggingface/mistral-7b-sft --controller-address http://0.0.0.0:28777 --host 0.0.0.0 --port 30010 --worker-address http://0.0.0.0:30010 --dtype bfloat16 --swap-space 32
Run Inference
export PYTHONPATH=$(pwd)
sh scripts/eval/cot_greedy.sh
# Method: cot. Average result: ({'majority_vote': 0.734, 'total_completion_tokens': 559.13},)
sh scripts/eval/cot_rerank.sh
# Method: best_of_n. Average result: ({'majority_vote': 0.782,
# 'prm_min_max': 0.772,
# 'prm_min_vote': 0.792,
# 'prm_last_max': 0.776,
# 'prm_last_vote': 0.792,
# 'total_completion_tokens': 4431.268},)
sh scripts/eval/beam_search.sh
# Method: beam_search. Average result: ({'majority_vote': 0.74, 'total_completion_tokens': 2350.492},)
Run Training
Before training, please modify the $dataset_path
, $model_name_or_path
and $prm_name_or_path
in train/mat/scripts/train_llm.sh
.
cd train/mat/scripts
bash train_llm.sh
Run PRM Training
cd prm/code
// single gpu
python finetune_qwen_single_gpu.py --model_path $YOUR_MODEL_PATH \
--train_data_path $TRAIN_DATA_PATH \
--test_data_path $TEST_DATA_PATH
// multi gpu
torchrun --nproc_per_node=2 finetune_qwen.py --model_path $YOUR_MODEL_PATH \
--data_path $YOUR_DATA_FOLDER_PATH \
--datasets both \