Response Example

Table of contents

  1. Comparing PRM, Math-psa (Ours) V.S. Math-Shepherd
  2. Justifing RL Training
  3. Exploring Test-time Computation

Comparing PRM, Math-psa (Ours) V.S. Math-Shepherd

QA 1 QA 2

Justifing RL Training

QA 3 QA 4

Exploring Test-time Computation

QA 5 QA 6 QA 7