NHacker Next

new
show
ask
jobs
submit

login

Offline Reinforcement Learning for LLM Multi-Step Reasoningarxiv.org

104 points by belter 2 days ago | 9 comments