N
Hacker Next
new
show
ask
jobs
submit
login
Offline Reinforcement Learning for LLM Multi-Step Reasoning
arxiv.org
104 points by
belter
2 days ago
|
9 comments
add comment