Gilera XR1 - 搜索 News

0.5B, 4x3090. if you have 4 GPUs, you should set --num_processes=3. One GPU deploy vLLM as online inference engine, for faster GRPO sampling example: 4x4090, 3epochs, training time, ~1h20min ...

GitHub1 个月

README.md

X-R1 aims to build an easy-to-use, low-cost training framework based on end-to-end reinforcement learning to accelerate the development of Scaling Post-Training Inspired by DeepSeek-R1 and open-r1, we ...

一些您可能无法访问的结果已被隐去。

显示无法访问的结果

今日热点