0.5B, 4x3090. if you have 4 GPUs, you should set --num_processes=3. One GPU deploy vLLM as online inference engine, for faster GRPO sampling example: 4x4090, 3epochs, training time, ~1h20min ...
一些您可能无法访问的结果已被隐去。
显示无法访问的结果一些您可能无法访问的结果已被隐去。
显示无法访问的结果