Skip to content

SLIME training backend setup

This doc describes how to train an AgentCore Runtime-deployed agent with the slime training backend. The public user surface is the SlimeRunner class for launching training.

For known issues (e.g. the norm-epsilon mismatch on Qwen2.5-32B-Instruct) see slime troubleshooting.

  • A GPU cluster with CUDA>=12.9 installed.
  • Python 3.12+ and uv.
  • AWS credentials with permission to invoke an AgentCore Runtime and read/write an S3 bucket.
  • An AgentCore Runtime deployment of your agent — follow the Prepare agent for RL guide. Save the resulting runtime ARN — required as the agent_runtime_arn argument on SlimeRunner below for Agent rollouts.
  • An S3 bucket for rollout result delivery — required as the s3_bucket argument on SlimeRunner below.

Choose one of the two paths below to install slime, then install the toolkit into the same environment.

Follow slime’s own installation docs and use the container image (slimerl/slime:latest). Inside the container, slime and Megatron-LM ship pre-installed at /root/slime and /root/Megatron-LM — use those paths for slime_dir / megatron_dir on SlimeRunner.

Install the toolkit with the slime-backend extras inside the container:

Terminal window
uv pip install -e ".[slime]"

Install slime and its heavyweight dependency stack (Megatron-LM, Transformer Engine, Apex, flash-attn, sglang, torch_memory_saver) with the provided script, which clones slime + Megatron-LM into the current directory and applies slime’s official patches. Run it inside your activated python environment.

Terminal window
uv pip install -e ".[slime]"
export CUDA_HOME=/usr/local/cuda-13.0
bash src/agentcore_rl_toolkit/backends/slime/scripts/install_slime.sh cu13

Point slime_dir / megatron_dir on SlimeRunner at the slime and Megatron-LM directories the script cloned.

The training dataset is a JSONL file where each line is one rollout request. Every line has the shape:

{"prompt": "...", "metadata": { /* whatever your agent expects */ }}
  • prompt — top-level string, used by slime for length filtering only.
  • metadata — copied verbatim as the payload dict your @rollout_entrypoint function receives. Put every per-rollout config the agent needs here (user prompt, ground-truth answer, task IDs, repo URIs, etc.).

Example (GSM8K):

{"prompt": "How many ...?", "metadata": {"prompt": "How many ...?", "answer": "42"}}

SlimeRunner is the one and only entry point — a Python class that stops stale processes, starts a Ray head, submits the slime training job, and streams output. Defaults target 8 × H100 (num_gpus=8, tp_size=2, rollout_gpus_per_engine=2); tune them for your cluster.

from agentcore_rl_toolkit.backends.slime import SlimeRunner
SlimeRunner(
exp_id="gsm8k-3b-smoke",
agent_runtime_arn="arn:aws:bedrock-agentcore:...",
s3_bucket="your-bucket-name",
model_dir="/path/to/Qwen2.5-3B-Instruct",
data_path="/path/to/gsm8k_tiny.jsonl",
model_type="qwen2.5-3B",
).train(num_rollout=1) # 1 = smoke test; bump to 100 for a real run

Wandb — set WANDB_API_KEY and WANDB_ENTITY in your environment (plus wandb_project / wandb_group on the constructor) to log a run. Unset env vars skip wandb entirely.

Config-file workflow — dump kwargs to YAML and call SlimeRunner.from_yaml("my_run.yaml") instead.

SlimeRunner exposes every field most experiments tune (cluster shape, training hyperparameters, per-rollout ACR limits, extra_flags for extra arguments to be directly passed to SLIME) as constructor arguments. See the API reference or help(SlimeRunner) for the full list.