PPO-Franka-Reach
A Proximal Policy Optimization (PPO) policy trained from scratch in PyTorch on the Isaac-Reach-Franka-v0 task using NVIDIA Isaac Lab with 4096 GPU-parallel environments.
GitHub Repository: DavidH2802/PPO-from-scratch
Model Description
The model is a diagonal Gaussian policy (Actor) that controls a 7-DOF Franka Emika robot arm to reach a randomly spawned target position in 3D space. The policy outputs continuous joint-level actions.
Architecture
- Actor: MLP (obs → 256 → 256 → act_dim) with Tanh activations, orthogonal initialization, and a learnable log-std parameter
- Critic: MLP (obs → 256 → 256 → 1) with Tanh activations and orthogonal initialization (included in checkpoint but not needed for inference)
Observation and Action Space
- Observations: 32-dimensional vector (joint positions, joint velocities, end-effector position, target position)
- Actions: 7-dimensional continuous (joint position targets)
Training Details
Hyperparameters
| Parameter | Value |
|---|---|
| Task | Isaac-Reach-Franka-v0 |
| Parallel Envs | 4096 |
| Learning Rate | 3e-4 |
| Discount (γ) | 0.99 |
| GAE (λ) | 0.95 |
| Clip (ε) | 0.2 |
| Epochs per Update | 4 |
| Minibatch Size | 2048 |
| Horizon | 32 |
| Total Iterations | 500 |
| Total Env Steps | 65.5M |
| Training Time | ~48 minutes |
Hardware
- GPU: NVIDIA RTX 4070 SUPER (12 GB VRAM)
- CPU: Intel Xeon E5-2673 v4
- Cloud: vast.ai
Training Curves
Reward
The agent starts with negative reward (arm far from target) and converges to positive reward (~0.03-0.05) as it learns to reach the target.
Observation Normalization
The checkpoint includes running mean and variance statistics for observation normalization. These must be restored at inference time — without them, the policy receives unnormalized inputs and will not perform correctly.
How to Use
Download
from huggingface_hub import hf_hub_download
checkpoint_path = hf_hub_download(
repo_id="DavidH2802/PPO-from-scratch",
filename="final_policy.pt",
)
Inference
Clone the full project for the model and environment code:
git clone https://github.com/DavidH2802/PPO-from-scratch.git
cd PPO-from-scratch
Full Evaluation with Isaac Lab
See the GitHub repository for complete setup instructions including Isaac Lab installation and the eval.py script for video recording.
Checkpoint Contents
The final_policy.pt file contains:
| Key | Description |
|---|---|
actor |
Actor network state dict |
critic |
Critic network state dict |
obs_rms_mean |
Running mean for observation normalization |
obs_rms_var |
Running variance for observation normalization |
Framework
- Algorithm: PPO (from scratch, no RL library dependencies)
- Deep Learning: PyTorch
- Simulation: NVIDIA Isaac Lab 2.0 / Isaac Sim 4.5
- Environment: Isaac-Reach-Franka-v0
Citation
@misc{habinski2026ppo,
author = {David Habinski},
title = {PPO from Scratch in PyTorch with Isaac Lab},
year = {2026},
publisher = {GitHub},
url = {https://github.com/DavidH2802/PPO-from-scratch}
}
License
MIT