PPO-Franka-Reach

A Proximal Policy Optimization (PPO) policy trained from scratch in PyTorch on the Isaac-Reach-Franka-v0 task using NVIDIA Isaac Lab with 4096 GPU-parallel environments.

GitHub Repository: DavidH2802/PPO-from-scratch

Franka Reach Policy

Model Description

The model is a diagonal Gaussian policy (Actor) that controls a 7-DOF Franka Emika robot arm to reach a randomly spawned target position in 3D space. The policy outputs continuous joint-level actions.

Architecture

  • Actor: MLP (obs → 256 → 256 → act_dim) with Tanh activations, orthogonal initialization, and a learnable log-std parameter
  • Critic: MLP (obs → 256 → 256 → 1) with Tanh activations and orthogonal initialization (included in checkpoint but not needed for inference)

Observation and Action Space

  • Observations: 32-dimensional vector (joint positions, joint velocities, end-effector position, target position)
  • Actions: 7-dimensional continuous (joint position targets)

Training Details

Hyperparameters

Parameter Value
Task Isaac-Reach-Franka-v0
Parallel Envs 4096
Learning Rate 3e-4
Discount (γ) 0.99
GAE (λ) 0.95
Clip (ε) 0.2
Epochs per Update 4
Minibatch Size 2048
Horizon 32
Total Iterations 500
Total Env Steps 65.5M
Training Time ~48 minutes

Hardware

  • GPU: NVIDIA RTX 4070 SUPER (12 GB VRAM)
  • CPU: Intel Xeon E5-2673 v4
  • Cloud: vast.ai

Training Curves

Reward

The agent starts with negative reward (arm far from target) and converges to positive reward (~0.03-0.05) as it learns to reach the target.

Observation Normalization

The checkpoint includes running mean and variance statistics for observation normalization. These must be restored at inference time — without them, the policy receives unnormalized inputs and will not perform correctly.

How to Use

Download

from huggingface_hub import hf_hub_download

checkpoint_path = hf_hub_download(
    repo_id="DavidH2802/PPO-from-scratch",
    filename="final_policy.pt",
)

Inference

Clone the full project for the model and environment code:

git clone https://github.com/DavidH2802/PPO-from-scratch.git
cd PPO-from-scratch

Full Evaluation with Isaac Lab

See the GitHub repository for complete setup instructions including Isaac Lab installation and the eval.py script for video recording.

Checkpoint Contents

The final_policy.pt file contains:

Key Description
actor Actor network state dict
critic Critic network state dict
obs_rms_mean Running mean for observation normalization
obs_rms_var Running variance for observation normalization

Framework

  • Algorithm: PPO (from scratch, no RL library dependencies)
  • Deep Learning: PyTorch
  • Simulation: NVIDIA Isaac Lab 2.0 / Isaac Sim 4.5
  • Environment: Isaac-Reach-Franka-v0

Citation

@misc{habinski2026ppo,
  author = {David Habinski},
  title = {PPO from Scratch in PyTorch with Isaac Lab},
  year = {2026},
  publisher = {GitHub},
  url = {https://github.com/DavidH2802/PPO-from-scratch}
}

License

MIT

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading