PPO-Franka-Reach

A Proximal Policy Optimization (PPO) policy trained from scratch in PyTorch on the Isaac-Reach-Franka-v0 task using NVIDIA Isaac Lab with 4096 GPU-parallel environments.

GitHub Repository: DavidH2802/PPO-from-scratch

Franka Reach Policy

Model Description

The model is a diagonal Gaussian policy (Actor) that controls a 7-DOF Franka Emika robot arm to reach a randomly spawned target position in 3D space. The policy outputs continuous joint-level actions.

Architecture

Actor: MLP (obs → 256 → 256 → act_dim) with Tanh activations, orthogonal initialization, and a learnable log-std parameter
Critic: MLP (obs → 256 → 256 → 1) with Tanh activations and orthogonal initialization (included in checkpoint but not needed for inference)

Observation and Action Space

Observations: 32-dimensional vector (joint positions, joint velocities, end-effector position, target position)
Actions: 7-dimensional continuous (joint position targets)

Training Details

Hyperparameters

Parameter	Value
Task	Isaac-Reach-Franka-v0
Parallel Envs	4096
Learning Rate	3e-4
Discount (γ)	0.99
GAE (λ)	0.95
Clip (ε)	0.2
Epochs per Update	4
Minibatch Size	2048
Horizon	32
Total Iterations	500
Total Env Steps	65.5M
Training Time	~48 minutes

Hardware

GPU: NVIDIA RTX 4070 SUPER (12 GB VRAM)
CPU: Intel Xeon E5-2673 v4
Cloud: vast.ai

Training Curves

Reward

The agent starts with negative reward (arm far from target) and converges to positive reward (~0.03-0.05) as it learns to reach the target.

Observation Normalization

The checkpoint includes running mean and variance statistics for observation normalization. These must be restored at inference time — without them, the policy receives unnormalized inputs and will not perform correctly.

How to Use

Download

from huggingface_hub import hf_hub_download

checkpoint_path = hf_hub_download(
    repo_id="DavidH2802/PPO-from-scratch",
    filename="final_policy.pt",
)

Inference

Clone the full project for the model and environment code:

git clone https://github.com/DavidH2802/PPO-from-scratch.git
cd PPO-from-scratch

Full Evaluation with Isaac Lab

See the GitHub repository for complete setup instructions including Isaac Lab installation and the eval.py script for video recording.

Checkpoint Contents

The final_policy.pt file contains:

Key	Description
`actor`	Actor network state dict
`critic`	Critic network state dict
`obs_rms_mean`	Running mean for observation normalization
`obs_rms_var`	Running variance for observation normalization

Framework

Algorithm: PPO (from scratch, no RL library dependencies)
Deep Learning: PyTorch
Simulation: NVIDIA Isaac Lab 2.0 / Isaac Sim 4.5
Environment: Isaac-Reach-Franka-v0

Citation

@misc{habinski2026ppo,
  author = {David Habinski},
  title = {PPO from Scratch in PyTorch with Isaac Lab},
  year = {2026},
  publisher = {GitHub},
  url = {https://github.com/DavidH2802/PPO-from-scratch}
}

License

MIT

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning