Content Moderation OpenEnv Benchmark

Meta PyTorch OpenEnv Hackathon · Round 1 · March 2026 · 4 difficulty tiers

Model Provider Agent Easy Score Easy Rwd Medium Score Medium Rwd Hard Score Hard Rwd Very Hard Score Very Hard Rwd Time

One-Shot Baseline — Grader Score by Difficulty Tier

Easy Medium Hard Very Hard

Multi-Step Agent — Grader Score by Difficulty Tier

Easy Medium Hard Very Hard

Cumulative Reward — Hard Task

Cumulative Reward — Very Hard Task

Total Reward Across All Tasks