[ 1 ]
Death by video review
150 failures. 150 videos. Engineers spend hours replaying simulations just to figure out why a task failed.
Automated evaluation, failure diagnostics, and analytics for robotics teams. Push code, run thousands of simulations, see what broke, and fix it in your existing simulator workflow.
$ quarq eval run --policy policy_v2.pt --n 1000 Running 1000 simulations across 8 workers... ✓ 848 passed ✗ 152 failed — categorized automatically Failure breakdown: 42 arm slip on grasp → wrist_torque < threshold 67 motor lag timeout → latency spike @ t=1.2s 43 camera glare / sensor loss → depth_conf < 0.4 → Dashboard: quarq.dev/runs/a3f7c2
Robotics teams run thousands of simulations per day with almost no infrastructure to make sense of what’s happening.
[ 1 ]
150 failures. 150 videos. Engineers spend hours replaying simulations just to figure out why a task failed.
[ 2 ]
Software teams have mature observability tools. Robotics teams still stitch together logs, videos, and custom scripts.
[ 3 ]
Policy updates are validated by manual scripts. Regressions slip through unnoticed until something breaks.
Two tools that work together: local testing for fast iteration and cloud analytics for scale.
[ 1 ]
SDK | AvailableDrop into your existing code. Define success criteria, configure failure scenarios, and run evals instantly on your own machine before you push anything.
[ 2 ]
Cloud | Early AccessEvery GitHub push triggers automated evaluation. Instead of video files, you get categorized failure groups with one-click replay.
[ 1 ]
Install the SDK, point it to your policy and simulator, and define success criteria.
[ 2 ]
Every commit triggers large-scale evaluation through GitHub Actions integrations.
[ 3 ]
Get grouped failure categories, replayable examples, and trend analysis instead of raw logs and videos.
Join the waitlist for the SDK, CI/CD integrations, and diagnostics dashboard.