Back

Stop watching robot failures. Start fixing them.

Automated evaluation, failure diagnostics, and analytics for robotics teams. Push code, run thousands of simulations, see what broke, and fix it in your existing simulator workflow.

quarq eval run - policy_v2.pt
$ quarq eval run --policy policy_v2.pt --n 1000
Running 1000 simulations across 8 workers...

 848 passed
 152 failed — categorized automatically

Failure breakdown:
42 arm slip on grasp → wrist_torque < threshold
67 motor lag timeout → latency spike @ t=1.2s
43 camera glare / sensor loss → depth_conf < 0.4

→ Dashboard: quarq.dev/runs/a3f7c2

The status quo is manual, slow, broken.

Robotics teams run thousands of simulations per day with almost no infrastructure to make sense of what’s happening.

[ 1 ]

Death by video review

150 failures. 150 videos. Engineers spend hours replaying simulations just to figure out why a task failed.

[ 2 ]

No observability stack

Software teams have mature observability tools. Robotics teams still stitch together logs, videos, and custom scripts.

[ 3 ]

Manual testing everywhere

Policy updates are validated by manual scripts. Regressions slip through unnoticed until something breaks.

Automated evals. Instant diagnostics.

Two tools that work together: local testing for fast iteration and cloud analytics for scale.

[ 1 ]

SDK | Available

Local Testing Toolkit

Drop into your existing code. Define success criteria, configure failure scenarios, and run evals instantly on your own machine before you push anything.

  • [ 1 ]Plug-and-play: works with your existing policy and simulator
  • [ 2 ]Scenarios: glare, motor lag, slippery surfaces, sensor noise
  • [ 3 ]Run locally in seconds no cloud round-trip needed
  • [ 4 ]Structured output for downstream analytics
Star on GitHub

[ 2 ]

Cloud | Early Access

Analytics Dashboard & CI/CD

Every GitHub push triggers automated evaluation. Instead of video files, you get categorized failure groups with one-click replay.

  • [ 1 ]GitHub Actions integration zero new infra
  • [ 2 ]Auto-categorized failures by root cause
  • [ 3 ]One-click playlist of exact failure moments
  • [ 4 ]Track physical metrics and regressions over time
Early Access →

From code change to root cause in minutes.

[ 1 ]

Connect your simulator

Install the SDK, point it to your policy and simulator, and define success criteria.

[ 2 ]

Run evaluations automatically

Every commit triggers large-scale evaluation through GitHub Actions integrations.

[ 3 ]

Understand failures instantly

Get grouped failure categories, replayable examples, and trend analysis instead of raw logs and videos.

Early access to the evaluation platform built for robotics engineers.

Join the waitlist for the SDK, CI/CD integrations, and diagnostics dashboard.