Quarq/Robotics Evals

Stop watching robot failures. Start fixing them.

Automated evaluation, failure diagnostics, and analytics for robotics teams. Push code, run thousands of simulations, understand what broke — and fix it. Works with your existing simulator workflow.

Star on GitHubEarly access to hosted offering
quarq eval run — policy_v2.pt
$ quarq eval run --policy policy_v2.pt --n 1000
Running 1000 simulations across 8 workers...
 
848 passed
152 failed — categorized automatically
 
Failure breakdown:
42 arm slip on grasp → wrist_torque < threshold
67 motor lag timeout → latency spike @ t=1.2s
43 camera glare / sensor loss → depth_conf < 0.4
 
Dashboard: quarq.dev/runs/a3f7c2
The problem

The status quo is manual, slow, broken.

Robotics teams run thousands of simulations per day with almost no infrastructure to make sense of what’s happening.

PAIN 01

Death by video review

150 failures. 150 videos. Engineers spend hours replaying simulations just to figure out why a task failed.

PAIN 02

No observability stack

Software teams have Datadog. LLM teams have Langfuse. Robotics is yet to have its movement.

PAIN 03

Manual testing everywhere

Policy updates are validated by manual scripts. Regressions slip through unnoticed until something breaks.

The solution

Automated evals. Instant diagnostics.

Two tools that work together — local testing for fast iteration, cloud analytics for scale.

01 / SDKAvailable

Local Testing Toolkit

Drop into your existing code. Define success criteria, configure failure scenarios, and run evals instantly on your own machine before you push anything.

  • Plug-and-play: works with your existing policy and simulator
  • Scenarios: glare, motor lag, slippery surfaces, sensor noise
  • Run locally in seconds — no cloud round-trip needed
  • Structured output for downstream analytics
Star on GitHub
02 / CLOUDEarly access

Analytics Dashboard & CI/CD

Every GitHub push triggers automated evaluation. Instead of video files, you get categorized failure groups with one-click replay.

  • GitHub Actions integration — zero new infra
  • Auto-categorized failures by root cause
  • One-click playlist of exact failure moments
  • Track physical metrics and regressions over time
Early access
Compatible with
Isaac SimMuJoCoGazeboGitHub ActionsPython SDK
How it works

From code change to root cause in minutes.

01

Connect your simulator

Install the SDK, point it to your policy and simulator, and define success criteria.

02

Run evaluations automatically

Every commit triggers large-scale evaluation through GitHub Actions integrations.

03

Understand failures instantly

Get grouped failure categories, replayable examples, and trend analysis instead of raw logs and videos.

Early access

Early access to the evaluation platform built for robotics engineers.

Join the waitlist for the SDK, CI/CD integrations, and diagnostics dashboard.