Now in early access

Stop watching robot
failures. Start fixing them.

Automated evaluation, failure diagnostics, and analytics for robotics teams. Works with your existing simulator workflow.

Get early access →See how it works
quarq eval run — policy_v2.pt
$ quarq eval run --policy policy_v2.pt --n 1000
Running 1000 simulations across 8 workers...
 
848 passed
152 failed — categorized automatically
 
Failure breakdown:
42 arm slip on grasp → wrist_torque < threshold
67 motor lag timeout → latency spike @ t=1.2s
43 camera glare / sensor loss → depth_conf < 0.4
 
Dashboard: quarq.dev/runs/a3f7c2
// the problem

The status quo
is manual, slow, broken.

Robotics teams run thousands of simulations per day with almost no infrastructure to make sense of what's happening.

PAIN_01

Death by video review

150 failures. 150 videos. Engineers spend hours replaying simulations just to figure out why a task failed.

PAIN_02

No observability stack

Software teams have Datadog. LLM teams have Langfuse. Robotics is yet to have its movement.

PAIN_03

Manual testing everywhere

Policy updates are validated by manual scripts. Regressions slip through unnoticed until something breaks.

// the solution

Automated evals.
Instant diagnostics.

Two tools that work together — local testing for fast iteration, cloud analytics for scale.

01 / SDK

Local Testing Toolkit

Drop into your existing code. Define success criteria, configure failure scenarios, and run evals instantly on your own machine before you push anything.

  • Plug-and-play: works with your existing policy and simulator
  • Define scenarios: glare, motor lag, slippery surfaces, sensor noise
  • Run locally in seconds — no cloud round-trip needed
  • Structured output for downstream analytics
02 / CLOUD

Analytics Dashboard & CI/CD

Every GitHub push triggers automated evaluation. Instead of video files, you get categorized failure groups with one-click replay.

  • GitHub Actions integration — zero new infra
  • Auto-categorized failures by root cause
  • One-click playlist of exact failure moments
  • Track physical metrics and regressions over time
Compatible with
Isaac SimMuJoCoGazeboGitHub ActionsPython SDK
// how it works

From code change
to root cause in minutes.

01

Connect your simulator

Install the SDK, point it to your policy and simulator, and define success criteria.

02

Run evaluations automatically

Every commit triggers large-scale evaluation through GitHub Actions Integrations.

03

Understand failures instantly

Get grouped failure categories, replayable examples, and trend analysis instead of raw logs and videos.

“Robotics teams spend countless hours reviewing simulation footage to understand failures. Quarq turns every code change into a clear explanation of what broke and why.”

— Quarq Team

Ship robot policies faster.

Join teams already cutting debug time from days to minutes.

No spam. We'll reach out within 48 hours.