OpenSpace: The Self-Evolving AI Agent Engine That Cuts Costs by 45.9% and Earns 4.2x More

Fact-Checked Deep Dive | 1.5k GitHub Stars | GDPVal Benchmark Verified

When you deploy an AI agent today—whether it's Claude Code, Cursor, OpenClaw, or nanobot—you're deploying a stateless worker. Every task starts from zero. Every mistake gets repeated. Every successful pattern evaporates into the void once the session ends.

OpenSpace changes this. Developed by HKUDS (HKU Data Science Lab), OpenSpace is a self-evolving skill engine that transforms AI agents from disposable tools into learning systems that accumulate expertise, share knowledge across agents, and deliver measurable economic returns.

The numbers from the GDPVal benchmark are striking:

Metric	OpenSpace Performance	Baseline (ClawWork)
Value Capture	72.8% ($11,484 / $15,764)	17.4%
Average Quality	70.8%	40.8% (+30pp improvement)
Token Efficiency	−45.9% (Phase 2 vs Phase 1)	N/A
Income Multiple	4.2x higher earnings	Baseline

These aren't synthetic benchmarks. GDPVal evaluates 220 real-world professional tasks across 44 occupations—the same work that generates actual GDP. We're talking payroll calculators from union contracts, tax returns from scattered PDFs, legal memoranda on California privacy regulations.

Let's dig into what makes OpenSpace different, verify the claims, and understand when this architecture matters for your agentic workflows.

The Problem: Why Today's AI Agents Never Learn

Current AI agents suffer from three fundamental weaknesses:

❌ Massive Token Waste

Every task requires reasoning from scratch. Need to parse a CSV file? The agent burns tokens rediscovering pandas.read_csv() parameters. Need to generate a PDF report? It relearns reportlab syntax every single time. There's no memory of successful patterns.

❌ Repeated Costly Failures

Agent A spends 2,000 tokens figuring out that a specific API requires pagination. Agent B, working on the same problem five minutes later, burns the same 2,000 tokens making the same mistakes. Knowledge doesn't transfer.

❌ Skills Degrade Silently

You write a skill that calls the Stripe API. Stripe updates their endpoints. Your skill breaks—not with a clear error, but with subtle data corruption. No monitoring, no auto-repair, no version tracking.

OpenSpace's thesis: Skills should be living entities that auto-repair, improve through usage, and share learnings across the entire agent network.

What Is OpenSpace? Three Superpowers for AI Agents

OpenSpace plugs into any agent that supports the SKILL.md format (Claude Code, Codex, OpenClaw, nanobot, Cursor) and adds three core capabilities:

🧬 1. Self-Evolution

Skills that learn and improve automatically through three mechanisms:

AUTO-FIX: When a skill breaks (API changes, dependency errors), OpenSpace detects the failure and generates a fix. The repaired skill becomes a new version.
AUTO-IMPROVE: Successful execution patterns get captured and optimized. If a skill works but uses 800 tokens, OpenSpace tries to distill it to 400 tokens.
AUTO-LEARN: When an agent completes a novel task successfully, the workflow gets captured as a reusable skill—no manual coding required.
Quality Monitoring: Tracks error rates, execution success, and token consumption across all tasks. Skills with high failure rates get flagged for review.

🌐 2. Collective Agent Intelligence

Turn individual agents into a shared brain:

Shared Evolution: One agent's improvement becomes every agent's upgrade. If Agent A evolves a skill for parsing complex PDFs, Agent B instantly benefits.
Network Effects: More agents → richer data → faster evolution for everyone.
Access Control: Choose public, private, or team-only access for each skill.
Cloud Community: Browse and download evolved skills at open-space.cloud.

💰 3. Token Efficiency

Stop repeating work. Start reusing solutions:

Cold Start → Warm Rerun: First execution of a task type builds the skill. Subsequent similar tasks reuse the evolved skill, dramatically reducing token consumption.
Small Updates Only: Fix what's broken, don't rebuild everything.
Measured Savings: 45.9% average token reduction across 50 professional tasks in GDPVal benchmark.

GDPVal Benchmark Results: Real Economic Impact

GDPVal is a benchmark dataset containing 220 real-world professional tasks covering 44 occupations, evaluated using actual economic value as the standard. OpenSpace was tested on 50 tasks across 6 industries in a two-phase design:

Phase 1 (Cold Start): Execute all 50 tasks sequentially with no prior skills
Phase 2 (Warm Rerun): Re-execute the same 50 tasks with the evolved skill database from Phase 1

Overall Results

Metric	OpenSpace (Qwen 3.5-Plus)	ClawWork Baseline (Same LLM)
Value Captured	$11,484 / $15,764 (72.8%)	~$2,743 (17.4%)
Quality Score	70.8% average	40.8% best agent
Token Reduction	−45.9% (Phase 2 vs Phase 1)	N/A
Income Multiple	4.2x higher	Baseline

Important: Both OpenSpace and the ClawWork baseline used the same backbone LLM (Qwen 3.5-Plus). The performance difference comes purely from skill evolution, not model capabilities.

Breakdown by Category

Category	Tasks	Income Δ	Token Δ	Why It Matters
Documents & Correspondence	7	71% → 74% (+3.3pp)	−56%	California privacy law memoranda, surveillance reports. The `document-gen-fallback` skill family evolved through 13 versions.
Compliance & Forms	11	51% → 70% (+18.5pp)	−51%	Tax returns from 15 PDFs, pharmacy compliance checklists. PDF skill chain evolves once, all form tasks reuse it.
Media Production	3	53% → 58% (+5.8pp)	−46%	Audio/video via ffmpeg. Evolved skills encode working codec flags, eliminating sandbox trial-and-error.
Engineering	4	70% → 78% (+8.7pp)	−43%	Technical specifications, CAD file processing. Reusable engineering calculation patterns.
Data Analysis	14	68% → 75% (+7pp)	−42%	CSV analysis, statistical reports. Pandas patterns captured and reused.
Research & Writing	11	65% → 72% (+7pp)	−38%	Market research, technical documentation.

Every category improved—no exceptions.

How Self-Evolution Actually Works: FIX, DERIVED, CAPTURED

OpenSpace implements three distinct evolution modes:

1. FIX Mode (Auto-Repair)

Trigger: Skill execution fails with a specific error type.

Example: A skill calls stripe.Customer.create() but Stripe updated the API to require email as a required field.

Execution Error: Missing required field 'email'
→ AUTO-FIX triggered
→ Skill updated: adds email parameter validation
→ New version: data-validation-csv v1.1.0

The fixed skill is stored as a new version, preserving the lineage. You can trace exactly when and why a skill evolved.

2. DERIVED Mode (Optimization)

Trigger: Successful execution with opportunity for improvement.

Example: A skill works but uses 1,200 tokens. OpenSpace analyzes the execution trace and creates a distilled version:

Original: 1,200 tokens, 8 steps
Derived: 650 tokens, 5 steps (same output quality)
→ Skill marked as v2.0 (optimized)

3. CAPTURED Mode (New Skill Creation)

Trigger: Novel task completed successfully without existing skill.

Example: Agent builds a monitoring dashboard with 20+ panels. The entire workflow gets captured as a reusable skill:

---
name: monitoring-dashboard-builder
description: Creates live monitoring dashboards with 20+ panels
target: docker, prometheus, grafana
---
# Workflow captured from successful execution
1. Scan running containers
2. Extract metrics endpoints
3. Generate Grafana datasource configs
4. Create dashboard JSON with 20 panels
5. Deploy and validate

Skill Storage: SQLite + SKILL.md

OpenSpace stores skills in two formats:

SQLite Database: Metadata, execution history, performance metrics, evolution lineage
SKILL.md Files: Human-readable skill definitions with instructions, code snippets, and triggers

You can inspect the database directly:

sqlite3 /path/to/workspace/.openspace/openspace.db
SELECT name, version, origin, execution_count FROM skills ORDER BY execution_count DESC;

Collective Intelligence: One Agent Learns, All Benefit

This is where OpenSpace gets interesting for teams and production systems.

Cloud Community: open-space.cloud

Public Skills: Browse 165+ evolved skills from the GDPVal benchmark
Skill Lineage: See how skills evolved (e.g., document-gen-fallback has 13 versions)
Upload/Download: Share your team's evolved skills or download community skills
Access Control: Mark skills as public, private, or team-only

Real-World Impact

Imagine your team has 10 agents running in production:

Without OpenSpace: Each agent independently discovers (and forgets) solutions. Agent #3 figures out the Stripe API pagination. Agent #7 burns tokens rediscovering it.
With OpenSpace: Agent #3's discovery becomes a skill. Agents #4-#10 instantly benefit. Next week, Agent #7 encounters a new edge case, fixes the skill, and everyone upgrades.

Network Effect Formula: More agents → More executions → More evolution data → Better skills → Lower costs → More agents.

Case Study: My Daily Monitor - 20+ Panels, Zero Human Code

The OpenSpace team showcased a personal behavior monitoring system built entirely by an agent:

20+ Live Dashboard Panels: Processes, servers, terminals, news, markets, messages, schedules
60+ Skills Evolved: All created autonomously through OpenSpace execution
Zero Human-Written Code: The agent developed the entire system end-to-end

This isn't a static dashboard. It includes a built-in AI agent that can:

Answer questions about your processes
Provide analysis of system metrics
Execute tasks (restart services, deploy updates, send alerts)

Why This Matters: Traditional agent development requires humans to write skills, test them, deploy them. OpenSpace demonstrates that agents can autonomously develop complex systems, evolving skills as they encounter challenges.

Integration: Plug Into Claude Code, Cursor, OpenClaw

OpenSpace works with any agent that supports the SKILL.md format. Here's how to integrate:

Step 1: Install OpenSpace

git clone https://github.com/HKUDS/OpenSpace.git
cd OpenSpace
pip install -e .

Pro Tip: Skip the 50MB assets/ folder for faster cloning:

git clone --filter=blob:none --sparse https://github.com/HKUDS/OpenSpace.git
cd OpenSpace
git sparse-checkout set '/*' '!assets/'
pip install -e .

Step 2: Add to Your Agent's MCP Config

For agents that support MCP (Model Context Protocol):

{
  "mcpServers": {
    "openspace": {
      "command": "openspace-mcp",
      "toolTimeout": 600,
      "env": {
        "OPENSPACE_HOST_SKILL_DIRS": "/path/to/your/agent/skills",
        "OPENSPACE_WORKSPACE": "/path/to/OpenSpace",
        "OPENSPACE_API_KEY": "sk-xxx (optional, for cloud)"
      }
    }
  }
}

Step 3: Copy Core Skills

cp -r OpenSpace/openspace/host_skills/delegate-task/ /path/to/your/agent/skills/
cp -r OpenSpace/openspace/host_skills/skill-discovery/ /path/to/your/agent/skills/

These two skills teach your agent when and how to use OpenSpace—no additional prompting needed.

Step 4: (Optional) Enable Cloud Community

Register at open-space.cloud to get an OPENSPACE_API_KEY, then add it to your config. Without it, all local capabilities work normally.

When to Use OpenSpace (and When Not To)

✅ Use OpenSpace When:

Use Case	Why
High-volume repetitive tasks	Token savings compound quickly (45.9% reduction)
Multi-agent teams	Collective intelligence amplifies value
Long-running production systems	Skills improve over time, costs decrease
Complex workflows with failure modes	AUTO-FIX catches and repairs breaking changes
Cost-sensitive deployments	4.2x income improvement changes unit economics

❌ Skip OpenSpace When:

Use Case	Why
One-off experimental tasks	Overhead outweighs benefits
Simple, stateless queries	No reusable patterns to capture
Tight latency requirements	Skill search adds ~100-300ms overhead
Highly specialized domains	Community skills may not apply

The Bottom Line: Economic Viability for AI Agents

OpenSpace addresses the fundamental economic problem of AI agents: costs scale linearly with task complexity because every task starts from zero.

By treating skills as living entities that auto-repair, improve, and share knowledge, OpenSpace flips this model:

Costs decrease over time as skills evolve and reuse increases
Quality improves as successful patterns get captured and optimized
Failures become rare as AUTO-FIX catches breaking changes
Network effects kick in as more agents contribute to the shared skill pool

The Numbers Don't Lie

72.8% value capture on real professional work
45.9% token reduction through skill reuse
4.2x higher earnings with the same backbone LLM
1.5k GitHub stars and growing (not 2.6K as some sources claim—fact-checked)

For teams running AI agents in production, OpenSpace isn't just a nice-to-have. It's the difference between agents that burn money and agents that generate profit.

Ready to try it?

GitHub: HKUDS/OpenSpace
Cloud Community: open-space.cloud
Documentation: See openspace/host_skills/README.md for integration guides

The era of stateless, forgetful AI agents is ending. Welcome to self-evolving systems that learn from every task, share knowledge across the network, and deliver measurable economic returns.

Fact-Checked Sources:

GitHub Repository: https://github.com/HKUDS/OpenSpace (1.5k stars, 168 forks)
GDPVal Benchmark: https://openreview.net/forum?id=hcuEdq6eKD
MarkTechPost Tutorial: https://www.marktechpost.com/2026/03/24/a-coding-implementation-to-design-self-evolving-skill-engine-with-openspace-for-skill-learning-token-efficiency-and-collective-intelligence/
Dev|Journal Analysis: https://earezki.com/ai-news/2026-03-24-a-coding-implementation-to-design-self-evolving-skill-engine-with-openspace-for-skill-learning-token-efficiency-and-collective-intelligence/