
Fact-Checked Deep Dive | 1.5k GitHub Stars | GDPVal Benchmark Verified
When you deploy an AI agent today—whether it's Claude Code, Cursor, OpenClaw, or nanobot—you're deploying a stateless worker. Every task starts from zero. Every mistake gets repeated. Every successful pattern evaporates into the void once the session ends.
OpenSpace changes this. Developed by HKUDS (HKU Data Science Lab), OpenSpace is a self-evolving skill engine that transforms AI agents from disposable tools into learning systems that accumulate expertise, share knowledge across agents, and deliver measurable economic returns.
The numbers from the GDPVal benchmark are striking:
| Metric | OpenSpace Performance | Baseline (ClawWork) |
|---|---|---|
| Value Capture | 72.8% ($11,484 / $15,764) | 17.4% |
| Average Quality | 70.8% | 40.8% (+30pp improvement) |
| Token Efficiency | −45.9% (Phase 2 vs Phase 1) | N/A |
| Income Multiple | 4.2x higher earnings | Baseline |
These aren't synthetic benchmarks. GDPVal evaluates 220 real-world professional tasks across 44 occupations—the same work that generates actual GDP. We're talking payroll calculators from union contracts, tax returns from scattered PDFs, legal memoranda on California privacy regulations.
Let's dig into what makes OpenSpace different, verify the claims, and understand when this architecture matters for your agentic workflows.
Current AI agents suffer from three fundamental weaknesses:
Every task requires reasoning from scratch. Need to parse a CSV file? The agent burns tokens rediscovering pandas.read_csv() parameters. Need to generate a PDF report? It relearns reportlab syntax every single time. There's no memory of successful patterns.
Agent A spends 2,000 tokens figuring out that a specific API requires pagination. Agent B, working on the same problem five minutes later, burns the same 2,000 tokens making the same mistakes. Knowledge doesn't transfer.
You write a skill that calls the Stripe API. Stripe updates their endpoints. Your skill breaks—not with a clear error, but with subtle data corruption. No monitoring, no auto-repair, no version tracking.
OpenSpace's thesis: Skills should be living entities that auto-repair, improve through usage, and share learnings across the entire agent network.
OpenSpace plugs into any agent that supports the SKILL.md format (Claude Code, Codex, OpenClaw, nanobot, Cursor) and adds three core capabilities:
Skills that learn and improve automatically through three mechanisms:
Turn individual agents into a shared brain:
Stop repeating work. Start reusing solutions:
GDPVal is a benchmark dataset containing 220 real-world professional tasks covering 44 occupations, evaluated using actual economic value as the standard. OpenSpace was tested on 50 tasks across 6 industries in a two-phase design:
| Metric | OpenSpace (Qwen 3.5-Plus) | ClawWork Baseline (Same LLM) |
|---|---|---|
| Value Captured | $11,484 / $15,764 (72.8%) | ~$2,743 (17.4%) |
| Quality Score | 70.8% average | 40.8% best agent |
| Token Reduction | −45.9% (Phase 2 vs Phase 1) | N/A |
| Income Multiple | 4.2x higher | Baseline |
Important: Both OpenSpace and the ClawWork baseline used the same backbone LLM (Qwen 3.5-Plus). The performance difference comes purely from skill evolution, not model capabilities.
| Category | Tasks | Income Δ | Token Δ | Why It Matters |
|---|---|---|---|---|
| Documents & Correspondence | 7 | 71% → 74% (+3.3pp) | −56% | California privacy law memoranda, surveillance reports. The document-gen-fallback skill family evolved through 13 versions. |
| Compliance & Forms | 11 | 51% → 70% (+18.5pp) | −51% | Tax returns from 15 PDFs, pharmacy compliance checklists. PDF skill chain evolves once, all form tasks reuse it. |
| Media Production | 3 | 53% → 58% (+5.8pp) | −46% | Audio/video via ffmpeg. Evolved skills encode working codec flags, eliminating sandbox trial-and-error. |
| Engineering | 4 | 70% → 78% (+8.7pp) | −43% | Technical specifications, CAD file processing. Reusable engineering calculation patterns. |
| Data Analysis | 14 | 68% → 75% (+7pp) | −42% | CSV analysis, statistical reports. Pandas patterns captured and reused. |
| Research & Writing | 11 | 65% → 72% (+7pp) | −38% | Market research, technical documentation. |
Every category improved—no exceptions.
OpenSpace implements three distinct evolution modes:
Trigger: Skill execution fails with a specific error type.
Example: A skill calls stripe.Customer.create() but Stripe updated the API to require email as a required field.
Execution Error: Missing required field 'email'
→ AUTO-FIX triggered
→ Skill updated: adds email parameter validation
→ New version: data-validation-csv v1.1.0
The fixed skill is stored as a new version, preserving the lineage. You can trace exactly when and why a skill evolved.
Trigger: Successful execution with opportunity for improvement.
Example: A skill works but uses 1,200 tokens. OpenSpace analyzes the execution trace and creates a distilled version:
Original: 1,200 tokens, 8 steps
Derived: 650 tokens, 5 steps (same output quality)
→ Skill marked as v2.0 (optimized)
Trigger: Novel task completed successfully without existing skill.
Example: Agent builds a monitoring dashboard with 20+ panels. The entire workflow gets captured as a reusable skill:
---
name: monitoring-dashboard-builder
description: Creates live monitoring dashboards with 20+ panels
target: docker, prometheus, grafana
---
# Workflow captured from successful execution
1. Scan running containers
2. Extract metrics endpoints
3. Generate Grafana datasource configs
4. Create dashboard JSON with 20 panels
5. Deploy and validate
OpenSpace stores skills in two formats:
You can inspect the database directly:
sqlite3 /path/to/workspace/.openspace/openspace.db
SELECT name, version, origin, execution_count FROM skills ORDER BY execution_count DESC;
This is where OpenSpace gets interesting for teams and production systems.
Register at open-space.cloud to access:
document-gen-fallback has 13 versions)Imagine your team has 10 agents running in production:
Network Effect Formula: More agents → More executions → More evolution data → Better skills → Lower costs → More agents.
The OpenSpace team showcased a personal behavior monitoring system built entirely by an agent:
This isn't a static dashboard. It includes a built-in AI agent that can:
Why This Matters: Traditional agent development requires humans to write skills, test them, deploy them. OpenSpace demonstrates that agents can autonomously develop complex systems, evolving skills as they encounter challenges.
OpenSpace works with any agent that supports the SKILL.md format. Here's how to integrate:
git clone https://github.com/HKUDS/OpenSpace.git
cd OpenSpace
pip install -e .
Pro Tip: Skip the 50MB assets/ folder for faster cloning:
git clone --filter=blob:none --sparse https://github.com/HKUDS/OpenSpace.git
cd OpenSpace
git sparse-checkout set '/*' '!assets/'
pip install -e .
For agents that support MCP (Model Context Protocol):
{
"mcpServers": {
"openspace": {
"command": "openspace-mcp",
"toolTimeout": 600,
"env": {
"OPENSPACE_HOST_SKILL_DIRS": "/path/to/your/agent/skills",
"OPENSPACE_WORKSPACE": "/path/to/OpenSpace",
"OPENSPACE_API_KEY": "sk-xxx (optional, for cloud)"
}
}
}
}
cp -r OpenSpace/openspace/host_skills/delegate-task/ /path/to/your/agent/skills/
cp -r OpenSpace/openspace/host_skills/skill-discovery/ /path/to/your/agent/skills/
These two skills teach your agent when and how to use OpenSpace—no additional prompting needed.
Register at open-space.cloud to get an OPENSPACE_API_KEY, then add it to your config. Without it, all local capabilities work normally.
| Use Case | Why |
|---|---|
| High-volume repetitive tasks | Token savings compound quickly (45.9% reduction) |
| Multi-agent teams | Collective intelligence amplifies value |
| Long-running production systems | Skills improve over time, costs decrease |
| Complex workflows with failure modes | AUTO-FIX catches and repairs breaking changes |
| Cost-sensitive deployments | 4.2x income improvement changes unit economics |
| Use Case | Why |
|---|---|
| One-off experimental tasks | Overhead outweighs benefits |
| Simple, stateless queries | No reusable patterns to capture |
| Tight latency requirements | Skill search adds ~100-300ms overhead |
| Highly specialized domains | Community skills may not apply |
OpenSpace addresses the fundamental economic problem of AI agents: costs scale linearly with task complexity because every task starts from zero.
By treating skills as living entities that auto-repair, improve, and share knowledge, OpenSpace flips this model:
For teams running AI agents in production, OpenSpace isn't just a nice-to-have. It's the difference between agents that burn money and agents that generate profit.
Ready to try it?
openspace/host_skills/README.md for integration guidesThe era of stateless, forgetful AI agents is ending. Welcome to self-evolving systems that learn from every task, share knowledge across the network, and deliver measurable economic returns.
Fact-Checked Sources: