Not a query tool. Not a RAG pipeline. The sandboxed execution environment where agents meet your data and return answers, not rows.
| Customer | Tier | Error Count |
|---|---|---|
| Acme Corp | Enterprise | 14,205 |
| GlobalTech | Enterprise | 8,192 |
Process 10M rows in Python, send the LLM 10 rows.
Most agents fail by swallowing 5,000 raw SQL rows. Strake lets
them process data in Python where it lives, sending only
the parsed results that matter.
from strake.mcp
import run_python
script =
"""
# 1. Query 10M rows instantly via DataFusion
df = strake.sql("SELECT * FROM user_events")
# 2. Aggregate in Python to prevent context bloat
summary =
df.groupby('feature_flag')['latency'].median()
# 3. Print exactly what the LLM needs
print(summary.to_json())
"""
# Runs isolated with OS Sandboxing or Firecracker
VMs
result =
await
run_python(script)
print(result)
Notebook prototypes are easy but production is hard. We built Strake to solve the four things that usually break agents.
Every agent execution runs inside strict native OS sandboxes for performance, or ephemeral MicroVMs for hardware-level isolation.
Powered by Apache Arrow & DataFusion. Query Postgres, Snowflake, and APIs simultaneously with Pushdown optimization.
Built for the Model Context Protocol. Your agents immediately discover your entire data catalog and schemas.
Strict read-only enforcement, dynamic Row-Level Security (RLS), and PII masking out of the box.
Traditional tools copy your data. Strake queries it where it lives.
Built for Engineers Shipping Agents to Production
Stop waiting for data pipelines. Strake lets you query any data source with standard SQL - locally in development, or at scale in production.
From zero to querying PostgreSQL + S3 in 5 minutes. No infrastructure required.
Manage 100 data sources as easily as editing a YAML file. Validate offline. Deploy with confidence.
10M rows → Pandas DataFrame in <1 second. Zero-copy via PyArrow. No serialization overhead.
Stop hard-coding database credentials and building brittle API wrappers. Strake gives agents a governed, sandboxed environment to explore and query your data estate safely.
Secure, observable, and isolated queries.
Brittle, opaque, and insecure connections.
Don't move petabytes of data just to filter it. Strake's optimizer pushes filters directly to the source, executing compute where the data lives. Get sub-second results on massive datasets.
Open-source core for developers. Enterprise governance for platforms.
Prod-ready core without enterprise governance or SSO.
For Production Workloads & Compliance
Everything you need to know about Strake.
Trino is for big company-wide dashboards. Strake is for the agent that needs to look up a customer ID in Postgres and match it against an error log in S3 right now. It's faster, lighter, and provides a hardware-isolated environment so your agents can actually run code safely.
It depends on your security vs. overhead needs. For absolute isolation, we orchestrate ephemeral Firecracker MicroVMs. If you're running internal tools and want sub-second cold starts, we default to strict native OS sandboxing (Landlock/Seatbelt).
Strake automatically maps your federated schema into the sandbox. If a column changes in Snowflake, your Python script in the sandbox sees the updated results immediately without needing to re-register tools or update prompts.
Yes. Strake follows the
Model Context Protocol (MCP) standards. Any
framework that supports MCP can call the run_python
tool to execute analysis across your entire data stack.
Strake is built for self-hosting. Sandbox execution runs entirely on your own infrastructure within your VPC boundaries, ensuring your data never leaves your environment.
We whitelist standard data science libraries like
pandas, numpy, and
pyarrow by default. If you need something
specialized, Enterprise teams can just supply their own
Firecracker rootfs image.