Projects

Real tools solving real problems. Each project starts as an experiment and evolves into something useful.

Live

EvalGate

Stop LLM regressions before they merge

The Problem

AI model regressions slip into production because there's no good way to test LLM outputs in CI.

The Solution

GitHub PR checks that validate your AI features before they merge. Deterministic tests, LLM-as-judge evals, zero infrastructure.

Try it GitHub

In Development

Claude Mirror

A mobile-friendly web interface for Claude Code CLI

The Problem

Claude Code runs in a terminal, which is great for focused work. But sometimes you need to step away—and Claude keeps working, needing permission or asking questions while you're not there.

The Solution

A real-time web interface to your Claude Code session. Watch Claude work, approve commands, answer questions, and send messages—all from your phone while you're away from your desk.

Coming soon

In Development

My Daily News

Personalized AI-generated audio news briefings

The Problem

Keeping up with news across multiple sources is time-consuming, and generic news feeds don't match your specific interests.

The Solution

Select your sources and topics, and receive a custom 15-minute audio briefing each day. Two AI hosts discuss the stories that matter to you.

Coming soon

Exploration

Agentic Application Firewall

Detect prompt injection and malicous data before it reaches your agent

The Problem

AI agents can't tell the difference between instructions and input, they are the new attack surface. Every document, email, or user message sent to an agent is a potential threat. Traditional security tools (WAFs, EDR) can't parse semantic attacks hidden in natural language.

The Solution

Agentic Application Firewall safely inspects data and documents as an agent would to detect and prevent threats before they reach your agentic application or LLM. Like a next-gen firewalls would inspect and detonate executables, we run inputs through a virtualized environment and observe what it tries to do. Higher latency, but the highest confidenc.

Coming soon