Product case study

One person, a full iOS product — shipped fast, without cutting the corners that matter.

How Shiki went from a thesis to a TestFlight release: a study in the macro and micro decisions that kept a solo build unblocked, and the AI-augmented workflow that made it move like a team.


Shiki is a complete iOS social product — small accountability crews, daily non-negotiables, photo proofs, real-time focus sessions and chat, notifications, onboarding, profiles, widgets. Built by one person — concept to TestFlight in about three weeks — across a tight series of releases. The interesting part isn't the code. It's the decisions — what to build, what to refuse, and how to never let any one thing stall everything else.

~3 wks
concept → v1.0.2 on TestFlight
10 days
to a live build on a real device
300+
tests green at ship
1
person, design → backend → ship

By the numbers

What three weeks of decisions looks like.

Day 4
a working solo tracker shipped — Milestone 1
Day 5
a social MVP with live crews — Milestone 2
Day 10
live on a real device via TestFlight
51 → 6
decisions locked into specs before a line of that phase's code
110 · 158 · 300+
migrations · backend-interface methods · tests, all green
~3.2B · 98%
tokens across the build · the share that were cache reads

“Measure twice, cut once.”

The carpenter's rule, applied to software — the specs are the measuring; the ship is the single, confident cut.


Macro decisions

The few bets that kept everything unblocked.

A solo build dies by a thousand stalls. A handful of upfront decisions made sure no single problem could ever halt the whole thing.

Scope

Decide what not to build.

The product thesis set hard nos — no public feed, no leaderboard, no gamification. A clear refusal is a product decision; it kept the surface small enough to actually finish.

Architecture

One swappable seam between the app and its backend.

Every screen talks to data through a single interface with a real implementation and a fake one. That one call meant features could be built, demoed, and tested in isolation — the UI never waited on the backend, and vice versa.

Process

Decisions before code.

Every feature ran the same pipeline: brainstorm → a written spec → a plan → execution → recorded outcome. The pre-launch UX overhaul alone locked 51 decisions into 6 specs before a line of its code. Thinking is cheap and reversible; code is expensive — so the thrash happened on paper.

Parallelism

Make the backend provable on its own.

The database and its access rules carried their own test suites — 110 migrations, hundreds of checks — verifiable from the command line, fully independent of the Mac-only app build. The two halves advanced in parallel; neither was ever a bottleneck for the other.

Sequencing

Ship vertical slices; polish in dedicated passes.

A real first release first, then focused polish phases. "Done and shippable" consistently beat "perfect and pending."


Micro decisions

How blockers actually died.

Mock first

Build the screen before the server exists.

A complete fake backend let entire flows get built and tested with zero network — design and logic decisions made in seconds instead of deploy cycles.

Test the invisible

Test the part that's easy to get subtly wrong.

Access rules — who can see and write what — were tested directly against the database, catching permission and privacy bugs long before they could reach a device.

Design loop

Decide layout in a browser, not in a build.

"Which grouping, which spacing, which order" got answered with quick mockups and side-by-side options — picking a direction in minutes instead of building three of them.

Reframe

When something fights you, switch to the boring path.

A real-time transport that proved flaky wasn't debugged forever — it was swapped for the proven pattern already working elsewhere. A clever query that failed silently became two plain ones. Bias to reliable over impressive.

Tight loops

Iterate on the real thing.

Literal, fast feedback — "up three pixels," "fifteen-pixel gap" — on the actual surface beat abstract debate every time. Ship, look, nudge, repeat.

The force multiplier

An AI-augmented workflow — used as leverage, not autocomplete.

The velocity came from treating an AI coding agent like a senior pair: handed the map, the constraints, and a tight directive — then trusted to execute and verify. The structure around it is what made it reliable.

The mental model came from somewhere unexpected — autonomous robots. At CACMS, my research lab at UIUC, I learned dead reckoning: how a robot holds its course by tracking position from a known origin and heading, correcting against periodic fixes, instead of leaning on a constant external signal. Agentic coding is the same problem. Give the agent a known position (the compiled “State of the App”) and a heading (a tight spec), and let it dead-reckon through the work — course-correcting at checkpoints (tests, verification) rather than being steered keystroke by keystroke. That transfer — from autonomous navigation to autonomous building — was the starter the whole workflow grew from.

Engine & plugins
Claude Code, driven with a structured skill suite — a brainstorm → spec pipeline, plan-writing and plan-execution, test-driven and systematic-debugging modes, parallel sub-agents for fan-out research, a semantic code-search index, framework-specific iOS/SwiftUI skill packs, and a browser “visual companion” for live mockups.
Prompt structure
A single compiled “State of the App” document — code is ground truth, recompiled never hand-edited — loaded every session so context is never re-derived. Per-feature specs and plans as durable artifacts. A persistent memory of decisions and gotchas. Directives kept tight and verifiable.
The loop
brainstorm (often visual) → spec → plan → execute → verify (automated tests on both halves) → commit → record the decision. Nothing gets built twice; nothing gets decided in the dark.
Evolved by use
The method tightened by killing what didn't work — a sub-agent once corrupted the build's code-signing config, so sub-agents were retired for fully inline execution; open-ended design questions gave way to a one-recommendation-with-written-rationale default. Rules earned, not assumed.
Economics
Long, context-rich sessions on the most capable model, made affordable by aggressive prompt caching: across the whole build, ~3.2B tokens — 98% of them cache reads, for roughly $2.2K of total compute. High leverage per dollar, not waste.

Standing on two open-source stacks: Superpowers by obra — the Claude Code skill methodology behind the brainstorm → spec → execute loop, parallel sub-agents, and the visual companion — and Obsidian Second Brain by eugeniughelbur — the vault structure that holds the specs, decisions, and the compiled “State of the App.”

What it demonstrates

The skills underneath the shipping.

Product judgment

A sharp sense of what to build — and a sharper one of what to refuse. Scope as a feature.

Systems thinking

One or two load-bearing decisions (the seam, the spec pipeline) that paid off across every feature that followed.

Velocity with a floor

Fast, but tested and reversible — shipping without accruing the debt that sinks the next sprint.

Leverage

An AI workflow run as a true force-multiplier: a structured, verifiable pipeline that turns one person into a shipping team.

"The goal was never to build software impressively — it was to make the right calls quickly, kill the blockers, and ship. Then do it again tomorrow."
See the product   Get in touch