I wanted a native app for location tracking. GPS and battery management do not work well in a PWA, especially on iOS.

I chose React Native to maximize code reuse.

The plan was to use the existing web app as the spec. Working code should be a perfect spec, right?

The shape that had to be taught

I thought I’d been clear to the agent: components should be thin shared typescript in a new package. When I looked at the first output, there was still lots of JavaScript inside the components. I refined the skill to specify a React custom hook for shared code and clarified that the bigger moves that had to be done — like moving an entire directory of hooks into a shared package. The agents seem reluctant to make big sweeping changes, which, in my experience, they’re usually pretty good at if their mission is clear. After enough rounds of that, a shape started showing up that I really liked: component purely display, hook pulled in, light connective tissue between them. Web and native became two thin renderings of the same logic.

The spec was not perfect upfront. It emerged from watching the agent.

Models: dollars versus attention

I had burned through my Cursor Ultra plan in a week on a different initiative and was reluctant to sustain that burn rate. So I switched to Composer 2.0, the cheaper model, figuring a well-written spec could compensate. The model would spin for a bit, come back with code, and I would say: you did not do this, this, and this. It would say, oh right, and fix it. Eventually. But the hand-holding never stopped. The cost moved from dollars to attention. I got fed up and went back to the more capable models.

Composer 2 was fine for a marketing site I built the same week. It falls apart on structural work: pulling a custom hook out of a component with a lot of state, getting the boundary right. That felt like a capability problem, not a spec problem.

Even GPT 5.5 was not perfect. “Oh yes it’s ready to test,” and after a smoke test revealed major gaps: “Now I have the full picture, I see I didn’t implement XYZ.” It missed a lot of details like icons on tabs and deviated from the mobile designs. A few more iterations of pointing out differences and fixes were required.

Testing followed the architecture

The hook pattern turned out to help testing too. Once logic lives in a hook, it is testable without standing up a native runtime. Unit tests for the business logic, component tests for the React layer, smoke tests on the actual app.

What should stay the same, and what should not

Not everything should match between web and native. My admin console stayed web-only. Premium upgrades stayed web only for now. Location tracking is the opposite: native will eventually do it better.

Other discoveries popped up. The original Google Maps integration was the web JavaScript API, which does not work on native. A small refactor pushed it server-side. The agents handled it.

Another fly in the ointment was the reality of clunky native emulators. It felt like I spent as much time trying to install and run the emulators as it did directing the agents. Expo Go was a big help, but there is only so much you can do to improve developer experience when it takes five minutes to launch a simulator.

What I am carrying forward

If I were doing it again: get the component shape right on a few screens first. Don’t go broad until the pattern is working deep. I’d put more effort into specifying the verification (e.g. invoking my UX and QA agents on each component) in the hopes of finding less problems during my manual smoke tests.

The mobile-parity workflow generates mappings between each web component and its native peer: component names, where shared logic lives, deliberate variations (or none allowed), and tests. That should give the next agent what it needs to follow a change into native when the web component is upgraded.

Any UX change now starts on web. The workflow minimizes the effort to upgrade the native apps.

The mobile web was not a perfect spec. Changes were desirable, and the technical architecture needed more explicit guidance to round it out. Overall, AI drastically reduced the cost; it just required more babysitting than I had hoped, and I expect that to improve with model releases.

For the full checklist, mapping template, and workflow in a tool-agnostic form, see Mobile parity (from web).

Further reading