top of page

Lessons Building with Claude Code

  • Writer: Muxin Li
    Muxin Li
  • 4 days ago
  • 4 min read

I've been using Claude Code to build an AI tool, and this is what I've learned the hard way.


I was using VS Studio Code and Claude Code inside the IDE. Some terms around UI may not apply to you.


Always:

  • Make small changes, test if it works before moving on.

  • Make commits on each task (like auto-saving).


Pick a task you will complete in this commit, write the message/task description ahead of time. Once completed, commit and sync changes.


This helps me avoid creating a long string of changes and never committing them. Smaller commits are more useful because you can have an 'undo' on changes by discarding them if things get hairy.



When you've gotten a basic MVP to a point of quality that you're happy with, do these things next:

  • Set up your eval metrics.

  • Plan how you'll test everything.

  • Build out the testing instrumentation.

  • Create a golden dataset that you compare future changes against.

  • Set up benchmarks for your metrics to quantify what 'good' looks like.


Do these things and you'll be way ahead of the curve for everything that comes after.


The tricky part - knowing what 'good' looks like. Some of it was instinct - a sense of what an 'acceptable' level of accuracy is. The goal shouldn't be to hit 100% across all metrics (you're not building a model after all, you're taking an existing one and defining which parameters matter for your product). This is the human part where you develop taste.



Create documentation of plans - store them as actual files in your repo. If you rely only on Claude Code's own internal documentation, you don't have easy access to it. This lets you pick up later when life happens, and docs are formatted to make it easier to read.


The AI doesn't always have the best ideas - it has surprised me once in a while, but more than half the time I leverage it for telling me what's in the codebase and brainstorm a solution with it. There's a real danger of the AI just agreeing with anything I come up with and not thinking things all the way through.


Having to read the documentation and its plans is the hard part. It doesn't always think everything all the way through. There are drifts from my original intent, just like with human collaborators. Having to keep a system-level perspective is my job. Knowing when to ask the AI for info I need to check on in case of gaps - that's hard, too.


Creating a mermaid diagram flow to see everything - I should have done this from the very beginning and updated it as I went along. Sadly, Claude isn't quite as detailed as I'd like for it to be - I want to see EVERYTHING. If I don't direct it to pull out things and put it on the board, it won't do it.



Keep requests to Claude small, if you can. Claude will make (confident) mistakes and updates - you'll either have to break everything down into smaller tasks or debug (or both).


I can trust Claude to make updates to a small piece - what it doesn't do well is tracking down ALL related dependencies to that change at the system level. If the impact is too many steps away from what it's focusing on, it tends to ignore it. A common problem is when testing instrumentation doesn't match the latest update.


Our definitions of 'done' conflict - Claude thinks it's done when it's compiled. I think it's done when the entire end-to-end pipeline is still in sync. In these cases, I end up needing to dig deeper into the system myself and pulling out what the right fix or bug is.



Claude will sometimes follow its own logic of what it thinks is best and override or take assumptions with what you're asking it to do. It could be related to running Opus on Thinking mode - I still have a bad habit of leaving Thinking mode on after planning or when I ask it to purely execute on a task.


Between the models, building with Sonnet gives me more control because it's more likely to ask me questions. With Opus, if I'm not on the right mode or if I didn't keep it restricted to Planning or Ask before Edits, it's more likely to slip away from me.


But the performance boost from running on Opus is worth it. It's a better thinker overall, which I need to compensate for my lack of in-depth software expertise and inability to read a codebase in seconds.


Still, 90% of the time I'm just debugging and checking its work.



Sometimes I am my own worst enemy - I want to run many things in parallel to get as much done as possible. Maybe that works in a team where someone else is handling the depth of knowledge required to ensure everything runs smoothly, but when you're building by yourself you ARE the human evaluator checking that everything runs smoothly.


If I have too many pokers in the fire, I easily get lost from context switching. Then I do a terrible job debugging because I'm unable to go deep into an issue until it's resolved. I end up creating multiple piles of problems instead of just one pile that I can work through and make progress bit by bit.


We're all human - our attention spans can only handle so much interruption. It's why Cal Newport writes books about needing focus and deep work.

Comments


  • X
  • substack-icon-1846255292
  • LinkedIn

© 2025 Grey Bird LLC. All Rights Reserved.

bottom of page