Darwin
Project Overview
Welcome to 2025, where we’re kicking things off with a little new year, new you energy. This week’s project: Darwin—as in natural [task] selection (see what I did there?).
Darwin enters an already overcrowded space of productivity tools, but after digging into the options out there, I realized that what I wanted didn’t quite exist. The goal? A tool that makes sorting through massive to-do lists easier—and dare I say, a little more fun too.
The resulting tool is linked from the button above!
Problem Statement
I wanted to design something simple—something that made sorting my to-do list easier and, if possible, a little fun. Like a lot of people, I come up with tasks far faster than I can complete them. The result? A never-ending migration from one to-do list tool to another, none of which actually help me tame the ever-swelling white whale of unfinished tasks.
I wasn’t looking to reinvent some massive, all-knowing meta-system to catalog every task with perfect metadata and global prioritization (still working on that). Instead, I wanted something I could actually use—something I could open on my phone, glance at quickly, and feel like I was making even a little progress.
That’s where Darwin comes in. Instead of drowning in dozens (or hundreds) of tasks, what if I could just compare two at a time? Which one is more important? Tap, decide, repeat. Simple. Efficient. Less overwhelming.
Objective
Goals for this project were as follows:
Build a usable app that could ingest a to-do list and enable a ranking algorithm to help prioritize tasks.
Get the app out in a functional state, including user sign-in flows, so people could actually try it out.
And—surprisingly myself here—the most exciting goal of all: this might be the first app I officially launch on the iOS App Store. That realization hit me mid-build, and now I’m really excited to see it through. It isn’t submitted yet. But 🔜.
Build Process
I’m getting better at this!
This time around, I kicked things off with a meta prompt—a structured way of dumping everything I wanted into one big, overarching request. It turns out, just vomiting out all my thoughts into a structured prompt helped keep things organized from the start. It feels a little bit weird to utilize but I felt more organized heading in and I’ve added the method to my general workflow.
→ Check out the video from Lawton Solutions here for tips on using his “meta prompt”
From there, I worked closely with Claude, and at this point, I’m confident in saying: Claude is the best AI for coding (for me)—at least for the kind of tools I’ve been building. I went back and forth, answering a ton of different questions, refining ideas, and iterating along the way. By the end of the prompt process, I feel like we landed in a really good place.
Structural considerations
One of the coolest surprises in this build process? Claude now outputs Mermaid diagrams—which, fun fact, is also how I learned what a Mermaid diagram is. When we had built up all the logic and planning for Darwin, Claude generated a visual flowchart of how everything connected. Even though the AI didn’t necessarily need it, having this diagram ended up being incredibly helpful for me when debugging.
I’m a merMAN!
I found this super helpful in referring to the way things should be connected in discussing things with the AI during the build of this project.
I found myself constantly referring back to it—like when I’d tell the AI, “The ranking manager is supposed to be connecting back to Firebase to keep task rankings consistent and synced,” and I could actually see where things might be breaking down. It made troubleshooting more intuitive, gave me a clearer sense of the app’s moving parts, and—bonus—as far as I can tell, ChatGPT doesn’t have this feature yet.
UI Implementation
In my last post, I talked about some of the challenges with shadcn/UI, and while I’ve mostly figured those out, there are still a lot of lingering annoyances—especially when it comes to designing in v0.dev and then porting everything over to an IDE to actually work on the code.
There’s just something about this process that always feels trickier than it should. The UI components look fine when they come in, but the code always struggles, or at least the AI agents have a really hard time placing everything just right. Things don’t quite align, spacing gets weird, and I end up spending way too much time debugging jiggling components.
I don’t have great news here yet—I still need to refine this part of my workflow. Right now, UI implementation is taking way longer than it should. But I have learned one useful thing: starting with UI elements upfront makes coding easier.
• It forces me to think structurally about how things should be implemented.
• It gives the AI a clearer picture of how components should connect.
So while I haven’t nailed down the perfect method, I at least know that getting the UI in place first does help in the long run.
A note on CSV uploads
This was my first time building a tool that needed to ingest data as part of its UI, and I’ve got to say—I was not expecting some of the debugging headaches that came with it. I don’t have any screenshots, but I spent way too much time staring at an empty task list, wondering why my imports weren’t working. No errors, no warnings—just…nothing.
Eventually, I realized the culprit: the CSV input couldn’t have empty rows at the bottom of the spreadsheet. This is such a classic operations person mistake—something I’ve seen happen in a dozen different tools over the years. The funny thing is, I eventually thought of it, but the AI didn’t flag it at all.
So, note to self (or anyone else reading this): When working with data ingestion, always check how your data is structured. If the computer isn’t chewing through it properly, the problem might be something as simple as an extra blank row messing things up.
Building (and testing) a Ranking Algorithm
To make sure Darwin wasn’t just fun but actually useful, we needed to validate how efficiently it ranked tasks. The goal? Get to an accurate task order with the fewest possible swipes. To test this, we ran a simple experiment: a numbered list from 1 to 20. This gave us a clear way to measure accuracy—if the system was working correctly, it should return the numbers in perfect order with minimal comparisons.
But instead of manually swiping through endless test cases, I had Claude simulate the process. I fed it different list sizes and ranking algorithms, and it generated simulation scripts—then ran them directly in the browser. No need to fire up Python or manually crunch numbers. And the best part? Claude automatically visualized the trade-offs in a beautifully rendered graph, making it way easier to decide on the right approach.
Local accuracy: for each pair of consecutive items, is the higher ranked item actually supposed to be higher than the lower ranked item?
The final model takes a dynamic hybrid approach based on list size:
1. Small lists (under 25 items) use a straightforward pairwise ranking for quick sorting.
2. Mid-sized lists leverage a mix of chunked sorting (grouping 2-3 tasks at a time) and confidence scoring to reduce unnecessary comparisons.
3. Large lists introduce an adaptive algorithm that balances ranking efficiency while ensuring newly added tasks find their correct placement quickly.
From our testing, the efficiency of the algorithm looks something like this:
• 10 tasks → ~25 comparisons for 80% accuracy
• 20 tasks → ~50 comparisons
• 50 tasks → ~100 comparisons
• 100 tasks → ~300 comparisons
The beauty of this system is that it continuously tracks accuracy levels, prioritizing the most impactful comparisons first. Users will even be able to tweak their desired accuracy threshold—higher accuracy requires more comparisons but results in a more reliable ranking (not quite ready for UI display at the time of this writing).
Seeing the list snap into place after just a handful of swipes? Very satisfying.
🔥Base
I had already started using Google Firebase for a separate side project (more on that later—consider it a little secret for now), so it made sense to use it for Darwin as well. Firebase handles storage for to-do lists, rankings, and all the backend essentials.
Setting it up? Ridiculously easy. Like many Google dev products, it throws a ton of overwhelming info at you at first, but once you get past the dashboard noise, it’s straightforward. Ask GPT how to set it up, and it’ll walk you through the basics: spin up a server, grab API keys, plug them in, and you’re good to go.
What did surprise me was how seamlessly Firebase handled Google Login. Since this was my first app that actually needed user accounts, I figured I’d have to set up some complicated authentication system. Turns out, it’s just a tab in Firebase, right below Firestore. Click a button, generate some snippets, and—boom—that familiar “Sign in with Google” prompt appeared like magic.
Except… it didn’t actually work.
No matter how much I iterated, I kept hitting a hanging authorization error. After fighting with it longer than I should have, I decided to punt—for now, login is just email-based authentication (enter your email, get a login link). Not perfect, but good enough to ship.
Oh, and a quick note: the email you get when signing in? Weird-looking. Apologies in advance. Future me will clean it up, but for now, function > aesthetics.
Mobile vs Desktop Design: Lessons Learned
From the start, I built Darwin with a mobile-first mindset. Even though I originally set it up in WebViews, I went straight to designing it inside a mobile container—because I knew I wanted it to be a proper app. What I didn’t think through was how I’d handle a desktop component (or whether I even needed one).
That meant I still had to make adjustments for desktop vs. mobile, and honestly? I’m sure there’s a better way to do this. I’ll have to think more carefully about it next time. Right now, desktop just renders the mobile version, but a better approach might be to make it exclusively mobile rather than forcing a mobile layout onto desktop.
I should have known better—after years of working in product at Uber, I know how painful it is to design across multiple surfaces. We dealt with endless issues keeping web versions of tools up-to-date, despite the fact that most users were on mobile. That created a ton of tech debt, and yet here I am, repeating the same mistake.
Lesson learned. Mobile-first, for real this time. Looking forward to going all-in on iOS architecture.
AI Hallucinations & Self-Inflicted Debugging Nightmares
One of the things I’ve learned from these projects is that deploying early—even when the code isn’t fully done—can be surprisingly useful. It forces all the configurations to be set up properly from the start, which makes incremental building much smoother.
So, I had already deployed Darwin and decided to check out the default deployed view, just to see how things were shaping up. Most of the core functionality was there, so I moved on to tweaking the UI and spent a frustrating hour and a half making adjustments that… weren’t doing anything. Now — to be clear, that didn’t stop my AI friends from attempting to fix the problems I was seeing, over and over and over. Hallucinations are real.
“I see the issue now”
—There was no issue. 🤦
Turns out, I was staring at the deployed production version—not my local or testing environment. Every single change I made? Completely meaningless as far as the version of the experience I was looking at. I wasn’t looking at the live code; I was looking at a snapshot from the last time I committed and deployed.
So yeah, lesson learned: Check the right window before you waste an afternoon debugging changes that don’t exist. Also, I probably need better tooling or safeguards to prevent myself from making this mistake again.
To Demo, or not to Demo
Because Prometheus is all about sharing what I build, I wanted visitors to try it out without signing in—no friction, no commitment, just an easy way to tinker with the tool. So I set up a demo mode, where users could interact with a dummy to-do list and see the ranking system in action. In theory, this should have been easy, or at least I thought it would be. All I needed to do was load a set of predefined tasks and let them function like a normal list.
In practice? Not so much.
Since I’m working with AI-generated code, demo mode ended up breaking things in ways I didn’t anticipate. I basically had to rebuild all the functionality I had already debugged for the real app, meaning I hit a lot of the same errors twice. It still took less time than the original build, which is a win, but what I thought would take five minutes ended up eating over an hour—and still isn’t quite as polished as I want it to be. Talk about sunk cost fallacy… I got in too deep and just keep going, reworking everything to work for this demo mode that I think ** fingers crossed ** works right? But honestly at this point I’m not 100% sure.
The real challenge? Trying to make sure demo mode worked without breaking logic on the production side. This turned into a mini refactor that I didn’t expect—and next time, I need to bake this in from the start instead of treating it as an afterthought.
So yeah. Please just sign up for the real version. It’s cool. I promise.
Results & Lessons Learned
Project Status: Live / Moving to Mobile
Website version of the app is live, and works pretty well particularly if you install to your homescreen! I’ll be investing more time in this one to monetize and get it on the app store(s) though.
Lessons Learned
Build Once, Use Twice: The web vs. mobile challenge reminded me of an ethos we employed at Uber: use a framework that deploys seamlessly across multiple environments. Back then, solving this at scale took months (or years) of engineering work. Here? It took a few hours to iterate on, which is a win—but still a lesson I should have learned the first time around.
Don’t Underestimate User Authentication: I assumed adding login functionality would be a simple, bolt-on step. Spoiler: it wasn’t. I initially told the AI not to worry about authentication, thinking we’d just “add a login step later.” But when we finally did, every action—every task move, every update—had to be tied to a specific user, rather than a single test environment. That meant a lot of rewriting. Next time, I need to consider authentication architecture from the start.
The Pixel-Perfect Factor: Once you start thinking about App Store deployment, things start feeling real. I found myself caring a lot more about pixel-perfect UI, polished animations, and smooth interactions—things that don’t really matter in early-stage prototypes but become crucial for actual users. It’s not there yet, but it will be.
Every project in this series has been an evolution, and Darwin was no different ** exaggerated wink ** … These lessons will definitely shape what I build next.
Costs
No net new costs this week at all! Firebase has paid service tiers for sure but they have not been necessary.
The Toolbox
Section coming soon! See the main build details for tools used.
#efficiencies
The main item this week was uncovering a little snippet of text that has been super helpful in “unsticking” the AI when it gets into endless feedback loops. I ran into a comment on reddit that suggested using the following prompt to get around “stuck” AI agents:
Please do this using concise, focused, localized edits. Leverage or modify existing files and classes rather than adding new ones if possible. Make the changes in small chunks if possible. Notify me before making changes if you must add/delete large chunks of code.
I found when deploying this method it seemed like it did a pretty good job of solving its way around the issue it was having. Not perfect by any means, but helpful!
Next Steps
As I mentioned at the top, Darwin is going to move to an app store near you. Hopefully soon!
There are still things to build, features to refine, and mobile versions to finalize, but I’m planning to launch and monetize this app. I’m working on marketing campaigns to get the word out, and honestly? I’m already using this tool myself—I imagine others will find it just as useful.
Feature Roadmap: A few things are already in the pipeline:
• Confidence Ratings: The backend logic is built to display how confident the algorithm is in the current list ranking.
• Guided Onboarding: A smoother first-time setup that walks users through adding tasks and understanding how the ranking works.
• Swipe Progress Meter: A tracker showing how many more swipes are needed before we reach a reliable task order.
Monetization Strategy: If we’re launching in the App Store, I need to figure out what people will actually pay for. A few ideas:
• AI-Assisted List Writing – Smart suggestions to help users quickly create tasks and categorize them.
• Preference Learning – AI that better understands your swipe inputs, adapting over time to refine rankings.
I have big plans, and I’m excited to see where this goes. If you have ideas for features you’d love to see in Darwin, let me know!
Appendix: How to Use Darwin
How It Works: Using Darwin to Take Control of Your To-Do List
Darwin is designed to make task prioritization simple, fast, and effective. Instead of juggling an ever-growing list, you’ll use a dynamic ranking system that keeps your most important tasks at the top—without the mental overhead of manually sorting everything. Here’s how I think about using it (but of course, I want feedback on how/what to build into it for a full release!)
Step 1: First Run – Set Up Your List
The first time you open Darwin, you’ll initialize your to-do list by:
• Manually adding tasks one by one.
• Bulk importing from a CSV if you already have a running list elsewhere (do this from a desktop browser)
• (Optional) Doing an initial sort/rank to get a rough prioritization.
Once your list is in, Darwin takes over.
Step 2: Daily/Continuous Task Flow
🔹 Adding New Tasks → Inbox
Anytime you think of something new, just add it to your inbox—no need to rank or organize it immediately. The inbox is your low-friction dumping ground for ideas, reminders, and must-dos.
🔹 Sorting & Prioritizing → Rank Tab (Gold Orb in the Middle)
When you have a spare moment or want to actively organize, tap the Rank tab and start swiping:
• Darwin will present two tasks at a time—you just swipe to choose which one is more important.
• The algorithm dynamically sorts your list in real time based on your choices.
• Occasionally, lower-priority tasks will resurface, so they don’t get completely buried.
🔹 Getting Things Done → Prioritized Tab
Once you’ve ranked your tasks, your Prioritized tab holds everything in order of importance.
• Work top to bottom—your most important tasks are always at the top.
• If something doesn’t feel relevant today, move it to the Icebox—a separate space for tasks you don’t want to delete but aren’t actively working on.
Step 3: Maintain the System (Without Extra Work)
• New tasks go to the Inbox.
• When you have time, swipe in the Rank tab.
• Execute from the Prioritized tab.
• Move low-priority items to the Icebox to keep your active list clean.
That’s it. No manually dragging tasks around, no endless resorting—just a simple, natural selection process for your to-dos.
Stick with it, and your to-do list won’t just grow forever—it’ll stay in order. 💪