Build a Tiny Internal Tool Before You Buy Another App

A lot of business automation does not start with a platform decision.

It starts with one annoying task that keeps coming back.

Someone exports a CSV from Stripe. Someone else downloads an accounting report. A folder fills up with files that need to be renamed before they can be sent to a client. A manager copies numbers from two systems into a weekly spreadsheet and checks whether anything looks off.

These tasks are often too small to justify a full software project, but too important to keep doing by hand forever. That is where a tiny internal tool can be the right first step.

The goal is not to build a polished app. The goal is to turn one repeated manual workflow into a small, understandable utility that runs locally, produces a useful output, and leaves the human decisions where they belong.

Start With the Repeated Task, Not the Tool

The best candidate is a task with a stable shape.

For example: reconcile two exports, compare a new price list against last month's file, check a folder for missing client documents, rename a batch of PDFs, flag invoices that do not match a purchase order, or create a weekly report from a known spreadsheet.

The task does not need to be glamorous. In fact, the boring tasks are often the best ones. They have clear inputs, clear outputs, and obvious pain when they go wrong.

A useful first prompt is specific about the work:

Build a small local tool for reconciling Stripe payouts against the accounting export in /path/to/accounting. It should accept two CSV files, match transactions by date and amount, flag unmatched rows, and generate a clean reconciliation report. Add a README with usage instructions and include a sample command. Run it against the sample files and show the output summary.

That prompt gives Codex something concrete to inspect and produce. It names the source files, the matching rules, the output, the documentation, and the proof that the tool was tested.

What Codex Needs to Inspect

A tiny internal tool becomes useful when Codex can see the real shape of the work.

That usually means giving it a small set of sample files, a description of the business rule, and permission to inspect the surrounding folder structure. If the task touches a repo, Codex can inspect existing scripts, package files, tests, and README patterns. If the task lives outside a repo, it can still work from sample CSVs, PDFs, spreadsheets, or local folders.

For a reconciliation workflow, the starting inputs might be:

- one Stripe payout export
- one accounting export
- a short note explaining which columns should match
- a few known examples of matched and unmatched transactions
- a preferred output format, such as HTML, CSV, or Markdown

The output should be something a business operator can actually review. A good first version might create a report with matched rows, unmatched Stripe rows, unmatched accounting rows, totals by category, and a short list of records that need human judgment.

The Human Approval Gates Matter

A tiny tool should reduce manual effort without quietly taking over decisions that still need review.

For finance workflows, the tool can match transactions and flag exceptions. It should not move money, edit the accounting system, or mark a reconciliation complete without approval.

For document workflows, it can propose file names and show a before-and-after table. It should not overwrite originals unless the user approves the rename plan or the tool creates a reversible backup.

For client packet workflows, it can check whether required files exist and assemble a draft folder. It should not send the packet to the client until a person reviews it.

These gates are not bureaucracy. They are what make a small tool trustworthy enough to use. The tool handles the repeatable checks; the person keeps the judgment, accountability, and client-facing decision.

What the First Version Should Include

A strong first version of a tiny internal tool usually includes five pieces.

First, a simple command or local page that runs the workflow. This might be a CLI command, a small script, or a local HTML dashboard.

Second, a README with exact usage instructions. A future user should not have to reconstruct how the tool works from memory.

Third, sample input files or fixtures. These make it possible to test the tool without touching live data.

Fourth, a clear output artifact: a reconciliation report, exception list, renamed-file preview, dashboard, or checklist.

Fifth, visible limitations. If the tool matches by date and amount only, say that. If it cannot handle refunds yet, say that. If it assumes a certain column name, document it.

This is where Codex can feel less like a chat assistant and more like an operations teammate. It can inspect files, write the utility, add usage notes, run it against samples, and show the output summary.

Why This Is Different From Buying Another App

Buying software can make sense when the workflow is broad, shared across many people, or connected to systems that need long-term support. But small manual tasks often sit between systems. They are too specific for a standard product and too narrow for a custom app proposal.

A tiny internal tool gives the team a way to learn before committing.

If the tool saves time, the business learns which rules matter. If it exposes messy source data, the business learns what needs to be fixed upstream. If the edge cases are too complex, the business learns that before paying for a larger implementation.

The first version is not the final system. It is a practical discovery step with immediate value.

How It Can Become a Skill

If the same workflow works more than once, the next step is to package the pattern.

OpenAI's Codex skills documentation describes skills as reusable workflows that can include instructions, resources, and optional scripts so Codex can follow a task-specific process reliably: https://developers.openai.com/codex/skills

For the reconciliation example, a skill could define which folders to inspect, which sample files to request, how to validate column names, what report format to produce, and which actions require approval. The script itself can live inside the skill, while the instructions explain how to run it safely.

That matters because the workflow stops living only in one person's memory. Instead of saying, "Can you do that CSV thing again?" the team can invoke a named workflow with known inputs, checks, and outputs.

How It Can Become an Automation

Some tiny tools should stay manual. Others become valuable when they run on a schedule.

OpenAI's Codex automations documentation explains that recurring tasks can run in the background, report findings to the inbox, and combine with skills for more complex work: https://developers.openai.com/codex/app/automations

That does not mean every script should become an unattended automation. The right automation is conditional.

For example, a weekly reconciliation automation might first check whether both source exports exist, whether they were updated recently, and whether the file format still matches expectations. If those checks fail, it should stop and report the blocker. If they pass, it can generate the report and flag exceptions for review.

The automation should still avoid final actions that require judgment. It can prepare the work. It can highlight problems. It can draft the next step. The person still approves changes, sends client-facing material, or records the final financial decision.

A Practical Decision Rule

Consider a tiny internal tool when a task has all of these traits:

- it repeats often enough to be annoying
- the inputs are reasonably consistent
- the desired output is clear
- mistakes are visible and reviewable
- a person can approve the final action
- the team is not ready to buy or build a larger system

Skip the tool, or narrow the scope, when the rules are still unclear, the data is unreliable, or the workflow requires judgment on almost every row.

The useful middle ground is not "automate everything." It is "turn the repeatable part into a dependable artifact, then review the parts that still need a person."

What to Try This Week

Pick one repeated task that happens at least monthly. Find two or three real sample files. Write down the current manual steps. Then ask Codex to build the smallest local utility that can produce a reviewable output from those samples.

Do not ask for a platform. Do not start with a full app. Ask for the smallest working tool, a README, sample usage, and a test run.

If the tool works once, improve it. If it works repeatedly, package it as a skill. If it becomes routine and low-risk, consider an automation that prepares the report and brings exceptions back to a human.

That path is usually safer, cheaper, and clearer than jumping straight from manual work to a software purchase.