Takao Shibamoto

Software engineer working on large-scale distributed systems, big data, and AI infrastructure. Currently at Microsoft.

More about me →


Pinned

Data infrastructure is the bottleneck for enterprise AI, not the models

There is a popular narrative right now that AI is bottlenecked by model capabilities — that we need bigger models, smarter agents, better reasoning. After several years of building data infrastructure in logistics and now enterprise AI, I have come to a different view: the real bottleneck is data infrastructure, and most companies will never get the full value of AI until they fix it.

Read more

The engineer's exit strategy and interview playbook

A junior engineer reached out to me recently. They were stuck in a job with bad management, mentally fried, and wanted out — but the idea of grinding interviews on top of a draining day job felt impossible. They asked how I balanced shipping work, prepping for interviews, and actually passing the loops. I’ve also sat on the other side of the table as an interviewer, so I have opinions about what works and what gets people rejected.

Read more

A simple investing strategy for beginners

My previous post ran the math on why investing matters. My full strategy is what I actually do, but I got feedback that it’s dense for someone just getting started. Fair. This is the simple version — concepts and behavior only, country-agnostic. For tickers and account names, the full post has them.

Read more

My investing strategy

My previous post showed the math: a Seattle engineering income at ~7% real returns reaches the rat-race escape zone (FIRE) in about a decade. This is the operating manual that turns that math into a portfolio.

Read more

RAG in production is mostly retrieval, not generation

Many teams shipping retrieval-augmented generation today started from the same five-line tutorial: chunk your documents, embed them into a vector store, embed the user’s question, fetch the nearest hits, hand them to an LLM. It works beautifully in a notebook. It also has almost nothing to do with what a real RAG product looks like once it has users.

Read more

Why should we invest our money?

One reason — if you can see the world a little more broadly, can calculate risk, and can plan long-term, you can see that this rat race actually has an exit.

Read more

What is data engineering, and why does AI need it?

A friend recently read my last post on why data infrastructure is the real bottleneck for enterprise AI, and asked exactly the right question: what is a data pipeline, and why does AI need one in the first place? If you have used ChatGPT, Gemini, or Copilot but never built a model or a pipeline yourself, that question deserves a proper answer — and I realized I had never written one.

Read more

How to Host a Private Jekyll Source on GitHub Pages Without Paying for Pro

GitHub Pages on the Free plan only builds from public repos. To keep your source private, GitHub charges $4/month for Pro. Here’s how to get the same result for $0 by splitting source and built output into two repos.

Read more

Designing fairness-aware performance metrics for gig-economy workforces

Gig-economy platforms — Uber, Lyft, DoorDash, Instacart, Amazon DSP, and many smaller ones — share a deep operational problem: they need to evaluate the performance of large numbers of independent workers using objective data, but the data is full of factors the worker cannot control.

Read more

When to replace manual data pipelines with automated worker-based systems

A pattern I have seen in nearly every company I have worked at, regardless of size or sophistication: somewhere in the engineering org, there is a critical data pipeline that is “automated” only in the loosest sense. An engineer runs a script. The script does some things. Sometimes it works. Sometimes it produces malformed output and somebody has to spend two hours figuring out what went wrong, manually cleaning up, and re-running.

Read more