Takao Shibamoto

Software engineer working on large-scale distributed systems, big data, and AI infrastructure. Currently at Microsoft.

Pinned

Data infrastructure is the bottleneck for enterprise AI, not the models

2026-05-05

There is a popular narrative right now that AI is bottlenecked by model capabilities — that we need bigger models, smarter agents, better reasoning. After several years of building data infrastructure in logistics and now enterprise AI, I have come to a different view: the real bottleneck is data infrastructure, and most companies will never get the full value of AI until they fix it.

The engineer's exit strategy and interview playbook

2026-06-08

A junior engineer reached out to me recently. They were stuck in a job with bad management, mentally fried, and wanted out — but the idea of grinding interviews on top of a draining day job felt impossible. They asked how I balanced shipping work, prepping for interviews, and actually passing the loops. I’ve also sat on the other side of the table as an interviewer, so I have opinions about what works and what gets people rejected.

A simple investing strategy for beginners

2026-06-04

My previous post ran the math on why investing matters. My full strategy is what I actually do, but I got feedback that it’s dense for someone just getting started. Fair. This is the simple version — concepts and behavior only, country-agnostic. For tickers and account names, the full post has them.

My investing strategy

2026-05-31

My previous post showed the math: a Seattle engineering income at ~7% real returns reaches the rat-race escape zone (FIRE) in about a decade. This is the operating manual that turns that math into a portfolio.

RAG in production is mostly retrieval, not generation

2026-05-25

Many teams shipping retrieval-augmented generation today started from the same five-line tutorial: chunk your documents, embed them into a vector store, embed the user’s question, fetch the nearest hits, hand them to an LLM. It works beautifully in a notebook. It also has almost nothing to do with what a real RAG product looks like once it has users.

Why should we invest our money?

2026-05-18

One reason — if you can see the world a little more broadly, can calculate risk, and can plan long-term, you can see that this rat race actually has an exit.

What is data engineering, and why does AI need it?

2026-05-14

A friend recently read my last post on why data infrastructure is the real bottleneck for enterprise AI, and asked exactly the right question: what is a data pipeline, and why does AI need one in the first place? If you have used ChatGPT, Gemini, or Copilot but never built a model or a pipeline yourself, that question deserves a proper answer — and I realized I had never written one.

How to Host a Private Jekyll Source on GitHub Pages Without Paying for Pro

2026-05-11

GitHub Pages on the Free plan only builds from public repos. To keep your source private, GitHub charges $4/month for Pro. Here’s how to get the same result for $0 by splitting source and built output into two repos.

Designing fairness-aware performance metrics for gig-economy workforces

2026-05-02

Gig-economy platforms — Uber, Lyft, DoorDash, Instacart, Amazon DSP, and many smaller ones — share a deep operational problem: they need to evaluate the performance of large numbers of independent workers using objective data, but the data is full of factors the worker cannot control.

When to replace manual data pipelines with automated worker-based systems

2026-04-20

A pattern I have seen in nearly every company I have worked at, regardless of size or sophistication: somewhere in the engineering org, there is a critical data pipeline that is “automated” only in the loosest sense. An engineer runs a script. The script does some things. Sometimes it works. Sometimes it produces malformed output and somebody has to spend two hours figuring out what went wrong, manually cleaning up, and re-running.

Load more posts