MLOps starter stack for a 5-person data team in 2026

The short answer: in 2026, a five-person data team graduating from notebooks needs five things and not much more — experiment tracking, a model registry, a way to serve models behind an API, a pipeline orchestrator, and basic monitoring. Use managed services for what you can, self-host only the experiment tracker, and skip the feature store and Kubeflow until you have at least 15 people.

This post walks through each component, names specific 2026 tools we’d actually pick, and gives a total monthly cost you can put in a budget — roughly $250–$900/month for a working stack. We also tell you what to not add yet, which is usually more valuable advice than the additions.

What we mean by “5-person data team”

The recommendations assume:

  • 3–5 people total: a mix of data scientists, ML engineers, and maybe one platform-leaning generalist.
  • 2–5 models in some form of production use today.
  • Models retrain on a schedule (daily, weekly, or on-demand) rather than continuously.
  • Mostly tabular / NLP / classical ML — the post applies to LLM-heavy teams too, but their stack picks differ; see the note at the end.
  • The team is fed up with notebooks-as-production, but doesn’t have a dedicated platform engineer.

The five components a starter stack actually needs

In rough order of how soon they hurt if you don’t have them:

  1. Experiment tracking
  2. Model registry
  3. Model serving
  4. Pipeline orchestration
  5. Monitoring

You can argue about the order. You can’t reasonably ship production ML without all five. Most teams we work with have one or two of these built, three of them held together with cron jobs, and one of them “coming next quarter.”

1. Experiment tracking

What it does: records every training run, its hyperparameters, its metrics, and the model artefact, so you can compare experiments and reproduce them six months later.

What we’d pick in 2026: MLflow, self-hosted on a small VM or in a container, with S3 (or equivalent object storage) for artefacts and Postgres for the metadata store. It’s open source, has eight years of community knowledge, integrates with everything, and the self-hosted setup is genuinely simple.

Alternatives:

  • Weights & Biases — better UI, polished product, but the per-seat pricing adds up. Free for small teams; from $200–$400/seat/month above that.
  • Neptune.ai — similar to W&B, friendly free tier, good for teams that don’t want to host anything.
  • Comet — the third major SaaS tracker, equivalent feature set.

Our default recommendation for a 5-person team: self-hosted MLflow. The tracker is something you’ll keep forever, and the lock-in cost of switching SaaS trackers (when their pricing changes) is high. Hosting MLflow is a one-day job and ~$30/month of infrastructure.

2. Model registry

What it does: tracks which model versions exist, which is in production, which is in staging, and provides a clean way to promote between them. Critically: the registry is what your deploy pipeline pulls from. Without it, “which model is in prod?” has no canonical answer.

What we’d pick in 2026: MLflow Model Registry, which comes free with the MLflow you already deployed for experiment tracking. One tool, one set of credentials, one UI.

Alternatives: Sagemaker Model Registry, Vertex AI Model Registry, Azure ML Model Registry. All are fine and integrate with their respective cloud’s deployment services. Pick one of these if you’re already heavily invested in one cloud’s ML stack.

What we’d skip: standalone model registry products. Not worth a second tool at this stage.

3. Model serving

This is the area where teams over-build the hardest. The 2026 reality is that for almost every small-team use case, the right serving layer is a containerized FastAPI app behind your existing platform.

What we’d pick in 2026:

  • For most cases: a small FastAPI app that loads the model from MLflow or S3 at startup, deployed as a container on whatever platform your engineering team uses (Kubernetes, ECS, Cloud Run, App Runner). Boring, scalable, debuggable.
  • For batch inference: a job triggered by your orchestrator, writing predictions to a database or warehouse table. No serving layer at all — if downstream consumers read from a table, that’s fine.
  • For high-throughput / GPU inference: consider BentoML for the developer ergonomics, or NVIDIA Triton Inference Server if you need raw throughput. Only do this if you actually have the load.

What we’d skip:

  • Seldon Core, KServe, Kubeflow Serving. Powerful platforms with significant operational overhead. A 5-person team can’t justify them yet.
  • Custom inference servers. Don’t write your own. FastAPI exists.
The honest fact: we’ve never seen a small team regret picking the boring containerized FastAPI approach. We’ve seen many regret picking Seldon, KServe, or a Kubeflow-based stack at 5 people and then spending the next year operating it instead of training models.

4. Pipeline orchestration

What it does: runs your data pipelines and training jobs on a schedule, with retries, alerts, and a UI to see what’s running.

What we’d pick in 2026: Prefect. Specifically Prefect’s managed cloud (free tier covers small teams) with self-hosted workers running in your cluster or on a worker VM.

Why not Airflow? Airflow is still the most widely deployed orchestrator, and it’s a perfectly defensible choice if your engineering org already runs it. But for a 5-person team starting fresh in 2026, Prefect’s developer experience is genuinely better, the cloud control plane removes a meaningful chunk of operational work, and the DAG-as-code model fits how data scientists already think.

Alternatives:

  • Dagster — an excellent third choice. We’d pick it over Airflow for any team without legacy Airflow code.
  • GitHub Actions / GitLab CI — under-rated for genuinely simple cases. If your “pipeline” is “run this Python script once a week,” a scheduled CI job is fine, free, and one less thing to maintain.
  • AWS Step Functions, Azure Data Factory, GCP Cloud Composer. Fine if you’re committed to one cloud and want managed.

What we’d skip: Kubeflow Pipelines. Heavy. Built for organisations with platform teams. Not for 5 people.

5. Monitoring

The most under-built component on every small team’s stack. There are two things to monitor and they’re both important:

System-level monitoring

Is the serving endpoint up? What’s the latency? Is it erroring? This is just service monitoring — use whatever your engineering team already uses (Datadog, Prometheus + Grafana, CloudWatch, etc.). If they don’t have anything, set up Prometheus + Grafana. Don’t reinvent.

Model-level monitoring

Is the model still doing its job? Has the input data drifted? Has accuracy regressed? This is where most teams skip and pay later.

What we’d pick in 2026: for a 5-person team, the minimum viable monitoring is:

  • Log every prediction to a table (model version, input features, prediction, optional ground-truth-later).
  • Run a daily job that computes summary statistics: prediction distribution, feature distribution, accuracy when ground truth is available.
  • Compare each day’s stats to a rolling baseline. Alert to Slack if anything drifts past a threshold.

That’s it. You can build it in Prefect (or wherever your orchestrator lives) in a couple of days. It catches 80% of what fancier tools catch, and it produces logs you can debug.

Off-the-shelf alternatives: Evidently AI (open source, good for drift dashboards), WhyLabs (SaaS, ML observability), Arize AI (SaaS, model monitoring at scale). Worth adopting once you have more than 5–10 models or genuine drift-sensitive use cases.

What we’d explicitly skip at this stage

The bullet point that saves most teams the most time:

  • Feature stores (Feast, Tecton, Hopsworks). Wonderful for teams with many features shared across many models and online serving requirements. Massive overkill for 5 people with 3 models. Build a feature store when you have the second model that wants to reuse the first model’s features — not before.
  • Kubeflow. An impressive platform built for organisations that need full ML platform abstraction across many teams. Not for you yet. You can run the components you need (training, serving) without buying the whole platform.
  • Data versioning (DVC, lakeFS). Useful, but later. For now, version your data by date partition in S3 and log the partition in your MLflow run. That’s 90% of the value.
  • A custom internal ML platform. The most expensive mistake at this stage. You will be tempted to abstract over MLflow + Prefect + FastAPI into “our ML platform.” Don’t. The day you have 15 people, you’ll know what to abstract; today you don’t.
  • Multi-cloud anything. Pick a cloud. Use its primitives. Worry about portability when someone with budget authority is genuinely asking about it.

The full stack and what it costs

Here’s what the recommendation looks like as a concrete stack, with rough monthly costs for a 5-person team in 2026:

  • Experiment tracking & registry: self-hosted MLflow on a small VM ($25) + S3 storage ($10–$50) + Postgres ($25 for a managed small instance) — ~$60–$100/month
  • Model serving: 1–2 FastAPI containers on your existing platform (Kubernetes, Cloud Run, ECS) — ~$50–$200/month depending on load
  • Pipeline orchestration: Prefect Cloud free tier + self-hosted workers on a small VM ($30) — ~$30/month
  • Monitoring: Prometheus + Grafana (free if self-hosted on existing platform) + a tiny SQL warehouse for prediction logs (~$50/month) — ~$50/month
  • Data storage: S3 for features, datasets, model artefacts — ~$50–$200/month for a small team
  • Training compute: highly variable. For most tabular teams, $50–$300/month. If you’re training transformers on GPUs, this dominates everything else.
  • Engineer time to keep it running: roughly 5–10% of one engineer’s time once steady-state.

Total fixed monthly cost (excluding training compute and data storage): roughly $250–$500/month. Realistic all-in: $400–$900/month for a 5-person team with a moderate training workload.

Compared to fully-managed alternatives like Databricks or SageMaker Studio, this stack is roughly half the cost and has more flexibility, at the price of slightly more operational ownership.

How to actually build this in 4 weeks

The order we’d follow:

  1. Week 1: Stand up MLflow. Move your last training run into it. Make the team commit to logging every run from now on.
  2. Week 2: Build one production-serving FastAPI service that pulls a model from MLflow. Deploy it to wherever your engineers deploy services. Stop deploying models any other way.
  3. Week 3: Move one scheduled training job into Prefect. Pick the one you currently run via a cron on someone’s laptop.
  4. Week 4: Add prediction logging to your FastAPI serving service. Write the daily drift-check job in Prefect. Wire it to Slack.

By the end of month one, you have a working MLOps stack. It won’t be impressive on a conference slide. It will be reliable, debuggable, and cheap.

Migration path as you grow

At each rough team size, you’ll feel pressure to add something. Here’s when the additions actually start paying for themselves:

  • 10 engineers, ~10 models: add Evidently for proper drift dashboards. Consider Dagster if Prefect is hitting limits.
  • 15 engineers, models sharing features: a feature store starts paying off — Feast self-hosted is the cheapest entry point.
  • 20+ engineers, GPU training: consider Kubeflow Pipelines or Ray for distributed training, but get a platform engineer first.
  • 30+ engineers, multiple teams: build the internal platform you were resisting at 5 people. You now know what to abstract.

A note on LLM-heavy teams

If your team is building products around LLMs rather than training models from scratch, the stack shifts:

  • Experiment tracking: still useful, but it tracks prompts, configs, and eval scores rather than model weights. MLflow works; Langfuse or Helicone are LLM-native alternatives.
  • Model registry: less central. Your “model” is often a prompt + a config + an upstream API.
  • Serving: mostly a thin wrapper around OpenAI/Anthropic/your self-hosted inference endpoint. FastAPI still the right answer.
  • Orchestration: the same orchestrator works for eval runs, batch processing, ingestion.
  • Monitoring: evals replace traditional drift. Run your eval set against production prompts daily.

For LLM-heavy teams, we’ll write a dedicated post. The fundamentals stay the same; the tools differ.

Closing thought

The five-person MLOps stack in 2026 is unglamorous. It’s MLflow, FastAPI, Prefect, S3, and a SQL warehouse you already had. It costs less than $1k a month. It’s built in four weeks. It will carry a team to roughly 15 engineers without rewriting.

What it isn’t is impressive. There’s no Kubeflow, no feature store, no custom platform. That’s a feature, not a gap. A small team’s leverage comes from shipping models, not from operating ML infrastructure. The minimum viable stack is the one that lets you do the first while doing as little of the second as possible.

Standing up your first MLOps stack?

We do this end-to-end — pick the right components, build them, and hand them over with documentation your team can run. Book a free 20-minute call to talk it through.

Book a free 20-min call