My Homelab Was a Mess. Here's How I Fixed It with Code.

Aug 1, 2025

Every homelab story starts the same. You fire up one little app, just to see if you can. Next thing you know, you’ve got a dozen services, a spiderweb of docker-compose.yml files, and you’re basically just hoping nothing breaks. That was me. It worked, but it was fragile. Every update felt like a risk, and the thought of a server failure looms at the back of my mind.

I needed a better way to manage the complexity. That’s when I turned to Infrastructure as Code with OpenTofu, the open-source fork of Terraform.

From Chaos to Code: Building a Foundation

The goal was to define my entire homelab in code. Well, most of it. I wanted to be able to just nuke the whole thing and bring it back with one command. To get there, I thought I’d build some reusable bits.

I ended up sorting things into layers:

globals: This is for the stuff that never changes. User IDs, Cloudflare domain, that kind of thing. Set it and forget it.
networking: This layer sets up the Docker networks and reverse proxies that let my services talk to each other and the outside world.
services-generic: This is where common services live. I defined a docker-service module that knows how to run a standard Docker container. This is the blueprint for almost every service in my lab.
services-apps: This is where the individual modules for each application live. Each app has its own module, and they all use the generic docker-service module.

The structure looks something like this:

homelab/
├── main.tf # The main file that ties everything together
├── services/ # Where I enable/disable the services I want to run
└── modules/
    ├── 00-globals/
    ├── 01-networking/
    ├── 10-services-generic/
    └── 20-services-apps/

The ‘Aha!’ Moment: Making Services Talk to the World

This is the part I’m probably most proud of. How do you add a new service and expose it to the internet without manually editing a bunch of Caddy or Cloudflare config files? I wanted it to be automatic.

Service Definitions: Each application module outputs a standard data structure. This is the contract that lets the rest of the system know how to handle it.

// In modules/20-services-apps/n8n/main.tf
output "service_definition" {
  value = {
    name        = "n8n"
    port        = 5678
    publish_via = "reverse_proxy" // Needs WebSockets
  }
}

The Collector: The root main.tf file gathers all these definitions from the enabled services into a single, clean list.

// In the root main.tf
locals {
  all_services = flatten([
    for service in module.services : service.service_definition
  ])
}

The Routers: This unified list is then fed into the networking modules, which use the data to configure themselves. No manual editing needed.
```
// In the root main.tf
module "caddy-proxy" {
  source   = "./modules/01-networking/caddy-proxy"
  services = local.all_services
}
```

This design decouples the services from the publishing mechanism. An application doesn’t need to know how it’s being exposed, it just needs to declare its preference.

My Three-Tiered Approach to Publishing

So with that little system, I get this super flexible, three-tier way of handling access:

Cloudflare Tunnel: For most services, I set publish_via = "tunnel". My cloudflared-tunnel module handles the rest, creating a secure, outbound-only connection without opening firewall ports.
Caddy Reverse Proxy: For applications needing special handling, like WebSocket (wss://) support for AFFiNE, I use publish_via = "reverse_proxy". This tells my caddy-proxy module to automatically generate the necessary configuration, including SSL termination and the specific proxy rules.
VPN Access: For private or internal-only services, I simply omit the publish_via flag. By default, they are not exposed to the internet and are only accessible via my internal network or a Tailscale VPN connection.

That one little flag gives me total control, right in the code.

A Key Design Choice: .env over .tfvars

This drove me nuts for a bit. A small thing, but it was important. I decided to use a standard .env file for my global settings instead of the usual terraform.tfvars.

Why? It’s a bit technical, but basically variables from .tfvars are a pain to get into deeply nested modules. They just don’t flow down easily. I wanted to define PUID, PGID, TZ once and have them available everywhere.

The germanbrew/dotenv provider was the answer. The globals module just reads my .env file. Now, any other module that needs those values—like my generic service module—can just use them. It’s clean. It’s like dependency injection for your config.

// In the generic docker-service module, I can now use these globals as defaults
  default_env_vars = {
    TZ   = module.system_globals.timezone
    PUID = var.puid != null ? var.puid : module.system_globals.puid
    PGID = var.pgid != null ? var.pgid : module.system_globals.pgid
  }

The Payoff: A Homelab That Just Works

Look, this took time. For me it was about two weekends. But the peace of mind? Totally worth it. I can spin up a new service in minutes. My entire setup is in Git. If I break something, I can git revert and tofu apply. Server dies? I just grab a new one, clone the repo, run one command and everything comes back. It feels like a superpower, honestly.

High-level architecture of the homelab

Loading diagram...

Source

graph TD
    subgraph Internet
        direction LR
        User[User/Client]
    end

    subgraph "Publishing Layer"
        direction TB
        cf[Cloudflare Tunnel]
        caddy[Caddy Reverse Proxy]
    end

    subgraph "Docker Network"
        n8n[n8n]
        nocodb[NocoDB]
        redis[Redis]
        private_svc[Private Service]

        n8n -- uses --> redis
        nocodb -- uses --> redis
    end

    subgraph "VPN"
        vpn_user[VPN User]
    end

    User -- HTTPS --> cf
    User -- HTTPS/WSS --> caddy

    cf -- Forwards traffic to --> nocodb
    caddy -- Forwards traffic to --> n8n

    vpn_user -- Tailscale --> private_svc

What’s Next?

Am I done? Haha, no. I thought about going deep into Proxmox and virtualization, but I think I want to do something that feels more like real-world DevOps.

I’m thinking of a multi-region k3s cluster. The plan is to connect my little PC here in Australia with another one in Indonesia and a cheap cloud VPS. That’d give me actual redundancy and let me run services closer to people. For now, though, I need to focus. Mature this setup, get better monitoring, and actually automate my backups. One thing at a time.

Anyway, if you want to see the code and all the gory details, the whole thing is here on my GitHub.

This has been one of the most fun projects I’ve done in a while. Took my pile of stuff and turned it into a proper platform.

Hit me up if you have questions. Happy self-hosting!