I spent two years being the guy who provisions databases. Every Monday morning, same Slack message: “Hey, can I get a Postgres instance for the new service?” I’d open Terraform, copy a module block, change three variables, run the plan, wait for approval, apply. Twenty minutes of my life, gone. Multiply that by four teams and it adds up fast.

Then I set up Crossplane with Compositions, and now developers do it themselves with a single YAML file. Here’s how I got there and what broke along the way.

Why Not Just Terraform?

Terraform works. I’m not here to trash it. But for self-service, it has a fundamental problem: developers need access to the state, the provider credentials, and the CI pipeline that runs terraform apply. That’s a lot of trust surface for someone who just wants a database.

Crossplane flips this. It runs inside your Kubernetes cluster as a set of controllers. Developers create a custom resource, Crossplane reconciles it into real cloud resources. Same GitOps workflow they already use for their apps.

Installing Crossplane

I run it via Helm because the marketplace operator had issues with our OPA policies:

helm repo add crossplane-stable https://charts.crossplane.io/stable
helm repo update

helm install crossplane crossplane-stable/crossplane \
  --namespace crossplane-system \
  --create-namespace \
  --set args='{"--enable-usages"}' \
  --version 1.19.0

The --enable-usages flag is important. Without it, deleting a Composition can orphan cloud resources. Learned that one the expensive way when an intern deleted a CompositeResourceDefinition and we had 14 untracked RDS instances running for a week.

Setting Up the AWS Provider

apiVersion: pkg.crossplane.io/v1
kind: Provider
metadata:
  name: provider-aws-rds
spec:
  package: xpkg.upbound.io/upbound/provider-aws-rds:v1.18.0
  runtimeConfigRef:
    name: irsa-config

I use IRSA (IAM Roles for Service Accounts) instead of static credentials. The runtime config looks like this:

apiVersion: pkg.crossplane.io/v1beta1
kind: DeploymentRuntimeConfig
metadata:
  name: irsa-config
spec:
  deploymentTemplate:
    spec:
      selector: {}
      template:
        spec:
          serviceAccountName: crossplane-provider-aws
          containers:
            - name: package-runtime
              args:
                - --poll=1m

One gotcha: the provider pods need the eks.amazonaws.com/role-arn annotation on their ServiceAccount, not on the Crossplane system SA. I spent an afternoon debugging “AccessDenied” errors because of this.

The Composition: Wrapping RDS

This is where it gets interesting. A Composition is basically a template that maps a simple developer-facing API to the complex cloud resource underneath.

First, define what developers see (the XRD):

apiVersion: apiextensions.crossplane.io/v1
kind: CompositeResourceDefinition
metadata:
  name: xpostgresinstances.database.dedico.hu
spec:
  group: database.dedico.hu
  names:
    kind: XPostgresInstance
    plural: xpostgresinstances
  claimNames:
    kind: PostgresInstance
    plural: postgresinstances
  versions:
    - name: v1alpha1
      served: true
      referenceable: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              properties:
                size:
                  type: string
                  enum: ["small", "medium", "large"]
                  description: "small=db.t3.micro, medium=db.t3.small, large=db.t3.medium"
                teamName:
                  type: string
              required:
                - size
                - teamName

Developers pick a t-shirt size and provide their team name. That’s it. No instance class memorization, no subnet group configs, no parameter groups.

Then the Composition maps those simple inputs to real AWS resources:

apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
  name: postgres-on-aws
  labels:
    provider: aws
spec:
  compositeTypeRef:
    apiVersion: database.dedico.hu/v1alpha1
    kind: XPostgresInstance
  resources:
    - name: rds-instance
      base:
        apiVersion: rds.aws.upbound.io/v1beta2
        kind: Instance
        spec:
          forProvider:
            engine: postgres
            engineVersion: "16.4"
            dbSubnetGroupName: shared-private
            vpcSecurityGroupIds:
              - sg-0abc123def456
            publiclyAccessible: false
            storageEncrypted: true
            autoMinorVersionUpgrade: true
            backupRetentionPeriod: 7
            deletionProtection: true
      patches:
        - type: FromCompositeFieldPath
          fromFieldPath: spec.size
          toFieldPath: spec.forProvider.instanceClass
          transforms:
            - type: map
              map:
                small: db.t3.micro
                medium: db.t3.small
                large: db.t3.medium
        - type: FromCompositeFieldPath
          fromFieldPath: spec.size
          toFieldPath: spec.forProvider.allocatedStorage
          transforms:
            - type: map
              map:
                small: "20"
                medium: "50"
                large: "100"
        - type: FromCompositeFieldPath
          fromFieldPath: spec.teamName
          toFieldPath: spec.forProvider.tags.Team
    - name: rds-password
      base:
        apiVersion: secretstores.crossplane.io/v1alpha1
        kind: VaultSecret
        spec:
          forProvider:
            path: database/creds

What Developers Actually Do

A developer wanting a database creates this in their app’s GitOps repo:

apiVersion: database.dedico.hu/v1alpha1
kind: PostgresInstance
metadata:
  name: user-service-db
  namespace: team-payments
spec:
  size: small
  teamName: payments

They push it, ArgoCD syncs it, Crossplane picks it up, and 5 minutes later there’s a running RDS instance with the connection string written to a Kubernetes Secret in their namespace.

No Slack message. No ticket. No waiting for me.

The Things That Went Wrong

Problem 1: Composition drift detection is slow. Crossplane polls cloud providers on an interval (default 1 minute). If someone modifies an RDS instance through the AWS console, it can take up to a minute to catch and revert. For us that was fine, but if you need tighter drift detection, bump the --poll interval down. Just watch the API rate limits.

Problem 2: Deletion ordering matters. We had a Composition that created both an RDS instance and a security group. When a developer deleted their claim, Crossplane tried to delete the security group before the RDS instance was fully gone. The security group deletion failed because it was still in use, and the whole thing got stuck in a delete loop. Fix: use the usages feature to declare dependencies.

apiVersion: apiextensions.crossplane.io/v1alpha1
kind: Usage
metadata:
  name: rds-uses-sg
spec:
  of:
    apiVersion: ec2.aws.upbound.io/v1beta1
    kind: SecurityGroup
    resourceRef:
      name: my-sg
  by:
    apiVersion: rds.aws.upbound.io/v1beta2
    kind: Instance
    resourceRef:
      name: my-rds

Problem 3: Provider version upgrades can break CRDs. When I upgraded provider-aws-rds from v1.14 to v1.16, two fields changed names. All existing managed resources started showing “field not found” errors. Now I always test provider upgrades in a staging cluster first, and I pin exact versions in production.

Cost Controls

The t-shirt size model is great for guardrails. Nobody can accidentally spin up a db.r6g.4xlarge because it’s not in the enum. But we also added a Kyverno policy as a second layer:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: limit-postgres-size
spec:
  validatingAdmissionPolicy: false
  rules:
    - name: max-size-per-namespace
      match:
        any:
          - resources:
              kinds:
                - PostgresInstance
      validate:
        message: "Non-production namespaces can only use 'small' or 'medium' sizes"
        deny:
          conditions:
            all:
              - key: "{{request.object.spec.size}}"
                operator: Equals
                value: "large"
              - key: "{{request.namespace}}"
                operator: AnyNotIn
                value:
                  - prod-*

Was It Worth It?

After three months: I went from handling 15+ infra requests per week to maybe 2 (edge cases where someone needs something outside the standard sizes). Developers are happier because they don’t wait. I’m happier because I can focus on the platform instead of being a human Terraform runner.

The setup took about two weeks of real work. Most of that was getting IRSA right and testing the Compositions against different failure scenarios. If you already run Kubernetes and use GitOps, adding Crossplane is a natural next step.

One piece of advice: start with one resource type. Get the Composition right, document it, let teams use it for a month. Then expand. Trying to build a full self-service catalog on day one is a recipe for half-finished abstractions that nobody trusts.