It started as a seemingly straightforward decision – using Terraform to manage our PostgreSQL database infrastructure. Like many teams, we were drawn to the allure of Infrastructure as Code (IaC).

After all, if we could version control our servers and networks, why not our databases? Little did we know the challenges that lay ahead.

When Simple Changes Have Big Consequences

Our first wake-up call came when we decided to modernize our Terraform code by moving from count to for_each with toset – a common and usually safe refactoring pattern.

Here’s what we changed:

From:

resource "postgresql_role" "external_users" {
  count = length(local.external_users)
  name  = local.external_users[count.index]
  login = true
  roles = ["rds_iam"]
}

To:

resource "postgresql_role" "external_users" {
  for_each = toset(local.external_users)
  name  = each.key
  login = true
  roles = ["rds_iam"]
}

Now, if this were a typical infrastructure resource – like Google Cloud Storage buckets or compute instances – this change would be straightforward. When you make this switch with something like GCP instances, Terraform might recreate the instances, but you’d just get new VMs with the same configuration. No big deal.

But with databases? This same change turned into a full-blown crisis. Because these were database roles with complex permissions and dependencies, the recreation meant we completely lost access to our database. While the syntax change was our mistake, the severity of the consequence reveals a deeper problem: database permissions aren’t just infrastructure configuration – they’re critical stateful components that, when broken, can lead to system-wide outages.

This stands in stark contrast to regular infrastructure. When you make the count-to-toset switch with something like Google Cloud resources, you might get new resource names and need to update some references, but you don’t risk losing access to your entire application. It’s the difference between “oops, need to update some DNS entries” and “nobody can access the database, and we can’t even get in to fix it.”

When Dependencies Attack

But that wasn’t the end of our IaC database adventures. We later encountered an even more perplexing issue with role dependencies.

Here’s what our configuration looked like:

resource "postgresql_database" "db" {
  name = "delete_before_update"
}

resource "postgresql_role" "publication_owner" {
  name = "${postgresql_database.db.name}_publication_owner"
}

resource "postgresql_publication" "pub" {
  database = postgresql_database.db.name
  name = "my_publication"
  owner = postgresql_role.publication_owner.name
}

When we tried to remove a role that owned a publication, we hit a wall. Terraform stubbornly attempted to destroy the role before updating its dependencies, resulting in the dreaded “role cannot be dropped because some objects depend on it” error.
The database knew these dependencies existed, but Terraform’s orchestration was completely blind to these relationships.

The Grant Management Meltdown

The grant management issues were perhaps the most baffling. Consider this configuration:

resource "postgresql_grant" "access" {
  provider    = postgresql.mydb
  database    = var.database
  object_type = "table"
  schema      = var.schema
  privileges  = ["SELECT", "INSERT", "UPDATE", "DELETE"]
  role        = var.role
}

We found ourselves in a situation where Terraform would randomly decide to recreate grants that were already in place. It was like watching a robot repeatedly try to open an already opened door – unnecessary and potentially disruptive.

The MySQL side of things wasn’t any better. During incidents where we had to manually remove grants (sometimes security demands quick action), we’d run into error messages like:

"Error 1141: There is no such grant defined for user 'user' on host 'host'"

This meant we couldn’t even clean up our Terraform state properly. The tool that was supposed to help us maintain order was now preventing us from managing our database effectively.

The Real Lessons

These weren’t just minor inconveniences – they were serious operational issues that affected our database’s reliability and security. The core problem wasn’t with Terraform itself, but with the fundamental mismatch between IaC tools and database management needs.
Databases are living, breathing entities with complex state relationships that don’t fit neatly into the IaC paradigm.

After these experiences, we learned to be much more cautious about what we manage with IaC. While it’s great for provisioning the initial database infrastructure, the actual database objects, permissions, and grants need more specialized tools that understand database relationships and states.

Remember, this isn’t about abandoning Infrastructure as Code – it’s about understanding its limitations. Sometimes, the old ways exist for a reason, and database management is one area where specialized database tools still reign supreme.

Better Ways to Manage Your Database

After experiencing these challenges, we developed a more robust approach to database management.

Here’s what actually works in production:

Separate Your Concerns

The first step is acknowledging that database provisioning and database management are two different things.

Use Terraform only for the initial infrastructure setup:

  • Creating the database instance
  • Setting up networking and security groups
  • Configuring basic instance parameters
  • Managing high-level resources like read replicas

Schema Management

For database schemas and migrations, use dedicated tools that understand database states and dependencies:

  • Flyway: Perfect for teams that need strict version control of their database changes. It keeps track of which migrations have been applied and ensures they run in the correct order.
  • Liquibase: Great for complex environments where you need to maintain multiple database versions or when you need to roll back changes.

Both these tools let you write migrations in pure SQL or their own format, giving you direct control over exactly what changes are being made.

Permission Management

Instead of managing permissions through Terraform, consider:

  • Creating dedicated database roles that group common permissions
  • Using database-native tools for permission management
  • Maintaining a separate repository of SQL scripts for permission changes
  • Implementing a review process for permission changes

Emergency Procedures

Always have a plan for when things go wrong:

  • Keep a set of emergency access credentials secured outside your IaC
  • Document the direct SQL commands needed to fix common permission issues
  • Maintain an up-to-date list of database administrators who can intervene manually when needed

Continuous Integration

For a more robust deployment process:

-- Example migration script structure
-- filename: V1__create_base_roles.sql
CREATE ROLE app_reader;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO app_reader;

-- filename: V2__create_app_user.sql
CREATE USER app_user WITH PASSWORD 'xxx';
GRANT app_reader TO app_user;

This approach gives you:

  • Version control of your database changes
  • Clear audit trail of who changed what and when
  • Ability to test changes before applying them
  • Easy rollback capability when needed

The key is striking the right balance between automation and control. While Infrastructure as Code is powerful, databases require specialized tools that understand their unique stateful nature and complex dependencies.


Discover more from FastCode

Subscribe to get the latest posts sent to your email.

Leave a Reply

Index