The Friday Cleanup That Broke 40 Pipelines
Last week, a platform team I know shipped what they called a “small cleanup” to their deployment API. They renamed a field from service_id to serviceId, removed an endpoint that was “barely used,” and updated the response format to match their new schema. No version bump. No deprecation notice. Just a Friday afternoon deploy and a Slack message in #platform-updates that nobody read.
Monday morning, 40 CI pipelines failed. Three teams scrambled to update their deployment scripts. A critical security hotfix got blocked because the team couldn’t deploy. The platform team spent the entire week doing emergency migrations instead of planned work. And the trust they’d built over the previous year? Gone.
This happens constantly with internal APIs. There’s a temptation to treat them differently than external ones: “we can just tell people to update,” “everyone’s in the same building,” “we’ll coordinate in Slack.” But internal APIs deserve more versioning discipline than external ones, not less.
Internal customers are captive customers. They can’t switch to a competitor’s platform. This makes breaking their workflows worse, not better - they have no recourse except escalating to leadership or building workarounds that create tech debt.
The same practices that make external APIs predictable - semantic versioning, deprecation policies, migration support - apply to internal platform APIs. The difference is that internal teams have higher expectations because they’re colleagues, and lower patience because they have their own roadmaps that don’t include “emergency migration of the deployment API.”
What Actually Breaks
Will existing client code still work? That’s the question that determines whether a change is breaking. If no, it’s breaking. If yes, you need to dig deeper: is the behavior meaningfully different? Will consumers notice or care?
Some changes are obviously breaking - removing endpoints, changing HTTP methods, adding required fields. These require full deprecation process, no exceptions. Other changes are obviously safe - adding optional fields, new endpoints, loosening validation. The table below draws the line.
| Change | Breaking? | Why |
|---|---|---|
| Remove endpoint | Yes | Returns 404 to existing callers |
| Change HTTP method | Yes | Returns 405 to existing callers |
| Add required request field | Yes | Existing requests fail validation |
| Change field type | Yes | Deserialization fails |
| Add optional request field | No | Existing requests still valid |
| Add response field | No | Clients should ignore unknown fields |
| Add new endpoint | No | Existing endpoints unchanged |
| Loosen validation | No | Previously valid requests still work |
The tricky cases fall in between - changes that are technically compatible but break consumer assumptions:
- Error code changes (400 → 422): Technically correct, but breaks error handling logic
- Default value changes: Clients that assumed a specific page size or timeout
- Semantic changes: Making
countinclude deleted items when it previously didn’t - breaking even though the field name stays the same
When you’re unsure whether a change breaks existing clients, treat it as breaking. Nobody complains about unnecessary deprecation warnings. They complain loudly about broken pipelines.
The Deprecation Process
Once you’ve identified a breaking change, follow a structured process that gives consumers time to adapt. Skipping steps is how you end up with 40 broken pipelines and a week of emergency migrations.
Step 1: Identify affected consumers. Before announcing anything, know who’s using the affected endpoints. API gateway logs tell you which services call which endpoints. Service mesh telemetry shows call patterns. SDK analytics reveal version distribution. Query the last 30 days of traffic - this catches regular callers and weekly batch jobs.
Step 2: Create a proposal document. Write down what’s changing, why it’s changing, which endpoints are affected, estimated migration effort, and the proposed timeline. This becomes the single source of truth for the deprecation.
Step 3: Open a feedback period. Give affected teams 30 days to review the proposal and raise concerns. Maybe you’re missing a use case. Maybe the timeline is too aggressive for their roadmap. Maybe there’s a simpler migration path you hadn’t considered. Listen to the feedback - internal customers have context you don’t.
Step 4: Notify affected teams directly. Broadcast announcements get ignored. Target the teams you identified in step 1. Send to their Slack channels. Email their tech leads. Make it impossible to miss. The notification should include what’s changing, why, the timeline, the migration guide, and where to ask questions.
Step 5: Schedule the timeline with milestones. Set dates for feedback deadline, new version release, 90-day warning, 30-day warning, and sunset date. Put them in the team calendar. Configure automated reminders. Don’t rely on anyone remembering.
$ Stay Updated
> One deep dive per month on infrastructure topics, plus quick wins you can ship the same day.
The key insight is that urgency should escalate as the deadline approaches. Early communication is broad and informational - changelog updates, Slack posts, email announcements. As sunset nears, communication becomes targeted and direct.
| Timeline | Channel | Consumer Response |
|---|---|---|
| Announcement | Changelog, Slack, email | Read and acknowledge |
| 90 days out | Slack reminder, dashboard | Create migration ticket |
| 30 days out | Direct email, Slack DM | Begin migration work |
| 7 days out | Personal outreach | Emergency completion |
| Sunset day | Final notice | Migration complete |
Your API should also warn consumers programmatically. HTTP headers provide machine-readable deprecation signals that consumers can build automation around:
HTTP/1.1 200 OK
Deprecation: true
Sunset: Sat, 15 Aug 2024 00:00:00 GMT
Link: </docs/migration/v1-to-v2>; rel="deprecation"Making Migration Easy
The faster consumers migrate, the sooner you can sunset the old version. Every friction point in migration extends the deprecation timeline.
A good migration guide answers questions in the order developers ask them. Start with the overview: what’s changing, why, and the timeline. Then provide a quick start - the minimal changes for basic migration, ideally with copy-paste code snippets. Include a mapping table showing v1 endpoints/fields alongside their v2 equivalents. End with troubleshooting for common errors.
Tier your support by consumer need. Most migrations should be self-service with docs and validators. For stuck teams, offer office hours and PR reviews. For critical consumers or complex cases, the platform team writes the migration PRs directly.
Automation pays for itself quickly. A codemod that handles 80% of cases automatically and flags the remaining 20% for manual review dramatically reduces the burden on consuming teams. Compatibility adapters that translate v1 requests to v2 internally buy time for slow migrators without extending your maintenance window.
Download the API Versioning Guide
Get the complete internal API playbook for breaking-change governance, deprecation workflows, and migration support operations.
What you'll get:
- Breaking change detection checklist
- Deprecation timeline communication templates
- Migration support tier framework
- Adoption funnel metrics dashboard
The Invisible Success
Good API versioning is invisible. Consumers barely notice migrations because they’re well-communicated, well-supported, and well-timed. The new version shows up with deprecation warnings months in advance. The migration guide makes the change trivial. By the time sunset arrives, everyone’s already moved on.
Bad API versioning is very visible: broken pipelines, blocked deployments, emergency all-hands, and trust that takes months to rebuild. The difference between them isn’t technical complexity - it’s discipline.
Internal customers deserve predictable, well-communicated changes - arguably more so than external customers, because they can’t switch providers when you break them. When in doubt, treat it as breaking. Give more notice than you think necessary. Make migration easier than seems reasonable. The payoff is trust with internal teams that makes future changes easier, faster adoption of new versions, and platform team time spent on planned work instead of emergency migrations.
Table of Contents
Share this article
Found this helpful? Share it with others who might benefit.
Share this article
Enjoyed the read? Share it with your network.