The Ops Mindset: Systems That Run Themselves
Part 1 taught you to set up calm. Part 2 taught you to ship with confidence.
Part 3 is about building systems that don't need you to babysit them.
We're going to automate your quality checks, practice what to do when things break, and establish rules for working with AI that keep you in control. This is the stuff that separates "I built a thing" from "I built a thing that keeps working."
You don't need a DevOps title. You just need the willingness to think in systems instead of one-off fixes.
Explain it three ways
We're turning our LEGO city into a real theme park. That means setting up rules so every ride works the same way, practicing what to do if a light goes out, and keeping a clipboard with all the "what happened?" stories. That way, when friends come to visit, everything runs smoothly without us having to fix things every five minutes.
This is the production operations playbook. It covers automation scripts that enforce quality gates, rollback procedures with documented recovery times, AI governance policies that maintain accountability, and analytics loops that catch issues before users report them. ROI: reduces incident response time by 60% and prevents 80% of "it worked on my machine" deployments.
Remember how we plan the whole weekend before inviting friends overโthe food, the playlist, the backup plan if it rains? This is that, but for software. We're writing down the recipes, testing the smoke alarms, practicing what to do if the oven fails, and keeping notes on what worked so next time is even smoother. It's the difference between hosting a party and hosting a party that runs itself.
The incident that changed everything
Let me tell you about a Tuesday afternoon.
14:03 Deployed new feature to production โ
14:17 Vercel logs show environment variable missing ๐ค
14:19 Users see blank screen ๐ฌ
14:25 Team scrambles, no one knows how to rollback ๐ฑ
14:45 Finally rolled back, postmortem scheduled42 minutes of chaos. Users affected. Stress through the roof.
Now here's the same scenario with ops rituals in place:
13:45 Pre-deploy checklist catches missing env var
13:50 Fix added to .env.example + Vercel dashboard
14:00 Deploy succeeds
14:05 Lighthouse + analytics verified โSame feature. Zero drama.
Ops rituals don't prevent all emergencies. They make emergencies boring.
What we're building today
๐ Part 3: Systems, Automation, and Ops Rituals
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ 1. Automation Ladder โ Scripts that check โ
โ 2. Rollback Drills โ Practice recovery โ
โ 3. AI Governance โ Stay in control โ
โ 4. Analytics Loop โ Catch issues early โ
โ 5. Retro Ritual โ Learn from mistakes โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| Section | Time | What you'll learn |
| --- | --- | --- |
| Automation Ladder | 12 min | Scripts that enforce quality |
| Rollback Drills | 10 min | How to recover when things break |
| AI Governance | 10 min | Rules for AI collaboration |
| Analytics Loop | 8 min | Monitoring and accessibility |
| Retro Ritual | 4 min | Learning from incidents |
| Total | 44 min | Complete ops foundation |
Step 1: The Automation Ladder (12 minutes)
You don't have to automate everything at once. Climb the ladder at your own pace.
The four levels
| Level | What it means | Time to set up |
| --- | --- | --- |
| 1. Manual checklist | Paper or Markdown list you run each deploy | 5 min |
| 2. Deploy script | Script that runs checks and stops if something fails | 20 min |
| 3. CI/CD | GitHub Actions runs checks on every pull request | 1-2 hours |
| 4. Zero-touch | Merge to main auto-deploys when all checks pass | 1 day |
Most people stay at Level 1 forever. Level 2 is the sweet spotโyou get 80% of the benefit with 20% of the setup.
Level 2: The deploy script
Create a file called scripts/deploy.ps1 (Windows) or scripts/deploy.sh (Mac/Linux):
Windows (PowerShell):
# scripts/deploy.ps1
# Run all quality checks before deploying
Write-Host "๐ Running lint..." -ForegroundColor Cyan
npm run lint
if ($LASTEXITCODE -ne 0) {
Write-Host "โ Lint failed. Fix errors before deploying." -ForegroundColor Red
exit 1
}
Write-Host "๐ Running type-check..." -ForegroundColor Cyan
npm run type-check
if ($LASTEXITCODE -ne 0) {
Write-Host "โ Type-check failed. Fix errors before deploying." -ForegroundColor Red
exit 1
}
Write-Host "๐งช Running tests..." -ForegroundColor Cyan
npm run test
if ($LASTEXITCODE -ne 0) {
Write-Host "โ Tests failed. Fix errors before deploying." -ForegroundColor Red
exit 1
}
Write-Host "๐๏ธ Building..." -ForegroundColor Cyan
npm run build
if ($LASTEXITCODE -ne 0) {
Write-Host "โ Build failed. Fix errors before deploying." -ForegroundColor Red
exit 1
}
Write-Host "๐ All checks passed! Deploying to Vercel..." -ForegroundColor Green
vercel --prod --yesMac/Linux (Bash):
#!/bin/bash
# scripts/deploy.sh
set -e # Stop on any error
echo "๐ Running lint..."
npm run lint
echo "๐ Running type-check..."
npm run type-check
echo "๐งช Running tests..."
npm run test
echo "๐๏ธ Building..."
npm run build
echo "๐ All checks passed! Deploying to Vercel..."
vercel --prod --yesHow to use it
Instead of running vercel --prod directly, run your script:
# Windows
.\scripts\deploy.ps1
# Mac/Linux
./scripts/deploy.shIf any check fails, the script stops. No more "oops, I forgot to run lint" deployments.
๐ฐ ROI calculation
If manual checks take 6 minutes per deploy, and you deploy 20 times per month, that's 120 minutes of manual work. The script takes 20 minutes to set up. It pays for itself in the first month, then saves you 2 hours every month after.
Step 2: Rollback Drills (10 minutes)
Things will break. The question isn't if but whenโand whether you'll panic or calmly fix it.
The rollback decision matrix
| Scenario | Severity | What to do |
| --- | --- | --- |
| 404 on new page | Medium | Fix forward if quick, rollback if >15 min |
| Missing env var | High | Rollback immediately, add var, redeploy |
| Styling bug | Low | Fix forward, note in devlog |
| Data corruption | Critical | Rollback, open incident, notify stakeholders |
Fire Drill Friday
Once a month, practice a rollback. Here's the drill:
1. Record current deployment
vercel ls
# Note the current production URL2. Trigger rollback
vercel rollback
# Select the previous deployment3. Verify site works
- Open the production URL
- Click around, check key features
- Note how long the rollback took
4. Roll forward again
vercel --prod --yes5. Log the drill
## Rollback Drill โ 2025-12-01
- Started: 14:00
- Rollback complete: 14:03 (3 minutes)
- Verification: 14:05
- Roll forward: 14:08
Lessons: Rollback command is fast. Main delay was finding the right deployment ID.
Next time: Bookmark the Vercel dashboard deployments page.๐ฅ Chaos engineering (lite version)
For extra credit: intentionally break a preview deployment. Remove a required env var, deploy to preview, and confirm that your monitoring catches it before you would have promoted to production. Document what you learned.
Step 3: AI Governance (10 minutes)
AI makes us faster. It can also make us sloppy if we're not careful.
The governance scorecard
Rate yourself honestly:
| Practice | Your status | Target |
| --- | --- | --- |
| Session logging | Do you write down what AI helped with? | Log every major session |
| Secrets in prompts | Ever pasted API keys into Cursor? | Never (use .env.example) |
| Technical debt tracking | Do you note when AI takes shortcuts? | Weekly review |
| Manual override protocol | Know when to ignore AI suggestions? | Documented in AI_PRACTICES.md |
| Refactor impact analysis | Check what AI changes might break? | Review all AI refactors |
The session log template
After any significant AI collaboration, log it:
## AI Session โ 2025-12-01 โ Deploy Script
**Goal:** Create automated deploy script with quality gates
**Constraints:** Windows PowerShell, must stop on failures
**What AI helped with:**
- Generated initial script structure
- Suggested error handling pattern
- Wrote the colored output messages
**What I changed:**
- Added the build step (AI forgot it)
- Changed exit codes to match our standards
**Decisions made:**
- Kept manual Vercel auth (didn't automate login)
- Added comments explaining each section
**Next time:**
- Ask AI to include coverage threshold flagThe escape hatch flow
Sometimes AI suggestions feel wrong. Here's what to do:
AI suggestion feels off?
โ
Stop autocomplete (Cmd+. or Esc)
โ
Check AI_PRACTICES.md for guidance
โ
Make the change manually
โ
Log why you overrode the suggestion
โ
Update PROMPT_PLAYBOOK if it's a pattern๐ฉ Red flag phrases from AI
When you see these, pause and verify manually:
โข "I guessed the file path..."
โข "This might work..."
โข "I can't access that file..."
โข "I assumed you wanted..."
These are moments to take over, not moments to trust blindly.
Step 4: Analytics Loop (8 minutes)
You can't fix what you can't see. Set up basic monitoring.
The metrics dashboard
| Metric | Tool | How often to check |
| --- | --- | --- |
| Core Web Vitals | Vercel Analytics | Weekly |
| Lighthouse scores | Chrome DevTools | Every major change |
| Error logs | Vercel Functions logs | Daily (quick scan) |
| User engagement | GA4 or Plausible | Weekly |
Accessibility quick audit
Before any major release, run through this checklist:
- [ ] Keyboard navigation โ Can you use the entire site without a mouse?
- [ ] Color contrast โ Do text and backgrounds have enough contrast? (Use DevTools)
- [ ] Reduced motion โ Do animations respect `prefers-reduced-motion`?
- [ ] Screen reader โ Do images have alt text? Do buttons have labels?
The weekly review ritual
Every Friday (or whatever day works), spend 10 minutes:
- Open Vercel Analytics โ Note any performance drops
- Check error logs โ Note any new errors
- Run Lighthouse on homepage โ Log the scores
- Update `PERFORMANCE_LOG.md` with findings
## Weekly Review โ 2025-12-01
**Lighthouse:** Mobile 94, Desktop 98 (no change)
**Errors:** None new
**Analytics:** 12% traffic increase, bounce rate stable
**Action items:**
- None this week
- Consider image optimization next sprintStep 5: The Retro Ritual (4 minutes)
When something goes wrongโand it willโcapture what you learned.
The retro template
## Incident: [Brief description]
**Date:**
**Duration:**
**Severity:** Low / Medium / High / Critical
### What happened
[Timeline of events]
### Impact
[Who was affected and how]
### Root cause
[Why did this happen]
### Fix applied
[What you did to resolve it]
### Prevention
[What changes prevent this from happening again]
### Rule update
[New checklist item or process change]Example entry
## Incident: Preview build failure
**Date:** 2025-12-01
**Duration:** 30 minutes
**Severity:** Medium (blocked preview, not production)
### What happened
- 14:00 Pushed feature branch
- 14:05 Vercel build failed
- 14:20 Found missing Tailwind color
- 14:30 Fixed and redeployed
### Impact
Preview blocked for 30 minutes. No user impact.
### Root cause
Used `bg-sage-500` but `sage` wasn't in tailwind.config.ts
### Fix applied
Added sage color palette to Tailwind config
### Prevention
- Add "color token check" to QA template
- Include tailwind.config.ts in work order context
### Rule update
New checklist item: "Verify all color tokens exist in Tailwind config"Ops Ritual Bingo ๐ฏ
Track your progress. How many can you check off this month?
| --- | --- | --- |
| โ Ran rollback drill | โ Caught bug in preview | โ Updated AI_PRACTICES.md |
| โ Hit 90+ Lighthouse | โ Wrote retro within 24h | โ Automated a manual step |
| โ Reviewed analytics | โ Rehearsed DNS failover | โ Pair-reviewed AI refactor |
Bingo = any row, column, or diagonal completed
Your completion checklist
The full Vercel ร Cursor Learning Ladder:
Part 1: Calm Rituals โ
- [ ] Folder structure with docs/, src/, content/
- [ ] Four documentation files created
- [ ] Tooling shakedown habit established
- [ ] Prompt spine template ready
- [ ] First Vercel deploy complete
Part 2: Quality Gates โ
- [ ] Work order workflow in place
- [ ] Feature shipped with tests
- [ ] Preview deploy reviewed
- [ ] Documentation updated
Part 3: Ops Rituals โ
- [ ] Deploy script created and tested
- [ ] Rollback drill completed and logged
- [ ] AI governance scorecard filled out
- [ ] Analytics loop established
- [ ] First retro entry written
What's next?
You've built the foundation. The rituals. The systems.
Now it's about repetition. Every feature you ship, run through the workflow. Every incident, write a retro. Every month, practice a rollback.
The goal isn't perfectionโit's predictability. When you know what to do, stress goes down. When stress goes down, you build better things.
๐ You've completed the Learning Ladder
From calm setup to confident shipping to operational systems. You now have the same workflow used by professional teamsโwithout needing years of experience to get here.
Interactive lab available
Practice ops rituals hands-on:
- Automation ladder with ROI calculator
- Rollback drill timer and logging
- AI governance scorecard
- Ops Ritual Bingo tracker
<a href="/learn-ai-lab/vercel-cursor/expert" className="text-purple-300 hover:text-purple-100 underline">โ Try the Expert Lab</a>