Kevin Brown

Principal Platform & DevOps Engineer

Summary

Platform engineer with five years of experience in cloud infrastructure, Kubernetes, observability, and developer experience. Specializes in migrating legacy systems to modern cloud-native architectures, building self-service platforms, and implementing security and compliance automation. Track record of measurable outcomes: 60-80% build time reductions, 70%+ MTTR improvements, and successful PCI-DSS, SOC2, and HIPAA certifications.

Past Clients

  • Senior Platform Engineer

    Tegna
    Tysons, VA

    Media and broadcasting company with 120 engineers across 8 product teams, bottlenecked by a two-person platform team processing all AWS infrastructure requests.

    • Built self-service infrastructure provisioning system using Terraform modules, OPA policy-as-code guardrails, and Atlantis PR-based workflows, reducing request lead time from two weeks to under one hour.
    • Designed five core Terraform modules (S3, RDS, Lambda, SQS, ElastiCache) covering 91% of all infrastructure requests, with security best practices (encryption, network isolation, audit logging) baked in.
    • Integrated Conftest with OPA policies enforcing SOC2 requirements in the CI pipeline, achieving zero policy violations while eliminating shadow IT resources discovered during initial audit.
    • Reduced platform team ticket volume by 75% enabling them to shift from repetitive provisioning to high-value platform work; infrastructure spend dropped 12% despite increased provisioning volume.
  • Developer Experience Lead

    Wolters Kluwer
    Philadelphia, PA

    Global information services company with 200+ engineers across 25 teams, struggling with fragmented documentation, unknown service ownership, and three-week onboarding cycles after years of acquisitions.

    • Led implementation of Spotify's Backstage as the internal developer platform, achieving voluntary 100% service catalog adoption across all 340 services within six months without any organizational mandate.
    • Integrated TechDocs for in-repo Markdown documentation rendering, raising documentation coverage from 40% to 95% by eliminating the friction of separate documentation systems.
    • Built software scaffolder templates that reduced new service creation from two days to 30 minutes, with CI / CD, observability, and catalog registration pre-configured.
    • Developed custom Backstage plugins for deployment history, feature flags, incident management, and per-service cost visibility; cut new engineer onboarding from three weeks to four days.
  • Senior Backend Engineer

    Moloco
    Redwood City, CA

    Machine learning adtech startup processing 500M+ daily events across 30+ services, suffering from cascade failures due to synchronous HTTP coupling and 24-hour analytics latency from batch processing.

    • Migrated service communication from synchronous request/response to event-driven architecture using Apache Kafka (AWS MSK), eliminating cascade failures that had averaged 8 incidents per month.
    • Implemented Confluent Schema Registry with compatibility enforcement across 30+ services, preventing breaking changes during producer/consumer schema evolution.
    • Reduced analytics latency from 24 hours to under 5 seconds by replacing nightly batch jobs with real-time stream processing using ksqlDB, enabling real-time customer dashboards.
    • Executed three-phase migration (billing pilot β†’ analytics events β†’ operational events) with zero downtime, increasing system throughput 3x on the same infrastructure.
  • Senior Platform Engineer

    Netsmart Technologies
    Overland Park, KS

    Healthcare technology company running 60+ microservices (Node.js, Python, Go) with fragmented observability across CloudWatch, Elasticsearch, Jaeger, and X-Ray.

    • Deployed unified observability stack with OpenTelemetry instrumentation and Grafana backends (Prometheus, Loki, Tempo), reducing mean time to detection from 4 hours to 8 minutes and MTTR from 6 hours to 45 minutes.
    • Built PHI redaction pipeline using OpenTelemetry Collector processors, enabling centralized logging while maintaining HIPAA compliance; passed audit with zero logging findings.
    • Saved $150K annually by replacing projected Datadog costs ($180K/yr) with self-hosted Grafana stack ($30K/yr in compute and storage) on existing Kubernetes infrastructure.
    • Automated 85% of HIPAA Security Rule controls using AWS Config rules, Terraform modules with compliant defaults, and CI/CD policy checks; reduced audit evidence collection from two weeks to two hours.
    • Achieved HIPAA certification one month ahead of deadline directly enabling a $2M ARR enterprise contract with a regional hospital network.
  • Cloud Migration Lead

    Regional LTL freight carrier processing 50,000 daily shipments on a legacy .NET Framework 4.6.2 / AngularJS 1.5 platform running on colocated servers with an expiring colocation contract and no cloud or modern .NET experience within the development team.

    • Led migration and modernization of 38 applications from colocated servers to Azure, porting the .NET Framework monolith to .NET 8 and replacing the AngularJS frontend with Angular 17, completing two weeks ahead of deadline with zero unplanned downtime during 24/7 operations.
    • Ported WCF services to ASP.NET Core minimal APIs using the strangler fig pattern with Azure API Management routing between legacy and modernized backends, cutting average API response times by 60% (320ms to 130ms).
    • Reverse-engineered undocumented carrier EDI integrations (FedEx, UPS, regional carriers) by capturing and analyzing network traffic, then rebuilt the integration layer using Azure API Management with Azure Functions and Azure Private Link.
    • Reduced monthly infrastructure costs by 35% (from $89K to $58K) through Azure Container Apps, Azure SQL Database, and elimination of the colocation lease.
    • Migrated SQL Server databases via Azure Database Migration Service with change data capture, replaced Entity Framework 6 with EF Core 8, and deployed .NET 8 services to Azure Container Apps with KEDA-based auto-scaling.
    • Executed four-phase approach (non-critical systems β†’ database migration and interim VM deployment β†’ .NET 8 / Angular modernization β†’ final cutover) that built team Azure competency before touching customer-facing systems.
  • Senior Cloud Engineer

    Fortune 100 financial services firm operating 100+ microservices across hybrid on-premises / AWS environment, with PCI-DSS compliance requirements, sub-100ms P99 latency SLAs, and 10M+ daily payment transactions.

    • Migrated 40+ microservices from EC2 to Amazon EKS while maintaining PCI-DSS compliance and sub-100ms latency; deployment frequency increased from bi-weekly to 15+ daily with 82% reduction in mean time to recovery.
    • Implemented zero trust architecture using Istio service mesh achieving 100% mTLS encryption for all service-to-service traffic and replacing 200+ static firewall rules with SPIFFE identity-based authorization policies.
    • Deployed SPIRE for workload identity attestation with automatic hourly certificate rotation, reducing credential revocation time from hours to seconds; passed SOC2 Type II audit with zero network security findings.
    • Configured ArgoCD for GitOps-based deployments and Karpenter for node provisioning, reducing infrastructure costs by 30% through better bin-packing and automatic scaling; P99 latency improved from 85ms to 72ms.
    • Led four-phase rollout (permissive mesh β†’ observability via Kiali β†’ audit-mode policies β†’ strict enforcement) that eliminated cascade failures without any production disruptions.
  • DevOps Engineer

    Overstock.com
    Midvale, UT

    E-commerce platform with 15 engineers and eight years of manual AWS console management across 847+ resources, suffering from configuration drift, two-week environment provisioning, and untracked incident response processes.

    • Imported 847 AWS resources into Terraform using S3 remote state with DynamoDB locking, establishing version-controlled infrastructure after a prior Terraform attempt had failed due to state file corruption.
    • Set up Atlantis for PR-based infrastructure workflows with automatic `terraform plan` output on pull requests, eliminating direct console changes that had previously caused a production payment system outage.
    • Reduced new environment provisioning from two weeks to two hours; resolved production / staging configuration drift that had been causing unreproducible bugs.
    • Introduced structured incident management framework with severity levels, escalation paths, and an Incident Commander rotation, reducing mean time to resolution by 72% (from 90 to 25 minutes).
    • Created runbook library for the top 20 recurring incident types reducing on-call pages by 73% (from 15 to 4 per week) and enabling junior engineers to resolve incidents previously requiring senior escalation.
    • Established blameless postmortem process that reduced recurring incidents by 60% and eliminated on-call burden as a factor in engineer attrition.
  • Senior Platform Engineer

    Braze
    New York, NY

    B2B SaaS customer engagement platform with 80 engineers in a TypeScript monorepo experiencing 45-minute CI builds, 30% flaky test failure rates, and $180K/month AWS spend growing faster than revenue.

    • Redesigned monorepo CI / CD pipeline using Turborepo for incremental builds and parallelized test execution, reducing build times from 45 minutes to 8 minutes (82% improvement) and flaky test failures from 30% to under 2%.
    • Increased deployment frequency from twice weekly to multiple times daily by implementing dependency-aware build graphs that only rebuilt and tested affected packages on each pull request.
    • Led AWS cost optimization initiative that reduced monthly spend from $180K to $72K (60% reduction), saving $1.3M annually through instance right-sizing, reserved instance commitments, and unused resource cleanup.
    • Maintained 99.95% uptime and improved P99 latency during optimization by eliminating noisy neighbor effects through right-sized instances; savings extended company runway by eight months, preventing planned engineering layoffs.
  • Software Developer

    Cause of a Kind
    Long Island, New York

    Full-service software development agency building mobile apps, web applications, and headless CMS solutions for clients across multiple industries.

    • Developed 30+ full stack applications utilizing React, NodeJS, TypeScript, React Native, Express, Laravel, and AWS technology stack.
    • Designed and coded 10,000+ unit, integration, and e2e tests using Mocha, Jest, and Playwight runners.
    • Completed troubleshooting of 2,000+ code-related issues and defects.
    • Developed 30+ headless CMS applications using NextJS, GatsbyJS, and various providers including Sanity and Contentful.
    • Experience with both REST and GraphQL API design and implementation.
    • Implemented pre-bid programmatic auction ad system for a publishing platform, and integration with Facebook using their SDK and API.
  • Infrastructure Engineer

    Alento, Inc.
    Dover, Delaware

    ERP software company serving art museums, providing inventory management systems delivered as both on-premises installations and managed hosted deployments.

    • Managed a multi-homed high availability distributed compute platform and infrastructure (BGP, OSPF, VLAN, Cisco IOS).
    • Installed, configured, tested and maintained operating systems (RHEL / KVM / QEMU, BSD), application software, and system management tools.
    • Supported CI / CD pipeline and automated build and deployment of Docker containers. Authored Jenkins extensions (Java) to support internal work flow.
    • Wrote and maintained custom scripts to increase system efficiency and automate processes for software and system administration (BASH / Python).
    • Managed and improved database server performance through maintenance tasks including re-indexing, updating statistics, and improving stored procedures (PostgreSQL).
  • Systems Analyst

    Eli Lilly
    Indianapolis, Indiana

    Fortune 100 pharmaceutical company with global R&D and manufacturing operations, requiring business process analysis, systems documentation, and technology procurement support across multiple internal teams.

    • Worked with users, observing business processes, interviewing staff, and documenting practices. Conducted research, analyzed business operations, and problem solved operating inefficiencies.
    • Created and analyzed users requirements for technology systems using use cases, mock ups, and interviewing stakeholders.
    • Prepared bid documents and functional specifications for new application implementations.
    • Led sprint planning and daily scrum meetings.
    • Created and maintained a company-wide inventory and descriptions of major systems, and a new employee guide to using them.

Skills

  • Cloud & Infrastructure

    AWS (EKS, EC2, RDS, S3, Lambda, MSK, DMS, API Gateway, PrivateLink, Config), Terraform, Atlantis, Karpenter, Kubernetes, Docker, Helm

  • Observability & Reliability

    OpenTelemetry, Grafana (Prometheus, Loki, Tempo), Kiali, CloudWatch, Datadog, PagerDuty, incident management, runbooks, blameless postmortems

  • CI/CD & Developer Experience

    ArgoCD, GitHub Actions, Turborepo, pipeline optimization, incremental builds, parallelized test execution, software scaffolding, monorepo tooling

  • Agentic Workflows & IDP

    Backstage-based internal developer platforms, agentic workflows, LangGraph, Claude Skills, AI-assisted scaffolding and operational automation

  • Security & Compliance

    Istio, mTLS, SPIFFE/SPIRE, OPA/Conftest, PCI-DSS, SOC2 Type II, HIPAA, zero trust architecture

  • Event Streaming

    Apache Kafka (MSK), Confluent Schema Registry, ksqlDB, event-driven architecture

Languages

  • Python
  • Go
  • TypeScript
  • Ruby
  • Lua
  • HCL
  • Rego
  • Bash
  • C++
  • Java

Social Profiles