Staff Engineer · Engineering Leader · Phoenix, AZ

Khilesh
Abhishek

17+ years building systems that matter

Staff-caliber engineer designing distributed systems, event-driven architectures, and AI/ML platforms at enterprise scale — billions of transactions, 99.99% availability, and teams of 65+ engineers.

17+
Years of experience
65+
Engineers led
99.99%
Availability delivered
6
Certifications held
AWS Solutions Architect Professional GCP Cloud Architect Professional TOGAF Practitioner Architect FinOps Certified Practitioner MIT Applied Data Science Harvard Leadership Training Distributed Systems Event-Driven Architecture AI/ML Platforms Payments at Scale Kafka · Kubernetes · Spark Java · Kotlin · Python AWS Solutions Architect Professional GCP Cloud Architect Professional TOGAF Practitioner Architect FinOps Certified Practitioner MIT Applied Data Science Harvard Leadership Training Distributed Systems Event-Driven Architecture AI/ML Platforms Payments at Scale Kafka · Kubernetes · Spark Java · Kotlin · Python

Engineering impact
in numbers

2h → 18min
MTTR reduction on a tier-1 payments application via layered observability strategy
SRE · OpenTelemetry · Dynatrace
40%
Infrastructure cost reduction through Kafka-based event-driven architecture replacing point-to-point messaging
Kafka · Kubernetes · FinOps
3.4×
Throughput improvement with 62% latency reduction on global payment orchestration — zero API changes to consumers
Async · Event Sourcing
25%
Engineering throughput increase plus 30% reduction in PR cycle time through CI/CD standards adopted org-wide
GitHub Actions · SonarQube

Professional experience

2022 – Present American Express
Staff Engineer / Senior Engineering Manager
📍 Phoenix, AZ
  • Authored end-to-end architecture for a multi-tier AI/ML platform serving corporate card customers, defining service boundaries, data contracts, and failure domains to sustain 99.99% availability under full production load.
  • Drove observability strategy from first principles — toolchain selection (OpenTelemetry → Dynatrace → Splunk), SLI/SLO/SLA contracts, and automated alerting pipelines — reducing MTTR from 2 hours to 18 minutes on a tier-1 application.
  • Designed and enforced CI/CD standards adopted org-wide, increasing engineering throughput by 25% and cutting PR cycle time by 30%.
  • Led incremental modernization of a legacy monolith via the strangler fig pattern — zero customer-facing downtime, iterative service decoupling using domain-driven design seam boundaries.
  • Scaled and mentored an organization of 65+ engineers; built technical career frameworks resulting in 4+ internal promotions and measurably improved retention.
2020 – 2022 American Express
Senior Software Engineer / Engineering Manager
📍 Phoenix, AZ
  • Architected an event-driven, Kubernetes-based infrastructure processing hundreds of millions of events daily — eliminating over-provisioning and reducing infrastructure cost by 40%.
  • Made the key architectural decision to adopt Apache Kafka over point-to-point messaging, enabling horizontal scaling and replay capability without redesigning 12 downstream consumers.
  • Built microservice layer handling 100K+ requests per minute at 99.99%+ availability; circuit-breaker and bulkhead patterns contained a cascading failure event with zero customer impact.
  • Defined code review standards and debugging playbooks that reduced P0/P1 incidents by ~30%.
2017 – 2020 American Express
Software Engineer II / Technical Lead
📍 Phoenix, AZ
  • Redesigned global payment orchestration — replacing a synchronous, tightly-coupled service graph with an async, event-sourced model — achieving 3.4× throughput improvement and 62% latency reduction without a full rewrite.
  • Established engineering best practices lifting delivery velocity by ~30% across 8 direct and 20+ matrixed engineers.
  • Led cross-functional technical programs coordinating 5+ product and data teams, with dependency maps and risk registers that prevented three critical path delays.
2014 – 2017 American Express
Software Engineer I / Technical Lead
📍 Phoenix, AZ
  • Architected a reactive Java-based data pipeline platform enabling end-to-end ingestion visibility — from REST intake through Hive/Spark transformation to SFTP delivery — serving co-brand corporate card programs at global scale.
  • Led development of enterprise digital platforms supporting commercial card systems across global markets with multi-region failover and data residency controls.
2010 – 2014 The Home Depot · Walgreens via TCS
Software Engineer
📍 Atlanta, GA · Chicago, IL
  • Designed a microservices-based fund tracking platform with county-level allocation and real-time budget visibility for Walgreens store managers.
  • Built a data merger tool for a major M&A integration that automated record matching with match quality metrics (false positives, true negatives).
  • Reduced defect leakage to production by ~35% at Home Depot by introducing contract testing and integration test coverage at the service boundary layer.

Architecture decisions

01
Event-Driven Migration
AmEx · 2020 · Messaging Architecture
Evaluated RabbitMQ vs. Kafka vs. AWS SQS/SNS for a high-throughput ingestion system. Selected Kafka for partition-based ordering guarantees, replay capability, and consumer group isolation — avoiding a redesign of 12 downstream consumers. Sequenced rollout by traffic weight with feature flags enabling parallel-run validation.
40% cost reduction · Linear scalability · Zero downstream redesign
02
Strangler Fig Modernization
AmEx · 2022 · Legacy Migration
Proposed and owned the migration strategy for a legacy monolith with 8 years of accumulated coupling. Defined seam boundaries using domain-driven design, sequenced rollout by traffic weight, and introduced feature flags to enable parallel-run validation — delivered with zero downtime.
Zero-downtime delivery · 8 years of coupling unwound · DDD seam strategy
03
Observability-First SRE
AmEx · 2022 · Platform Reliability
Identified MTTR was driven by alert fatigue and lack of correlated traces. Proposed a layered observability model (metrics → traces → logs) anchored on OpenTelemetry. Contracted SLOs with product leadership, establishing a shared accountability model across engineering and product.
MTTR: 2 hours → 18 minutes · SLO-driven culture established
04
Async Payment Architecture
AmEx · 2017 · Performance Engineering
Diagnosed a latency bottleneck in a synchronous payment orchestration chain via distributed tracing. Redesigned to async event-sourced model with idempotency guarantees — delivered 3.4× throughput and 62% latency reduction with no schema or API changes to downstream consumers.
3.4× throughput · 62% latency reduction · No API contract changes

Core competencies

Systems Architecture
Distributed Systems Microservices Event-Driven Kafka Reactive Pipelines Fault-Tolerant Design Strangler Fig DDD
Languages & Frameworks
Java Spring Boot Kotlin Python GraphQL REST APIs PySpark
Cloud & Platforms
AWS GCP Kubernetes Docker Hybrid / On-Prem TOGAF
Data & AI/ML
Kafka Spark BigQuery Cassandra PostgreSQL TensorFlow PyTorch ML Lifecycle
Observability & SRE
OpenTelemetry Prometheus Grafana Dynatrace Splunk SLI/SLO/SLA MTTR Reduction
Engineering Excellence
GitHub Actions Jenkins SonarQube GitHub Copilot CI/CD Code Review Culture FinOps

Certifications & education

☁️
AWS Certified Solutions Architect – Professional
Amazon Web Services
🌐
GCP Professional Cloud Architect
Google Cloud
🏛️
TOGAF Practitioner Architect
The Open Group
🤖
MIT Applied Data Science Program
Massachusetts Institute of Technology
💰
FinOps Certified Practitioner
FinOps Foundation
🎓
Harvard Leadership Training
Harvard University
MBA, Leadership
Grand Canyon University, Arizona
GPA 3.8 / 4.0
B.Tech, Computer Science
JIIT, India
GPA 6.8 / 9.0
Memory Bloating in RAG-Based Enterprise Applications
Technical Publication · AI/ML Architecture
Using (and Not Using) the Sidecar Design Pattern
Technical Publication · Cloud Architecture
Work Authorization: I hold an approved I-140 and have maintained H-1B status for over 12 years. This makes me eligible for long-term extensions beyond the standard cap-subject limit and exempt from the lottery-based sponsorship process — no gap in authorization, straightforward transfer.

Open to the right opportunity

Staff/Principal Engineer or Director of Engineering roles in fintech, payments, or enterprise AI/ML platforms. I bring architectural depth, a track record of measurable outcomes, and the leadership instinct to build organizations that last.