Principal GPU Performance and Diagnostic Software Architect

AMD

United States

United States

Jetzt Bewerben

Über

WHAT YOU DO AT AMD CHANGES EVERYTHING At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond.
Together, we advance your career.
The Role AMD is seeking a Sr level Engineer to lead company-level innovation in GPU microarchitecture performance measurement, parallel programming optimization, and advanced software diagnostics. This role centers on deep technical leadership in performance attribution, hardware/software observability, and defect localization across GPU compute stacks. In this role you will define and architect next-generation methodologies for microarchitectural analysis, counter design, instrumentation, and diagnostic tooling that enable precise performance understanding from silicon through runtime and application layers.
The Person Are you a hands-on architect in areas like GPU/accelerator or HPC performance engineering, microarchitecture analysis, compilers, runtime systems, and diagnostics?
Key Responsibilities Microarchitecture Performance Measurement & Attribution
Define AMD’s methodology for cycle-accurate and counter-driven performance attribution across GPU generations.
Architect performance measurement frameworks that correlate workload behavior to microarchitectural structures (CUs/SIMDs, wavefront schedulers, issue pipelines, register files, memory hierarchy, cache systems, fabric/interconnect).
Drive counter architecture definition and validation to ensure observability of pipeline stalls, cache contention, memory divergence, synchronization overhead, and scheduling inefficiencies.
Establish rigorous approaches for bottleneck classification: compute-bound, memory-bound, latency-bound, fabric-bound, and occupancy-limited regimes.
Develop scalable performance modeling techniques linking pre-silicon simulation, emulation, and post-silicon telemetry.
Parallel Programming Performance Optimization
Architect end-to-end performance workflows: microbenchmarks, workload decomposition, instrumentation, trace capture, and guided optimization.
Lead development of profiling and visualization systems exposing pipeline stages, wave occupancy, cache behavior, memory bandwidth utilization, atomic/synchronization costs, and interconnect utilization.
Influence compiler and runtime optimizations including code generation, scheduling, register allocation, vectorization, tiling, kernel fusion, and launch configuration strategies.
Drive auto-tuning and kernel optimization frameworks for AI/HPC workloads (GEMM, convolution, attention, graph workloads) across GPU generations and heterogeneous system configurations.
Ensure strong correlation between synthetic benchmarks, application kernels, and real-world workloads.
Advanced Software Diagnostics & Defect Localization
Architect diagnostic frameworks capable of detecting, isolating, and reproducing defects across silicon, firmware, driver, runtime, and application layers.
Develop static and dynamic analysis tools tailored to GPU execution and memory consistency models.
Lead development of GPU-focused sanitizers, race detectors, memory checkers, hang analysis tools, and fuzzing frameworks.
Build automated triage systems integrating telemetry, crash signatures, counter anomalies, and workload traces to accelerate root cause identification.
Drive methodologies for deterministic repro, workload minimization, and differential testing across hardware stepping and driver/compiler revisions.
Collaborate with architecture and validation teams to improve design-for-observability and post-silicon debug capabilities.
Tooling & Observability Architecture
Influence design of profiling and performance counter infrastructure in collaboration with silicon teams.
Guide evolution of ROCm profiling tools, trace systems, and low-level instrumentation interfaces.
Ensure alignment between hardware counters, compiler instrumentation, runtime telemetry, and developer-facing tools.
Establish reproducible measurement standards across lab and production environments.
Preferred Experience
Hands-on experience in GPU/accelerator or HPC performance engineering, microarchitecture analysis, compilers, runtime systems, and diagnostics.
Deep expertise in GPU microarchitecture: SIMD/CU design, wavefront scheduling, issue pipelines, cache hierarchies, shared/local memory, and interconnect fabrics.
Proven experience designing or leveraging hardware performance counters for bottleneck attribution and workload characterization.
Strong background in profiling, trace analysis, and performance modeling (pre- and post-silicon).
Demonstrated experience building diagnostic tooling: sanitizers, race detection, memory analysis, fuzzing, crash triage systems.
Strong Linux systems knowledge including kernel, drivers, and multi-GPU/multi-node environments.
Proficiency in C/C++ and Python; familiarity with LLVM IR, GPU ISA, and compiler backends.
Track record of delivering measurable performance improvements in production silicon and software stack.
Academic Credentials
Bachelors or Masters degree in electrical or computer science engineering.
Location Austin, TX (Flexible/Hybrid)
Visa Sponsorship This role is not eligible for visa sponsorship.
Benefits Benefits offered are described: AMD benefits at a glance.
Equal Opportunity Statement AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.
AI Screening AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here.
Posting Notice This posting is for an existing vacancy.
#J-18808-Ljbffr

United States

Sprachkenntnisse

English

Hinweis für Nutzer

Dieses Stellenangebot stammt von einer Partnerplattform von TieTalent. Klick auf „Jetzt Bewerben”, um deine Bewerbung direkt auf deren Website einzureichen.

Jetzt Bewerben