Dieses Stellenangebot ist nicht mehr verfügbar

Software Engineering Lakehouse Performance Engineer Professional Austin, US

IBM

Austin, Texas, United States

Austin, Texas, United States

Über

Responsibilities Industry-standard benchmarks: Run, maintain, and continuously improve reproducible benchmarks across watsonx.data configurations and against competitive offerings. Customer‑representative workloads: Build and curate workload suites that reflect real customer query mixes, data volumes, concurrency profiles, and freshness requirements: not just synthetic benchmarks. Reproducibility & rigor: Ensure every published result is reproducible end‑to‑end: controlled environments, pinned versions, locked datasets, documented methodology, variance analysis, and statistically defensible reporting. Cost‑per‑performance metrics: Operationalize canonical price‑performance KPIs ($/query, $/TB scanned, $/training‑token, queries/sec/$, TCO at workload mix); instrument workloads, collect data, and produce repeatable scorecards. Telemetry pipeline: Build and maintain metrics, traces, profiles, GPU/CPU utilization, query plan, and IO telemetry that flow from benchmark runs into the performance data store. Dashboards & scorecards: Develop dashboards that surface trends, regressions, and competitive position to engineering, leadership, and external audiences. Regression gates: Operate performance regression gates in CI/CD; triage failures, file and drive issues with engine, storage, and GPU teams, and verify fixes. Root‑cause analysis: Drill into slow queries and GPU/CPU bottlenecks using profilers (Nsight, perf, async‑profiler, pprof, flamegraphs) and query plan inspection to pinpoint regressions and improvement opportunities. Performance environment ownership: Own the lifecycle of the dedicated performance environment(s) supporting watsonx.data: GPU and CPU clusters, networking, storage, and the orchestration that schedules workloads onto them. Test fleet automation: Build and maintain infrastructure‑as‑code (Terraform/Ansible/Helm) for provisioning, configuring, and resetting test environments deterministically across on‑prem hardware and cloud (IBM Cloud and partner clouds). Benchmark harness platform: Develop and operate the benchmark harness itself: job scheduler, run orchestration, dataset provisioning, result capture, artifact storage, and the API/CLI other teams use to launch runs. Dataset & result warehouse: Own the curated datasets used for benchmarking and the warehouse of historical results that powers trend analysis, regression detection, and competitive comparisons. Capacity & utilization: Manage capacity and utilization of the performance lab so concurrent campaigns from different teams run cleanly and without interference. Self‑service for engineers: Provide engineers across watsonx.data with self‑service paths to run standardized performance experiments against well‑known baselines, lowering the cost of evidence‑based engineering decisions. Produce data, charts, and write‑ups: Feed internal quarterly scorecards and external performance whitepapers, blog posts, and analyst briefings. Participate in design reviews and code reviews: Flag risks early and propose measurable acceptance criteria. Document workloads, harnesses, lab usage, and results: So the next engineer, internal or external, can reproduce what you ran. Qualifications Required education: Bachelor's Degree Professional experience: 8+ years of professional software engineering experience with at least 2 years focused on performance engineering, benchmarking, or SRE for a data platform, database, distributed system. Programming skills: Proficient in at least one of Python, Go, Java, plus shell scripting and modern automation tooling. Analytics engine knowledge: Working knowledge of at least one modern analytics engine (Presto/Trino, Spark, DuckDB, ClickHouse, or comparable) and at least one open table format (Iceberg, Delta, or Hudi). Performance tooling: Hands‑on experience with Linux performance tooling (perf, ftrace, eBPF), profilers (Nsight, async‑profiler, pprof), and query plan analysis. Infrastructure‑as‑code: Fluency in at least one of Terraform, Ansible, Pulumi, or Helm; comfort writing and maintaining the automation, not just consuming it. Preferred skills - GPU processing: Hands‑on experience with GPU‑accelerated data processing (RAPIDS/cuDF, Velox/Theseus‑class engines, CUDA) and the GPU memory hierarchy (HBM, NVLink, PCIe trade‑offs). Preferred - publishing performance results: Experience publishing or co‑authoring peer‑reviewed or industry‑recognised performance results (TPC, MLPerf, ClickBench, LST‑Bench, or similar). Preferred - multi‑tenant performance lab: Experience operating a multi‑tenant performance lab or shared test fleet where multiple teams ran experiments concurrently. Preferred - benchmark harness: Experience building bespoke benchmark harnesses or workload generators, including dataset generation at TB+ scale. Preferred - AI performance: Familiarity with vector search, retrieval‑augmented generation (RAG), and AI inference/training performance characterisation. Preferred - FinOps: Familiarity with FinOps and cloud unit economics—translating raw performance numbers into $/performance and TCO conclusions. Preferred - open‑source contributions: Contributions to relevant open‑source projects (Iceberg, Trino, Spark, Arrow, Velox, RAPIDS, OpenTelemetry, perf‑tooling, etc.). Preferred - performance experiments: Hands‑on experience designing and running performance experiments: controlling for variance, isolating variables, and producing clear, defensible results. Preferred - infrastructure: Experience operating real infrastructure: Linux servers, Kubernetes, container runtimes, networking basics, and object storage. Preferred - observability tooling: Comfort with observability tooling: metrics (Prometheus), tracing/telemetry (OpenTelemetry), and dashboards (Grafana or equivalent). Benefits Healthcare benefits including medical, prescription drug, dental, vision, and mental health coverage. Retirement plans: 401(k) and pension plan. Paid time off: vacation, sick days, and parental bonding leave. Training and educational resources on an AI‑driven learning platform. Employee resource groups, volunteer opportunities, and other company‑wide benefits. Equal‑Opportunity Employer IBM is proud to be an equal‑opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, gender, gender identity or expression, sexual orientation, national origin, genetics, pregnancy, disability, neurodivergence, age, or other characteristics protected by the applicable law. IBM is also committed to compliance with all fair employment practices regarding citizenship and immigration status.
#J-18808-Ljbffr

Austin, Texas, United States

Sprachkenntnisse

English

Hinweis für Nutzer

Dieses Stellenangebot wurde von einem unserer Partner veröffentlicht. Sie können das Originalangebot einsehen hier.