Jobbörse
Finde Jobs in deiner Nähe – ob vor Ort, hybrid oder remote.- Ähnliche Jobs zu: Machine Learning Infrastructure Engineer
Machine Learning Infrastructure Engineer
TRM LabsSan FranciscoBuild a Safer World. TRM Labs provides blockchain analytics and AI solutions to help law enforcement and national security agencies, financial institutions, and cryptocurrency businesses detect, inves
Machine Learning Infrastructure Engineer
Workshop LabsSan FranciscoBuild the infrastructure to serve personal AI models privately and at scale. We're building the first truly private, personal AI – one that learns your skills, judgment, and preferences without big te
Machine Learning Infrastructure Engineer
ZensorsSan FranciscoAbout Zensors Zensors is the spatial intelligence platform for the physical world. Our AI platform provides real-time insights—from airport queue times to office utilization—helping organizations make
Machine Learning Infrastructure Engineer
Mind Robotics Inc.Palo AltoThe Role At Mind Robotics, we’re building generalized physical AI —robotic systems capable of dexterous, adaptive, and reasoning-intensive work in real-world industrial environments. Our ability to it
Machine Learning Infrastructure Engineer
Garuda VenturesPalo AltoLocation Palo Alto Employment Type Full time Location Type On-site Department Software Engineering We’re hiring Machine Learning Infrastructure Engineers to build the systems that make large-scale mod
Machine Learning Infrastructure Engineer Intern
PlusAI, Inc.Santa ClaraPlusAI is a Physical AI company pioneering AI-based virtual driver software for factory-built autonomous trucks. Headquartered in Silicon Valley with operations in the United States and Europe, Plus w
Staff Machine Learning Infrastructure Engineer
Dyna RoboticsRedwood CityCompany Overview:Dyna Robotics makes general-purpose robots powered by a proprietary embodied AI foundation model that generalizes and self-improves across varied environments with commercial-grade pe
Software Engineer (Machine Learning Infrastructure)
WhatnotSan FranciscoJoin the Future of Commerce with Whatnot! Whatnot is the largest livestream shopping platform in North America and Europe to buy, sell, and discover the things you love. Whether it's trading cards, fa
Machine Learning Infrastructure Engineer, GenAI Technology
Point72 Asset Management, L.PNew YorkA Career with Point72’s Technology Team As Point72 reimagines the future of investing, our Technology team is constantly evolving our firm’s IT infrastructure and engineering capabilities, positioning
Backend/Infrastructure Engineer
AtomicsemiSan FranciscoAbout Atomic Semi Atomic Semi is building a small, fast semiconductor fab.It’s already possible to build this with today’s technology and a few simplifications. We’ll build the tools ourselves so we c
Software Engineer - Data Infrastructure
Fable Security LLPSan FranciscoAbout Fable Security AI-driven threats and human error are today’s biggest enterprise security risks. Cybercriminals don’t hack systems—they exploit people. Human errors drive 70% of security breaches
Software Engineer (Data Infrastructure)
PersonaSan FranciscoAbout Persona Persona is the configurable identity platform built for businesses in a digital-first world. Verifying individuals and organizations is harder — but more important — than ever, with AI e
Backend Software Engineer - Infrastructure
jobs.frontdoordefense.com - JobboardSan FranciscoBackend Software Engineer – Infrastructure Location: San Francisco Bay AreaCompensation: $135,000 - 200,000 USD / yearAbout The Role Palantir builds the world’s leading software for data‑driven decisi
Software Engineer, Data Infrastructure
OpenAISan FranciscoAbout the Team Data Platform at OpenAI owns the foundational data stack powering critical product, research, and analytics workflows. We operate some of the largest Spark compute fleets in production;
Rust Backend Engineer Rust Infrastructure
ArtOfBlockchainSan FranciscoSvix is hiring a Rust Backend Engineer to help build scalable developer infrastructure focused on reliable server-to-server communication and webhook delivery. The engineering team is focused on high-
Senior Backend Engineer, AI Infrastructure
Antler LtdSan FranciscoAbout CodeIntegrity CodeIntegrity builds security infrastructure for AI agents. As AI systems connect to critical tools, sensitive data, and real-world actions, we help organizations prevent prompt in
Software Engineer, Infrastructure - Analytics Platform
OpenAISan FranciscoAbout the TeamThe Scaling team designs, builds, and operates critical infrastructure that enables research at OpenAI. Our mission is simple: accelerate the progress of research towards AGI. We do this
Android Engineer, ChatGPT Mobile Infrastructure
SlopeSan FranciscoAbout the Team We’re building the core Kotlin-based platform—from architecture to CI—that powers ChatGPT, Sora, and other fast-evolving Android apps. Android is a key platform for ChatGPT, and our foc
iOS Engineer, ChatGPT Mobile Infrastructure
SlopeSan FranciscoAbout the Team We’re building up the core Swift platform, from architecture to CI, that enables hundreds of engineers (and their agents) to develop new capabilities in ChatGPT, Atlas, Sora, and other
Staff Backend Engineer Payments Infrastructure
Repovive, Inc.San Francisco##### ###### ##### ### # # ### # # ######## ## ## ## ## ## ## # # # # # ####### #### ##### # # # # # # # ###### # ## ## ## ## # # # # # #### # ###### ## ### # ### # ###### $ curl repovive.com/jobs/69d
Software Engineer, Data Infrastructure - Bytes
I did my part and supported the Regular ToiletSan FranciscoAdept is working to advance a people-centric approach to AI that optimizes for what’s actually most useful for people and their work. You can see this approach in the technology we’re building: models
Software Engineer, Data Infrastructure San Francisco
PersonaSan FranciscoAbout Persona Persona is the configurable identity platform built for businesses in a digital-first world. Verifying individuals and organizations is harder — but more important — than ever, with AI e
Staff Backend Software Engineer Messaging & AI Infrastructure
EmeraldadvantageconceptsSan FranciscoWhy Join This is a rare opportunity to join one of the fastest‑growing enterprise messaging platforms in the world — a company that went from $0 to $27M revenue in 18 months and is on track for $100M+
Staff Data Infrastructure Engineer - Analytics at Scale
100 Salesforce, Inc.San Francisco100 Salesforce, Inc. is looking for a Staff Software Engineer to join the Data Infrastructure team. This role involves designing and operating reliable, scalable data infrastructure that supports anal
Senior Data Infrastructure Engineer - Real-Time Analytics
Judgment Labs Inc.San FranciscoJudgment Labs builds infrastructure for Agent Behavior Monitoring (ABM). While traditional observability focuses on logging exceptions and latency, our ABM surfaces behavioral anomalies such as instru
Machine Learning Infrastructure Engineer
- San Francisco, California, United States
- San Francisco, California, United States
Über
TRM Labs provides blockchain analytics and AI solutions to help law enforcement and national security agencies, financial institutions, and cryptocurrency businesses detect, investigate, and disrupt crypto-related fraud and financial crime. TRM’s blockchain intelligence and AI platforms include solutions to trace the source and destination of funds, identify illicit activity, build cases, and construct an operating picture of threats. TRM is trusted by leading agencies and businesses worldwide who rely on TRM to enable a safer, more secure world for all.
At TRM, we’re on a mission to build a safer financial system for billions of people around the world. Our next-generation platform, which combines threat intelligence with machine learning, enables financial institutions and governments to detect cryptocurrency fraud and financial crime at an unprecedented scale.
As a Senior Software Engineer, ML Infrastructure at TRM Labs, you will collaborate with data scientists, engineers, and product managers to design and operate scalable GPU-backed infrastructure that powers TRM’s AI systems. You will work at the intersection of distributed systems, cloud infrastructure, GPU performance engineering, and applied machine learning — building the foundation that enables high-throughput, production-grade ML workloads.
The impact you’ll have here:
Design and operate GPU cluster infrastructure.
Build and manage GPU-backed environments in cloud settings, including orchestration, autoscaling, resource isolation, and workload management across multiple concurrent models and users.
Optimize high-throughput inference.
Implement and tune serving systems that maximize token throughput, batching efficiency, GPU occupancy, and cost effectiveness across interactive and batch workloads.
Enable distributed inference strategies.
Support and operationalize model parallelism, tensor parallelism, and other distributed serving patterns for large-scale models.
Implement model optimization and compilation workflows.
Integrate and optimize acceleration stacks such as TensorRT, ONNX Runtime, vLLM, FlashAttention, and related tooling to improve performance and reduce inference cost.
Schedule heterogeneous workloads.
Design systems that manage multiple models, multiple users, and mixed workload types across heterogeneous accelerators (e.g., NVIDIA GPUs, Inferentia), ensuring predictable performance under varying demand.
Build observability into ML infrastructure.
Instrument systems to measure GPU load, memory utilization, batching efficiency, queue depth, and token throughput, and use data to continuously improve performance and reliability.
Partner across engineering teams.
Work closely with infrastructure, ML, and product teams to ensure models transition smoothly from experimentation to production-grade, highly available services.
What we’re looking for:
Bachelor’s degree (or equivalent) in Computer Science or related field.
5+ years of experience building and operating distributed systems or infrastructure in production environments.
Experience deploying and operating ML/LLM inference workloads on GPU clusters in cloud environments (AWS and/or GCP).
Deep understanding of high-throughput inference systems, including batching strategies, token throughput optimization, and the trade-offs between latency, throughput, and cost.
Experience with one or more ML serving frameworks such as Triton Inference Server, vLLM, Ray Serve, ONNX Runtime, or HuggingFace Optimum.
Experience optimizing GPU load, memory efficiency, and performance bottlenecks in production systems.
Familiarity with distributed inference strategies including model parallelism and tensor parallelism.
Experience working with Kubernetes or equivalent orchestration systems in cloud environments.
Familiarity with heterogeneous accelerators (e.g., Inferentia) is a plus.
CUDA familiarity and experience debugging GPU-related issues is a plus.
Adaptable. Goals can change fast. You anticipate and react quickly.
Autonomous. You own what you work on. You move fast and get things done.
Excellent communication. You communicate complex ideas effectively to both technical and non-technical audiences, verbally and in writing.
Collaborative. You work effectively in a cross-functional team and with people at all levels in an organization.
Life at TRM We are building a safer world. That promise shows up in how we work every day.
TRM moves quickly. We are a high velocity, high ownership team that expects clarity, follow-through, and impact. People who thrive here are energized by hard problems, experimentation, and continuous feedback. If something takes months elsewhere, it will ship here in days.
Our work sits at the intersection of AI, national security, and fighting financial crime. The problems are complex, the stakes are real, and the environment evolves quickly. The pace and intensity of the work reflect the importance of the mission. As a result, the way we operate requires a high level of ownership, adaptability, collaboration, and creative problem-solving.
At TRM, you should expect:
Priorities and targets to change quickly as we experiment and iterate
Work that often requires operating with a high degree of ambiguity
A high level of personal ownership and accountability
Close collaboration across teams and functions
Frequent, high-touch communication
• Creative problem solving and out-of-the-box thinking
A pace that rewards urgency, adaptability, and outcomes
This environment is energizing for people who enjoy building, solving hard problems, and making progress in situations that are not always fully defined. It also requires comfort navigating ambiguity, adjusting course as new information emerges, and maintaining focus and positivity in a fast-moving and intense environment.
We also recognize that this style of operating is not for everyone. If you are primarily optimizing for predictability or a consistently balanced workload, we encourage you to use the interview process to pressure test whether this environment is truly the right fit. We want teammates who thrive here, not just survive here.
At the same time, many people find this work deeply rewarding. If you are excited by meaningful problems, motivated by ambitious goals, and energized by working alongside mission-driven colleagues, there is a good chance you will find TRM to be an exceptional place to grow and contribute. Learn more: TRM Interviewing at TRM: How We Hire and What Success Looks Like
AI Fluency at TRM AI fluency is a baseline expectation at TRM.
We believe AI meaningfully changes how top performers operate. We expect every team member to use AI to accelerate and reimagine their craft, not just automate surface tasks.
At TRM, AI fluency means you are among the top 10 percent of operators in your function in how you apply AI to:
Accelerate repeatable workflows
Structure and solve problems
Improve output quality
Increase speed and leverage
You will be evaluated on applied AI fluency during the interview process.
Leadership Principles We hire and grow against three leadership principles. They’re the standards for how we operate, treat each other, and make decisions.
Impact-Oriented Trailblazer: We put customers first and move with speed, focus, and adaptability. We treat every plan like an experiment – test, ship, measure, and iterate quickly.
Master Craftsperson: We care deeply about our craft. We balance speed with high standards, own outcomes end‑to‑end, and invest in getting better everyday.
Inspiring Colleague: We add clarity and energy, not noise. We bring humility, candor, and a one‑team mindset — giving and receiving feedback to make the team stronger.
The impact you will have This work has real stakes. Depending on your role at TRM, your week might look like:
Driving critical investigations that can’t wait for typical business hours.
Shipping products in days when others would schedule quarters.
Partnering with teams across time zones to deliver insights while the story is still unfolding.
Building new solutions from first principles when the playbook doesn’t yet exist.
Protecting victims and customers by tracing illicit activity and disrupting criminal networks.
Join our Mission At TRM we care deeply about our craft. We are looking for individuals who want their work to matter, who experiment with speed and rigor, and who take pride in building a safer world for billions of people. If you’re excited by TRM’s mission but don’t check every box, we encourage you to apply — we hire for slope, judgment, and the will to learn fast.
TRM is a Series C company with $220M in total funding, backed by Blockchain Capital, Goldman Sachs, Bessemer, Y Combinator, Thoma Bravo, and others. Headquartered in San Francisco, TRM operates as a distributed-first company with hubs in Los Angeles, San Francisco, New York, Washington D.C., London, and Singapore.
Privacy Policy and Additional Information By submitting your application, you are agreeing to allow TRM to process your personal information in accordance with the TRM Privacy Policy.
Our typical hiring cycles for specialized roles span 24 to 36 months. Accordingly, we retain your personal information for up to 36 months to evaluate your application and to consider you for current and future employment opportunities, unless you request earlier deletion or a different retention period is required or permitted by law.
To notify TRM Labs that you believe this job posting is non-compliant, please submit a report through this form. No response will be provided to inquiries unrelated to job posting compliance.
We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this form.
Recruitment agencies TRM Labs does not accept unsolicited agency resumes. Please do not forward resumes to TRM employees. TRM Labs is not responsible for any fees related to unsolicited resumes and will not pay fees to any third-party agency or company without a signed agreement.
Learn More Learn More: Company Values | Interviewing | FAQs
#J-18808-Ljbffr
Sprachkenntnisse
- English
Dieses Stellenangebot stammt von einer Partnerplattform von TieTalent. Klick auf „Jetzt Bewerben”, um deine Bewerbung direkt auf deren Website einzureichen.