Offres d'emploi
Trouvez des postes près de chez vous, sur site, hybrides ou à distance.- Emplois similaires à : Staff Network Engineer, Operations
Staff Network Engineer, Operations
Crusoe Energy Systems LLCSan FranciscoCrusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of th
Staff Network Engineer, Operations
CrusoeSan FranciscoCrusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of th
Senior Staff Network Engineer, Operations
CrusoeSan FranciscoCrusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of th
Staff HPC Network Engineer
AtomsSan FranciscoWho we are Atoms is building the machines that power the next era of progress. Over the last decade, software has transformed the digital world. But the physical world, where food is made, minerals ar
Staff Network Engineer, Deployment
CrusoeSan FranciscoCrusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of th
Staff Network Engineer, Deployment
Crusoe Energy SystemsSan FranciscoCrusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of th
Principal/Staff HPC Network Engineer
Electric CapitalSan FranciscoLocation San Francisco, CAEmployment Type Full timeDepartment EngineeringCompensation$250K – $325KWe're building the company which will de-risk the largest infrastructure build‑out in history.When peo
Senior Staff Network Engineer, Deployment
JobrSan FranciscoCrusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of th
Staff Frontend Engineer, Client Data & Networking Platform
NerdleveltechSan FranciscoAirbnb was born in 2007 when two hosts welcomed three guests to their San Francisco home, and has since grown to over 5 million hosts who have welcomed over 2 billion guest arrivals in almost every co
Principal Front-End Network Engineer
NscaleSan FranciscoHouston; New York; San Francisco; SeattleAbout Nscale Nscale is the GPU cloud engineered for AI. We provide cost‑effective, high‑performance infrastructure for AI start‑ups and large enterprise custom
Staff Software Engineer, Claude Design
AnthropicSan FranciscoAbout Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly
Staff Software Engineer - Backend
Sigma ComputingSan FranciscoAbout the Role Sigma is transforminghow businesses run by delivering a high performance platform on the modern data architecture. Hence, we are growing the engineering team and looking for engineers w
Backend Engineer - Mission Operations Services
Loft OrbitalSan FranciscoLoft Orbital is looking for a Backend Engineer to join our Mission Operations Services (MOS) team. MOS owns the backend services that let Loft operators and customers command, monitor, and automate ou
Staff Backend Engineer
Weave, Inc.San FranciscoWhy Weave Exists At Weave, our mission is to evolve how therapeutic knowledge is captured, transformed, and communicated throughout drug development. We do this by equipping human experts with AI inst
Staff Software Engineer - Frontend
Sigma ComputingSan FranciscoAbout the Role Sigma is transforminghow businesses run by delivering a high performance platform on modern data architecture. Hence, we are growing the engineering team and looking for engineers who a
Staff / Senior Staff Backend Engineer, B2B Flywheel
SwiftCruitSan FranciscoAbout the Team B2B Flywheel is a small, high-leverage team working at the intersection of data science, research, and engineering within OpenAI's B2B organization. The team is focused on building the
Network Engineer, Supercomputing
Thinking Machines LabSan FranciscoThinking Machines Lab’s mission is to empower humanity through advancing collaborative general intelligence. We're building a future where everyone has access to the knowledge and tools to make AI wor
Staff Backend Engineer
AtomsSan FranciscoWho we are Atoms is building the machines that power the next era of progress. Over the last decade, software has transformed the digital world. But the physical world, where food is made, minerals ar
Staff Backend Engineer & Product Engineering Lead
StravaSan FranciscoStrava is seeking a Staff Engineer based in San Francisco to design and build backend systems for athlete-facing features. You’ll lead technical direction, mentor engineers, and work closely with prod
Network Engineer
ClearBridge Technology GroupSan FranciscoOur client, a technology solutions provider company located in San Francisco, CA, needs a Network Engineer for permanent employment. The Network Engineer will work 4 days a week onsite in San Francisc
Staff Backend Engineer & Product Engineering Lead
TOGETHXRSan FranciscoStrava is looking for a Staff Engineer to lead the development of backend systems for athlete-facing features. You will collaborate with product managers and designers, focusing on shipping quality pr
Staff AI Frontend Engineer
David Joseph & CompanySan FranciscoSan Francisco, CA · On-site · Full-time Compensation:$250,000–$315,000 + competitive equityAbout the Company A fast-growing AI company building tools for enterprise transformation. Its platform deploy
Staff Embedded Software Engineer
Fruition GroupSan FranciscoDirect message the job poster from Fruition Group Embedded Software Engineer – Robotics (Senior/Staff Level) We’re partnered with a pioneering robotics company solving large-scale logistics challenges
Founding Frontend Engineer (Staff)
NextradarSan FranciscoVariant is code generation with creativity and taste. Instead of a confined conversation, you can generate endless designs from a single idea. Freely explore, discover directions you wouldn’t have tho
Staff Frontend Engineer
Theory VenturesSan FranciscoAbout DossDOSS is building an Operations Cloud for the real world. A modern, AI‑native platform for physical product businesses to manage the flow of goods, dollars, and data in real time, across:Proc
Staff Network Engineer, Operations
- San Francisco, California, United States
- San Francisco, California, United States
À propos
Crusoe Cloud is seeking a Staff Network Operations Engineer to help own production reliability across our global network infrastructure, including edge, backbone, data center fabric, and GPU cluster interconnects. This is a hands‑on production ownership role focused on incident response, root cause analysis, and operational excellence initiatives that keep our hyperscale AI infrastructure running at scale. Your work will directly affect the availability of AI workloads running across thousands of GPUs worldwide. The ideal candidate is a seasoned network engineer with deep operational experience in large‑scale environments who thrives in high‑pressure situations and takes pride in keeping systems healthy. You'll contribute to defining SLIs and SLOs, improving observability tooling, building automation to reduce toil, and mentoring peers — all while serving as a key escalation point during high‑severity network events. What You'll Be Working On
Production Reliability: Help own uptime across Crusoe's global edge, backbone, data center, and GPU cluster network, directly supporting AI workloads at scale.
Incident Response: Lead and contribute to end‑to‑end response for high‑severity network events, including mitigation, stakeholder communication, and postmortem documentation.
Root Cause Analysis: Drive RCAs for production incidents, identify systemic issues, and author remediation plans tracked through to closure.
Observability Improvements: Contribute to and improve Crusoe's network monitoring stack using streaming telemetry, SNMP, NetFlow, and tools such as Kentik, Grafana, Prometheus, and ThousandEyes.
Operational Standards: Author and maintain runbooks, escalation playbooks, and SOPs used across the operations team.
Operational Automation: Write Python‑based tooling to reduce toil, automate common remediation workflows, and accelerate mean time to resolution.
SLI/SLO Contribution: Partner with Architecture and SRE teams to define and track network reliability metrics and service level objectives backed by real‑time dashboards.
Mentorship: Provide technical guidance to Senior engineers and contribute to a culture of operational excellence and continuous learning.
What You'll Bring to the Team
8+ years of production network engineering experience with a focus on operations, incident response, and reliability in large‑scale or internet‑scale environments.
Hands‑on experience with observability and monitoring tools including streaming telemetry, SNMP, NetFlow/sFlow, Grafana, Prometheus, and ThousandEyes.
Experience operating RDMA/RoCE lossless fabrics for GPU or HPC workloads, including familiarity with PFC, ECN, and DCQCN tuning.
Expert hands‑on knowledge of BGP, EVPN‑VXLAN, IS‑IS, OSPF, MPLS, QoS, and TCP/IP in production data center environments.
Proficiency with Arista (EOS) and Juniper (Junos) platforms in leaf‑spine CLOS architectures across multi‑vendor environments.
Python proficiency for writing auto‑remediation scripts, diagnostic tooling, and operational automation.
Comfort operating large device fleets across multi‑region environments with on‑call responsibility, including experience as an escalation point during critical events.
Bachelor's degree in Computer Science, Electrical Engineering, or a related field, or equivalent practical experience.
Bonus Points
Experience with NVIDIA/Mellanox networking platforms in GPU cluster environments.
Familiarity with Kentik or Arbor for traffic analysis and DDoS visibility.
Experience defining or contributing to SLIs and SLOs in partnership with SRE or product teams.
Exposure to operating 10K+ device fleets across hyperscale or cloud environments.
Background contributing to post‑incident learning programs or operational excellence initiatives org‑wide.
Benefits
Competitive compensation and equity packages
Restricted Stock Units
Paid time off, paid holidays & leave of absence programs
Comprehensive health, dental & vision insurance
Employer contributions to HSA account
Paid parental leave
Paid life insurance, short-term and long-term disability
Professional development & tuition reimbursement
Mental health & wellness support
Commuter benefits (parking & transit)
Cell phone stipend
401(k) Retirement plan with company match up to 4% of salary
Volunteer time off
Global travel insurance & emergency assistance
Daily meals allowance
Additional perks & programs specific to location
Compensation Range
Compensation will be paid in the range of up to $195,000 -$235,000 + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicant's knowledge, education, and abilities, as well as internal equity and alignment with market data. Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.
#J-18808-Ljbffr
Compétences linguistiques
- English
Cette offre provient d’une plateforme partenaire de TieTalent. Cliquez sur « Postuler maintenant » pour soumettre votre candidature directement sur leur site.