Jobbörse
Finde Jobs in deiner Nähe – ob vor Ort, hybrid oder remote.- Ähnliche Jobs zu: Senior Staff Network Engineer, Operations
Senior Staff Network Engineer, Operations
ProducePaySan FranciscoCrusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of th
Staff Network Engineer, Operations
Crusoe Energy Systems LLCSan FranciscoCrusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of th
Senior Front-End Network Engineer, AI Infrastructure Operations
NscaleSan FranciscoSenior Front-End Network Engineer, AI Infrastructure Operations Houston; New York; San Francisco; SeattleAbout Nscale Nscale is the GPU cloud engineered for AI. We provide cost‑effective, high‑perform
Senior Principal Front-End Network Engineer, AI Infrastructure Operations
NscaleSan FranciscoSenior Principal Front-End Network Engineer, AI Infrastructure Operations Houston; New York; San Francisco; SeattleAbout Nscale Nscale is the GPU cloud engineered for AI. We provide cost-effective, hi
Staff Network Engineer, Deployment
Crusoe Energy SystemsSan FranciscoCrusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of th
Staff Fiber Network Engineer
anthropicSan FranciscoAnthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group o
Principal/Staff HPC Network Engineer
Electric CapitalSan FranciscoLocation San Francisco, CAEmployment Type Full timeDepartment EngineeringCompensation$250K – $325KWe're building the company which will de-risk the largest infrastructure build‑out in history.When peo
Senior Network Engineer - Firewalls, VPNs & Switches
InfoIMAGE, Inc.San FranciscoInfoIMAGE, Inc. is seeking a Network Engineer based in San Francisco, California. This role is pivotal for maintaining and engineering our IP networks, focusing on the design and deployment of firewal
Senior Data Center Network Engineer - GPU Clusters
BaseTenSan FranciscoBaseten is hiring a Network Engineer (Data Centers) in San Francisco to design and own the high-performance network infrastructure for their GPU clusters. This senior role collaborates closely with ha
Senior Cloud Network Engineer Traffic & Data Plane
IBM ComputingSan FranciscoIBM Computing in San Francisco is seeking skilled software engineers to join the Traffic team focused on cloud networking. Candidates should have 2+ years of experience in cloud infrastructure and be
Senior NOC Engineer - Data-Driven AI Network Ops
ObsidianSan FranciscoObsidian is seeking a Level 3 / Tier 3 network support engineer to work on data science and autonomous infrastructure projects. In this role, you will troubleshoot and analyze data from networks, help
Senior/Staff Backend Engineer
Dormont Manufacturing CompanySan FranciscoJoin us and help shape the future of AI by architecting next-generation knowledge systems.Values we look for:Unwavering Integrity with Purpose : We operate with complete honesty and stay true to our m
Staff / Senior Staff Backend Engineer, B2B Flywheel
SwiftCruitSan FranciscoAbout the Team B2B Flywheel is a small, high-leverage team working at the intersection of data science, research, and engineering within OpenAI's B2B organization. The team is focused on building the
Senior Staff Machine Learning Systems Engineer, Indexing & Retrieval Search
TensecSan FranciscoTeam The ML Indexing & Retrieval Platform team at Reddit is responsible for building and scaling the core infrastructure that powers machine learning driven recommendations. We design and maintain sys
Senior/Staff Machine Learning Research Engineer, General Agents, Enterprise GenAI
Scale AISan FranciscoScale AI is the data foundation for AI, helping organizations build and deploy reliable production AI applications. We partner with leading enterprises and government organizations to accelerate their
Senior Staff Backend Engineer - AI Finance Platform
United States Digital Space LLCSan FranciscoUnited States Digital Space LLC in San Francisco is looking for an experienced backend engineer to join their Codex for Finance team. This role involves designing and scaling systems to support AI inn
Senior Staff Backend Software Engineer, API Platform
United States Digital Space LLCSan FranciscoAbout the Team Our team brings the company’s most capable technology to the world through our developer platform: the the company API. As the leading AI development platform, our API is used by millio
Senior Staff Software Engineer - Mobile AI Platform Lead
United States Digital Space LLCSan FranciscoUnited States Digital Space LLC is seeking a Senior Staff Software Engineer to serve as the technical lead for Claude's core chat experience on mobile platforms. You will shape the technical direction
Staff / Senior Staff Backend Software Engineer, Workspace Agents
United States Digital Space LLCSan FranciscoAbout the TeamThe Workspace Agents team builds the product and platform foundations that bring powerful, reliable agents into ChatGPT workspaces. The team recently launched Workspace Agents in ChatGPT
Senior Staff Machine Learning Engineer, Consumer
FairygodbossSan FranciscoAbout the Team The Consumer Engineering Team is responsible for helping consumers discover and order everything they love globally. Our work spans the entire consumer journey across homepage, search,
Senior Staff Data Engineer
Unchain DataSan FranciscoAbout UsCircle (NYSE: CRCL) is one of the world's leading internet financial platform companies, building the foundation of a more open, global economy through digital assets, payment applications, an
Senior Staff/Frontend Engineer
HiringlySan FranciscoAbout the job Senior Staff/Frontend Engineer Job Description: Senior Frontend Engineer ResponsibilitiesDesign and implement user interfaces with a focus on user experience and performance. Collaborate
Senior or Staff Computer Vision Engineer
Menlo VenturesSan FranciscoHover helps people design, improve, and protect the properties they love. With proprietary AI built on over a decade of real property data, Hover answers age‑old questions like “What will it look like
Senior Staff Software Engineer (Search)
Dormont Manufacturing CompanySan FranciscoAbout the Team We are on a mission to build a reliable, fast, and scalable search for DoorDash. As a product, we help millions of customers find the stores, items, and experiences they want. As a plat
Senior Frontend Staff Engineer, Vue.js for B2B SaaS
Dormont Manufacturing CompanySan FranciscoStuut is transforming accounts receivable for B2B companies—making collections smarter and faster for companies that have historically relied on manual processes that are labor intensive and costly. O
Senior Staff Network Engineer, Operations
- San Francisco, California, United States
- San Francisco, California, United States
Über
Crusoe Cloud is seeking a Senior Staff Network Operations Engineer to own production reliability across our global network, including edge, backbone, data center fabric, and GPU cluster interconnects. You will drive incident response, root cause analysis, and the operational excellence initiatives that keep our hyperscale AI infrastructure healthy at scale. This is a senior production ownership role, not architecture, not pre-sales, not purely automation. You will set operational standards, define SLIs and SLOs, mentor Staff and Senior engineers, and serve as the senior escalation point during high-severity events. This is the role that keeps the network up. What You'll Be Working On
Own Production Reliability:
Serve as the senior technical owner for uptime of Crusoe's global edge, backbone, data center, and GPU cluster network, directly affecting the availability of AI workloads running on hundreds of thousands of GPUs.
Lead Incident Response:
Own end-to-end response for high-severity network events, including rapid mitigation, stakeholder communication, and postmortem documentation that prevents recurrence.
Drive Root Cause Analysis:
Lead RCAs for production incidents, identify systemic issues, author remediation plans, and track them to closure.
Define SLIs and SLOs:
Partner with Architecture and Site Reliability to define network reliability metrics and service level objectives, backed by real-time dashboards and alerting.
Set Operational Standards:
Author and maintain runbooks, escalation playbooks, and SOPs used by the broader operations team.
Improve Observability:
Drive continuous improvement of Crusoe's network monitoring stack including streaming telemetry, SNMP, NetFlow, and tools such as Kentik, Grafana, Prometheus, and ThousandEyes.
Build Operational Automation:
Write Python-based auto-remediation tooling that reduces toil and accelerates mean time to resolution for known failure modes.
Mentor and Multiply:
Provide technical guidance to Staff and Senior engineers. Drive post-incident learning and build a culture of operational excellence across the team.
What You'll Bring to the Team
12+ years of production network engineering experience with a demonstrated focus on large-scale operations, incident response, and reliability in hyperscale or internet-scale environments.
Observability and Monitoring:
Hands-on experience with streaming telemetry, SNMP, NetFlow, sFlow, and tools such as Kentik, Grafana, Prometheus, ThousandEyes, and Arbor.
GPU Cluster and RDMA Networking:
Hands-on experience operating RDMA/RoCE (v1 and v2) lossless fabrics for GPU and HPC workloads, including PFC, ECN, and DCQCN tuning. Required at this level.
Demonstrated Technical Leadership:
Proven track record owning production reliability at scale, leading RCAs that drove systemic change, and setting operational standards the broader org executes against.
Hyperscale Operational Depth:
Comfort operating 10K+ device fleets across multi-region environments with 24/7 on-call responsibility. You have been the senior escalation point during critical network events.
Protocol Fluency:
Expert hands-on knowledge of BGP, EVPN-VXLAN, IS-IS, OSPF, MPLS, QoS, and TCP/IP across production DC fabric environments at scale.
Hardware Platform Depth:
Expert knowledge of Arista (EOS), Juniper (Junos), and NVIDIA/Mellanox platforms in leaf-spine CLOS architectures across multi-vendor environments.
Operational Automation:
Proficiency in Python for auto-remediation scripts, diagnostic tooling, and operational workflows that reduce toil and accelerate incident resolution.
SLI and SLO Ownership:
Experience defining and owning network reliability metrics and service level objectives in partnership with engineering and product leadership.
Education:
Bachelor's degree in Computer Science, Electrical Engineering, or a related field, or equivalent practical experience in hyperscale or internet-scale environments.
Benefits
Competitive compensation
Restricted Stock Units
Paid time off & paid holidays
Comprehensive health, dental & vision insurance
Employer contributions to HSA account
Paid parental leave
Paid life insurance, short-term and long-term disability
Professional development & tuition reimbursement
Mental health & wellness support
Commuter benefits (parking & transit)
Cell phone stipend
401(k) Retirement plan with company match up to 4% of salary
Volunteer time off
Compensation
Compensation will be paid in the range of $225,000 - $275,000 + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicant’s education, experience, knowledge, skills, and abilities, as well as internal equity and alignment with market data. Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.
#J-18808-Ljbffr
Sprachkenntnisse
- English
Dieses Stellenangebot stammt von einer Partnerplattform von TieTalent. Klick auf „Jetzt Bewerben”, um deine Bewerbung direkt auf deren Website einzureichen.