Über
Biohub is the first large‑scale initiative bringing frontier AI models, massive compute, and frontier experimental capabilities under one roof. It focuses on building a general‑purpose system to accelerate scientific discovery by integrating AI models, biological foundation models, and laboratory capabilities with the goal of curing disease.
The Team Biohub is a 501(c)(3) biomedical research organization combining frontier AI with frontier biology to solve disease. We develop technology that lets scientists worldwide use AI‑powered biology to study cellular systems and understand, diagnose, and correct disease.
The Opportunity The Data Engineering team owns the strategy, sourcing, and implementation of data that supports AI research and development. Your work will maximize the speed, agility, and capability of biological AI research by connecting public data resources and Biohub’s experimental platforms to AI systems. You will design pipelines that ingest hundreds of terabytes to petabytes of data—genomic, imaging, spatial, temporal, molecular, and metadata—and transform it into AI‑ready datasets for training frontier models.
What You’ll Do
Design and build data pipelines that process genomic and imaging data at petabyte scale
Solve performance and bandwidth challenges with creative engineering
Build agent‑based systems for automated dataset curation, quality control, and workflow generation
Create tooling for data cataloging and registration that makes datasets discoverable and accessible
Collaborate with AI Research teams to translate model requirements into data specifications, and with scientists to integrate public and internal data into large‑scale AI‑ready datasets
Improve pipeline reliability and observability, working toward 99%+ success rates without manual intervention
What You’ll Bring
5+ years experience building reliable, operable data systems at scale (100s terabytes to petabytes)
Strong software engineering fundamentals
Experience deploying distributed computing frameworks such as Databricks, Spark, or Ray for large‑scale data processing
Experience with cloud infrastructure (AWS preferred) and HPC environments
Comfort with ambiguity; ability to make progress when requirements are evolving
Interest in AI‑native development practices and tooling
Nice to have: background in computational biology, bioinformatics, or life sciences and experience with genomics datasets and formats (FASTQ, BAM, VCF) or imaging formats (OME‑Zarr, HDF5)
Compensation The Redwood City, CA and New York City, NY base pay range for a new hires is $241,000–$301,000 for the Senior Data Engineer role (5+ years of experience required), $270,000–$338,000 for the Staff Data Engineer role (8+ years of experience required), and $323,000–$404,000 for the Senior Staff Data Engineer role (12+ years of experience required). Candidates must have equivalent years of experience to be considered for each level. Leveling is determined during the interview process. New hires are typically hired into the lower portion of the range, enabling employee growth over time.
Better Together This hybrid position requires you to be onsite at least 60% of the working month—approximately three days a week—with specific in‑office days determined by the team manager. The schedule will be at the hiring manager’s discretion and communicated during the interview process.
Benefits for the Whole You
Generous employer match on employee 401(k) contributions
Paid time off to volunteer at an organization of your choice
Relocation support for employees who need assistance moving
Equal Employment Opportunity As set forth in Biohub’s Equal Employment Opportunity policy, we do not discriminate on the basis of any protected group status under any applicable law.
#J-18808-Ljbffr
Sprachkenntnisse
- English
Hinweis für Nutzer
Dieses Stellenangebot stammt von einer Partnerplattform von TieTalent. Klick auf „Jetzt Bewerben”, um deine Bewerbung direkt auf deren Website einzureichen.