Offres d'emploi
Trouvez des postes près de chez vous, sur site, hybrides ou à distance.- Emplois similaires à : Data Engineer, Scientific Data Ingestion
Scientific Lead - Scientific Data Engineer
BioSpace, Inc.San FranciscoAt Lilly, we unite caring with discovery to make life better for people around the world. We are a global healthcare leader headquartered in Indianapolis, Indiana. Our employees around the world work
Senior Data Engineer
GallupSan FranciscoEngineer data systems that change how people live and work. As a senior data engineer at Gallup, you’ll play a key role in designing, developing and optimizing the data systems that underpin our flags
Cloud-Native Data Platform Engineer for Scalable Analytics
Women In BioSan FranciscoWomen In Bio is seeking a Senior Data Platform Engineer in San Francisco, CA. This role involves implementing end-to-end data solutions and managing cloud infrastructure. Candidates should have a Bach
Knowledge Graph Data Engineer - Semantic AI & Routing
SalesforceSan FranciscoSalesforce is looking for a software engineer to design and implement its Enterprise Knowledge Graph platform. This role involves building scalable data pipelines, ensuring high system reliability, an
Data Engineer II San Francisco
KnitSan FranciscoKnit Health is building a novel clinical foundation model to improve the way healthcare is delivered. We combine expertise in AI with deep clinical knowledge to develop safe, trustworthy systems that
Data Engineer
SupioSan FranciscoWho We’re Looking For We’re looking for a Data Engineer who brings intellectual curiosity and genuine drive to understand the business and product — not just the data. This isn’t a role for someone wh
Data Engineer
BonfirevcSan FranciscoData EngineerSeattle Hybrid • San Francisco HybridProductHybridFull-timeAbout Us Supio is a trusted AI platform purpose-built for law firms, reshaping how data drives impactful outcomes. Our innovativ
Data Engineer for AI & Product Insights
SupioSan FranciscoSupio is seeking a Data Engineer in San Francisco to build essential data infrastructure and improve product strategies through deep analysis of user interactions. The ideal candidate has over 4 years
Data Infrastructure Engineer
DroydSan FranciscoAbout the team Droyd builds autonomous robotic systems that take on repetitive manual work in real environments. Our robots generate high-rate video, telemetry, and demonstration data every time they
Senior Data Infrastructure Engineer - Robot Data Pipelines
DroydSan FranciscoDroyd in San Francisco is seeking a Staff Software Engineer focused on data infrastructure. You will own data pipelines that convert robot telemetry into valuable training signals. Collaborate directl
Data Platform Engineer Scale Real-Time Analytics & Infra
FairygodbossSan FranciscoDoorDash is seeking a Data Platform Engineer based in the Bay Area to lead the vision and strategy for a rapidly growing analytics framework. You will scale the platform for increasing data workloads
Customer-Facing AI Data Engineer & Solutions Architect
EONSan FranciscoEon is looking for a Field Data Engineer (FDE) to build and deploy data solutions for major enterprises. You'll take ownership of technical relationships, transforming real business problems into cust
Data Engineer
11xSan FranciscoAbout The Role We're looking for a Data Engineer who wants to operate like a founder. Not someone who wants to spend their time maintaining dashboards, moving tickets across a board, or optimizing pip
Lead Data Engineer
Qcells North AmericaSan FranciscoPosition Description We are seeking a Lead Data Engineer to architect, build, and lead the development of scalable, cloud-based data platforms that support enterprise analytics, operational reporting,
Senior Data Engineer (1043) - DataSF - Office of City Administrator
City-and-County-of-SAN-FranciscSan FranciscoSenior Data Engineer (1043) – DataSF – Office of City Administrator Employment Type: Full‑time, Permanent ExemptJob Code and Title: 1043‑IS Engineer‑SeniorJob class: 1043 Senior Data EngineerRole type
Data Engineer
Neon RedwoodSan FranciscoAbout Neon Redwood Neon Redwood is a data services consulting company, working on cutting‑edge AI and data‑driven solutions. We are a team of passionate engineers and data experts, and we are currentl
Senior Backend Engineer - Scale High-Throughput APIs & Data
Judgment LabsSan FranciscoJudgment Labs is seeking a Senior Backend Engineer in San Francisco, CA. You will design and build high-throughput backend services that process agent telemetry efficiently. This role involves owning
Senior Data Engineer
NimbleRxSan FranciscoRequirements 5+ years of experience building production data pipelines and platforms Deep Python (PySpark) and SQL fluency, including tuning Spark jobs at scale The skills and willingness to work on t
Senior Data Engineer: Build Autonomous Plant Data Backbone
Mariana MineralsSan FranciscoMariana Minerals is seeking a Senior Data Engineer based in San Francisco, California, to build critical data pipelines for their autonomous mineral refining operations. You'll design and implement ro
Platform Data Engineer II: BigQuery & Cloud Pipelines
Neon RedwoodSan FranciscoNeon Redwood, a data services consulting company in San Francisco, is seeking an experienced Data Engineer II to enhance their data infrastructure. The candidate should have at least 2 years of experi
Senior Data Engineer - Hybrid City Data Platform
City-and-County-of-SAN-FranciscSan FranciscoThe City and County of San Francisco is hiring a Senior Data Engineer for the DataSF team to ensure robust data infrastructure. This role includes managing Snowflake, developing data pipelines, and co
Sr. Data Engineer
Neon RedwoodSan FranciscoAbout Neon Redwood Neon Redwood is a data services consulting company, working on cutting-edge AI and data-driven solutions. We are a team of passionate engineers and data experts, and we are currentl
Sr. Data Engineer
Mariana MineralsSan FranciscoMariana Minerals is a software-first, vertically integrated minerals company on a mission to supply the critical minerals powering modern energy, AI, and defense technologies. We’re reimagining the mi
Staff Backend Engineer, Scalable Data Pipelines | Remote
OwnerSan FranciscoOwner.com is seeking a backend services expert to manage data pipelines and improve system reliability. This is a pivotal role contributing to the success of Grader, a fast-growing product for restaur
Founding Analytics Engineer Data Architect for Growth
Success Matcher RecruitmentSan FranciscoSuccess Matcher Recruitment is seeking a founding analytics engineer to build a robust data layer from scratch. This role offers total ownership over analytics data models and direct influence on comp
Scientific Lead - Scientific Data Engineer
- San Francisco, California, United States
- San Francisco, California, United States
À propos
The Opportunity We are building something unprecedented — an AI foundation that will push the frontier on what is possible today across drug discovery research, from target identification and disease biology through translational science.
AI4D Team The Applied Intelligence for Discovery (AI4D) team is a newly formed group within Lilly Research Laboratories that operates at the intersection of scientific delivery and core platform development. AI4D’s mission is connecting scientists to petabyte‑scale data through natural language interfaces, automated analysis workflows, and intelligent search — and to convert early deployments into repeatable system standards and evaluation practices that scale across therapeutic areas.
As a Scientific Data Engineer, you will close that gap. You will build the semantic layer, data harmonization infrastructure, AI‑ready data products, and lakehouse architecture that bridge how data is stored and how AI systems need to consume it. You will be working at the intersection of the data infrastructure team and the generative AI engineers who build the systems scientists interact with.
Responsibilities Data Harmonization and Lakehouse Architecture
Design and build the data architecture that transforms raw and processed omics data into harmonized, AI‑consumable layers
Build and optimize ETL/ELT pipelines that produce denormalized views, pre‑computed aggregations, embedding‑ready text representations, and feature stores optimized for AI system consumption
Implement data quality monitoring, automated profiling, and validation checks across harmonization layers
Create versioned, reproducible data snapshots that support model training, evaluation, and audit requirements in a regulated environment
Partner with the teams to extend harmonization patterns as data modalities expand beyond genomics and proteomics into spatial transcriptomics, perturbational data (Perturb‑Seq), single‑cell, and digital pathology
Semantic Layer and Schema Engineering
Design and maintain a semantic layer over Lilly’s multi‑omics databases that enables AI systems
Create comprehensive schema documentation: table descriptions, column‑level annotations, relationship mappings, business logic rules, and domain‑specific constraints (e.g., statistical thresholds, unit conventions, experimental design metadata)
Develop gold‑standard question/SQL pairs for each major database, in collaboration with computational biologists and Generative AI Engineers, to serve as training data, few‑shot examples, and evaluation benchmarks
Build and maintain a data dictionary and ontology mapping layer that translates how scientists think and speak about data (gene names, pathway terms, assay types) into how the data is physically stored
AI‑Ready Data Products
Build and manage vector embedding pipelines for scientific documents, study metadata, and structured data descriptions to power RAG‑based retrieval
Build integration pipelines that connect heterogeneous data sources — omics databases, internal publications, electronic lab notebooks, assay results, and clinical annotations — into a unified, queryable layer
Develop and enforce metadata standards that ensure new data sources are AI‑accessible from the point of ingestion, not retroactively
Design data products that serve multiple consumption patterns: direct SQL access for computational biologists, structured feeds for ML training pipelines, and semantic interfaces for LLM‑powered tools
Qualifications
Bachelors degree in Computer Science, Data Engineering, Bioinformatics, or a related field + 8 years data engineering experience OR Masters degree and 5 years data engineering experience
Additional Skills/Preferences
Phd in data or related field
Demonstrated expertise in building data pipelines, ETL/ELT workflows, and data products that serve downstream AI/ML systems
Strong SQL skills and experience with complex relational database schemas (hundreds of tables, multi‑level joins, domain‑specific conventions)
Experience with modern data platform technologies, including at least one of: Databricks, Snowflake, or equivalent lakehouse platforms
Experience with modern data engineering tools: dbt, Spark, Airflow, or similar orchestration and transformation frameworks
Proficiency in Python for data processing, scripting, and pipeline development
Experience with cloud data platforms (AWS preferred: Redshift, Athena, Glue, S3, or similar)
Familiarity with at least one of: vector databases, embedding pipelines, or semantic layer tooling
Strong communication skills — you can work effectively with both engineers who think in schemas and scientists who think in biology
Experience with biomedical or scientific data: omics datasets (RNA‑seq, proteomics, GWAS), clinical data, or laboratory information management systems
Experience in pharmaceutical, biotech, or life sciences environments
Familiarity with biomedical ontologies and controlled vocabularies (Gene Ontology, MeSH, ChEBI, HGNC) and their application to data integration
Experience building data products that serve AI/ML systems — feature stores, training datasets, evaluation benchmarks, or semantic annotations for text‑to‑SQL
Knowledge of data governance practices in regulated industries: data lineage, access controls, versioning, and auditability
Experience with knowledge graph technologies (Neo4j, Amazon Neptune, RDF/SPARQL) or graph‑based data modeling
Deep experience with Databricks ecosystem: Unity Catalog for data governance, Delta Lake for ACID transactions, MLflow integration, and Databricks SQL for analytics workloads
Experience designing data architectures that bridge traditional bioinformatics workflows (Nextflow, R/Bioconductor) with modern lakehouse consumption patterns
Lilly is dedicated to helping individuals with disabilities to actively engage in the workforce, ensuring equal opportunities when vying for positions. If you require accommodation to submit a resume for a position at Lilly, please complete the accommodation request form https://careers.lilly.com/us/en/workplace-accommodation for further assistance. Please note this is for individuals to request an accommodation as part of the application process and any other correspondence will not receive a response.
Lilly is proud to be an EEO Employer and does not discriminate on the basis of age, race, color, religion, gender identity, sex, gender expression, sexual orientation, genetic information, ancestry, national origin, protected veteran status, disability, or any other legally protected status.
Our employee resource groups (ERGs) offer strong support networks for their members and are open to all employees. Our current groups include: Africa, Middle East, Central Asia Network, Black Employees at Lilly, Chinese Culture Network, Japanese International Leadership Network (JILN), Lilly India Network, Organization of Latinx at Lilly (OLA), PRIDE (LGBTQ+ Allies), Veterans Leadership Network (VLN), Women’s Initiative for Leading at Lilly (WILL), enAble (for people with disabilities). Learn more about all of our groups.
Actual compensation will depend on a candidate’s education, experience, skills, and geographic location. The anticipated wage for this position is $166,500 - $266,200.
Full‑time equivalent employees also will be eligible for a company bonus (depending, in part, on company and individual performance). In addition, Lilly offers a comprehensive benefit program to eligible employees, including eligibility to participate in a company‑sponsored 401(k); pension; vacation benefits; eligibility for medical, dental, vision and prescription drug benefits; flexible benefits (e.g., healthcare and/or dependent day care flexible spending accounts); life insurance and death benefits; certain time off and leave of absence benefits; and well‑being benefits (e.g., employee assistance program, fitness benefits, and employee clubs and activities). Lilly reserves the right to amend, modify, or terminate its compensation and benefit programs in its sole discretion and Lilly’s compensation practices and guidelines will apply regarding the details of any promotion or transfer of Lilly employees.
#WeAreLilly
#J-18808-Ljbffr
Compétences linguistiques
- English
Cette offre provient d’une plateforme partenaire de TieTalent. Cliquez sur « Postuler maintenant » pour soumettre votre candidature directement sur leur site.