Jobbörse
Finde Jobs in deiner Nähe – ob vor Ort, hybrid oder remote.- Ähnliche Jobs zu: Lead Scientific Data Engineer
Scientific Lead - Scientific Data Engineer
BioSpace, Inc.San FranciscoAt Lilly, we unite caring with discovery to make life better for people around the world. We are a global healthcare leader headquartered in Indianapolis, Indiana. Our employees around the world work
Lead Scientific Data Engineer
Virtual Vocations IncUnited StatesProviding senior technical leadership, the full-time Lead Scientific Data Engineer will develop implementation roadmaps and architectures for core scientific data systems, while leading the design of
Global Sales Lead - Scientific Instruments
GranutoolsNew YorkGranutools, a fast-growing technology company in the United States, is seeking a dynamic Sales Manager to lead and grow our global sales operations. This role requires a deep understanding of scientif
Data Scientist/Data Engineer- Mid
Castalia SystemsCharlottesvilleCareer Opportunities with Castalia Systems A great place to work.Current job opportunities are posted here as they become available.Workplace Type : Onsite in Charlottesville, VAClearance : TS/SCI wit
Lead Data Scientist
Technify TalentNew YorkRemote - must be UK-basedThe Opportunity We’re supporting a growing technology business developing advanced AI-driven systems operating in complex, real-world environments.Their platform brings togeth
Lead Data Scientist
HumanaTampaLead Data Scientist The Enterprise AI organization at Humana is a pioneering force, driving AI innovation across our Insurance and CenterWell business segments. By collaborating with world-leading exp
Senior Data Scientist / Engineer
CloudseedAiNew YorkCloudseed Inc is dedicated to solving the problem of global food and water security through the use of Artificial Intelligence (AI) and other emerging technologies. Our mission is to develop tools for
Lead Property & Casualty Data Scientist
Argonaut Management Services, IncHoustonCompany Shared Services Argo and Farm Family are specialty property and casualty insurance brands whose underwriting companies are wholly-owned subsidaries of Clearbrook Holdings Inc. Argo and Farm Fa
Senior/Lead Data Scientist, Data&Analytics (Nashville, TN)
StarbucksNashvilleNow Brewing– Senior/Lead Data Scientist! #tobeapartner From the beginning, Starbucks set out to be a different kind of company. One that not only celebrated coffee and rich tradition, but that also br
Lead Data Engineer (Data Engineer)
Indiana UniversityBloomingtonJob SummaryPerforms advanced data management tasks, including complex data modeling, conversion, de‑duplication, migration, and identification and repair of data quality issues.Designs, develops, and
Lead Data Scientist - Autonomous Goal Management
HumanaSalt Lake CityBecome a part of our caring community and help us put health firstThe Enterprise AI organization at Humana is a pioneering force, driving AI innovation across our Insurance and CenterWell business seg
Remote Principal Data Scientist AI Strategy Lead
mSupplyNew YorkMSupply is seeking a Principal Data Scientist responsible for leading the data science function and managing a talented team. This role will drive the AI roadmap, oversee key modeling initiatives, and
Remote Data Scientist AI Training & Evaluation Lead
questzoricaNew Yorkquestzorica is looking for a talented Data Scientist to join their remote team in the USA. This role involves designing and delivering high-quality training content in AI and data science, as well as
Lead Data Scientist, Card Intelligence & ML
Capital OneNew YorkCapital One in New York is looking for a Principal Data Scientist to leverage vast customer data and build machine learning models. You will collaborate with a diverse team to enhance customer experie
Lead Data Scientist, Health Marketing Analytics
6AM CitySpringfieldAbelsonTaylor Group is looking for an Associate Director, Data Science to enhance health and wellness marketing via data analytics. You'll automate data collection, develop dashboards, and lead junior
Senior Data Scientist: Lead ML & Advanced Analytics
BrillioEdisonBrillio seeks a Principal Data Scientist in Edison, NJ, with 15-18 years of experience in data science. This role involves designing and implementing statistical models, leading projects, and mentorin
Senior Data Scientist & GenAI Engineer Remote
Induct Hr SolutionsNew YorkInduct Hr Solutions is seeking an experienced Data Scientist + AI Engineer to develop and implement cutting-edge Generative AI solutions. You will play a critical role in delivering production-grade A
Lead Data Scientist ML Delivery & Client Leadership
OneSixNew YorkOneSix, a leading consultancy in data and AI, seeks a Lead Data Scientist to drive machine learning solutions for clients, from startups to Fortune 500 firms. This role combines technical contribution
Data Scientist & ML Engineer with Secret Clearance
CathexisfederalFalls ChurchCathexisfederal is seeking a dynamic Data Scientist/ML Engineer to enhance analytics capabilities for federal customers through innovative machine learning solutions. The ideal candidate will have a B
Lead Enterprise Data Scientist AI, Forecasting & GTM
Koitecc SolutionsSunnyvaleCrowdStrike, Inc. is looking for a Lead Enterprise Data Scientist in Sunnyvale, California. This role is pivotal in developing AI-driven insights and predictive analytics, enhancing various crucial bu
Lead Data Scientist - Remote, Digital Twin & AI Lab
STORDAtlantaStord Inc. is seeking a Lead Data Scientist based in Atlanta, GA. This role involves leading the development of digital twin models and embedding AI systems into operational workflows. You will work c
Lead Data Engineer
Phase2 TechnologyAustinJob Title Lead Data EngineerPurpose The Lead Data Engineer for the UT Data Hub improves university outcomes and advances the UT mission to transform lives for the benefit of society by increasing the
Lead Data Engineer
LennarDoralLead Data Engineer We are LennarLennar is one of the nation's leading homebuilders, dedicated to making an impact and creating an extraordinary experience for their Homeowners, Communities, and Associ
Data Scientist & ML Engineer - Hybrid, Impactful Projects
ManpowerGroup Global, Inc.WaterfordManpowerGroup Global, Inc. is seeking a Data Scientist / Machine Learning Engineer -Software Engineer 4 to join their dynamic team in Charlotte, NC. In this role, you will support innovative projects
Lead Data Scientist, AI/ML for Defense Analytics
Lockheed Martin CorporationHartfordLockheed Martin seeks a visionary team leader in Data Analytics Innovations to develop advanced analytics for proactive decision-making in defense. Collaborate with multidisciplinary teams and enhance
Scientific Lead - Scientific Data Engineer
- San Francisco, California, United States
- San Francisco, California, United States
Über
The Opportunity We are building something unprecedented — an AI foundation that will push the frontier on what is possible today across drug discovery research, from target identification and disease biology through translational science.
AI4D Team The Applied Intelligence for Discovery (AI4D) team is a newly formed group within Lilly Research Laboratories that operates at the intersection of scientific delivery and core platform development. AI4D’s mission is connecting scientists to petabyte‑scale data through natural language interfaces, automated analysis workflows, and intelligent search — and to convert early deployments into repeatable system standards and evaluation practices that scale across therapeutic areas.
As a Scientific Data Engineer, you will close that gap. You will build the semantic layer, data harmonization infrastructure, AI‑ready data products, and lakehouse architecture that bridge how data is stored and how AI systems need to consume it. You will be working at the intersection of the data infrastructure team and the generative AI engineers who build the systems scientists interact with.
Responsibilities Data Harmonization and Lakehouse Architecture
Design and build the data architecture that transforms raw and processed omics data into harmonized, AI‑consumable layers
Build and optimize ETL/ELT pipelines that produce denormalized views, pre‑computed aggregations, embedding‑ready text representations, and feature stores optimized for AI system consumption
Implement data quality monitoring, automated profiling, and validation checks across harmonization layers
Create versioned, reproducible data snapshots that support model training, evaluation, and audit requirements in a regulated environment
Partner with the teams to extend harmonization patterns as data modalities expand beyond genomics and proteomics into spatial transcriptomics, perturbational data (Perturb‑Seq), single‑cell, and digital pathology
Semantic Layer and Schema Engineering
Design and maintain a semantic layer over Lilly’s multi‑omics databases that enables AI systems
Create comprehensive schema documentation: table descriptions, column‑level annotations, relationship mappings, business logic rules, and domain‑specific constraints (e.g., statistical thresholds, unit conventions, experimental design metadata)
Develop gold‑standard question/SQL pairs for each major database, in collaboration with computational biologists and Generative AI Engineers, to serve as training data, few‑shot examples, and evaluation benchmarks
Build and maintain a data dictionary and ontology mapping layer that translates how scientists think and speak about data (gene names, pathway terms, assay types) into how the data is physically stored
AI‑Ready Data Products
Build and manage vector embedding pipelines for scientific documents, study metadata, and structured data descriptions to power RAG‑based retrieval
Build integration pipelines that connect heterogeneous data sources — omics databases, internal publications, electronic lab notebooks, assay results, and clinical annotations — into a unified, queryable layer
Develop and enforce metadata standards that ensure new data sources are AI‑accessible from the point of ingestion, not retroactively
Design data products that serve multiple consumption patterns: direct SQL access for computational biologists, structured feeds for ML training pipelines, and semantic interfaces for LLM‑powered tools
Qualifications
Bachelors degree in Computer Science, Data Engineering, Bioinformatics, or a related field + 8 years data engineering experience OR Masters degree and 5 years data engineering experience
Additional Skills/Preferences
Phd in data or related field
Demonstrated expertise in building data pipelines, ETL/ELT workflows, and data products that serve downstream AI/ML systems
Strong SQL skills and experience with complex relational database schemas (hundreds of tables, multi‑level joins, domain‑specific conventions)
Experience with modern data platform technologies, including at least one of: Databricks, Snowflake, or equivalent lakehouse platforms
Experience with modern data engineering tools: dbt, Spark, Airflow, or similar orchestration and transformation frameworks
Proficiency in Python for data processing, scripting, and pipeline development
Experience with cloud data platforms (AWS preferred: Redshift, Athena, Glue, S3, or similar)
Familiarity with at least one of: vector databases, embedding pipelines, or semantic layer tooling
Strong communication skills — you can work effectively with both engineers who think in schemas and scientists who think in biology
Experience with biomedical or scientific data: omics datasets (RNA‑seq, proteomics, GWAS), clinical data, or laboratory information management systems
Experience in pharmaceutical, biotech, or life sciences environments
Familiarity with biomedical ontologies and controlled vocabularies (Gene Ontology, MeSH, ChEBI, HGNC) and their application to data integration
Experience building data products that serve AI/ML systems — feature stores, training datasets, evaluation benchmarks, or semantic annotations for text‑to‑SQL
Knowledge of data governance practices in regulated industries: data lineage, access controls, versioning, and auditability
Experience with knowledge graph technologies (Neo4j, Amazon Neptune, RDF/SPARQL) or graph‑based data modeling
Deep experience with Databricks ecosystem: Unity Catalog for data governance, Delta Lake for ACID transactions, MLflow integration, and Databricks SQL for analytics workloads
Experience designing data architectures that bridge traditional bioinformatics workflows (Nextflow, R/Bioconductor) with modern lakehouse consumption patterns
Lilly is dedicated to helping individuals with disabilities to actively engage in the workforce, ensuring equal opportunities when vying for positions. If you require accommodation to submit a resume for a position at Lilly, please complete the accommodation request form https://careers.lilly.com/us/en/workplace-accommodation for further assistance. Please note this is for individuals to request an accommodation as part of the application process and any other correspondence will not receive a response.
Lilly is proud to be an EEO Employer and does not discriminate on the basis of age, race, color, religion, gender identity, sex, gender expression, sexual orientation, genetic information, ancestry, national origin, protected veteran status, disability, or any other legally protected status.
Our employee resource groups (ERGs) offer strong support networks for their members and are open to all employees. Our current groups include: Africa, Middle East, Central Asia Network, Black Employees at Lilly, Chinese Culture Network, Japanese International Leadership Network (JILN), Lilly India Network, Organization of Latinx at Lilly (OLA), PRIDE (LGBTQ+ Allies), Veterans Leadership Network (VLN), Women’s Initiative for Leading at Lilly (WILL), enAble (for people with disabilities). Learn more about all of our groups.
Actual compensation will depend on a candidate’s education, experience, skills, and geographic location. The anticipated wage for this position is $166,500 - $266,200.
Full‑time equivalent employees also will be eligible for a company bonus (depending, in part, on company and individual performance). In addition, Lilly offers a comprehensive benefit program to eligible employees, including eligibility to participate in a company‑sponsored 401(k); pension; vacation benefits; eligibility for medical, dental, vision and prescription drug benefits; flexible benefits (e.g., healthcare and/or dependent day care flexible spending accounts); life insurance and death benefits; certain time off and leave of absence benefits; and well‑being benefits (e.g., employee assistance program, fitness benefits, and employee clubs and activities). Lilly reserves the right to amend, modify, or terminate its compensation and benefit programs in its sole discretion and Lilly’s compensation practices and guidelines will apply regarding the details of any promotion or transfer of Lilly employees.
#WeAreLilly
#J-18808-Ljbffr
Sprachkenntnisse
- English
Dieses Stellenangebot stammt von einer Partnerplattform von TieTalent. Klick auf „Jetzt Bewerben”, um deine Bewerbung direkt auf deren Website einzureichen.