Jobbörse
Finde Jobs in deiner Nähe – ob vor Ort, hybrid oder remote.- Ähnliche Jobs zu: Scientific Data Engineer
Scientific Lead - Scientific Data Engineer
BioSpace, Inc.San FranciscoAt Lilly, we unite caring with discovery to make life better for people around the world. We are a global healthcare leader headquartered in Indianapolis, Indiana. Our employees around the world work
Scientific Lead - Scientific Data Engineer
Initial Therapeutics, Inc.San FranciscoWe are building something unprecedented — an AI foundation that will push the frontier on what is possible today across drug discovery research, from target identification and disease biology through
Lead Scientific Data Engineer
Virtual Vocations IncUnited StatesProviding senior technical leadership, the full-time Lead Scientific Data Engineer will develop implementation roadmaps and architectures for core scientific data systems, while leading the design of
Remote AI Scientific Reasoning Data Engineer
Codefeast EnterprisesNew YorkCodefeast Enterprises seeks a Scientific Reasoning & Discovery Engineer to design high-quality datasets enhancing scientific reasoning capabilities of LLMs. The role involves creating tasks that requi
Data Scientist/Data Engineer- Mid
Castalia SystemsCharlottesvilleCareer Opportunities with Castalia Systems A great place to work.Current job opportunities are posted here as they become available.Workplace Type : Onsite in Charlottesville, VAClearance : TS/SCI wit
Senior Data Scientist Engineer
FOUR SEA GROUP, INC.AuroraFour Sea Group is a 100% employee-owned technical mission support company based in Denver, Colorado. Founded by an aerospace industry veteran and PhD astrophysicist, we support complex defense, space,
Cloud Data Scientist & Sales Engineer
MaziCToolsNew YorkMaziCTools is hiring a full-time Cloud Sales Engineer to work with clients in San Francisco. The role involves delivering product demos and ensuring successful client onboarding to maximize value usin
Senior Data Scientist & GenAI Engineer Remote
Induct Hr SolutionsNew YorkInduct Hr Solutions is seeking an experienced Data Scientist + AI Engineer to develop and implement cutting-edge Generative AI solutions. You will play a critical role in delivering production-grade A
Data Scientist / AI/ML Engineer (Imagery) VAWFH
Global InfoTekRestonClearance Level: TS/SCIUS Citizenship: RequiredJob Classification: Regular, Full-TimeLocation: Reston, VAYears of Experience: 5-7 YearsEducation Level: Bachelor degree or Master’s degree is required i
Data Scientist & ML Engineer with Secret Clearance
CathexisfederalFalls ChurchCathexisfederal is seeking a dynamic Data Scientist/ML Engineer to enhance analytics capabilities for federal customers through innovative machine learning solutions. The ideal candidate will have a B
Federal Data Scientist & ML Engineer Secret Clearance
CATHEXISFalls ChurchCATHEXIS in Tysons, Virginia, is seeking a Data Scientist/ML Engineer to enhance analytical capabilities for federal clients. This role involves implementing Machine Learning algorithms, collaborating
Senior Data Scientist/Machine Learning Engineer - TS/SCI with Poly
CathexisfederalFalls ChurchTeam CATHEXIS elevates the government contracting experience through rapid response, deep skill, and thoughtful problem‑solving and communication. Our core capabilities are our top-tier program and pr
BI Engineer/Analyst Perfect Data Science & Analytics
OxagilePolandCompare your skills with our requirements Must-haveExperience with Power BI EmbeddedExperience enabling external data sharing / embedded reportingExperience working with FabricStrong DAX and semantic
Senior Data Engineer
Capital One FinancialCambridgeDo you love building and pioneering in the technology space? Do you enjoy solving complex business problems in a fast-paced, collaborative, inclusive, and iterative delivery environment? At Capital On
Lead Data Engineer
Capital One FinancialCambridgeDo you love building and pioneering in the technology space? Do you enjoy solving complex business problems in a fast-paced, collaborative, inclusive, and iterative delivery environment? At Capital On
Data Engineer, Data Solutions
NISA Investment AdvisorsClaytonData Engineer, Data SolutionsOverviewNISA Investment Advisors, LLC (NISA) partners with world-leading organizations to design, develop, and manage highly customized, risk-controlled investment strateg
Data Engineer
Legora ABWest StockholmAbout Us Legora is on a mission: to redefine how legal work gets done. From the very start we have been very clear about the fact that we are not building a solution for lawyers, we are building it wi
Data Engineer
CVS HealthDenverPosition SummaryManaging and designing ETL system to ingest and transform data for purposes of analysis and reporting; ensures monitoring, alerting, and high availability of the system to all data con
Data Engineer
AnthropicSan FranciscoAbout the Role As an Analytics Engineer, you will be an early member of the Data Science & Analytics team building the foundation to scale analytics across our organization. You will collaborate with
Data Engineer
CloudbridgeusaMarlboroughMinimum Education Requirement: This is a professional position, and as such, we require, at minimum, a Bachelor’s degree or higher (or equivalent) in computer science, computer information systems, in
Data Engineer
ViaSatTempeWhat you'll do As a member of our Data Platform team, you’ll help build infrastructure and tools for collecting and processing data in streaming and batch pipelines. Your work will power access to dat
Data Engineer
EmergencyMDNew YorkEnjoy problem-solving, need a venue to display your creativity, and emerging technologies pique your interest; if so, Barrow Wise Consulting, LLC is for you. As a multi-disciplined leader, you underst
Data Engineer
IsraelvcforumPolandData EngineerData & Information SystemsPolandMid- SeniorFull-timeDescription Guesty is the all-in-one platform helping hospitality businesses around the world automate, optimize, and scale their opera
Lead Data Engineer: Cloud, Big Data & ML
Capital OneCambridgeCapital One is hiring a Lead Data Engineer in Cambridge, MA. This role involves collaborating with Agile teams to design and develop technical solutions, utilizing programming languages such as Python
Lead Data Engineer: Architect Scalable Data Pipelines
Capital One Financial CorpCambridgeCapital One Financial is seeking a Lead Data Engineer based in McLean, Virginia. In this role, you'll own the technical roadmap and influence teams to deliver impactful data solutions. The ideal candi
Scientific Lead - Scientific Data Engineer
- San Francisco, California, United States
- San Francisco, California, United States
Über
The Opportunity We are building something unprecedented — an AI foundation that will push the frontier on what is possible today across drug discovery research, from target identification and disease biology through translational science.
AI4D Team The Applied Intelligence for Discovery (AI4D) team is a newly formed group within Lilly Research Laboratories that operates at the intersection of scientific delivery and core platform development. AI4D’s mission is connecting scientists to petabyte‑scale data through natural language interfaces, automated analysis workflows, and intelligent search — and to convert early deployments into repeatable system standards and evaluation practices that scale across therapeutic areas.
As a Scientific Data Engineer, you will close that gap. You will build the semantic layer, data harmonization infrastructure, AI‑ready data products, and lakehouse architecture that bridge how data is stored and how AI systems need to consume it. You will be working at the intersection of the data infrastructure team and the generative AI engineers who build the systems scientists interact with.
Responsibilities Data Harmonization and Lakehouse Architecture
Design and build the data architecture that transforms raw and processed omics data into harmonized, AI‑consumable layers
Build and optimize ETL/ELT pipelines that produce denormalized views, pre‑computed aggregations, embedding‑ready text representations, and feature stores optimized for AI system consumption
Implement data quality monitoring, automated profiling, and validation checks across harmonization layers
Create versioned, reproducible data snapshots that support model training, evaluation, and audit requirements in a regulated environment
Partner with the teams to extend harmonization patterns as data modalities expand beyond genomics and proteomics into spatial transcriptomics, perturbational data (Perturb‑Seq), single‑cell, and digital pathology
Semantic Layer and Schema Engineering
Design and maintain a semantic layer over Lilly’s multi‑omics databases that enables AI systems
Create comprehensive schema documentation: table descriptions, column‑level annotations, relationship mappings, business logic rules, and domain‑specific constraints (e.g., statistical thresholds, unit conventions, experimental design metadata)
Develop gold‑standard question/SQL pairs for each major database, in collaboration with computational biologists and Generative AI Engineers, to serve as training data, few‑shot examples, and evaluation benchmarks
Build and maintain a data dictionary and ontology mapping layer that translates how scientists think and speak about data (gene names, pathway terms, assay types) into how the data is physically stored
AI‑Ready Data Products
Build and manage vector embedding pipelines for scientific documents, study metadata, and structured data descriptions to power RAG‑based retrieval
Build integration pipelines that connect heterogeneous data sources — omics databases, internal publications, electronic lab notebooks, assay results, and clinical annotations — into a unified, queryable layer
Develop and enforce metadata standards that ensure new data sources are AI‑accessible from the point of ingestion, not retroactively
Design data products that serve multiple consumption patterns: direct SQL access for computational biologists, structured feeds for ML training pipelines, and semantic interfaces for LLM‑powered tools
Qualifications
Bachelors degree in Computer Science, Data Engineering, Bioinformatics, or a related field + 8 years data engineering experience OR Masters degree and 5 years data engineering experience
Additional Skills/Preferences
Phd in data or related field
Demonstrated expertise in building data pipelines, ETL/ELT workflows, and data products that serve downstream AI/ML systems
Strong SQL skills and experience with complex relational database schemas (hundreds of tables, multi‑level joins, domain‑specific conventions)
Experience with modern data platform technologies, including at least one of: Databricks, Snowflake, or equivalent lakehouse platforms
Experience with modern data engineering tools: dbt, Spark, Airflow, or similar orchestration and transformation frameworks
Proficiency in Python for data processing, scripting, and pipeline development
Experience with cloud data platforms (AWS preferred: Redshift, Athena, Glue, S3, or similar)
Familiarity with at least one of: vector databases, embedding pipelines, or semantic layer tooling
Strong communication skills — you can work effectively with both engineers who think in schemas and scientists who think in biology
Experience with biomedical or scientific data: omics datasets (RNA‑seq, proteomics, GWAS), clinical data, or laboratory information management systems
Experience in pharmaceutical, biotech, or life sciences environments
Familiarity with biomedical ontologies and controlled vocabularies (Gene Ontology, MeSH, ChEBI, HGNC) and their application to data integration
Experience building data products that serve AI/ML systems — feature stores, training datasets, evaluation benchmarks, or semantic annotations for text‑to‑SQL
Knowledge of data governance practices in regulated industries: data lineage, access controls, versioning, and auditability
Experience with knowledge graph technologies (Neo4j, Amazon Neptune, RDF/SPARQL) or graph‑based data modeling
Deep experience with Databricks ecosystem: Unity Catalog for data governance, Delta Lake for ACID transactions, MLflow integration, and Databricks SQL for analytics workloads
Experience designing data architectures that bridge traditional bioinformatics workflows (Nextflow, R/Bioconductor) with modern lakehouse consumption patterns
Lilly is dedicated to helping individuals with disabilities to actively engage in the workforce, ensuring equal opportunities when vying for positions. If you require accommodation to submit a resume for a position at Lilly, please complete the accommodation request form https://careers.lilly.com/us/en/workplace-accommodation for further assistance. Please note this is for individuals to request an accommodation as part of the application process and any other correspondence will not receive a response.
Lilly is proud to be an EEO Employer and does not discriminate on the basis of age, race, color, religion, gender identity, sex, gender expression, sexual orientation, genetic information, ancestry, national origin, protected veteran status, disability, or any other legally protected status.
Our employee resource groups (ERGs) offer strong support networks for their members and are open to all employees. Our current groups include: Africa, Middle East, Central Asia Network, Black Employees at Lilly, Chinese Culture Network, Japanese International Leadership Network (JILN), Lilly India Network, Organization of Latinx at Lilly (OLA), PRIDE (LGBTQ+ Allies), Veterans Leadership Network (VLN), Women’s Initiative for Leading at Lilly (WILL), enAble (for people with disabilities). Learn more about all of our groups.
Actual compensation will depend on a candidate’s education, experience, skills, and geographic location. The anticipated wage for this position is $166,500 - $266,200.
Full‑time equivalent employees also will be eligible for a company bonus (depending, in part, on company and individual performance). In addition, Lilly offers a comprehensive benefit program to eligible employees, including eligibility to participate in a company‑sponsored 401(k); pension; vacation benefits; eligibility for medical, dental, vision and prescription drug benefits; flexible benefits (e.g., healthcare and/or dependent day care flexible spending accounts); life insurance and death benefits; certain time off and leave of absence benefits; and well‑being benefits (e.g., employee assistance program, fitness benefits, and employee clubs and activities). Lilly reserves the right to amend, modify, or terminate its compensation and benefit programs in its sole discretion and Lilly’s compensation practices and guidelines will apply regarding the details of any promotion or transfer of Lilly employees.
#WeAreLilly
#J-18808-Ljbffr
Sprachkenntnisse
- English
Dieses Stellenangebot stammt von einer Partnerplattform von TieTalent. Klick auf „Jetzt Bewerben”, um deine Bewerbung direkt auf deren Website einzureichen.