Jobbörse
Finde Jobs in deiner Nähe – ob vor Ort, hybrid oder remote.- Ähnliche Jobs zu: Scientific Data Engineer
Scientific Lead - Scientific Data Engineer
Initial Therapeutics, Inc.San FranciscoWe are building something unprecedented — an AI foundation that will push the frontier on what is possible today across drug discovery research, from target identification and disease biology through
Scientific Lead - Scientific Data Engineer
BioSpace, Inc.San FranciscoAt Lilly, we unite caring with discovery to make life better for people around the world. We are a global healthcare leader headquartered in Indianapolis, Indiana. Our employees around the world work
Senior Scientific Data Engineer
TetraScienceUnited StatesAbout TetraScienceTetraScience is the Scientific Data and AI Company building Tetra OS, the operating system for scientific intelligence. We help the world's leading life sciences firms turn fragmente
Remote AI Scientific Reasoning Data Engineer
Codefeast EnterprisesNew YorkCodefeast Enterprises seeks a Scientific Reasoning & Discovery Engineer to design high-quality datasets enhancing scientific reasoning capabilities of LLMs. The role involves creating tasks that requi
Senior Scientific Data Engineer (Institutional Informatics Team - Joint Genome Institute)
Lawrence Berkeley National LaboratoryBerkeleyBerkeley Lab's Joint Genome Institute has an opening for a Senior Scientific Data Engineer to join the Institutional Informatics Team. In this exciting role, you will provide technical expertise suppo
Data Scientist/Data Engineer- Mid
Castalia SystemsCharlottesvilleCareer Opportunities with Castalia Systems A great place to work.Current job opportunities are posted here as they become available.Workplace Type : Onsite in Charlottesville, VAClearance : TS/SCI wit
Data Scientist / Software Engineer
MixModePortlandOpen Position Data Scientist / Software Engineer Remote / Hybrid (Portland, OR HQ)Competitive Salary plus EquityFull-timeAbout Prophetic Real estate development is a multi-billion-dollar industry that
Data Engineer/Scientist for Navy Submarine Programs
SercoCambridgeSerco is seeking a Data Engineer/Scientist to support the U.S. Navy's Team Submarine Program. This role involves collaborating with a dynamic team at the Washington Navy Yard. The ideal candidate will
Cloud Data Scientist & Sales Engineer
MaziCToolsNew YorkMaziCTools is hiring a full-time Cloud Sales Engineer to work with clients in San Francisco. The role involves delivering product demos and ensuring successful client onboarding to maximize value usin
Staff Software Engineer - Data Scientist
6AM CityCaliforniaJob Description Who We Are Baton is seeking ambitious individuals who desire the autonomy and agility of a startup environment combined with the backing, power, reach, and stability of a highly respec
Lead Data Engineer
Capital OneCambridgeDo you love building and pioneering in the technology space? Do you enjoy solving complex business problems in a fast‑paced, collaborative, inclusive, and iterative delivery environment? At Capital On
Senior Data Engineer - Pipelines, Cloud & Data Solutions
Capital One FinancialCambridgeCapital One Financial in Philadelphia seeks a Senior Associate, Data Engineer to support the design and development of data architectures. The successful candidate will build and optimize data pipelin
Senior Data Engineer
Capital One FinancialCambridgeDo you love building and pioneering in the technology space? Do you enjoy solving complex business problems in a fast-paced, collaborative, inclusive, and iterative delivery environment? At Capital On
AI Data Engineer
Pfizer BelgiumCambridgePOSITION SUMMARY As a member of our cross-functional Data Ecosystem Team, you will help build and scale an AI‑ready data architecture supporting In‑Vivo biology labs. In this role, you will leverage y
Data Engineer
Beyondsoft GroupSeattleJob Description: Support scalable data operations through development of ETL processes, SQL-based integrations, Power Platform solutions, and Power BI reporting capabilities. Design, build, maintain,
Data Engineer
Base PowerAustinAbout Base Base is America’s next-generation power company. We’re rebuilding the foundation of modern civilization–electricity–by deploying a vast network of distributed batteries that is transforming
Data Engineer
Aplos DataColumbusData Engineer | USA Here, your work runs in the real world.Location: USAEmployment: Full Time or ContractorWho We Are Where others stop, we start.At Aplos Data, we help organizations turn fragmented d
Data Engineer
Berkshire Hathaway GUARD Insurance CompaniesPhoenixOverview Berkshire Hathaway GUARD Insurance Companies is a nationwide Property & Casualty insurer backed by Berkshire Hathaway.We are hiring a senior‑level Data Engineer to build trustworthy data prod
Data Engineer
SeuratUnited StatesData EngineerSeurat is transforming manufacturing for people and our planet by delivering a scalable additive manufacturing solution to fundamentally change how products are made. Seurat's proprietary
Data Engineer
NTT DATAUnited StatesData EngineerNTT DATA strives to hire exceptional, innovative and passionate individuals who want to grow with us. If you want to be part of an inclusive, adaptable, and forward-thinking organization,
Data Architect / Data Engineer
The Lockwood GroupHiberniaPosition Summary Provide data architecture/engineering support to the U.S. Army Combat Capabilities Development Command – Armaments Center (DEVCOM-AC) Headquarters and its subordinate directorates loc
Data Engineer, Data Solutions
NISA Investment AdvisorsClaytonData Engineer, Data SolutionsOverviewNISA Investment Advisors, LLC (NISA) partners with world-leading organizations to design, develop, and manage highly customized, risk-controlled investment strateg
Data Engineer, Marketing Data Operations
NBCUniversalOrlandoWe are currently looking for exceptional candidates to fill positions in the below categories.Data Engineer, Marketing Data Operations Apply now Job no: 661852 Work type: Regular (Full-Time) Location:
Data Engineer
Legora ABWest StockholmAbout Us Legora is on a mission: to redefine how legal work gets done. From the very start we have been very clear about the fact that we are not building a solution for lawyers, we are building it wi
Data Engineer
X4 TechnologyPolandWe are seeking experienced Data Engineer's in Snowflake OR Pyspark to support the design, assessment, and optimisation of modern cloud-based data platforms. This role will focus on Databricks and Meda
Scientific Lead - Scientific Data Engineer
- San Francisco, California, United States
- San Francisco, California, United States
Über
Responsibilities Data Harmonization and Lakehouse Architecture
Design and build the data architecture that transforms raw and processed omics data into harmonized, AI-consumable layers
Build and optimize ETL/ELT pipelines that produce denormalized views, pre-computed aggregations, embedding‑ready text representations, and feature stores optimized for AI system consumption
Implement data quality monitoring, automated profiling, and validation checks across harmonization layers
Create versioned, reproducible data snapshots that support model training, evaluation, and audit requirements in a regulated environment
Partner with the teams to extend harmonization patterns as data modalities expand beyond genomics and proteomics into spatial transcriptomics, perturbational data (Perturb‑Seq), single‑cell, and digital pathology
Semantic Layer and Schema Engineering
Design and maintain a semantic layer over Lilly’s multi‑omics databases that enables AI systems
Create comprehensive schema documentation: table descriptions, column‑level annotations, relationship mappings, business logic rules, and domain‑specific constraints (e.g., statistical thresholds, unit conventions, experimental design metadata)
Develop gold‑standard question/SQL pairs for each major database, in collaboration with computational biologists and Generative AI Engineers, to serve as training data, few‑shot examples, and evaluation benchmarks
Build and maintain a data dictionary and ontology mapping layer that translates how scientists think and speak about data (gene names, pathway terms, assay types) into how the data is physically stored
AI‑Ready Data Products
Build and manage vector embedding pipelines for scientific documents, study metadata, and structured data descriptions to power RAG‑based retrieval
Build integration pipelines that connect heterogeneous data sources — omics databases, internal publications, electronic lab notebooks, assay results, and clinical annotations — into a unified, queryable layer
Develop and enforce metadata standards that ensure new data sources are AI‑accessible from the point of ingestion, not retroactively
Design data products that serve multiple consumption patterns: direct SQL access for computational biologists, structured feeds for ML training pipelines, and semantic interfaces for LLM‑powered tools
Qualifications
Bachelors degree in Computer Science, Data Engineering, Bioinformatics, or a related field + 8 years data engineering experience OR Masters degree and 5 years data engineering experience
Additional Skills/Preferences
PhD in data or related field
Demonstrated expertise in building data pipelines, ETL/ELT workflows, and data products that serve downstream AI/ML systems
Strong SQL skills and experience with complex relational database schemas (hundreds of tables, multi‑level joins, domain‑specific conventions)
Experience with modern data platform technologies, including at least one of: Databricks, Snowflake, or equivalent lakehouse platforms
Experience with modern data engineering tools: dbt, Spark, Airflow, or similar orchestration and transformation frameworks
Proficiency in Python for data processing, scripting, and pipeline development
Experience with cloud data platforms (AWS preferred: Redshift, Athena, Glue, S3, or similar)
Familiarity with at least one of: vector databases, embedding pipelines, or semantic layer tooling
Strong communication skills — you can work effectively with both engineers who think in schemas and scientists who think in biology
Experience with biomedical or scientific data: omics datasets (RNA‑seq, proteomics, GWAS), clinical data, or laboratory information management systems
Experience in pharmaceutical, biotech, or life sciences environments
Familiarity with biomedical ontologies and controlled vocabularies (Gene Ontology, MeSH, ChEBI, HGNC) and their application to data integration
Experience building data products that serve AI/ML systems — feature stores, training datasets, evaluation benchmarks, or semantic annotations for text‑to‑SQL
Knowledge of data governance practices in regulated industries: data lineage, access controls, versioning, and auditability
Experience with knowledge graph technologies (Neo4j, Amazon Neptune, RDF/SPARQL) or graph‑based data modeling
Deep experience with Databricks ecosystem: Unity Catalog for data governance, Delta Lake for ACID transactions, MLflow integration, and Databricks SQL for analytics workloads
Experience designing data architectures that bridge traditional bioinformatics workflows (Nextflow, R/Bioconductor) with modern lakehouse consumption patterns
Lilly is dedicated to helping individuals with disabilities to actively engage in the workforce, ensuring equal opportunities when vying for positions. If you require accommodation to submit a resume for a position at Lilly, please complete the accommodation request form https://careers.lilly.com/us/en/workplace-accommodation for further assistance. Please note this is for individuals to request an accommodation as part of the application process and any other correspondence will not receive a response.
Lilly is proud to be an EEO Employer and does not discriminate on the basis of age, race, color, religion, gender identity, sex, gender expression, sexual orientation, genetic information, ancestry, national origin, protected veteran status, disability, or any other legally protected status.
Actual compensation will depend on a candidate’s education, experience, skills, and geographic location. The anticipated wage for this position is $166,500 - $266,200. Full‑time equivalent employees also will be eligible for a company bonus (depending, in part, on company and individual performance). In addition, Lilly offers a comprehensive benefit program to eligible employees, including eligibility to participate in a company‑sponsored 401(k); pension; vacation benefits; eligibility for medical, dental, vision and prescription drug benefits; flexible benefits (e.g., healthcare and/or dependent day care flexible spending accounts); life insurance and death benefits; certain time off and leave of absence benefits; and well‑being benefits (e.g., employee assistance program, fitness benefits, and employee clubs and activities). Lilly reserves the right to amend, modify, or terminate its compensation and benefit programs in its sole discretion and Lilly’s compensation practices and guidelines will apply regarding the details of any promotion or transfer of Lilly employees.
#J-18808-Ljbffr
Sprachkenntnisse
- English
Dieses Stellenangebot stammt von einer Partnerplattform von TieTalent. Klick auf „Jetzt Bewerben”, um deine Bewerbung direkt auf deren Website einzureichen.