Data Scientist, AI Data Foundationsremoterocketship • Remote, Oregon, United States
Cette offre d'emploi n'est plus disponible
Data Scientist, AI Data Foundations
remoterocketship
- Remote, Oregon, United States
- Remote, Oregon, United States
À propos
Build and maintain vector stores for RAG: Design embedding pipelines, chunking strategies, indexing approaches, and refresh patterns for the vector stores powering retrieval-augmented generation across MeridianLink products. Own the feature store: Design, build, and operate feature store assets used for model training and online/offline inference, including feature definitions, freshness SLAs, lineage, point-in-time correctness, and reuse across teams. Design graph data structures: Build graph databases that model relationships between applicants, applications, products, lenders, decisions, and outcomes — and make them queryable for both AI use cases and analytical investigations. Lead data discovery: Profile our lending, deposit, and behavioral datasets to identify hidden trends, segments, anomalies, and potential model drivers; turn findings into actionable hypotheses for product, risk, and growth teams. Engineer for AI consumption: Build the curated, AI-ready datasets that downstream model builders, application engineers, and analysts rely on — with appropriate quality, documentation, and governance baked in. Evaluate retrieval and feature quality: Define and run evaluation frameworks for RAG retrieval quality, feature drift, embedding quality, and graph completeness; iterate based on what the metrics tell you. Partner with model builders: Work closely with ML engineers and applied scientists to make sure the data structures you build accelerate their work rather than slow it down. Champion responsible data use: Partner with governance, security, and compliance to ensure that AI-facing data assets respect data classification, customer consent, and regulatory boundaries from day one. Communicate findings: Translate discovery work into clear narratives — write-ups, notebooks, dashboards, and short presentations — that help non-technical stakeholders act on what the data is showing. Requirements:
4–7 years of experience in a data science, ML engineering, or applied data role, with a meaningful portion of that time spent building data assets that other people's models or applications consumed. Hands-on experience designing and operating vector stores for RAG or semantic search, including embedding generation, chunking, indexing, and retrieval evaluation. Experience building or operating a feature store (e.g., Databricks Feature Store, Feast, or a custom internal platform), including offline training and online serving patterns and point-in-time correctness. Experience modeling and building graph data structures using Neo4j, TigerGraph, Azure Cosmos DB Gremlin, or similar graph databases — and writing graph queries to answer real questions. Strong proficiency in Python (pandas, NumPy, scikit-learn, PySpark) and SQL; comfortable working day-to-day in Databricks notebooks and jobs. Practical experience with embedding models and LLM tooling (e.g., Hugging Face transformers, OpenAI / Azure OpenAI APIs, LangChain or similar) in a production or near-production context. Demonstrated data discovery skills: profiling messy real-world datasets, surfacing non-obvious patterns, validating findings statistically, and explaining them clearly. Solid grounding in classical ML concepts — supervised vs. unsupervised learning, train/test discipline, leakage, evaluation metrics — even though you will not own model training day-to-day. Strong written and verbal communication skills; able to write up findings for both technical and business audiences. Benefits:
Insurance coverage (medical, dental, vision, life, and disability) Flexible paid time off Paid holidays 401(k) plan with company match Remote work
Compétences linguistiques
- English
Avis aux utilisateurs
Cette offre a été publiée par l’un de nos partenaires. Vous pouvez consulter l’offre originale ici.