Principal Data Scientist (Agent Builder)Elastic • London, England, United Kingdom

Postuler Maintenant

Principal Data Scientist (Agent Builder)

Elastic

London, England, United Kingdom

London, England, United Kingdom

Postuler Maintenant

À propos

Requirements
8+ years of applied DS/ML experience, with deep expertise in IR, NLP, ranking, semantic search, RAG, or LLM-powered product experiences
Strong track record defining and leading evaluation for production AI/ML systems, including offline metrics, online experimentation, LLM-as-judge approaches, groundedness, citation quality, and model comparison
Experience influencing product and technical strategy through data, especially in ambiguous or emerging domains where the “right” metric or approach is not obvious at the start
Hands‑on ability with Python, PyTorch/Transformers, Pandas, notebooks, reproducible experiments, versioned datasets, and clean, reviewable code
Strong understanding of retrieval systems, including dense and sparse retrieval, re-ranking, vector search, query understanding, and evaluation metrics such as nDCG, MRR, Recall@k, precision, and latency/cost trade‑offs
Experience collaborating closely with engineering teams to move from prototype to production, including telemetry design, dashboards, CI guardrails, and quality regression tracking
Practical Elasticsearch experience, or experience with similar search and distributed data systems. ES|QL familiarity is a plus
Excellent written and verbal communication, with the ability to explain complex scientific and technical trade‑offs to engineering, product, design, and leadership audiences
A collaborative, low‑ego style and a strong ability to mentor, raise standards, and develop transparency for others in a distributed team
What the job involves
Their own data in Elasticsearch. We build the core quality layer for RAG, agents and tools, retrieval and citations, streaming, memory, and the evaluation signals that turn open‑ended questions into grounded, reliable answers
As a Principal Data Scientist, you will help set the technical direction for how we evaluate, improve, and scale chat quality across Elastic’s agentic platform. You will define the evaluation strategy that guides product decisions, including which models we standardize on, how we route requests across agents, which tools we enable and when, and how we tailor agents to different Elastic use cases in search and beyond. You will work closely with backend engineering, product, UX, and other data scientists to turn ambiguous, cutting‑edge problems into measurable product improvements
You’ll help lead work on frontier problems such as folding RAG and vector search into an agent’s knowledge base, dynamically enriching model context to improve groundedness, shaping reasoning strategies and tool‑selection policies, lighting up agent‑driven visualizations on top of Elasticsearch data, and exploring multimodality where it can create meaningful user value. This is an applied leadership role: you will prototype, evaluate, influence roadmap direction, and help teams ship improvements that customers can feel
Define the evaluation strategy for conversational and agentic search, including offline and online evaluation, golden datasets, rubrics, LLM-as-judge calibration, groundedness and citation checks, and A/B testing
Lead the design of quality metrics and decision frameworks for RAG, agents, tools, model selection, agent routing, prompt behavior, and cost/latency trade‑offs
Build, compare, and guide improvements across retrieval and re‑ranking approaches, including sparse and dense retrieval, vector search, query understanding, semantic rewrites, and context enrichment
Turn experimental results into product and business decisions: which models to use, how to route requests efficiently, which tools should be exposed, and how agents should be customized for different Elastic use cases
Partner with engineering to productionize evaluation pipelines, telemetry, dashboards, CI guardrails, and regression detection for chat quality, helpfulness, dedication, latency, and cost
Influence the roadmap by identifying the highest‑leverage quality gaps, proposing practical solutions, and communicating trade‑offs clearly to product, engineering, and leadership
Mentor other data scientists and engineers in experiment design, evaluation methodology, statistical rigor, and practical approaches to improving LLM‑powered systems
Share outcomes through clear docs, notebooks, PRs, dashboards, technical proposals, and cross‑functional reviews
#J-18808-Ljbffr

London, England, United Kingdom

Compétences linguistiques

English

Avis aux utilisateurs

Cette offre provient d’une plateforme partenaire de TieTalent. Cliquez sur « Postuler maintenant » pour soumettre votre candidature directement sur leur site.

Postuler Maintenant