About
Lead the design, development, and optimization of prompt engineering strategies for LakeFusion's LLM-based entity matching to improve accuracy, reduce bias, and enhance interpretability. Drive the continuous improvement of our Retrieval-Augmented Generation (RAG) architecture, refining the interplay between Vector Search candidate generation and LLM evaluation for superior match results. Iterate on LakeFusion's entity resolution process, exploring novel approaches to enhance match performance (precision, recall, F1-score) and operational efficiency (speed, flexibility, cost). Investigate and implement advanced LLM evaluation strategies, including multi-stage processing with potentially less powerful models to balance performance, cost, and output quality. Contribute to the design and development of production-grade, business-user-facing data science tools and workflows that provide transparency and control over AI matching. Collaborate closely with product managers, data engineers, and data stewards to translate complex business requirements into robust, scalable AI/ML solutions. Monitor and analyze AI model performance using telemetry from AI Gateway Inference Tables and custom logs, identifying opportunities for continuous improvement and drift mitigation. What We're Looking For
5+ years of hands-on experience as an ML Engineer, Data Scientist, or similar role, specifically building and deploying machine learning solutions in a production environment. Deep expertise in Entity Resolution and Master Data Management (MDM), understanding the nuances of data matching, deduplication, and survivorship. Extensive practical experience with Generative AI (GenAI) concepts, Large Language Models (LLMs), Vector Search, and Retrieval-Augmented Generation (RAG) architectures. Strong proficiency in Python and its ecosystem for data science and machine learning (e.g., PyTorch, TensorFlow, scikit-learn). Demonstrated ability to deploy, manage, and optimize modern AI/ML models in production, with a focus on latency, throughput, and cost. Proven track record of building production-grade data science tools or applications that directly enable business users to interact with and leverage AI/ML insights. Solid foundation in machine learning fundamentals, including experience with diverse model types and strong statistical analysis skills. Experience working with the Databricks platform (e.g., Delta Lake, MLflow, Databricks SQL Analytics) is highly desirable. Excellent problem-solving skills and the ability to debug complex AI systems, understanding the interplay between data, models, and prompts. Strong communication skills, capable of articulating complex technical concepts to both engineering and non-technical stakeholders. Nice-to-Have
Experience with MLOps practices, CI/CD for ML pipelines. Knowledge of distributed computing frameworks beyond Databricks. Experience with other MDM platforms or enterprise data quality tools. Familiarity with cloud platforms (AWS, Azure) for AI/ML deployments. About the LakeFusion
LakeFusion is the modern Master Data Management (MDM) company. Global enterprises across industries ranging from retail to manufacturing and financial services rely on the LakeFusion platform to unify, govern, and deliver trusted data entities such as customers, products, suppliers, and employees. Built natively on the Databricks Lakehouse, LakeFusion creates a single source of truth that powers analytics and AI. LakeFusion enables organizations worldwide to accelerate innovation with trusted and governed data. Insights on master data management, Databricks, and building AI-ready data platforms—delivered occasionally, without the noise.
#J-18808-Ljbffr
Languages
- English
Notice for Users
This job comes from a TieTalent partner platform. Click "Apply Now" to submit your application directly on their site.