Dieses Stellenangebot ist nicht mehr verfügbar
Sr Language Data Scientist Search Specialization
Innodata
- United States
- United States
Über
As a Senior Language Data Scientist, you lead projects and own processes for optimizing search and retrieval systems by creating, validating and annotating search-specific data for LLM/ML applications. This includes query-document pairs, relevance judgments, query intent labels, search result quality assessments, and multimodal search scenarios (image search, product search, news search). You work across different search domains—from web search to e-commerce to vertical search. You consult and engage with customers to understand their business goals and design processes to meet them. You advise and support business unit heads on engaging with customers to understand the upstream activities that would be performed using Innodata Inc services.
You can lead long-term projects with high complexity and ambiguity from first discussion with the client to completion Design/improve workflows to create data for AI/ML training and evaluation. Includes human annotation and data-collection workflows, as well as synthetic ones Design and refine search data annotation frameworks, including relevance judging guidelines that handle nuanced query-document relationships, query ambiguity, and domain-specific search challenges (e.g., freshness for news search, user intent for product search) Dive deep into existing workflows and processes to gather data and insights, make recommendations, and drive improvement through innovation and cross-functional collaboration with customers Assess and optimize search-specific evaluation approaches, including A/B testing frameworks, ranking metrics, and human evaluation studies for search result quality Quantitatively analyze large datasets, perform statistical analysis, calculate metrics, and make recommendations to improve accuracy and performance Contribute to establishing best practices and standards for generative AI development with customers and within the organization MA in (computational) linguistics, data science, computer science (AI / ML / NLU), quantitative social sciences or a related scientific / quantitative field, PhD strongly preferred Ability to collaborate directly with technical stakeholders including senior project managers, data engineers, and research scientists. Collaborating with cross-functional teams to define AI project requirements and objectives, ensuring alignment with overall business goals Design efficient data strategies for complex long-term projects, potentially involving multiple teams and workflows. Developing clear and concise documentation, including technical specifications, user guides, and presentations, to communicate complex AI concepts to both technical and nontechnical stakeholders Search and Language Data Expertise: Extensive experience working with search-specific language data (queries, documents, relevance judgments, intent labels) and designing human evaluation tasks, including multi-phase and complex workflows. You have hands-on experience with query annotation frameworks and understand the semantic relationship between queries and documents. Quantitative Analysis Skills: Advanced knowledge of statistics, metrics (e.g. f1 score, inter-rater reliability metrics), and data analysis methods such as sampling. Experience with Natural Language Processing (NLP) techniques and tools, such as SpaCy, NLTK, or Hugging Face. o Proficiency in Python to handle / transform large datasets (e.g. pre- and postprocessing data, pandas) perform quantitative analyses visualize data (for example matplotlib, seaborn)
Data processing: ~ Deep understanding of data pipelines to support ML and NLP workflows
Knowledge of efficient data collection, transformation, and storage Knowledge of data structures, algorithms, and data engineering principles Excellent problem-solving skills, with the ability to think critically and creatively to develop innovative AI solutions Conducting research to stay up-to-date with the latest advancements in generative AI, machine learning, and deep learning techniques · Knowledge of optimizing existing generative AI models for improved performance, scalability, and efficiency Experience of developing and maintaining ML/AI pipelines, including data preprocessing, feature extraction, model training, and evaluation · Model Fine-Tuning: Knowledge of Fine-tuning pre-trained models to adapt them to specific tasks and datasets, improving their performance and relevance Please be aware of recruitment scams involving individuals or organizations falsely claiming to represent employers. Innodata will never ask for payment, banking details, or sensitive personal information during the application process. com
and consider reporting it to the FTC at
ReportFraud.ftc.gov
.
#Your application has been successfully submitted!
Sprachkenntnisse
- English
Hinweis für Nutzer
Dieses Stellenangebot wurde von einem unserer Partner veröffentlicht. Sie können das Originalangebot einsehen hier.