XX
Data ScientistBespoke LabsUnited States

This job offer is no longer available

XX

Data Scientist

Bespoke Labs
  • US
    United States
  • US
    United States

About

Senior Data Scientist: AI Training Data (2-4 Months Contract) Company:
BespokeLabs (VC-backed; founded by IIT & Ivy League alumni) Location:
Remote Role Type:
Contract (2-4 Months) Time Commitment:
40 hrs/week (Full-time availability required) Compensation:
Hyper-competitive hourly rate (matching top-tier Senior Data Scientist bands)
Experience:
6+ years About BespokeLabs
BespokeLabs is a premier, VC-backed AI Research lab with an exceptionally talent-dense team of IIT and Ivy League alumni. We don’t just build tooling around AI—we build the massive-scale data systems and reasoning architectures that directly power next-generation models. Our research shapes the frontier of AI: we’ve published breakthroughs like GEPA, driven foundational datasets like OpenThoughts, and shipped state-of-the-art models including Bespoke-MiniCheck and Bespoke-MiniChart. More on our website bespokelabs.ai :) Role Overview
We are looking for a high-impact Senior Data Scientist for an intensive, 2-month sprint. You will leverage your deep expertise in production-grade machine learning and applied statistics to develop the algorithms and logic that curate and evaluate datasets for advanced AI model training. This is not a traditional model-building or research role. We need a seasoned practitioner who has already owned the end-to-end DS lifecycle at scale. You will use your intuition for feature engineering, statistical validity, and large-scale data processing to programmatically generate, shape, and validate AI training data. What You Will Do (The Contract)
Algorithm Design:
Design and implement custom statistical models and programmatic logic (e.g., anomaly detection, active learning, similarity scoring) to evaluate data quality, complexity, and redundancy at scale. Hands-on At-Scale Coding:
Write scalable PySpark and Python (NumPy/Pandas) code to apply these algorithms across massive datasets, translating experimental logic into reliable, large-scale workflows. Metric Formulation:
Develop custom quantitative metrics and heuristic benchmarks to rigorously assess the fidelity and suitability of data subsets for specific AI training objectives. Validation & Iteration:
Run high-speed validation cycles, analyzing the output of data-curation algorithms to diagnose skew, bias, or noise, and iteratively refining the logic. High-Level Curation:
Apply Senior-level domain expertise in predictive modeling and feature engineering to ensure the final training inputs meet the strict standards required for state-of-the-art ML systems. What You Bring to the Table (Your Past Experience)
To be successful in this contract, you must have a track record of: The End-to-End DS Lifecycle:
Framing problems, modeling, validation, production, and iteration. Production Ownership:
Building and deploying ML and statistical models on large-scale datasets. Large-Scale Data Processing:
Working with Apache Spark to develop scalable feature pipelines and offline training workflows. Experimentation:
Designing and analyzing rigorous experiments (A/B tests, causal inference). Impact:
Translating complex model outputs into clear product and business decisions. Required Qualifications (Non-Negotiable)
Experience:
6+ years as a Data Scientist or Applied Scientist. Production Background:
Proven ownership of models running in production environments. Applied Statistics:
Strong background in applied statistics and experimentation frameworks. Core Technical Skills
Languages:
Python (NumPy, Pandas, Scikit-learn, PyTorch / TensorFlow) and Strong SQL. Big Data:
Apache Spark (PySpark or Spark SQL) for large-scale data processing. Methodologies:
Feature engineering, model evaluation, statistical modeling, and hypothesis testing. Strong Signals (Highly Valued)
Scale:
Models trained on TB-scale datasets. Domain Specificity:
Experience in high-complexity domains such as: Recommendations, Pricing, Fraud / risk, Search / ranking, or Growth & experimentation. Collaboration:
Experience deploying models alongside data engineering pipelines. Out of Scope (Who Should Not Apply)
BI / reporting-only roles SQL-only analysts Research-only ML roles with no production ownership
#J-18808-Ljbffr
  • United States

Languages

  • English
Notice for Users

This job was posted by one of our partners. You can view the original job source here.