XX
Data ScientistDiamondpickUnited States
XX

Data Scientist

Diamondpick
  • US
    United States
  • US
    United States

Über

Data Scientist
Primary focus:
Model reproduction, feature engineering logic, performance validation, and ensuring alignment with established modeling frameworks. Rebuild and port existing
Python-based models
into the customer's
Databricks
platform. Develop, train, and validate
predictive models
using Python, PySpark, and ML frameworks such as
scikit-learn, XGBoost, and Spark MLlib . Develop, validate, and reproduce
feature engineering logic , ensuring parity with baseline models. Train, retrain, validate, and benchmark model performance using
customer-provided datasets , while maintaining performance parity with reference models. Work with
Data Engineers
to define feature requirements and ensure datasets support model needs. Perform
model diagnostics, bias checks, stability checks, and accuracy assessments . Prepare
model documentation, validation summaries, and stakeholder-ready insights . Support
scoring pipeline design
and ensure reproducibility across
Dev / QA / Prod . Collaborate with
compliance and platform teams
to ensure adherence to governance requirements. Perform
model diagnostics, hyperparameter tuning, and stability analysis . Evaluate model performance across
population segments and time periods . Work with
platform and engineering teams
to support scoring pipeline deployment across
Dev / QA / Prod . Qualifications
4-6 years of experience in
applied machine learning or data science . Strong hands-on experience with
Python
and ML libraries such as
scikit-learn, XGBoost, LightGBM, CatBoost , or similar. Experience developing
ML models in Databricks
using Python or PySpark. Strong knowledge of
feature engineering, model training workflows, and evaluation techniques . Experience working with
large structured datasets
(financial or transactional data preferred). Ability to write
clear documentation
and communicate technical results to non-technical stakeholders. 4+ years of hands-on experience
developing, deploying, and maintaining ML models . Advanced proficiency in
Python
(NumPy, pandas, scikit-learn, PyTorch or TensorFlow). Strong
statistical and mathematical foundation , including regression, classification, probability, and optimization. Experience building
end-to-end ML pipelines : data ingestion, cleaning, feature engineering, modeling, evaluation, and deployment. Experience working within
client environments , including adapting to unfamiliar infrastructure, constraints, and security requirements. Experience with
cloud platforms
(AWS, Azure, or GCP) and on-prem environments. Advanced
SQL
ability and experience with
big-data tools
(Spark, Databricks, Hadoop).
  • United States

Sprachkenntnisse

  • English
Hinweis für Nutzer

Dieses Stellenangebot stammt von einer Partnerplattform von TieTalent. Klicken Sie auf „Jetzt Bewerben“, um Ihre Bewerbung direkt auf deren Website einzureichen.