Data Scientist - Machine Learning Engineering
Kalamata Capital LLC
- United States
- United States
Über
Perform exploratory data analysis on vast and intricate datasets utilizing Pandas and PySpark; evaluate data quality and structure. Model Development:
Create, optimize, and assess supervised and unsupervised machine learning models (e.g., tree-based methods, regression, boosting algorithms). Pipeline Engineering:
Design and establish dependable, maintainable machine learning pipelines and preprocessing workflows tailored for production settings. Data Management:
Execute queries and integrate MongoDB datasets; develop efficient schemas and aggregation pipelines supporting analytical and operational tasks. Visualization:
Generate insightful visualizations with seaborn, plotly, and matplotlib to aid in model diagnostics and business storytelling. Reproducible Code:
Develop clean, modular, and well-documented Python code (PEP8 compliant) and manage version control through Git. Model Explainability:
Utilize model interpretation tools like SHAP and LIME to assess feature impact and enhance transparency. Cross-Functional Collaboration:
Collaborate with engineering, analytics, and product teams to transform business needs into actionable model-driven solutions. Documentation:
Create comprehensive technical documents, reports, and model documentation for internal stakeholders. Required Skills & Qualifications Education & Experience:
M.S. in Computer Science, Machine Learning, Computational Biology, or a related quantitative field along with 3+ years of relevant experience, or a similar combination of education and applied work. Solid understanding of Linear Algebra, Probability, and Statistics. Technical Expertise:
Advanced proficiency in Pandas and PySpark for data cleaning, reshaping, merging, feature engineering, and workflow optimization. Extensive experience with MongoDB, including querying, indexing, and aggregation pipelines. Deep understanding of supervised/unsupervised machine learning methodologies and tools (scikit-learn, XGBoost). Strong grasp of optimization, regularization, loss functions, and evaluation metrics (AUC, precision, recall, RMSE). Core Skills:
Demonstrated experience delivering end-to-end machine learning projects (data ingestion, modeling, evaluation, and optional deployment). Capability to write clean, reproducible code and maintain organized notebooks/scripts. Excellent communication abilities to translate analyses into business insights. Willingness to relocate to the New York metro area. Preferred (Bonus) Skills Familiarity with AWS tools (Glue, S3, DMS). Experience with deep learning frameworks (PyTorch, TensorFlow). Experience in deploying models using FastAPI, Flask, AWS, or GCP. Knowledge of SQL, data warehousing, or data versioning. Understanding software engineering best practices (testing, CI/CD, code review). Provide a link to GitHub, GitLab, or a portfolio showcasing analytical/ML code. Flexible work from home options available.
Sprachkenntnisse
- English
Hinweis für Nutzer
Dieses Stellenangebot stammt von einer Partnerplattform von TieTalent. Klicken Sie auf „Jetzt Bewerben“, um Ihre Bewerbung direkt auf deren Website einzureichen.