Data Scientist/Machine Learning EngineerSumble Inc. • New York, New York, United States
This job offer is no longer available
Data Scientist/Machine Learning Engineer
Sumble Inc.
- New York, New York, United States
- New York, New York, United States
About
Our long-term vision is to become the primary destination for accessing high-quality web data. Try the product at sumble.com.
Our Team: We are a team of 15, including 10 engineers with experience at companies such as Google, Meta, Stack Overflow, and Kaggle.
What you’ll do
Finetuning small language models
Improving the quality of existing data using scalable approaches. Examples include: making sure URLs are associated the right company, we have the correct HQ address, we have mapped parents-subsidiary using techniques like LLM validation, SERP, and triangulating across sources.
Adding new signals: this usually involves scrubbing, matching and normalizing new signals and matching to our existing ontology
Pushing solutions into production environments, which may involve touching data pipelines and/or backend systems
Located within Americas timezones
More about Sumble Our Tech Stack:
ML/Data:
PyTorch, Huggingface, Gemma models, LORA, VLLM, Skypilot, Marimo
Languages & Frameworks: Python, FastAPI, React, Typescript
Cloud Platform: Google Cloud Platform (GCP)
Databases: PostgreSQL, DuckDB
Infrastructure: Cloud Run
Product/Design: Figma, Vercel V0
Challenges We Tackle:
Transforming noisy datasets into high-quality data products
Running expensive analytics computations efficiently
Managing the complexity of a growing number of data sources, machine learning models, and large data operations
Create a great PLG experience with upsell pathways
Medical, dental, and vision (US)
401k (US)
Target 4 weeks PTO
#J-18808-Ljbffr
Languages
- English
Notice for Users
This job was posted by one of our partners. You can view the original job source here.