Senior Data Engineer
Brahma
- New York, New York, United States
- New York, New York, United States
À propos
You’ll take on complex, high-impact data engineering work and run with it end‑to‑end, with minimal hand‑holding. This means you’ll operate with significant autonomy: scoping work, gathering requirements across teams, challenging assumptions when something doesn’t add up, and shipping production‑grade systems.
You’ll work closely with research scientists and the SaaS delivery platform, designing the systems that extract features for multi‑modal model training, automate the end‑to‑end model lifecycle, and ensure our datasets are of the highest fidelity.
If you’re the kind of engineer who thrives with ownership, moves fast with AI‑assisted tooling, and would rather ask forgiveness than wait for permission, this is the role.
What You’ll Do
Design and build scalable data pipelines: Architect and implement robust, automated pipelines for ingesting, processing, and managing non‑tabular datasets (video, audio, etc.).
Own and deliver complex work end‑to‑end: Take advanced pipeline tickets, epics, and infrastructure work from scoping through to production. Proactively gather requirements, flag risks early, and propose solutions.
Feature extraction: Collaborate with research scientists to identify and extract features and annotations that elevate our dataset quality.
Data lake management: Maintain our human‑centric data lake, ensuring high data quality, lineage, and accessibility for the research team.
Infrastructure optimisation: Minimise resource consumption and maximise cost‑effectiveness by leveraging tools like Ray and Kubernetes.
Accelerate with AI‑assisted development: Use agentic coding tools (Claude Code, Cursor, Copilot, etc.) as a core part of your workflow to maintain high velocity across prototyping, pipeline development, and code review. Push the team to adopt what works.
Drive continuous improvement: Proactively identify gaps in tooling, processes, and infrastructure. Build internal tools, propose architectural changes, and bring ideas to the table without being asked.
What You’ll Need
Python expertise: Proficiency in Python, with a focus on writing maintainable, clean production‑ready code.
Data engineering fundamentals: Proven experience building (multi‑modal) data pipelines and managing large datasets for research and training.
Multi‑modal experience: Previous exposure to, or a deep interest in, processing and analysing non‑tabular data (video, audio, etc.).
Orchestration & Cloud: Experience with tools like Airflow, Dagster, or Spark, and being exposed to hands‑on cloud work (primarily GCP).
Distributed compute: Hands‑on experience with distributed compute frameworks (Ray, Spark etc).
Autonomy and ownership: You know how to operate with limited context and ambiguity. Gathering requirements cross‑departmentally, challenging what doesn’t make sense, unblocking yourself, and making decisions to assess with your manager.
AI‑assisted development: Demonstrated work with agentic coding tools and LLM‑driven development workflows. You’ve gone deep enough to have opinions on what works and what doesn’t.
ML mindset: A data‑centric approach to AI, with an understanding of how data‑processing directly impacts model performance.
Clear, direct communication: You surface problems and ideas without being asked, challenge requirements constructively across teams, and articulate trade‑offs clearly to both technical and non‑technical stakeholders. Common sense is non‑negotiable.
Proactive drive: A visible track record of bringing ideas – tooling, process improvements, experiments – rather than waiting for them to appear on a backlog.
Nice to Have
Practical knowledge of computer vision, audio processing, or FFmpeg‑based workflows.
Experience working with generative models or fine‑tuning techniques.
#J-18808-Ljbffr
Compétences linguistiques
- English
Avis aux utilisateurs
Cette offre provient d’une plateforme partenaire de TieTalent. Cliquez sur « Postuler maintenant » pour soumettre votre candidature directement sur leur site.