Über
Profitmind is a retail analytics SaaS company. We turn competitive and customer data into agent driven insights that help retailers make faster, sharper merchandising and pricing decisions. Our platform spans multiple clouds and runs the data pipelines, ML models, and customer-facing applications that deliver those insights at scale. We are at an inflection point: onboarding enterprise customers at pace, standing up multi-cloud data infrastructure, and building the agent-driven automation behind our insights. The data foundation this role builds is central to that work. We are hiring a Senior Data Engineer to build and scale the data infrastructure behind our platform. You will operate the batch and streaming pipelines that ingest messy, real-world customer data and turn it into the reliable, validated foundation our ML models and insights depend on. This is a hands-on, high-ownership role with broad scope. You will own customer onboarding pipelines end to end, work across AWS and Azure, and partner directly with the Head of Engineering and our ML, application, and product teams. We value practicality over polish - engineers who get imperfect data into production, not just architect for the ideal case. What You'll Do Own customer onboarding pipelines. Take raw, inconsistent customer data through discovery, profiling, validation, and enrichment into a medallion (bronze/silver/gold) Delta Lake architecture - and get new customers live quickly. Build scalable pipelines. Design and implement batch and streaming data pipelines using Python and Spark, processing transaction, inventory, pricing, and competitive data. Run multi-cloud infrastructure. Build and maintain cloud-agnostic data infrastructure that runs across AWS and Azure (e.g., EMR Serverless and Synapse) from a single codebase. Reconcile messy source data. Map each retailer's idiosyncratic feeds - differing category hierarchies, group/style IDs, planning grains, and pricing/markdown conventions - onto Profitmind's canonical schema. Understand what the data feeds. Reason about how ingested data drives our downstream models such as demand forecasting, price optimization, and assortment to name only a few- well enough to judge whether incoming data is fit for purpose and flag gaps before they corrupt insights. Orchestrate and deliver. Establish DataOps best practices, CI/CD, and workflow orchestration (Argo Workflows on Kubernetes) as we scale the data platform. Own data quality and observability. Implement validation, profiling, and observability (e.g., Honeycomb telemetry) to ensure reliability across our data ecosystem. Automate with AI agents. Develop and operate AI agents to automate data operations and accelerate onboarding. Own data security and compliance. Maintain data security and privacy discipline as we handle sensitive customer and business data. Reduce single-points-of-knowledge. Document what you build and spread ownership. We are deliberately building a well-documented data function, not concentrating critical knowledge in one person. How We Work - AI-Accelerated Engineering Our engineers are fluent with AI-assisted engineering. They use agentic coding tools and AI-driven automation - including MCP-based tooling and CLI/workflow automation - daily to debug, build, and operate pipelines faster. The trait we're hiring for is comfort folding new tooling into how you work rather than working around it. With these tools, you deliver at the leverage of several engineers while bringing the seniority to judge when a tool fits and when it doesn't. What We're Looking For Strong data engineering foundation. 5+ years of hands-on experience, with strong proficiency in Python for data processing and pipeline development and production experience with Apache Spark at scale. Medallion and Delta Lake depth. Strong working knowledge of the bronze/silver/gold medallion architecture and Delta Lake, including incremental loads, merges, and Delta optimization. Multi-cloud experience. Hands-on across more than one major cloud (AWS and Azure strongly preferred; GCP a plus), with a track record of building batch pipelines; streaming experience is a plus. SQL and data modeling. Deep understanding of SQL, modern data warehouse /lakehouse technologies, ETL/ELT patterns, and workflow orchestration. Orchestration and containers. Experience with workflow orchestration on Argo Workflows (or comparable, e.g., Airflow/Dagster) and container orchestration (Docker, Kubernetes). AI-assisted engineering. You use agentic coding tools and AI-driven automation (including MCP-based tooling and CLI/workflow automation) daily, and fold new tooling into how you work rather than around it. A practical bias with messy data. Comfort profiling, cleaning, validating, and enriching inconsistent real-world source files into production-grade pipelines, with a strong bias toward practical delivery. Curiosity about the retail domain - required. A genuine appetite to learn pricing, merchandising, assortment, and inventory/OTB planning, and to reason about what the data is used for, not just how to move it. Prior retail/merchandising data experience is a strong plus, but the willingness to learn it is the requirement. Communication and ownership. Strong communication, the ability to translate business needs into technical solutions, and the self-direction to thrive in a fast-paced startup with changing priorities. Nice to Have Experience in retail tech, e-commerce, or similar enterprise systems. Familiarity with retail data concepts: SKU/style hierarchies, AUR, markdown/promotion types, receipts planning, demand forecasting, or product matching. Experience with data observability tooling (e.g., Honeycomb). Proficiency with Python data handling libraries (pandas, NumPy, polars). Understanding of event-driven architectures, medallion architectures, and CDC patterns. Expertise with real-time processing frameworks like Kafka, Kinesis, or Flink. Previous experience at an early-stage startup or building data infrastructure from scratch. Background working with multi-tenant SaaS platforms or enterprise analytics systems. Knowledge of DataOps and MLOps best practices. Location & Reporting Remote-friendly, with Pittsburgh, PA preferred. This role reports to the Head of Engineering and works across the ML/insights, application, and product teams. Why This Role You will own the data foundation a fast-growing platform runs on. This is a role for an engineer who wants to build the pipelines and tooling that get real customers live - not just move files - and who is energized by turning messy data into something the rest of engineering can trust. If you want broad scope, real ownership, and the leverage modern AI tooling gives a senior engineer, this is that role. Profitmind is an equal opportunity employer. We welcome applicants of all backgrounds.
Sprachkenntnisse
- English
Hinweis für Nutzer
Dieses Stellenangebot stammt von einer Partnerplattform von TieTalent. Klick auf „Jetzt Bewerben”, um deine Bewerbung direkt auf deren Website einzureichen.