Senior Data Engineer
- Remote, Oregon, United States
- Remote, Oregon, United States
À propos
Pyx Health is looking for a talented and motivated Senior Data Engineer to join our team at a high-growth startup. You will play a pivotal role in maintaining and evolving our data infrastructure on Azure, with a core stack of Databricks, Airflow (Astronomer), dbt, and Postgres. You'll own data pipelines end-to-end, from ingestion through a medallion architecture to analytics delivery in Tableau.
ONLY CANDIDATES RESIDING IN THE USA MAY APPLY.
- Minimum 5 years of experience as a Data Engineer.
- Deep expertise in Databricks, including Delta Lake optimization (ZORDER, vacuuming, partitioning).
- Strong Python skills for data engineering workflows.
- Proficiency in Postgres (our primary transactional database).
- Hands-on experience with Airflow—building and maintaining industry-grade DAGs.
- Hands-on experience with dbt for transformation and data modeling.
- Solid understanding of medallion architecture principles.
- Experience with Unity Catalog or comparable data governance tooling.
- Proficiency with GitHub and CI/CD pipelines for data projects.
- Can start, run, manage, and complete a technical project with minimal oversight.
- Strong root cause analysis skills.
- Communicates effectively with cross-functional teams and stakeholders.
Preferred Qualifications:
- Experience with Databricks native extractors and connectors.
- Familiarity with Great Expectations or similar data quality frameworks.
- Experience optimizing Databricks costs at scale.
- Background in semantic layer design and BI performance tuning (Tableau preferred).
- Prior experience mentoring or leading data engineers.
Familiarity with healthcare data regulations (HIPAA).
Design, build, and maintain batch data pipelines using Airflow (Astronomer) and dbt, ingesting data from Postgres, Salesforce, other business critical cloud-based SaaS applications, flat files, and other internal transactional tools.
- Develop and optimize data models within a medallion architecture (Bronze/Silver/Gold) on Delta Lake.
- Write production-grade Python for custom extractors, transformations, and pipeline logic.
- Implement and enforce data governance using Unity Catalog across multi-tenant schemas.
- Strengthen CI/CD practices for data—automated testing, environment promotion, and deployment pipelines via GitHub.
- Monitor pipeline health and data quality using Datadog; proactively resolve issues.
- Optimize Databricks compute costs through cluster policies, spot instances, and query tuning.
- Collaborate with analysts to improve semantic layer design and Tableau performance at scale.
- Document pipelines and processes for clarity and maintainability.
- Participate in code reviews and provide technical mentorship to junior team members.
Compétences linguistiques
- English
Cette offre provient d’une plateforme partenaire de TieTalent. Cliquez sur « Postuler maintenant » pour soumettre votre candidature directement sur leur site.