Cette offre d'emploi n'est plus disponible
À propos
Role Summary The Data Engineer will design and implement scalable, distributed data pipelines and enterprise data lakes, leveraging Spark, Databricks, Python, and SQL. The role focuses on building high-performance ETL/ELT pipelines, metadata-driven architectures, and governed analytical data stores supporting advanced analytics and machine learning workloads in cloud environments. Key Responsibilities Pipeline Design & Optimization Design high-performance ETL/ELT pipelines using PySpark and Spark SQL to translate business requirements into optimized data pipelines, demonstrating an ability to reduce processing latency by up to 50%. Cloud Data Architecture Design and implement data models for Medallion architecture (Bronze, Silver, Gold) using Delta Lake, enabling scalable and reusable data processing. Data Ingestion & Orchestration Orchestrate data pipelines using Azure Data Factory (ADF) to reliably ingest, transform, and load enterprise datasets. Implement data ingestion pipelines, including those connecting on-premises HDFS with Azure Data Factory and Databricks, to create curated Gold-layer datasets supporting Microsoft Fabric analytics. Data Governance & Security Implement centralized data governance using Unity Catalog for managing catalogs, schemas, role-based access controls (RBAC), and fine-grained permissions. Quality Assurance & Cost Management Build scenario-based test frameworks in Databricks using PySpark for data validation. Optimize storage costs (e.g., 25% reduction in Azure Storage) by managing required history/versions of Delta tables. Operational Monitoring Generate an automated email reporting framework to set up pipeline failure alerts, reducing manual support efforts by 40%.
Compétences linguistiques
- English
Avis aux utilisateurs
Cette offre a été publiée par l’un de nos partenaires. Vous pouvez consulter l’offre originale ici.