This job offer is no longer available
About
Role Summary The Data Engineer will design and implement scalable, distributed data pipelines and enterprise data lakes, leveraging Spark, Databricks, Python, and SQL. The role focuses on building high-performance ETL/ELT pipelines, metadata-driven architectures, and governed analytical data stores supporting advanced analytics and machine learning workloads in cloud environments. Key Responsibilities Pipeline Design & Optimization Design high-performance ETL/ELT pipelines using PySpark and Spark SQL to translate business requirements into optimized data pipelines, demonstrating an ability to reduce processing latency by up to 50%. Cloud Data Architecture Design and implement data models for Medallion architecture (Bronze, Silver, Gold) using Delta Lake, enabling scalable and reusable data processing. Data Ingestion & Orchestration Orchestrate data pipelines using Azure Data Factory (ADF) to reliably ingest, transform, and load enterprise datasets. Implement data ingestion pipelines, including those connecting on-premises HDFS with Azure Data Factory and Databricks, to create curated Gold-layer datasets supporting Microsoft Fabric analytics. Data Governance & Security Implement centralized data governance using Unity Catalog for managing catalogs, schemas, role-based access controls (RBAC), and fine-grained permissions. Quality Assurance & Cost Management Build scenario-based test frameworks in Databricks using PySpark for data validation. Optimize storage costs (e.g., 25% reduction in Azure Storage) by managing required history/versions of Delta tables. Operational Monitoring Generate an automated email reporting framework to set up pipeline failure alerts, reducing manual support efforts by 40%.
Languages
- English
Notice for Users
This job was posted by one of our partners. You can view the original job source here.