About
Role Summary The Data Engineer will design and implement scalable, distributed data pipelines and enterprise data lakes, leveraging Spark, Databricks, Python, and SQL. The role focuses on building high-performance ETL/ELT pipelines, metadata-driven architectures, and governed analytical data stores supporting advanced analytics and machine learning workloads in cloud environments. Key Responsibilities Pipeline Design & Optimization Design high-performance ETL/ELT pipelines using PySpark and Spark SQL to translate business requirements into optimized data pipelines, demonstrating an ability to reduce processing latency by up to 50%. Cloud Data Architecture Design and implement data models for Medallion architecture (Bronze, Silver, Gold) using Delta Lake, enabling scalable and reusable data processing. Data Ingestion & Orchestration Orchestrate data pipelines using Azure Data Factory (ADF) to reliably ingest, transform, and load enterprise datasets. Implement data ingestion pipelines, including those connecting on-premises HDFS with Azure Data Factory and Databricks, to create curated Gold-layer datasets supporting Microsoft Fabric analytics. Data Governance & Security Implement centralized data governance using Unity Catalog for managing catalogs, schemas, role-based access controls (RBAC), and fine-grained permissions. Quality Assurance & Cost Management Build scenario-based test frameworks in Databricks using PySpark for data validation. Optimize storage costs (e.g., 25% reduction in Azure Storage) by managing required history/versions of Delta tables. Operational Monitoring Generate an automated email reporting framework to set up pipeline failure alerts, reducing manual support efforts by 40%.
Languages
- English
Notice for Users
This job comes from a TieTalent partner platform. Click "Apply Now" to submit your application directly on their site.