This job offer is no longer available
About
Our client seeks a Data Engineer to build a modern data platform using Apache Iceberg on AWS with Spark, Kafka, and Python. The role emphasizes scalable data pipelines, distributed processing, and lakehouse best practices to support reliable analytics.
Due to client requirements, applicants must be willing and able to work on a w2 basis. For our w2 consultants, we offer a great benefits package that includes Medical, Dental, and Vision benefits, 401k with company matching, and life insurance.
Rate: $54.00 to $64.00/hr. w2
Responsibilities:
Design and build Apache Iceberg-based data lakes with ACID-compliant, versioned datasets. Implement Iceberg table evolution including schema evolution, partition specifications, and snapshot management. Develop best practices for Iceberg governance, metadata compaction, and performance tuning. Build scalable batch and streaming pipelines using AWS services such as S3, EMR, Glue, Lambda, and Step Functions. Develop ingestion and transformation workflows using Python, Spark, or Flink. Implement CDC pipelines using Kafka Connect or equivalent tooling. Ensure CI/CD integration with GitHub Actions or similar tooling. Design and operate Kafka-based streaming pipelines, including MSK. Build Kafka producers and consumers using Python or JVM languages. Implement topic partitioning, compaction, schema registry usage, and event versioning patterns. Design Iceberg-based data models for analytical and operational use cases. Implement automated data quality checks, validation rules, and anomaly detection. Build lineage, monitoring, alerting, and pipeline observability. Apply AWS security, cost optimization, and data governance best practices. Manage IAM, KMS, S3 lifecycle management, networking, and data encryption. Operationalize EMR/Glue jobs, containerized workloads, or serverless workloads. Partner with analytics, platform, and product teams to deliver high-quality data products. Participate in design reviews, architecture discussions, and roadmap planning. Mentor junior engineers and contribute to engineering standards. Experience Requirements:
4 to 10+ years of experience in data engineering or similar roles. Hands-on experience with Apache Iceberg including table design, evolution, metadata management, and partitioning. Deep experience with AWS data stack including S3, EMR, Lambda, Glue, IAM, Step Functions, and CloudWatch. Proficiency with distributed engines such as Spark, Flink, or PySpark. Fluency in Python for data pipelines, automation, and APIs. Expertise building scalable ETL/ELT pipelines and real-time streaming architectures. Strong SQL and data modeling expertise. Kafka experience including producers/consumers, schema registry, and partitioning strategies (preferred).
Languages
- English
Notice for Users
This job was posted by one of our partners. You can view the original job source here.