Cette offre d'emploi n'est plus disponible
À propos
Location: Dallas, TX / Pittsburgh, PA / Cleveland, OH (Onsite) Term: C2C/W2 role Duration: Long Term Data Engineer with 5+ years of experience in designing, developing, and maintaining scalable data pipelines supporting analytics, reporting, and operational platforms. The ideal candidate will have strong expertise in Spark, PySpark, Airflow, SQL, Data Lakes, and large-scale batch processing environments. Responsibilities
Design and build scalable data pipelines aligned with business requirements. Process large datasets using batch and near real-time processing frameworks. Ensure data quality, consistency, governance, and reliability across systems. Support data integration and transformation initiatives for analytics and reporting platforms. Maintain metadata repositories, data dictionaries, and technical documentation. Participate in data architecture reviews and data model validation activities. Support analytics, reporting, and risk management platforms. Collaborate with cross-functional teams to align enterprise data solutions with business objectives. Required Qualifications
5+ years of experience in Data Engineering and Big Data processing. Strong expertise in:
Apache Spark (Spark Core, Spark SQL) PySpark Large-scale batch processing
Experience working with structured and semi-structured data, complex transformations, and performance tuning. Hands-on experience with data ingestion and integration from:
Oracle SQL Server Hive HDFS Amazon S3
Experience building and maintaining curated data models. Experience writing data to:
Hive Tables Data Lakes (Iceberg) Downstream reporting systems
Strong SQL and data modeling skills. Hands-on experience with Apache Airflow:
DAG Development Scheduling Monitoring Workflow Orchestration
Proficiency in Shell Scripting:
Job Automation File Validation Dependency Management Logging & Error Handling Spark Job Execution Data Archival & Purging
Strong understanding of batch processing and scheduling frameworks. Experience migrating job schedules from:
CA7 Control-M Airflow
Experience implementing CI/CD for data pipelines. Experience ensuring data quality, reliability, governance, and compliance in regulated environments. Strong communication and documentation skills.
Compétences linguistiques
- English
Avis aux utilisateurs
Cette offre a été publiée par l’un de nos partenaires. Vous pouvez consulter l’offre originale ici.