Über
Location Dallas, TX / Pittsburgh, PA / Cleveland, OH (Onsite)
Term C2C/W2 role
Duration Long Term
Job Description Data Engineer with 5+ years of experience in designing, developing, and maintaining scalable data pipelines supporting analytics, reporting, and operational platforms. The ideal candidate will have strong expertise in Spark, PySpark, Airflow, SQL, Data Lakes, and large-scale batch processing environments.
Responsibilities
Design and build scalable data pipelines aligned with business requirements.
Process large datasets using batch and near real-time processing frameworks.
Ensure data quality, consistency, governance, and reliability across systems.
Support data integration and transformation initiatives for analytics and reporting platforms.
Maintain metadata repositories, data dictionaries, and technical documentation.
Participate in data architecture reviews and data model validation activities.
Support analytics, reporting, and risk management platforms.
Collaborate with cross-functional teams to align enterprise data solutions with business objectives.
Required Qualifications
5+ years of experience in Data Engineering and Big Data processing.
Strong expertise in:
Apache Spark (Spark Core, Spark SQL)
PySpark
Large-scale batch processing
Experience working with structured and semi-structured data, complex transformations, and performance tuning.
Hands‑on experience with data ingestion and integration from:
Oracle
SQL Server
Hive
HDFS
Amazon S3
Experience building and maintaining curated data models.
Experience writing data to:
Hive Tables
Data Lakes (Iceberg)
Downstream reporting systems
Strong SQL and data modeling skills.
Hands‑on experience with Apache Airflow:
DAG Development
Scheduling
Monitoring
Workflow Orchestration
Proficiency in Shell Scripting:
Job Automation
File Validation
Dependency Management
Logging & Error Handling
Spark Job Execution
Data Archival & Purging
Strong understanding of batch processing and scheduling frameworks.
Experience migrating job schedules from:
CA7
Control‑M
Airflow
Experience implementing CI/CD for data pipelines.
Experience ensuring data quality, reliability, governance, and compliance in regulated environments.
Strong communication and documentation skills.
Preferred Skills
Banking / Financial Services Domain
Risk Reporting Platforms
Data Governance
Enterprise Data Architecture
Near Real-Time Data Processing
Key Technologies Apache Spark | Spark SQL | PySpark | Apache Airflow | SQL | Hive | HDFS | S3 | Iceberg | Oracle | SQL Server | Shell Scripting | ETL | Data Pipelines | Data Modeling | Control-M | CA7 | CI/CD
#J-18808-Ljbffr
Sprachkenntnisse
- English
Hinweis für Nutzer
Dieses Stellenangebot stammt von einer Partnerplattform von TieTalent. Klick auf „Jetzt Bewerben”, um deine Bewerbung direkt auf deren Website einzureichen.