XX
JustinBradley

Senior Data Engineer

  • +2
  • +7
  • US
    Texas, United States
Manifester de l'intérêt pour ce poste
  • +2
  • +7
  • US
    Texas, United States

À propos

JustinBradley's client, a leading source of mortgage financing, is seeking a

Senior

Data Engineer

to join our team. This role is critical for setting up and managing Change Data Capture (CDC) for multiple types of databases to hydrate a data lake. You will work closely with teams to orchestrate the flow of raw CDC data and perform ETL transformations to ensure the data is transformed into a usable, query-able form for analytics. The ideal candidate will have hands-on experience with Apache Spark for both batch and streaming data processing and will be well-versed in performance tuning and Big Data concepts. Responsibilities: Set up and manage Change Data Capture (CDC) for various databases to ensure data flows seamlessly into a data lake. Implement ETL transformations using Apache Spark, handling both streaming and batch processing of data. Work with Apache Spark DataFrames, Spark SQL, and Spark Streaming to design and develop robust data pipelines. Orchestrate the transformation of raw CDC data into structured, analytics-ready datasets. Collaborate with cross-functional teams to understand data requirements and ensure data is correctly transformed and made available for downstream analysis. Optimize performance of data pipelines, ensuring efficient data processing and storage. Work with AWS services, including EMR, Glue Data Catalog, Lambda, and S3 to integrate, store, and manage data. Utilize Apache Airflow to orchestrate and automate workflows for data processing. Keep up-to-date with the latest trends and technologies in Big Data and cloud computing to improve system performance and scalability. Requirements: Bachelor's degree in Computer Science, Engineering, or related field, or equivalent work experience. Strong problem-solving skills with an ability to work independently and as part of a team. Ability to work in an agile environment and handle multiple tasks simultaneously. Excellent communication skills, both written and verbal. Extensive knowledge and experience with S3 and S3 operations (CRUD). Proficiency with EMR & EMR Serverless for large-scale data processing. Experience with Glue Data Catalog for managing metadata. Hands-on experience with Step Functions and Managed Workflows for Apache Airflow (MWAA) for workflow orchestration. Proficiency in Lambda (Python) for serverless applications. Experience with AWS Batch for running batch processing jobs. Familiarity with AWS Deequ (optional, but a plus) for data quality validation. Java: Mid to Senior level experience. Python: Mid-level experience (preferably with PySpark). Apache Spark: Experience with DataFrames, Spark SQL, Spark Streaming, and building ETL pipelines. Apache Airflow: Strong experience in using Airflow for orchestration. Experience with Scala. Experience with Apache Hudi. Experience with Apache Griffin. JustinBradley is an EO employer - Veterans/Disabled and other protected employees.

Compétences idéales

  • ETL
  • AWS Lambda
  • Big Data
  • Java
  • Python
  • PySpark
  • Scala
  • Texas, United States

Expérience professionnelle

  • Data Engineer
  • Data Infrastructure

Compétences linguistiques

  • English