This job offer is no longer available
About
We are seeking an experienced Data Engineer to join our team. The ideal candidate will have strong expertise in designing and implementing efficient and scalable data pipelines using Apache Spark with Java, integrating data from diverse sources into data lakes (e.g., S3) and data warehouses (e.g., Redshift). This role offers an exciting opportunity to work on enterprise-level data engineering projects.
Key Responsibilities: Design and develop efficient and scalable data pipelines using Apache Spark with Java, integrating data from diverse sources into data lakes (e.g., S3) and data warehouses (e.g., Redshift). Leverage a range of AWS services for data storage, processing, and analytics, including but not limited to S3, Redshift, Glue, EMR, Lambda, Kinesis, and DynamoDB. Optimize Spark applications and data pipelines for performance, cost-efficiency, and reliability, including tuning Spark configurations and utilizing appropriate AWS resources. Design and implement data models for structured and unstructured data, and contribute to the overall data architecture strategy within an AWS environment. Work closely with data scientists, analysts, and other stakeholders to understand data requirements and deliver solutions that meet business needs. Implement monitoring solutions for data pipelines and infrastructure, and troubleshoot issues to ensure data quality and system stability. Implement data security measures and adhere to data governance policies within the AWS ecosystem. Create and maintain comprehensive documentation for data engineering processes, designs, and deployments. Required Technical Skills:
Strong proficiency in Java, with experience in developing Spark applications. In-depth knowledge and hands-on experience with Apache Spark, including Spark SQL, DataFrames, and RDDs. Extensive experience with AWS services, particularly those related to data engineering (S3, Redshift, Glue, EMR, Kinesis, Lambda). Experience with data warehousing concepts and technologies (e.g., Redshift, Snowflake), and building data lakes on S3. Proficiency in SQL and experience with both relational and NoSQL databases. Strong understanding and practical experience in designing and implementing ETLELT processes. Excellent analytical and problem-solving skills, with the ability to troubleshoot complex data issues. Required Qualifications:
8-10 years of experience in data engineering. Relevant certification in data engineering or related field. Bachelor's degree in Computer Science or related field. Preferred Qualifications:
Experience with PySpark. Strong communication and collaboration skills to work effectively within a team and with various stakeholders.
Languages
- English
Notice for Users
This job was posted by one of our partners. You can view the original job source here.