AWS Data Engineer - Citizen/GC/GC-EAD
TechAxis
- Irving, Texas, United States
- Irving, Texas, United States
Über
Responsibilities
AWS Solution Design & Implementation: Design, develop, and deploy scalable and cost-effective data solutions on AWS, leveraging services such as S3 (for data lakes), EC2, EMR, Glue, Athena, Lambda, Redshift, and Kinesis.
Data Pipeline Development: Build and maintain robust ETL/ELT data pipelines using PySpark for data ingestion, transformation, and loading into various data stores, including those utilizing open table formats like Iceberg.
Big Data Processing: Develop and optimize big data processing jobs using PySpark on AWS EMR or AWS Glue, handling large datasets efficiently and integrating with Iceberg table formats.
Data Warehousing: Design, implement, and manage data warehousing solutions, including schema design, data modeling, and query optimization, with a focus on Hive and modern data lake table formats like Iceberg for historical data and analytical queries.
Cloud Infrastructure & Networking: Implement secure and robust cloud infrastructure components, including VPCs, subnets, routing, and security groups, to ensure proper connectivity and isolation for data solutions.
Containerized Workloads: Design, deploy, and manage containerized data processing applications on Amazon Elastic Kubernetes Service (EKS).
Performance Tuning & Optimization: Optimize AWS resources and big data applications (Spark, Hive, Iceberg) for performance, cost, and efficiency.
Data Governance & Security: Implement best practices for data security, access control, and compliance within AWS, including IAM policies, S3 bucket policies, and encryption.
Monitoring & Troubleshooting: Set up monitoring, alerting, and logging for data pipelines and AWS infrastructure; troubleshoot and resolve issues promptly.
Automation: Develop and maintain automation scripts using Python and shell scripting for infrastructure provisioning, deployment, and operational tasks.
Collaboration: Work closely with data scientists, analysts, and other engineering teams to understand data requirements and deliver reliable data solutions.
Qualifications
AWS Certification: Hold at least one AWS certification (e.g., AWS Certified Solutions Architect – Associate, AWS Certified Data Analytics – Specialty, AWS Certified Developer – Associate).
AWS Services Expertise: Hands-on experience with key AWS services for data processing and storage including:
Networking: VPC, Subnets, Routing, Security Groups
Containerization: EKS
Big Data Processing: Strong proficiency in PySpark for developing complex data transformations and analytics.
Data Lake Table Formats: Practical experience with Apache Iceberg for managing and querying data lakes.
Data Warehousing: In-depth knowledge and practical experience with Apache Hive for data storage, querying, and schema management.
Programming Languages: Python: Expert-level proficiency in Python for scripting, data manipulation, and AWS automation (Boto3).
Shell Scripting: Proficient in shell scripting for automation and operational tasks.
Database & SQL: Strong SQL skills for data querying and manipulation.
Data Concepts: Solid understanding of ETL/ELT processes, data modeling, distributed computing, and data governance.
Good to Have Skills
Containerization Orchestration: Experience with Kubernetes for deploying and managing containerized applications.
CI/CD: Experience with CI/CD tools and practices (e.g., AWS CodePipeline, GitHub Actions, GitLab CI) for automating deployment of data solutions.
Orchestration: Experience with workflow orchestration tools like Apache Airflow.
Version Control: Proficient in using Git for source code management.
Other Big Data Technologies: Exposure to other big data technologies like Apache Kafka, Flink, or Presto.
#J-18808-Ljbffr
Sprachkenntnisse
- English
Hinweis für Nutzer
Dieses Stellenangebot stammt von einer Partnerplattform von TieTalent. Klick auf „Jetzt Bewerben”, um deine Bewerbung direkt auf deren Website einzureichen.