This job offer is no longer available
Data Engineer
Veriipro
- New York, New York, United States
- New York, New York, United States
About
Design, develop, and optimize scalable ETL/ELT pipelines using Databricks and Apache Spark (PySpark/Scala).
Build robust data ingestion workflows from structured and semi-structured sources.
Develop reusable components and frameworks to streamline data processing and integration.
Implement best practices for data quality, validation, and governance.
Collaborate with data architects, analysts, and business stakeholders to understand requirements and translate them into data solutions.
Tune Spark jobs for performance and scalability in cloud-based environments.
Maintain and optimize data lake or Lakehouse architecture for high availability, security, and integrity.
Support troubleshooting, debugging, and performance optimization in production workloads.
Migrate on-premises applications to cloud service providers.
Automate deployment and infrastructure using CI/CD pipelines (GitLab) and Terraform.
Ensure adherence to coding standards and perform code reviews for quality and optimal execution.
Partner with business and technical teams to implement best practices and achieve agile delivery goals
Must-Have Skills & Technical Expertise
Databricks SaaS: Designing and building ETL/ELT pipelines.
Apache Spark: PySpark and Scala programming.
SQL & Python: Strong programming and querying skills.
AWS Services: EC2, EMR, S3, EBS, RDS, DynamoDB, Glue, Identity & Access Management, Networking, Monitoring.
Container Orchestration: Docker and Kubernetes.
CI/CD & Deployment: GitLab pipelines, automation.
Infrastructure as Code: Terraform.
Data Architecture: Lakehouse, Starburst, Trino, federated queries.
ETL / Data Pipelines: Development, optimization, and monitoring.
#J-18808-Ljbffr
Languages
- English
Notice for Users
This job was posted by one of our partners. You can view the original job source here.