Data Engineer

Veriipro

New York, New York, United States

New York, New York, United States

About

Key Responsibilities
Design, develop, and optimize scalable ETL/ELT pipelines using Databricks and Apache Spark (PySpark/Scala).
Build robust data ingestion workflows from structured and semi-structured sources.
Develop reusable components and frameworks to streamline data processing and integration.
Implement best practices for data quality, validation, and governance.
Collaborate with data architects, analysts, and business stakeholders to understand requirements and translate them into data solutions.
Tune Spark jobs for performance and scalability in cloud-based environments.
Maintain and optimize data lake or Lakehouse architecture for high availability, security, and integrity.
Support troubleshooting, debugging, and performance optimization in production workloads.
Migrate on-premises applications to cloud service providers.
Automate deployment and infrastructure using CI/CD pipelines (GitLab) and Terraform.
Ensure adherence to coding standards and perform code reviews for quality and optimal execution.
Partner with business and technical teams to implement best practices and achieve agile delivery goals
Must-Have Skills & Technical Expertise
Databricks SaaS: Designing and building ETL/ELT pipelines.
Apache Spark: PySpark and Scala programming.
SQL & Python: Strong programming and querying skills.
AWS Services: EC2, EMR, S3, EBS, RDS, DynamoDB, Glue, Identity & Access Management, Networking, Monitoring.
Container Orchestration: Docker and Kubernetes.
CI/CD & Deployment: GitLab pipelines, automation.
Infrastructure as Code: Terraform.
Data Architecture: Lakehouse, Starburst, Trino, federated queries.
ETL / Data Pipelines: Development, optimization, and monitoring.
#J-18808-Ljbffr

New York, New York, United States

Languages

English

Notice for Users

This job was posted by one of our partners. You can view the original job source here.

Find similar jobs