Data Engineer with data science exp

PeopleNTech LLC

United States

United States

About

Role : Data Engineer with data science exp
Location : Scottsdale AZ (Onsite)
Rate : $75/hr.
Indent :
SF_OP_201150-1-1
We are looking for a skilled
Data Engineer with strong PySpark experience
to work on large-scale data processing and analytics initiatives. The ideal candidate will have hands-on experience working with
large datasets, complex joins, and performance optimization , along with the ability to apply
basic analytical thinking
and deliver
clear, stakeholder-ready outputs .
Key Responsibilities
Data Engineering & Development
Design, develop, and maintain scalable data pipelines using
PySpark .
Write
efficient and optimized PySpark code
to process and transform
large-scale datasets .
Handle
joins across multiple large databases , ensuring performance, accuracy, and scalability.
Optimize Spark jobs to
minimize runtime, memory usage, and compute cost .
Work with structured and semi-structured data from multiple sources.
Data Preparation & Analysis Support
Build and curate
training and analytical datasets
by joining and transforming multiple data sources.
Apply
basic analytical skills
to understand data patterns, anomalies, and business relevance.
Perform
data validation and quality checks , including:
Record counts and reconciliation
Duplicate detection
Null and outlier checks
Schema and data-type validation
Ensure datasets are
analysis-ready and trustworthy .
Stakeholder Interaction & Reporting
Understand business objectives and translate them into data requirements.
Ask the right questions to determine:
Level of aggregation required
Metrics definitions
Data freshness and accuracy expectations
Preferred output and reporting formats
Present results and insights clearly to stakeholders.
Create
reports and summaries using Excel
for business users and leadership.
Expected Technical Approach (Problem-Solving Mindset)
Candidates are expected to demonstrate the ability to:
Approach complex data projects methodically, starting with:
Understanding business objectives
Reviewing source data structure and volume
Designing efficient join strategies
Choose the right join types, partitioning strategies, and caching techniques.
Validate data at every stage of the pipeline.
Balance technical accuracy with business usability when presenting results.
Core Skill Sets (Must-Have)
Strong hands-on experience with PySpark
Extensive experience working with large datasets
Proven expertise in
joining large databases efficiently
Ability to write
high-performance, optimized code
Basic analytical skills
to interpret and validate data
Reporting skills using Excel
Good to Have Skills
Experience in
model development
or supporting analytics/modeling teams
SAS
experience
Exposure to
Cloudera
or similar big data platforms
Understanding of data warehousing and analytics workflows
Soft Skills & Competencies
Strong problem-solving and logical thinking
DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.

United States

Languages

English

Notice for Users

This job was posted by one of our partners. You can view the original job source here.

Find similar jobs