Data Engineer

Qloo Inc.

United States

United States

Trouver des emplois similaires

À propos

divh2Data Engineer/h2pAt Qloo, we harness large-scale behavioral and catalog data to power recommendations and insights across entertainment, dining, travel, retail, and more. Our platform is built on a modern AWS data stack and supports analytics, APIs, and machine-learning models used by leading brands. We are looking for an experienced Data Engineer to help evolve and scale this platform./ph3Role Overview/h3pAs a Data Engineer at Qloo, you will design, build, and operate the pipelines that move data from external vendors, internal systems, and public sources into our S3-based data lake and downstream services. Youll work across AWS Glue, EMR (Spark), Athena/Hive, and Airflow (MWAA) to ensure that our data is accurate, well-modeled, and efficiently accessible for analytics, indexing, and machine-learning workloads./ppYou should be comfortable owning end-to-end data flows, from ingestion and transformation to quality checks, monitoring, and performance tuning./ph3Responsibilities/h3p- Design, develop, and maintain batch data pipelines using Python, Spark (EMR), and AWS Glue, loading data from S3, RDS, and external sources into Hive/Athena tables./pp- Model datasets in our S3/Hive data lake to support analytics (Hex), API use cases, Elasticsearch indexes, and ML models./pp- Implement and operate workflows in Airflow (MWAA), including dependency management, scheduling, retries, and alerting via Slack./pp- Build robust data quality and validation checks (schema validation, freshness/volume checks, anomaly detection) and ensure issues are surfaced quickly with monitoring and alerts./pp- Optimize jobs for cost and performance (partitioning, file formats, join strategies, proper use of EMR/Glue resources)./pp- Collaborate closely with data scientists, ML engineers, and application engineers to understand data requirements and design schemas and pipelines that serve multiple use cases./pp- Contribute to internal tooling and shared libraries that make working with our data platform faster, safer, and more consistent./pp- Document pipelines, datasets, and best practices so the broader team can easily understand and work with our data./ph3Qualifications/h3p- Bachelors degree in Computer Science, Software Engineering, or a related field, or equivalent practical experience./pp- Experience with Python and distributed data processing using Spark (PySpark) on EMR or a similar environment./pp- Hands-on experience with core AWS data services, ideally including:/pulliS3 (data lake, partitioning, lifecycle management)/liliAWS Glue (jobs, crawlers, catalogs)/liliEMR or other managed Spark platforms/liliAthena/Hive and SQL for querying large datasets/liliRelational databases such as RDS (PostgreSQL/MySQL or similar)/liliExperience building and operating workflows in Airflow (MWAA experience is a plus)./li/ulp- Strong SQL skills and familiarity with data modeling concepts for analytics and APIs./pp- Solid understanding of data quality practices (testing, validation frameworks, monitoring/observability)./pp- Comfortable working in a collaborative environment, managing multiple projects, and owning systems end-to-end./ph3We Offer/h3p- Competitive salary and benefits package, including health insurance, retirement plan, and paid time off./pp- The opportunity to shape a modern cloud-based data platform that powers real products and ML experiences./pp- A collaborative, low-ego work environment where your ideas are valued and your contributions are visible./pp- Flexible work arrangements (remote and hybrid options) and a healthy respect for work-life balance./p/div

United States

Compétences linguistiques

English

Avis aux utilisateurs

Cette offre a été publiée par l’un de nos partenaires. Vous pouvez consulter l’offre originale ici.

Trouver des emplois similaires