Senior Data Engineer - Vice President

Citi

Irving, Texas, United States

Irving, Texas, United States

Über

Responsibilities Design, build, and maintain scalable ETL/ELT pipelines using PySpark, Spark SQL, and Delta Lake on Databricks, ensuring efficient ingestion, transformation, and integration of large‑scale datasets across cloud platforms. Implement and manage data solutions on cloud platforms (AWS, GCP, Azure), leveraging cloud‑native services for data storage, processing, and analytics. Work extensively with big data frameworks and platforms such as Databricks, Snowflake, and open table formats like Apache Iceberg to process and analyze petabyte‑scale datasets. Optimize Spark workloads and Databricks clusters by tuning jobs, managing partitioning strategies, caching, and autoscaling to improve performance, reduce processing time, and control infrastructure costs. Implement and manage Lakehouse architecture using Delta Lake, enforcing data quality, schema evolution, and governance (e.g., Unity Catalog), while ensuring reliable, secure, and high‑quality data for analytics and downstream applications. Lead the design and architecture of Starburst‑based data solutions, ensuring scalability, performance, and reliability for enterprise‑level data platforms. Implement and manage data federation strategies using Starburst connectors to seamlessly integrate and query data across disparate systems (Data Lakes, RDBMS, NoSQL databases, Cloud Storage). Identify and resolve performance bottlenecks in data pipelines and queries, optimizing data storage and processing for cost and efficiency. Develop and optimize robust data pipelines with a strong focus on data governance, ensuring high data quality, comprehensive data lineage, and efficient compliant data flow from ingestion to consumption for analytical and operational needs. Design and implement data models that support business intelligence, analytics, and machine learning use cases, ensuring architecture is robust, scalable, and secure. Partner with data scientists and AI specialists to support the development and deployment of AI models, contributing to projects involving Retrieval‑Augmented Generation and Agentic AI systems by providing necessary data infrastructure and support. Operate effectively within an Agile development environment, actively participating in sprint planning, daily stand‑ups, and retrospectives to ensure iterative and timely delivery of project milestones. Provide technical leadership to steer projects toward success, making critical decisions that align with client interests and organizational goals, while mentoring junior engineers and promoting best practices. Serve as a key point of contact for stakeholders and clients, effectively communicating project progress, managing expectations, and translating complex business requirements into actionable technical tasks. Core Data Technologies Python: Expert‑level proficiency with the Python data ecosystem (Pandas, NumPy, Dask) and production‑grade code for data processing, automation, and API development. PySpark: Extensive experience with the Spark framework, deep knowledge of the DataFrame API, Spark SQL, and performance‑tuning techniques for distributed data processing. Databricks: Proven experience developing on the Databricks Lakehouse Platform, including Delta Lake, structured streaming, and Spark job optimization. Ab Initio: Practical experience with the Ab Initio suite (GDE, Co>Operating System, Conduct>It) designing enterprise‑grade ETL workflows. Snowflake: Hands‑on experience building and maintaining data warehouses, data modelling, RBAC security, performance tuning, and features such as Snowpipe and Time Travel. Starburst/Trino: Experience using federated query engines to provide unified access across disparate data sources. Apache Iceberg: Familiarity with open table formats for managing large analytic datasets. Major cloud provider: In‑depth, multi‑year experience with at least one of AWS, Google Cloud Platform, or Azure. Cloud‑native services: Building and managing data pipelines using services such as AWS Glue, Lambda, S3, Redshift; Azure Data Factory, Synapse Analytics; or Google Cloud Composer, Dataflow, BigQuery. Data lifecycle for ML: Solid understanding of the data lifecycle required for machine learning projects. AI/ML pipelines: Building pipelines to support AI/ML models, with interest or experience preparing data for advanced AI applications such as vector databases used in Retrieval‑Augmented Generation and Agentic AI systems. Agile proficiency: Deep familiarity with Agile and Scrum, delivering projects iteratively and adapting to changing requirements. Leadership & influence: Capability to influence architectural decisions and steer projects toward success aligned with client needs and organizational strategy. Client engagement: Exceptional communication skills, proven ability to engage clients, articulate complex technical concepts, and build strong stakeholder relationships. Recommended Qualifications 6‑10 years of hands‑on experience in data engineering, preferably in a large‑scale enterprise or financial services environment. Demonstrable experience leading project work streams and mentoring junior team members. Relevant industry certifications (e.g., AWS Certified Big Data, Google Professional Data Engineer, Snowflake SnowPro). Experience with containerization technologies such as Docker and orchestration tools like Kubernetes. Deep understanding of data governance, data quality, and data security principles. Excellent analytical and problem‑solving skills with the ability to work independently or as part of a team. Experience as Applications Development Manager or senior level in an Applications Development role, with stakeholder and people management experience. Consistent demonstration of clear written and verbal communication. Education Bachelor’s degree, university degree, or equivalent experience. Benefits In addition to salary, Citi offers discretionary and formulaic incentive and retention awards. Employee benefits include medical, dental, and vision coverage; 401(k); life, accident, and disability insurance; and wellness programs. Paid time off is provided, encompassing planned vacation, unplanned sick leave, and paid holidays. Salary range: $125,760.00 – $188,640.00. Offers may vary by jurisdiction, job level, and date of hire. EEO Statement Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law. If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity, review Accessibility at Citi. View Citi’s EEO Policy Statement and the Know Your Rights poster.
#J-18808-Ljbffr

Irving, Texas, United States

Sprachkenntnisse

English

Hinweis für Nutzer

Dieses Stellenangebot wurde von einem unserer Partner veröffentlicht. Sie können das Originalangebot einsehen hier.

Ähnliche Jobs finden