Sr AI Data Engineer

Honeywell Aerospace US LLC

United States

United States

Postuler Maintenant

À propos

Overview As a Sr Data Engineer at Honeywell, you will design and implement advanced data solutions for AI, driving business insights and enhancing decision‑making across all AI modalities.
You will report to the AI Director and work out of Phoenix, AZ or Charlotte, NC on a hybrid schedule.
Key Responsibilities
Support end‑to‑end data needs for
all AI modalities , including classic ML, GenAI/LLMs, and agentic AI systems.
Build robust, scalable data pipelines for
structured, semi‑structured, and unstructured data , including text, documents, images, audio, video, and logs.
Develop feature engineering pipelines for classic ML, including feature extraction, transformation, and feature store management.
Build and optimize
GenAI and LLM data pipelines , including embedding generation, vectorization, chunking, metadata extraction, and document enrichment for RAG and context retrieval.
Develop data ingestion and orchestration workflows that support
agentic AI , including memory stores, event‑driven pipelines, tool‑use data flows, and real‑time retrieval services.
Design and implement advanced data solutions using
AWS (S3, Glue, Lambda, EMR, Kinesis), Databricks (Spark, Delta Lake, Vector Search), and Dataiku
to enable intelligent systems at scale.
Implement data governance, quality, lineage, monitoring, and observability to support high‑performance, trustworthy AI.
Partner with data scientists, ML engineers, and AI product teams to deliver datasets for model development, fine‑tuning, evaluation, and production inference.
Optimize pipelines for latency, cost, reliability, and throughput, ensuring AI systems—from batch ML to real‑time agents—have the data they need.
Responsibilities
Lead the design, automation, and operation of
end‑to‑end MLOps pipelines
supporting classic ML, GenAI/LLM systems, and agentic AI workloads across Databricks and Dataiku.
Build, maintain, and optimize training, evaluation, and deployment pipelines, ensuring reliability, reproducibility, and alignment with business objectives.
Collaborate with data scientists, AI software developers, data engineers, and platform engineers to operationalize models, LLMs, RAG workflows, and agentic AI capabilities.
Architect and implement solutions for
distributed training , hyperparameter optimization, accelerated inference, and performance‑tuned model serving.
Develop automated testing, validation, governance, and monitoring frameworks for ML/LLM/agentic workflows, including drift detection, model quality, and guardrail coverage.
Own CI/CD pipelines for model assets, prompts, embeddings, vector search updates, and agent tool registries using GitHub Actions and modern ML deployment frameworks.
Manage MLflow experiment tracking, model registry lifecycle, lineage, and promotion flows across multiple environments in Databricks and Dataiku.
Optimize integration between ML frameworks (PyTorch, TensorFlow, scikit‑learn) and cloud‑based compute ecosystems including Spark, Kubernetes, and serverless runtimes.
Ensure production‑grade reliability, scalability, performance, and observability of all deployed AI workloads (classic → GenAI → agentic).
Establish best practices, patterns, reusable templates, and standards for MLOps across the AI delivery lifecycle.
Qualifications You must have:
Bachelor’s degree in a technical discipline such as science, technology, engineering, mathematics.
5 or more years of experience in data engineering, distributed data systems, or ML data pipelines.
Strong experience working with
Apache Spark , preferably in Databricks.
Proficiency in Python and SQL; experience with distributed computing and big data frameworks.
Hands‑on experience with
cloud‑based ETL/ELT pipelines , preferably AWS (S3, Glue, Lambda, EMR, Step Functions, Redshift, Athena).
Experience building data solutions that support
multiple AI workloads , including:
ML training and inference data flows
Unstructured data ingestion and transformation
Embedding/vector pipelines for LLMs
Experience working with data modeling, data integration, ETL/ELT frameworks, and reliable production‑grade pipelines.
We Value
Bachelor’s degree in a technical field (CS, Engineering, Math, or related).
Experience supporting AI at scale across
classic ML, GenAI/LLM, and agentic AI systems .
Experience with vector databases and semantic search (Databricks Vector Search, Pinecone, FAISS, Milvus, OpenSearch).
Familiarity with
LLM and GenAI data preparation , including:
Text processing
Tokenization
Chunking strategies
Prompt/context formatting
Experience with unstructured data technologies (OCR, NLP pipelines, computer vision data processing).
Hands‑on experience with Dataiku for automation, workflow orchestration, and AI project management.
Knowledge of MLOps tooling: MLflow, Delta Lake, experiment tracking, CI/CD for ML.
Understanding of
agentic AI system patterns , such as memory architectures, tool APIs, event‑driven workflows, and reasoning chain data requirements.
Strong analytical mindset, attention to detail, and commitment to high data quality.
Ability to thrive in a fast‑paced, evolving AI environment and collaborate across cross‑functional teams.
Benefits of Working for Honeywell In addition to a competitive salary and cutting‑edge projects, Honeywell employees receive a comprehensive benefits package, including employer‑subsidized medical, dental, vision, and life insurance; short‑term and long‑term disability; 401(k) matching; flexible spending accounts; health savings accounts; employee assistance programs; educational assistance; parental leave; paid time off for vacation, personal business, sick time, and parental leave; and 12 paid holidays.
US Citizen Requirement Must be a U.S. citizen due to contractual requirements.
#J-18808-Ljbffr

United States

Compétences linguistiques

English

Avis aux utilisateurs

Cette offre provient d’une plateforme partenaire de TieTalent. Cliquez sur « Postuler maintenant » pour soumettre votre candidature directement sur leur site.

Postuler Maintenant