Data Engineer with AI - Remote

Lorven Technologies

United States

United States

Postuler Maintenant

À propos

I hope you are doing well,
Please share your updated profile if you are interested in the below role.
Our client seeks an
Data Engineer + AI
for a
12 Months
project in
Boston, MA.
Below is the detailed requirement
Job Title: Data Engineer + AI
Work location :
Boston, MA
Duration: 12 Months
Job Summary:
We're looking for a
Senior Data Engineer to build and scale our lakehouse and AI data pipelines on Databricks.
You'll design robust
ETL/ELT, enable feature engineering for ML/LLM use cases , and drive best practices for reliability, performance, and cost.
What you'll do
Design, build, and maintain batch/streaming pipelines in
Python + PySpark
on
Databricks
(Delta Lake, Autoloader, Structured Streaming). Implement data models (Bronze/Silver/Gold), optimize with partitioning, Z-ORDER, and indexing, and manage reliability (DLT/Jobs, monitoring, alerting). Enable ML/AI: feature engineering,
MLflow
experiment tracking, model registries, and model/feature serving; support RAG pipelines (embeddings, vector stores). Establish data quality checks (e.g., Great Expectations), lineage, and governance (Unity Catalog, RBAC). Collaborate with Data Science/ML and Product to productionize models and AI workflows; champion CI/CD and IaC. Troubleshoot performance and cost issues; mentor engineers and set coding standards. Must-have qualifications
6-10+ years in data engineering with a track record of production pipelines. Expert in
Python
and
PySpark
(UDFs, Window functions, Spark SQL, Catalyst basics). Deep hands-on
Databricks : Delta Lake, Jobs/Workflows, Structured Streaming, SQL Warehouses; practical tuning and cost optimization. Strong SQL and data modeling (dimensional, medallion, CDC). ML/AI enablement experience:
MLflow , feature stores, model deployment/monitoring; familiarity with LLM workflows (embeddings, vectorization, prompt/response logging). Cloud proficiency on
AWS/Azure/GCP
(object storage, IAM, networking). CI/CD (GitHub/GitLab/Azure DevOps), testing (pytest), and observability (logs/metrics). Nice to have
Databricks
Delta Live Tables , Unity Catalog automation, Model Serving. Orchestration (Airflow/Databricks Workflows), messaging (Kafka/Kinesis/Event Hubs). Data quality & lineage tools (Great Expectations, OpenLineage). Vector DBs (FAISS, pgvector, Pinecone), RAG frameworks (LangChain/LlamaIndex). IaC (Terraform), security/compliance (PII handling, data masking). Experience interfacing with BI tools (Power BI, Tableau, Databricks SQL).

United States

Compétences linguistiques

English

Avis aux utilisateurs

Cette offre provient d’une plateforme partenaire de TieTalent. Cliquez sur « Postuler maintenant » pour soumettre votre candidature directement sur leur site.

Postuler Maintenant