XX
Data Engineer with AI - RemoteLorven TechnologiesUnited States
XX

Data Engineer with AI - Remote

Lorven Technologies
  • US
    United States
  • US
    United States

À propos

I hope you are doing well,
Please share your updated profile if you are interested in the below role.
Our client seeks an
Data Engineer + AI
for a
12 Months
project in
Boston, MA.
Below is the detailed requirement
Job Title: Data Engineer + AI
Work location :
Boston, MA
Duration: 12 Months
Job Summary:
We're looking for a
Senior Data Engineer to build and scale our lakehouse and AI data pipelines on Databricks.
You'll design robust
ETL/ELT, enable feature engineering for ML/LLM use cases , and drive best practices for reliability, performance, and cost.
What you'll do
Design, build, and maintain batch/streaming pipelines in
Python + PySpark
on
Databricks
(Delta Lake, Autoloader, Structured Streaming). Implement data models (Bronze/Silver/Gold), optimize with partitioning, Z-ORDER, and indexing, and manage reliability (DLT/Jobs, monitoring, alerting). Enable ML/AI: feature engineering,
MLflow
experiment tracking, model registries, and model/feature serving; support RAG pipelines (embeddings, vector stores). Establish data quality checks (e.g., Great Expectations), lineage, and governance (Unity Catalog, RBAC). Collaborate with Data Science/ML and Product to productionize models and AI workflows; champion CI/CD and IaC. Troubleshoot performance and cost issues; mentor engineers and set coding standards. Must-have qualifications
6-10+ years in data engineering with a track record of production pipelines. Expert in
Python
and
PySpark
(UDFs, Window functions, Spark SQL, Catalyst basics). Deep hands-on
Databricks : Delta Lake, Jobs/Workflows, Structured Streaming, SQL Warehouses; practical tuning and cost optimization. Strong SQL and data modeling (dimensional, medallion, CDC). ML/AI enablement experience:
MLflow , feature stores, model deployment/monitoring; familiarity with LLM workflows (embeddings, vectorization, prompt/response logging). Cloud proficiency on
AWS/Azure/GCP
(object storage, IAM, networking). CI/CD (GitHub/GitLab/Azure DevOps), testing (pytest), and observability (logs/metrics). Nice to have
Databricks
Delta Live Tables , Unity Catalog automation, Model Serving. Orchestration (Airflow/Databricks Workflows), messaging (Kafka/Kinesis/Event Hubs). Data quality & lineage tools (Great Expectations, OpenLineage). Vector DBs (FAISS, pgvector, Pinecone), RAG frameworks (LangChain/LlamaIndex). IaC (Terraform), security/compliance (PII handling, data masking). Experience interfacing with BI tools (Power BI, Tableau, Databricks SQL).
  • United States

Compétences linguistiques

  • English
Avis aux utilisateurs

Cette offre provient d’une plateforme partenaire de TieTalent. Cliquez sur « Postuler maintenant » pour soumettre votre candidature directement sur leur site.