Data Engineer AI Systems

MAK Technologies LLC

St Louis, Missouri, United States

St Louis, Missouri, United States

Ähnliche Jobs finden

Über

Job Title:
Data Engineer - AI Systems

6 Months

St. Louis, MO Day 1 onsite role

Data Engineer – AI Systems (Databricks)

Primary Skills:
Data Engineer, Databricks, Python, PySpark, AI/ML

We'rebuilding intelligent, Databricks-powered AI systems that structure and activate information from diverse enterprise sources (Confluence, OneDrive, PDFs, andmore). As a
Data Engineer
, you'll design and optimize the data pipelinesthat transform raw and unstructured content into clean, AI-ready datasets formachine learning and generative AI agents.

You'llcollaborate with a cross-functional team of Machine Learning Engineers,Software Developers, and domain experts to create high-quality data foundationsthat power Databricks-native AI agents and retrieval systems.

KeyResponsibilities

Develop Scalable Pipelines:
Design, build, and maintain high-performance ETL and ELT workflows using Databricks, PySpark, and Delta Lake.
Data Integration:
Build APIs and connectors to ingest data from collaboration platforms such as Confluence, OneDrive, and other enterprise systems.
Unstructured Data Handling:
Implement extraction and transformation pipelines for text, PDFs, and scanned documents using Databricks OCR and related tools.
Data Modeling:
Design Delta Lake and Unity Catalog data models for both structured and vectorized (embedding-based) data stores.
Data Quality & Observability:
Apply validation, version control, and quality checks to ensure pipeline reliability and data accuracy.
Collaboration:
Work closely with ML Engineers to prepare datasets for LLM fine-tuning and vector database creation, and with Software Engineers to deliver end-to-end data services.
Performance & Automation:
Optimize workflows for scale and automation, leveraging Databricks Jobs, Workflows, and CI/CD best practices.

What YouBring

Experience with
data engineering, ETL development
, or
data pipeline automation
.
Proficiency in
Python
,
SQL
, and
PySpark
.
Hands-on experience with
Databricks
,
Spark
, and
Delta Lake
.
Familiarity with
data APIs
,
JSON
, and unstructured data processing (OCR, text extraction).
Understanding of
data versioning
,
schema evolution
, and
data lineage
concepts.
Interest in
AI/ML data pipelines
,
vector databases
, and
intelligent data systems
.

BonusSkills

Experience with
vector databases
(e.g., Pinecone, Chroma, FAISS) or Databricks'
Vector Search
.
Exposure to
LLM-based architectures
,
LangChain
, or
Databricks Mosaic AI
.
Knowledge of
data governance frameworks
,
Unity Catalog
, or
access control
best practices.

Familiarity with
REST API development
or
data synchronization services
(e.g., Airbyte, Fivetran, custom connectors

St Louis, Missouri, United States

Sprachkenntnisse

English

Hinweis für Nutzer

Dieses Stellenangebot wurde von einem unserer Partner veröffentlicht. Sie können das Originalangebot einsehen hier.

Ähnliche Jobs finden