Dieses Stellenangebot ist nicht mehr verfügbar
Data Engineer AI Systems
- St Louis, Missouri, United States
- St Louis, Missouri, United States
Über
Job Title:
Data Engineer - AI Systems
6 Months
St. Louis, MO Day 1 onsite role
Data Engineer – AI Systems (Databricks)
Primary Skills:
Data Engineer, Databricks, Python, PySpark, AI/ML
We'rebuilding intelligent, Databricks-powered AI systems that structure and activate information from diverse enterprise sources (Confluence, OneDrive, PDFs, andmore). As a
Data Engineer
, you'll design and optimize the data pipelinesthat transform raw and unstructured content into clean, AI-ready datasets formachine learning and generative AI agents.
You'llcollaborate with a cross-functional team of Machine Learning Engineers,Software Developers, and domain experts to create high-quality data foundationsthat power Databricks-native AI agents and retrieval systems.
KeyResponsibilities
- Develop Scalable Pipelines:
Design, build, and maintain high-performance ETL and ELT workflows using Databricks, PySpark, and Delta Lake. - Data Integration:
Build APIs and connectors to ingest data from collaboration platforms such as Confluence, OneDrive, and other enterprise systems. - Unstructured Data Handling:
Implement extraction and transformation pipelines for text, PDFs, and scanned documents using Databricks OCR and related tools. - Data Modeling:
Design Delta Lake and Unity Catalog data models for both structured and vectorized (embedding-based) data stores. - Data Quality & Observability:
Apply validation, version control, and quality checks to ensure pipeline reliability and data accuracy. - Collaboration:
Work closely with ML Engineers to prepare datasets for LLM fine-tuning and vector database creation, and with Software Engineers to deliver end-to-end data services. - Performance & Automation:
Optimize workflows for scale and automation, leveraging Databricks Jobs, Workflows, and CI/CD best practices.
What YouBring
- Experience with
data engineering, ETL development
, or
data pipeline automation
. - Proficiency in
Python
,
SQL
, and
PySpark
. - Hands-on experience with
Databricks
,
Spark
, and
Delta Lake
. - Familiarity with
data APIs
,
JSON
, and unstructured data processing (OCR, text extraction). - Understanding of
data versioning
,
schema evolution
, and
data lineage
concepts. - Interest in
AI/ML data pipelines
,
vector databases
, and
intelligent data systems
.
BonusSkills
- Experience with
vector databases
(e.g., Pinecone, Chroma, FAISS) or Databricks'
Vector Search
. - Exposure to
LLM-based architectures
,
LangChain
, or
Databricks Mosaic AI
. - Knowledge of
data governance frameworks
,
Unity Catalog
, or
access control
best practices.
Familiarity with
REST API development
or
data synchronization services
(e.g., Airbyte, Fivetran, custom connectors
Sprachkenntnisse
- English
Dieses Stellenangebot wurde von einem unserer Partner veröffentlicht. Sie können das Originalangebot einsehen hier.