XX
Data Engineer AI SystemsMAK Technologies LLCSt Louis, Missouri, United States

Dieses Stellenangebot ist nicht mehr verfügbar

XX

Data Engineer AI Systems

MAK Technologies LLC
  • US
    St Louis, Missouri, United States
  • US
    St Louis, Missouri, United States

Über

Job Title:
Data Engineer - AI Systems

6 Months

St. Louis, MO Day 1 onsite role

Data Engineer – AI Systems (Databricks)

Primary Skills:
Data Engineer, Databricks, Python, PySpark, AI/ML

We'rebuilding intelligent, Databricks-powered AI systems that structure and activate information from diverse enterprise sources (Confluence, OneDrive, PDFs, andmore). As a
Data Engineer
, you'll design and optimize the data pipelinesthat transform raw and unstructured content into clean, AI-ready datasets formachine learning and generative AI agents.

You'llcollaborate with a cross-functional team of Machine Learning Engineers,Software Developers, and domain experts to create high-quality data foundationsthat power Databricks-native AI agents and retrieval systems.

KeyResponsibilities

  • Develop Scalable Pipelines:
    Design, build, and maintain high-performance ETL and ELT workflows using Databricks, PySpark, and Delta Lake.
  • Data Integration:
    Build APIs and connectors to ingest data from collaboration platforms such as Confluence, OneDrive, and other enterprise systems.
  • Unstructured Data Handling:
    Implement extraction and transformation pipelines for text, PDFs, and scanned documents using Databricks OCR and related tools.
  • Data Modeling:
    Design Delta Lake and Unity Catalog data models for both structured and vectorized (embedding-based) data stores.
  • Data Quality & Observability:
    Apply validation, version control, and quality checks to ensure pipeline reliability and data accuracy.
  • Collaboration:
    Work closely with ML Engineers to prepare datasets for LLM fine-tuning and vector database creation, and with Software Engineers to deliver end-to-end data services.
  • Performance & Automation:
    Optimize workflows for scale and automation, leveraging Databricks Jobs, Workflows, and CI/CD best practices.

What YouBring

  • Experience with
    data engineering, ETL development
    , or
    data pipeline automation
    .
  • Proficiency in
    Python
    ,
    SQL
    , and
    PySpark
    .
  • Hands-on experience with
    Databricks
    ,
    Spark
    , and
    Delta Lake
    .
  • Familiarity with
    data APIs
    ,
    JSON
    , and unstructured data processing (OCR, text extraction).
  • Understanding of
    data versioning
    ,
    schema evolution
    , and
    data lineage
    concepts.
  • Interest in
    AI/ML data pipelines
    ,
    vector databases
    , and
    intelligent data systems
    .

BonusSkills

  • Experience with
    vector databases
    (e.g., Pinecone, Chroma, FAISS) or Databricks'
    Vector Search
    .
  • Exposure to
    LLM-based architectures
    ,
    LangChain
    , or
    Databricks Mosaic AI
    .
  • Knowledge of
    data governance frameworks
    ,
    Unity Catalog
    , or
    access control
    best practices.

Familiarity with
REST API development
or
data synchronization services
(e.g., Airbyte, Fivetran, custom connectors

  • St Louis, Missouri, United States

Sprachkenntnisse

  • English
Hinweis für Nutzer

Dieses Stellenangebot wurde von einem unserer Partner veröffentlicht. Sie können das Originalangebot einsehen hier.