XX
Senior AI Systems Engineer – OllamaFreelanceJobsCanada

Dieses Stellenangebot ist nicht mehr verfügbar

XX

Senior AI Systems Engineer – Ollama

FreelanceJobs
  • CA
    Canada
  • CA
    Canada

Über

# Senior AI Systems Engineer – Ollama / Multi-Agent / RAG Architecture (Phased Build)
## Project Overview
I am building a secure, local AI infrastructure for my company using **Ollama** and open-weight models. This system will serve as an internal knowledge engine and structured workflow assistant, with gradual scaling to 5–10 users over time.
This is not a chatbot project.
This is an infrastructure build.
The engagement will be structured in defined phases with clear deliverables. We will begin with architecture and foundation, then move into RAG pipelines, then agent orchestration and optimization.
Environment:
* macOS-based system (Apple Silicon – high RAM configuration)
* Long-term document corpus (15+ years financial + operational data)
* Focus on privacy-first local deployment
* Potential hybrid cloud later (optional)
---
# Phase 1 – Architecture & Foundation
**Objective:** Establish stable, optimized local LLM environment.
### Deliverables:
* Ollama configuration and optimization
* Model benchmarking and selection:
* General reasoning model
* Coding/tooling model
* Lightweight fast-response model
* GPU/CPU optimization strategy
* Memory tuning and concurrency planning
* Documentation of architecture decisions
* Recommended WebUI (Open WebUI / AnythingLLM / custom stack)
### Technical Requirements:
* Experience running models like Llama 3.x, Mixtral, Qwen, DeepSeek, Mistral, etc.
* Understanding of quantization strategies (Q4_K_M, Q8_0, etc.)
* Context window tradeoffs and rope scaling considerations
* macOS performance tuning experience preferred
* Familiarity with model benchmarking methodology (latency, tokens/sec, memory footprint)
---
# Phase 2 – RAG System Design & Deployment
**Objective:** Build structured retrieval system for business records.
Corpus includes:
* PDFs (financial records)
* Excel exports
* Outlook email exports
* Meeting transcripts
### Deliverables:
* Document ingestion pipeline
* Chunking strategy (with rationale)
* Metadata schema design
* Embedding model selection
* Vector database setup (Chroma / Qdrant / Weaviate / etc.)
* Query evaluation & hallucination mitigation strategy
* Retrieval performance testing (precision/recall validation approach)
* Documentation
### Must Demonstrate Experience With:
* LlamaIndex, LangChain, or custom RAG pipelines
* Embedding dimensionality tradeoffs
* Hybrid search (semantic + keyword)
* Re-ranking strategies
* Handling large historical datasets efficiently
* Evaluation methodology (not just "it seems to work")
---
# Phase 3 – Agent Architecture
**Objective:** Build structured AI agents for defined workflows.
Initial Use Cases:
* Vendor history lookup across 15+ years
* Financial pattern queries
* Internal "decision style" assistant (knowledge-based behavioral agent)
* Meeting transcript analysis and summarization
* Multi-step task agents
### Deliverables:
* Agent orchestration framework (LangGraph / CrewAI / custom)
* Tool calling integration
* Structured output schemas
* Guardrails and boundary enforcement
* Logging & observability
* Security boundaries between data domains
Preference for candidates with experience designing:
* Multi-agent systems
* Tool-augmented agents
* Deterministic workflow + LLM hybrid systems
* Memory management strategies (short-term vs long-term memory separation)
---
# Phase 4 – Optimization & Multi-User Scaling
**Objective:** Prepare system for internal team usage.
### Deliverables:
* Concurrent user planning
* Resource allocation strategy
* Security segmentation
* Backup and recovery strategy
* Optional hybrid architecture evaluation
* Documentation for internal team use
---
# Small Paid Technical Test (Required Before Full Engagement)
Before beginning Phase 1, shortlisted candidates will complete a **paid technical test**.
### Test Scope:
* Deploy a small Ollama environment
* Implement a basic RAG pipeline over a provided document set
* Explain chunking strategy and embedding choice
* Provide performance metrics (latency, memory usage)
* Submit brief architecture explanation (1–2 pages)
Time expectation: 4–6 hours
Compensation: Paid fixed amount
Purpose: Validate real-world implementation capability, not theoretical knowledge.
---
# Skills Screening Checklist (Must Address in Proposal)
Please respond clearly and specifically:
1. What local LLM systems have you deployed in production or near-production environments?
2. Which models do you prefer for:
* General reasoning
* Structured extraction
* Coding
* Embeddings
Why?
3. Explain your chunking strategy for large financial document sets.
4. How do you measure and mitigate hallucination in RAG systems?
5. What vector databases have you used and why choose one over another?
6. Have you optimized Ollama specifically? Provide details.
7. Describe a multi-agent architecture you have built.
8. How would you structure long-term memory for an internal knowledge assistant?
9. What logging/observability stack do you recommend?
10. Provide links or summaries of relevant builds.
Generic or AI-generated answers without depth will not be considered.
---
# Red Flag Filter
Please do NOT apply if:
* Your experience is limited to prompt engineering.
* You have only used hosted APIs (OpenAI/Anthropic) and not local deployments.
* You have not built a working RAG pipeline from scratch.
* You cannot explain embedding tradeoffs or chunking methodology.
* You cannot discuss performance metrics in concrete terms.
* You have not worked with vector databases directly.
* You are unfamiliar with quantization or model memory constraints.
This project requires systems-level thinking and implementation experience.
---
# Engagement Structure
* Phase-based contract
* Fixed deliverables per phase (preferred)
* Paid technical validation before Phase 1
* Remote
* Ongoing advisory role possible
This is a serious AI infrastructure build, not a prompt engineering exercise.
Looking for someone who understands systems architecture, performance tuning, and long-term maintainability.
Contract duration of 3 to 6 months. with 40 hours per week.
Mandatory skills: Python, Artificial Intelligence, Machine Learning, Amazon Web Services, Java, Software Architecture & Design, macOS, Linux
  • Canada

Sprachkenntnisse

  • English
Hinweis für Nutzer

Dieses Stellenangebot wurde von einem unserer Partner veröffentlicht. Sie können das Originalangebot einsehen hier.