DevOps & AI Infrastructure Engineer

FreelanceJobs

Canada

Canada

Ähnliche Jobs finden

Über

We have completed development of a social media chat application with AI-powered features, and we are now preparing for full-scale production deployment.
We are seeking an experienced DevOps & AI Infrastructure Engineer who can take complete ownership of:
Production deployment
Cloud infrastructure setup
Auto-scaling architecture
GPU/AI model management
Ongoing monitoring & daily operational support
This role requires hands-on execution, guidance, and long-term support to ensure a smooth, stable, and scalable launch.
Scope of Work:::::
Production Deployment
Deploy backend services, APIs, databases, and AI services to production
Configure secure cloud infrastructure (AWS / GCP / Azure preferred)
Set up CI/CD pipelines for continuous deployment
Ensure zero-downtime deployment strategies
Infrastructure & Auto-Scaling
Design and implement auto-scaling architecture
Configure load balancers
Set up container orchestration (Docker + Kubernetes preferred)
Optimize performance for high concurrent users
Plan infrastructure for scaling from thousands to millions of users
AI Models & GPU Management
Deploy and manage AI models in production
Configure and manage GPU instances
Optimize inference performance and latency
Monitor AI workloads and resource utilization
Coordinate with AI team for updates, fine-tuning, and scaling requirements
Monitoring & Reliability
Implement monitoring systems (Prometheus, Grafana, Datadog, etc.)
Set up alerts and logging systems
Handle incident response and troubleshooting
Daily health checks of production systems
Ensure high availability and uptime (99.9%+ target)
Security & Compliance
Configure secure networking (VPC, firewalls, private subnets)
Manage SSL, encryption, and secrets handling
Implement best practices for infrastructure security
Protect AI models and user data
Ongoing Support & Guidance
Provide daily operational oversight
Proactively detect and fix issues
Guide team in infrastructure decisions
Assist in performance optimization
Help plan future scaling strategies
This is not a one-time deployment — we need someone who will support us during and after launch.
Required Skills
Strong experience with AWS / GCP / Azure
Kubernetes & Docker expertise
Experience deploying AI/ML models in production
GPU infrastructure management (NVIDIA, CUDA environments)
CI/CD pipeline setup
Infrastructure as Code (Terraform preferred)
Experience handling high-traffic applications
Strong monitoring & observability experience
Database scaling (PostgreSQL / MongoDB / Redis)
Preferred Experience
Experience deploying real-time chat applications
Experience with AI inference pipelines
Experience optimizing model latency and GPU cost
Experience with WebSocket-based applications
Knowledge of autoscaling GPU clusters
Engagement Model
Immediate start
Full deployment support required
Daily monitoring & long-term collaboration
Must be available for real-time communication during critical deployment phase
Our Expectations
We need someone who:
Takes ownership
Thinks proactively
Solves problems independently
Guides us technically
Acts like a technical partner, not just a freelancer
Contract duration of 3 to 6 months.
Mandatory skills: Cloud Infrastructure (AWS / GCP / Azure), Kubernetes (K8s) Orchestration, Docker Containerization, GPU Infrastructure & NVIDIA CUDA Management, AI/ML Model Deployment & Optimization, CI/CD Pipeline Implementation, Infrastructure as Code (Terraform), Auto-Scaling & Load Balancing, Monitoring & Observability (Prometheus, Grafana, ELK), Database Scaling & Performance Optimization (PostgreSQL / MongoDB / Redis)

Canada

Sprachkenntnisse

English

Hinweis für Nutzer

Dieses Stellenangebot wurde von einem unserer Partner veröffentlicht. Sie können das Originalangebot einsehen hier.

Ähnliche Jobs finden