Dieses Stellenangebot ist nicht mehr verfügbar
Backend Engineer
PolarGrid
- Mississauga, Ontario, Canada
- Mississauga, Ontario, Canada
Über
We're seeking a Backend Engineer to build and scale our edge inference infrastructure. You'll architect distributed compute systems handling GPU-accelerated AI workloads across edge nodes with sub-10ms latency requirements.
Core Responsibilities
Infrastructure Engineering
Design and implement Kubernetes-native distributed compute platforms
Build GPU resource management and allocation systems
Develop edge deployment pipelines with automated testing
Create high-performance inference serving infrastructure
Backend Systems
Architect microservices for distributed model serving
Implement API gateways with OpenAI and Hugging Face-compatible endpoints
Build dynamic resource allocation and load balancing
Design multi-backend systems with mutual exclusivity enforcement
Performance & Optimization
Optimize GPU memory utilization and inference latency
Implement streaming inference with TensorRT acceleration
Build comprehensive monitoring and observability systems
Design automatic scaling based on workload patterns
Required Technical Skills
Core Infrastructure
Kubernetes: Production experience with cluster management, resource allocation, networking
Containerization: Docker, container security, multi-stage builds, optimization
Distributed Systems: Service mesh, load balancing, distributed consensus, fault tolerance
Cloud: GitOps, infrastructure as code, AWS, CDK
Backend Development
Languages: TypeScript, Go, Python, or Rust
APIs: RESTful services, gRPC, WebSocket streaming, rate limiting
Databases: Distributed databases, caching systems, data consistency
Message Queues: Kafka, Redis, SQS, distributed event systems
AI Inference Infrastructure
GPU Computing: NVIDIA CUDA, TensorRT, GPU memory management
AI/ML Serving: Triton Inference Server, model optimization, batch processing
Performance: Latency optimization, throughput tuning, resource profiling
Preferred Experience
Infrastructure Platforms
Edge computing deployments
Multi-region distributed systems
Hardware acceleration (GPUs)
Container security (Kata, gVisor)
Monitoring & Operations
Prometheus, Grafana, distributed tracing
SRE practices, incident response
Capacity planning, cost optimization
Automated testing and deployment
What You'll Build
Edge Inference Platform
Multi-tenant GPU inference clusters serving 10,000+ concurrent requests
Sub-10ms latency requirements with geographic distribution
Automatic model loading and resource optimization
Comprehensive health monitoring and alerting
Backend Architecture
Microservices handling model lifecycle management
API gateway with authentication and rate limiting
Dynamic backend switching (Python/TensorRT-LLM)
Streaming inference with WebSocket support
DevOps Infrastructure
Kubernetes operators for inference workload management
Automated testing covering performance and reliability
GitOps deployment with rollback capabilities
Cloud and edge resource monitoring and cost optimization
Sprachkenntnisse
- English
Hinweis für Nutzer
Dieses Stellenangebot wurde von einem unserer Partner veröffentlicht. Sie können das Originalangebot einsehen hier.