AI Software Engineer
- Tampa, Florida, United States
- Tampa, Florida, United States
À propos
ABOUT THE ROLE
We are looking for a deeply technical engineer who lives and breathes AI infrastructure - someone who can build, deploy, and scale production LLM systems from bare metal to browser. This is not a prompt-engineering role. We need someone who understands how transformers actually work, can diagnose bottlenecks at the infrastructure level, and builds reliable, observable systems around fundamentally probabilistic models.
You will own the full lifecycle of AI model deployment and play a key role in ensuring seamless CI/CD, infrastructure reliability, security, and performance across our environments.
WHAT YOU'LL DO
- Design and operate high-availability LLM inference clusters using vLLM, SGLang, and NVIDIA Triton
- Build AI-powered tools and customer-facing products with React frontends and Python/FastAPI backends
- Manage Kubernetes clusters (k8s, k3s, RKE2) end-to-end: provisioning, networking, GPU operator configuration, and upgrades
- Establish and maintain CI/CD pipelines for model packaging, container builds, and automated deployments
- Evaluate, fine-tune, and benchmark open-weight models for specific downstream tasks
- Build RAG pipelines and agentic workflows using vector databases and tool-calling frameworks
- Instrument infrastructure with monitoring and observability tooling to surface latency, throughput, and resource metrics
- Deploy and maintain AI systems in compliance-sensitive environments (CMMC, FedRAMP, ITAR)
- Maintain documentation of architectures, configurations, and processes across projects
- Track and manage tasks across concurrent projects using Kanban tools (ClickUp, Jira)
REQUIRED SKILLS
AI / ML & Inference
- SGLang, vLLM, Ollama, OpenWebUI
- NVIDIA Triton Inference Server, NVIDIA NIM, NVIDIA NeMo, TensorRT
- CUDA, cuBLAS, cuDNN, NCCL (multi-GPU)
- Hugging Face Transformers, LangChain, LlamaIndex
- Model quantization: GGUF, AWQ, GPTQ
- Fine-tuning: LoRA / QLoRA
- LLM architecture: transformers, attention mechanisms, KV cache
- RAG pipelines, embeddings, and vector search
- Agent frameworks: function calling, tool use
- RLHF, DPO, and SFT concepts
- Multimodal models (vision + text)
- Model benchmarking: MMLU, HumanEval, MT-Bench
- AI safety, output filtering, and prompt engineering
Linux & Systems
- Linux (Ubuntu / RHEL / SLES), Bash, systemd
- Networking fundamentals: iptables, VLAN, BGP
- SELinux / AppArmor
Languages & Frameworks
- Python, JavaScript / TypeScript, React
- FastAPI / Flask, Node.js
- REST, WebSocket, and SSE APIs
- SQL (PostgreSQL / SQLite), Redis
- Vector databases: Milvus, Qdrant, pgvector
DevOps / CI/CD & Infrastructure
- Kubernetes (k8s), k3s, RKE2, Helm, Kustomize
- Rancher, ArgoCD, Flux CD
- Docker / Podman, container registries
- Ingress-NGINX / Traefik, cert-manager, MetalLB
- GitHub Actions, GitLab CI, Jenkins
- Terraform, Pulumi, Ansible
- Prometheus, Grafana, OpenTelemetry, ELK Stack
- Vault (secrets management)
NICE TO HAVE
- Multi-node tensor parallelism and pipeline parallelism
- Experience deploying AI in air-gapped or classified environments
- Open-source contributions to AI or inference tooling
- Distributed systems background (Raft, consensus, replication)
- Rust or Go for high-performance tooling
- Active DoD security clearance
Compétences linguistiques
- English
Cette offre provient d’une plateforme partenaire de TieTalent. Cliquez sur « Postuler maintenant » pour soumettre votre candidature directement sur leur site.