AI QA ENGINEER

Staffworxs Inc

United States

United States

Postuler Maintenant

À propos

At Staffworxs, we don't just connect talent - we power transformation. Headquartered in Frisco, TX, with teams in Bengaluru and Hyderabad, we combine global reach with deep expertise. Our Digital & Data Analytics practice drives growth and innovation for some of the world's top brands, who continue to retain us as their trusted partner. If you're ready to make an impact, you're in the right place.
Job Details:
Title: AI QA ENGINEER Duration: Long term Location: Dallas, TX (Need locals - Weekly 3-4 days Hybrid) - 2 ROUNDS FACE TO FACE Mode of interview: One teams, One face to face (MUST NEED 10 years of exp)
Must have: Agentic QA Engineer - Generative AI & Agentic Systems (Agent, Multi-Agent Testing)
Summary We are seeking a hands-on AI Engineer to design and execute end-to-end testing strategies for
agentic AI solutions , including
multi-agent systems
in production-grade environments. This role partners with the
Agentic Operations Team
to ensure
resiliency, reliability, accuracy, latency, orchestration correctness, and scale . You will establish QA frameworks, build
reusable test artifacts , drive
macro-level validations
across complex workflows, and lead the QA function for Agentic AI from
Dev to Prod .
Key Responsibilities Quality Strategy & Leadership Agentic & Multi-Agent Testing Reliability, Resiliency, and Latency Accuracy & Macro-Level Validations Scale & Orchestration Dev → Prod Readiness Define and own the QA strategy for agentic/multi-agent AI systems across dev, staging, and prod. Mentor a team of QA engineers; establish testing standards, coding guidelines for test harnesses, and review practices. Partner with Agentic Operations, Data Science, MLOps, and Platform teams to embed QA in the SDLC and incident response. Design tests for
agent orchestration ,
tool calling ,
planner-executor loops , and
inter-agent coordination
(e.g., task decomposition, handoff integrity, and convergence to goals). Validate
state management ,
context windows ,
memory/knowledge stores , and
prompt/graph correctness
under varying conditions. Implement
scenario fuzzing
(e.g., adversarial inputs, prompt perturbations, tool latency spikes, degraded APIs). Create
resilience testing
suites: chaos experiments, failover, retries/backoff, circuit-breaking, and degraded mode behavior. Establish
latency SLOs
and measure end-to-end response times across orchestration layers (LLM calls, tool invocations, queues). Ensure
reliability
through soak tests, canary verifications, and automated rollbacks. Define
ground-truth
and
reference pipelines
for task accuracy (exact match, semantic similarity, factuality checks). Build
macro validation
frameworks that validate task outcomes across multi-step agent workflows (e.g., complex data pipelines, content generation + verification agent loops). Instrument
guardrail
validations (toxicity, PII, hallucination, policy compliance). Design
load/stress
tests for multi-agent graphs under scale (concurrency, throughput, queue depth, backpressure). Validate
orchestrator correctness
(DAG execution, retries, branching, timeouts, compensation paths). Engineer
reusable test artifacts
(scenario configs, synthetic datasets, prompt libraries, agent graph fixtures, simulators). Integrate tests into CI/CD (pre-merge gates, nightly, canary) and
production monitoring
with alerting tied to KPIs. Define release criteria and run
operational readiness
(performance, security, compliance, cost/latency budgets). Build post-deployment
validation playbooks
and incident triage runbooks.
Required Qualifications
7+ years in
Software QA/Testing , with 2+ years in
AI/ML or LLM-based systems ; hands-on experience testing
agentic/multi-agent
architectures. Strong programming skills in
Python
or
TypeScript/JavaScript ; experience building test harnesses, simulators, and fixtures. Experience with
LLM evaluation
(exact/soft match, BLEU/ROUGE, BERTScore, semantic similarity via embeddings),
guardrails , and
prompt testing . Expertise in
distributed systems testing
latency profiling, resiliency patterns (circuit breakers, retries),
chaos engineering , and message queues. Familiarity with
orchestration frameworks
(LangChain, LangGraph, LlamaIndex, DSPy, OpenAI Assistants/Actions, Azure OpenAI orchestration, or similar). Proficiency with
CI/CD
(GitHub Actions/Azure DevOps),
observability
(OpenTelemetry, Prometheus/Grafana, Datadog), and
feature flags/canaries . Solid understanding of
privacy/security/compliance
in AI systems (PII handling, content policies, model safety). Excellent communication and leadership skills; proven ability to work cross-functionally with Ops, Data, and Engineering.
Preferred Qualifications
Experience with
multi-agent simulators ,
agent graph testing , and
tooling latency emulation . Knowledge of
MLOps
(model versioning, datasets, evaluation pipelines) and
A/B experimentation
for LLMs. Background in
cloud
(AWS),
serverless ,
containerization , and
event-driven
architectures. Prior ownership of
cost/latency/SLAs
for AI workloads in production.
Staffworxs is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive workplace for all employees, regardless of race, color, religion, gender, sexual orientation, national origin, age, disability, or veteran status.

United States

Compétences linguistiques

English

Avis aux utilisateurs

Cette offre provient d’une plateforme partenaire de TieTalent. Cliquez sur « Postuler maintenant » pour soumettre votre candidature directement sur leur site.

Postuler Maintenant