Staff Software Engineer — Search Platform, API & InfrastructureThomson Reuters • United States
Staff Software Engineer — Search Platform, API & Infrastructure
Thomson Reuters
- United States
- United States
Über
About the Role This is a high‑ownership, high‑leverage position at the intersection of platform engineering, API design, and cloud infrastructure. Staff Engineers on this team own the full lifecycle of what they ship—full‑stack ownership is the baseline, not a bonus. Delivery friction is treated as an engineering problem: the team ships to production constantly, AI‑assisted development is the norm, and removing obstacles to fast, safe delivery is everyone's responsibility. The successful candidate brings enterprise‑grade security instincts, deep AWS expertise, and a product‑mindful approach to developer experience—treating the platform’s API as a product in its own right.
Platform Control‑Plane API
Plan, design, develop, and own the platform’s management API—the self‑service interface through which client teams create and configure search systems, manage ingestion topologies, register reusable components, promote index versions, and monitor system health.
Architect the platform’s multi‑tenant access model: implement strict data isolation between client tenants, integrate with enterprise identity providers, establish role‑based access control across all API endpoints, and define the governance framework that ensures the platform can make credible security commitments to enterprise customers.
Establish API strategy and cross‑system integration patterns—designing versioned, backward‑compatible interfaces with clear contracts, comprehensive documentation, developer‑experience patterns drawn from best‑in‑class search platform providers, and set governance standards that the team follows for all future API surface.
Design and expose the API surface required to support the platform’s evaluation and experimentation workflows—include endpoints that enable the search grading tool to consume experiment run outputs, query/result pairs, and relevance judgments, and allow client teams to configure and trigger A/B search experiments through self‑service interfaces.
Design the configuration data model and persistence layer (DynamoDB and related services) that stores search system definitions, component registry entries, index lifecycle state, and audit logs—applying architectural patterns that scale to the platform’s multi‑tenant and multi‑region ambitions.
Break down complex business requirements into functional and technical requirements with consideration for security, ethical AI implementation, and operational efficiency; contribute to recommendations where technology transformation can spark business growth.
Cloud Infrastructure & DevOps
Own the platform’s AWS infrastructure as code—defining, provisioning, and maintaining ECS services, MSK clusters, OpenSearch/Vespa deployments, DynamoDB tables, networking (VPC, security groups, NAT), and IAM roles using Terraform or AWS CDK—establishing infrastructure governance standards and a cloud strategy for multi‑environment and eventual multi‑region operation.
Design and own the CI/CD pipeline for platform services—establishing DevOps culture and toolchain strategy for the team, with a clear mandate to eliminate delivery friction.
Drive adoption of AI‑assisted development practices across the team’s infrastructure and API work—establishing the tooling, patterns, and norms that enable engineers to leverage AI to move faster while maintaining the quality and reliability bar the platform demands.
Own infrastructure cost management: monitor AWS spend across platform components, evaluate architectural trade‑offs at the system level, and implement an enterprise performance and optimization framework that keeps the platform’s economics sustainable as it scales—including compute cost governance for inference workloads as custom model serving is introduced.
Implement and operate customer‑controlled encryption key (CMK) support—applying security strategy, risk assessment frameworks, and security governance to give enterprise clients control over their encryption keys while preserving multi‑tenant reliability.
Reliability Engineering
Define and own platform‑level SLOs covering API availability, query latency, ingestion throughput, and end‑to‑end document freshness—and build the monitoring infrastructure (CloudWatch, distributed tracing, alerting) that makes SLO compliance continuously visible to the team and to client teams.
Design the observability infrastructure for agentic retrieval paths—implement trace‑level instrumentation that captures tool invocation sequences, per‑hop latency, and retrieval inputs, enabling reliable diagnosis of failures and quality regressions in non‑deterministic agent workflows.
Take full operational responsibility for platform API and infrastructure—triage and resolve incidents, write thorough post‑mortems, and drive systematic improvements that prevent recurrence.
Design enterprise performance strategy for the platform’s API layer: load testing, capacity planning, performance profiling, and system‑level optimization—ensuring the platform can handle planned growth in tenants, content volumes, and query traffic.
Embed security architecture throughout the platform’s infrastructure: least‑privilege IAM, secrets management, encryption at rest and in transit, audit logging, and compliance implementation aligned with TR’s enterprise security requirements.
Technical Leadership
Establish architectural principles and cross‑system design patterns for the platform’s control plane and infrastructure—functioning as the technical authority that other engineers and teams turn to for API and infrastructure guidance.
Lead significant projects and business initiatives that span multiple engineers and interact with partner teams; determine work priorities and make adjustments to short‑term priorities while maintaining strategic focus; provide specialist advice to senior management on complex infrastructure and security issues.
Mentor and develop Senior and mid‑level engineers—providing coaching, technical direction, and educational opportunities in cloud infrastructure, platform API design, reliability engineering, and AI‑assisted development practices.
Engage with client teams as a technical partner—understanding their integration experience and pain points, feeding structured requirements back into the platform API roadmap, and proactively reducing time‑to‑value for new platform adopters.
Deliver effective presentations on complex infrastructure and security concepts to technical and non‑technical stakeholders; champion ethical AI practices and responsible technology deployment across the team’s work.
About You You’re an ideal fit if you have:
Required Experience
Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
8+ years of software engineering experience, with demonstrated progression to staff‑level or equivalent technical leadership—including ownership of a functional area and leadership of significant cross‑functional projects.
Deep expertise in cloud‑native platform and infrastructure engineering on AWS: VPC architecture, IAM, ECS, Lambda, DynamoDB, MSK, and related managed services—with hands‑on infrastructure‑as‑code experience (Terraform and/or AWS CDK) and the ability to establish infrastructure governance frameworks.
Production experience with OpenSearch, Vespa, or Elasticsearch at an operational level—cluster sizing, backup and restore, index lifecycle management, and multi‑tenant access controls.
Mastery of Python with strategic awareness of language selection and migration; strong software engineering fundamentals including testing architecture, security architecture, and system design.
Demonstrated enterprise security practice: security strategy, risk assessment frameworks, least‑privilege IAM, secrets management, encryption at rest and in transit, and compliance implementation in production cloud environments.
Track record of establishing API governance frameworks, cross‑system integration patterns, and documentation standards; experience designing multi‑tenant SaaS‑style platform APIs with versioning, access control, and first‑class developer experience.
Demonstrated reliability engineering ownership: SLO definition, observability implementation, on‑call leadership, and a track record of improving platform reliability through data‑driven retrospectives—spiritual that shipping frequently and operating reliably are complementary.
Comfort and fluency with AI‑assisted development tools; you use them to move faster and produce higher‑quality infrastructure and API code, and you actively help the team do the same.
Preferred Experience
Experience operating Kafka (MSK) or other distributed messaging infrastructure in production—including partition management, consumer group monitoring, and schema registry governance.
Background in Kubernetes or ECS container orchestration—including service mesh, autoscaling, and health check patterns.
Experience building developer‑facing internal platforms where API quality and documentation are treated as first‑class product concerns.
Knowledge of enterprise encryption patterns, including customer‑managed keys (AWS KMS) and their architectural implications for multi‑tenant systems.
Familiarity with distributed tracing infrastructure for non‑deterministic or agentic workflows—where trace design must capture tool call sequences and per‑hop context, not just request/response pairs.
Familiarity with AI service architecture: evaluating AI vendors, cost‑benefit analysis, and integrating AI API services with fallback strategies into production platform infrastructure.
What Success Looks Like
Build a thorough understanding of the platform’s current infrastructure, API surface, and operational posture—including known gaps in reliability, security, and developer experience.
Establish relationships with key client teams to understand their integration experience and pain points with the current platform.
Take on‑call ownership for your functional area and identify and begin delivering the highest‑leverage near‑term improvements to platform API or infrastructure reliability.
In The First Year
Deliver a materially improved self‑service platform API—with strong multi‑tenant isolation, documented governance standards, and measurably better developer experience for client teams.
Establish end‑to‑end SLO coverage across platform services, with automated alerting, clear on‑call runbooks, documented architectural decision records, and a track record of fast, high‑quality incident resolution.
Own and deliver a major infrastructure initiative—CMK support, multi‑environment maturity, agentic observability infrastructure, or a comparable project—from architectural design through production, establishing the principles and patterns that guide the platform’s infrastructure evolution.
Become the recognized technical authority for platform API and infrastructure—shaping team standards, influencing platform architecture, and providing specialist guidance to leadership on complex infrastructure and security challenges.
What’s in it For You
Hybrid Work Model: 2‑3 days a week in the office, with a flexible hybrid working environment.
Flexibility & Work‑Life Balance: policies that support personal and professional responsibilities, including work from anywhere for up to 8 weeks per year.
Career Development and Growth: continuous learning and skill development, with programs that support growth and advancement in an AI‑enabled future.
Industry Competitive Benefits: comprehensive benefit plans—vacation, mental health days, Headspace app, retirement savings, tuition reimbursement, incentive programs, and resources for wellbeing.
Culture: award‑winning reputation for inclusion, belonging, flexibility, and work‑life balance.
Social Impact: two volunteer days off annually and opportunities to support ESG initiatives.
Making a Real‑World Impact: supporting clients in upholding justice, truth, and transparency through trusted information services.
Equal Employment Opportunity Statement Thomson Reuters is an equal employment opportunity employer. We offer a drug‑free workplace. We comply with all applicable laws and regulations to ensure a fair and inclusive hiring process for all candidates regardless of race, color, sex/gender, sexual orientation, disability, or any other protected classification.
#J-18808-Ljbffr
Sprachkenntnisse
- English
Hinweis für Nutzer
Dieses Stellenangebot stammt von einer Partnerplattform von TieTalent. Klick auf „Jetzt Bewerben”, um deine Bewerbung direkt auf deren Website einzureichen.