Principal Machine Learning Engineer - Production SystemsSoftInWay UK Ltd. • London, England, United Kingdom

Apply Now

Principal Machine Learning Engineer - Production Systems

SoftInWay UK Ltd.

London, England, United Kingdom

London, England, United Kingdom

Apply Now

About

Principal Machine Learning Engineer – Production Systems Overview SoftInWay UK Ltd. Is seeking a highly experienced
ML Systems Architect
to design and implement a scalable, production-grade architecture for our machine learning solver. This role bridges research prototypes and commercial deployment, ensuring reliability, maintainability, and performance in a mixed technology stack.
Responsibilities
Architect the ML Solver Platform :
Define modular architecture for data preprocessing, model execution, and post-processing.
Establish clear API contracts between Python/TensorFlow and C# services.
Productionize ML Workflows :
Convert research code into robust, testable, and observable services.
Implement CI/CD pipelines, automated testing, and reproducibility standards.
Integration & Interoperability :
Design REST/gRPC endpoints for cross-language communication.
Ensure compatibility with C#/.NET services.
Performance & Scalability :
Optimize GPU/CPU utilization, batching strategies, and memory management.
Plan for multi-model and multi-tenant scenarios.
MLOps & Lifecycle Management :
Implement model versioning, artifact registries, and deployment workflows.
Set up monitoring, logging, and alerting for solver performance.
Security & Compliance :
Apply best practices for secrets management, dependency scanning, and secure artifact storage.
Required Skills & Experience
ML Frameworks : Expert in TensorFlow (TF2/Keras), experience with ONNX Runtime for inference.
Programming : Advanced Python for ML; strong understanding of packaging, type checking, and performance profiling.
Architecture : Proven experience designing scalable ML systems for production.
APIs : Proficiency in gRPC/Protobuf and REST for cross-language integration.
MLOps : CI/CD pipelines, containerization (Docker/Kubernetes), model registries, reproducibility.
Performance Optimization : GPU acceleration (CUDA/cuDNN), mixed precision, XLA, profiling.
Observability : Metrics, tracing, structured logging, dashboards.
Security : SBOM, image signing, role-based access, vulnerability scanning.
Preferred Qualifications
Experience with ONNX Runtime Training, PyTorch, or hybrid ML architectures.
Familiarity with distributed training strategies and multi-GPU setups.
Knowledge of feature stores and data validation frameworks.
Exposure to regulated environments and compliance frameworks.
Tools & Technologies
ML : TensorFlow, ONNX Runtime, tf2onnx.
APIs : FastAPI, gRPC.
DevOps : GitLab CI/GitHub Actions, Docker, Kubernetes.
Monitoring : Prometheus, Grafana, OpenTelemetry.
Security : HashiCorp Vault, Sigstore.
Why Join Us?
Work on cutting-edge ML solutions integrated into commercial engineering software.
Define architecture that scales across global deployments.
Collaborate with a team of experts in ML, software engineering, and UI development.
Competitive salary and benefits.
To apply: Send your resume and a brief cover letter to HR@softinway.com
#J-18808-Ljbffr

London, England, United Kingdom

Languages

English

Notice for Users

This job comes from a TieTalent partner platform. Click "Apply Now" to submit your application directly on their site.

Apply Now