Infrastructure & Observability EngineerDrawbridge Digital • United States
Cette offre d'emploi n'est plus disponible
Infrastructure & Observability Engineer
Drawbridge Digital
- United States
- United States
À propos
*Location:* Remote – must be located in the greater NJ/NYC area (1–2 days per month on-site in New Jersey)
*About the Role*
We're looking for an experienced infrastructure engineer to design and implement centralized monitoring, logging, and alerting systems across our hybrid environment spanning cloud services and physical colocation facilities. But this isn't just about building dashboards—you'll use the insights you gather to drive real improvements in performance, reliability, and efficiency across our infrastructure. You'll also contribute to strategic planning for future infrastructure and operations initiatives, helping shape the direction of our entire environment.
This role supports a 24-hour production environment, so you'll also be involved in day-to-day operations—helping ensure our systems remain stable, performant, and available around the clock.
*What You'll Do*
* Architect and deploy centralized monitoring and log aggregation solutions across cloud and on-premises infrastructure
* Design and implement alerting systems for critical infrastructure events, ensuring the right people are notified at the right time
* Support day-to-day operations of infrastructure serving a 24/7 production environment, including troubleshooting, maintenance, and capacity management
* Establish observability standards, dashboards, and runbooks to support operations and incident response
* Analyze monitoring data to identify performance bottlenecks, inefficiencies, and opportunities for optimization
* Contribute to long-term infrastructure planning, including capacity forecasting, technology roadmaps, and operational improvements
* Create and maintain technical documentation, including system architecture diagrams, standard operating procedures, and emergency response playbooks
* Partner with infrastructure, operations, and engineering teams to implement improvements based on observability insights
* Drive continuous improvement initiatives that enhance system reliability, reduce costs, and improve performance
* Evaluate and integrate tooling that fits our hybrid environment needs
* Participate in an on-call rotation, including occasional overnight shifts, to respond to critical infrastructure incidents
*What We're Looking For*
* 3+ years of experience in infrastructure, site reliability, or systems engineering roles
* Hands-on experience with monitoring and observability platforms (e.g., Prometheus, Grafana, Datadog, ELK stack, Splunk, or similar)
* Strong understanding of networking fundamentals and server infrastructure in both cloud and physical datacenter environments
* Experience with Linux server administration, Ceph storage clusters and highly available database clusters
* Experience building alerting frameworks that balance signal quality with noise reduction
* Demonstrated ability to translate monitoring insights into actionable infrastructure improvements
* Strong technical writing skills—you'll be documenting systems, procedures, and emergency protocols
* Strong collaboration and communication skills—you'll be working across teams to drive change
* Comfortable working independently in a remote environment while collaborating effectively with distributed teams
* Must reside in the greater NJ/NYC metropolitan area
* Ability to commute to New Jersey 1–2 days per month for team meetings
* Able to occasionally travel to customer locations to support on-site projects
*Compensation & Benefits*
Full-Time:
* $95,000 – $140,000 annually, based on experience
* Health insurance
* 401(k)
* On-call compensation
*About Us*
We are a veteran-owned company. While all qualified candidates are encouraged to apply, we particularly welcome applications from veterans. Military experience in signal, communications, and IT fields translates well to this role—backgrounds such as Army 25B (IT Specialist), 25 series, 18E (Special Forces Communications Sergeant), or 26A (Network Development Officer), as well as equivalent roles in other branches, provide strong foundational skills for this position.
We are an equal opportunity employer and consider all qualified applicants without regard to race, color, religion, sex, national origin, age, disability, veteran status, sexual orientation, gender identity, or any other protected characteristic.
Pay: $95,000.00 - $140,000.00 per year
Benefits:
* 401(k)
* 401(k) matching
* Health insurance
* Paid time off
Work Location: Remote
Compétences linguistiques
- English
Avis aux utilisateurs
Cette offre a été publiée par l’un de nos partenaires. Vous pouvez consulter l’offre originale ici.