About
Are you an experienced SRE or DevOps engineer? Do you want the freedom to work remotely and want to grow in the new field of site reliability at an internationally successful software and education company? Well, than take our reliability to the next level as part of our Site Reliability Engineering team.
(!)
Please note that this position is initially limited to a 12-month contract.
Your new dream job
Automation and Infrastructure as Code (IaC) : You automate repetitive tasks, deployments, and system management to reduce human error and improve efficiency. This might involve creating scripts, CI/CD pipelines, or automating infrastructure provisioning.
Reliability and Performance Optimization : You continuously improve the system uptime by identifying bottlenecks and optimizing system architecture.
Capacity Planning and Scaling : You assess and predict system resource requirements (CPU, memory, storage) to ensure the infrastructure can scale with increasing demand. Implement auto-scaling solutions to handle load spikes without human intervention, ensuring systems remain performant under various conditions.
System Monitoring and Incident Response : Continuously monitor system performance, uptime, and reliability using tools like Prometheus, Grafana, or ElasticSearch. The goal is to detect and respond to issues before they impact users. Manage and respond to incidents, outages, and failures quickly, aiming to minimize downtime. This includes managing incident documentation, communication, and post-incident analysis.
Incident Postmortems and Continuous Improvement : Conduct root cause analysis (RCA) after incidents to identify what went wrong and how to prevent similar issues in the future. Implement fixes, improvements, and best practices based on learnings from postmortems to increase system reliability and reduce future incidents.
Benefits
Work in partner's
coworking spaces
or in your
home office , as long as you can guarantee uninterrupted internet access
Regular
further education
The
stability
of an extremely successful German high-tech company that is
funded by its successful product
and not by investors
Outcome focused teams and a culture of direct feedback
Modern equipment:
Thinkpad or MacBook
International,
collaborative team
with strong cohesion
Spectacular team events
in various European countries
Autonomy
from day one
Contribution to the
retirement scheme
Work in your
team on a first-name basis , without a dress code, and at eye level
Flexible working hours
from Mondays to Fridays (core working hours from 10AM to 4PM)
Requirements
Communication Mastery : You communicate precisely and in a recipient-friendly manner. You diffuse potential conflicts with sensitivity and a solution-oriented approach. You always strike the right tone with stakeholders, developers and your team, even under time pressure, and can seamlessly switch from German to English if necessary.
Collaboration Wizardry : You collaborate with developers, stakeholders and operations and bring everyone on the same page. You understand the challenges of different teams and find solutions that benefit the entire company.
Automation Sorcery : You promote automation as a way to save time and reduce errors, and implement tools that improve productivity across the team.
Problem-Solving Genius : You dive deep into problems, identify root causes and come up with solutions that prevent future incidents.
Self-organization : You thrive on autonomy and excel at organizing and structuring complex projects while working from home.
Tech stack :
Kubernetes / Container Technology
CI/CD (Github Workflows, Helm, Kustomize)
Cloud Services (preferably Google, but others are also okay)
Excellent spelling and grammar in German
PHP language experience would be a plus
Minimum 3 years of experience in IT operations
Ability to take ownership and work independently
Strong planning and prioritization skills
Passion for finding solutions for complex problems
Your typical day at Digistore24
Morning video call to talk to your team about yesterday's progress and today's plans.
You like to work in a
structured way
and outline your daily routine and daily goals. Like every day, you block out enough time to work on the continuous development of our SRE processes. You are not alone in this, but can count on the support of your team.
Now it's time for the
daily call with your team . You report on your priorities and blockers and receive tangible tips on how to solve your challenges.
For the next few hours, you allow yourself the luxury of turning off all messengers in order to
develop focused ideas
for improvements in auto-scaling, monitoring and alerting. You then test your ideas in practice. You make a note of these success principles so that you can present them to the Head of IT Operations in a one-on-one call.
After your lunch break, a developer needs help with a new CI/CD workflow. You discuss the requirements with him and provide him with an initial prototype.
You take the ticket to check the resource allocation of an application, check the current utilization and adjust the deployment.
You find an endpoint that is not yet included in the monitoring. After creating a ticket for this, you immediately write the code in the Terraform project to add it.
This position is NOT for you if
You do not identify with our values
You have less than 3 years of experience in IT operations
You cannot take ownership and need to discuss every detail with your supervisor or colleagues
You have difficulty planning and prioritizing your tasks
You don't like to find solutions for complex problems
#J-18808-Ljbffr
Languages
- English
Notice for Users
This job comes from a TieTalent partner platform. Click "Apply Now" to submit your application directly on their site.