hackajob is partnering with mthree to fill this position. Create a profile to be automatically considered for this role—and others that match your experience.
Role
As part of the Data Loss Prevention (DLP) cybersecurity team you will be involved with helping a shift from classical Monitoring towards an observability model of metrics, diagnostic logging, distributed tracing, scalability, SLO/SLI based alerting with utilising the SRE tooling stacks available, primarily focused on driving improvements with building functional and usable Telemetry dashboards for the DLP Cybersecurity stack.
This is an opportunity for those that wish to change career track into cybersecurity and also those who have cybersecurity background and wish to build on their skills.
Team
The Data Leakage Prevention Squad comprises approximately 30 engineers, operators, and agile practitioners distributed globally. The squad is responsible for the architecture, build, deployment and operation of the Firm’s global DLP infrastructure. The squad has a strong Agile/DevOps culture and supports the Firm’s cloud-first strategy by focusing on cloud security controls as well as traditional onprem DLP.
Duties will involve but not limited to:
· Review, Write, and Optimise PromQL queries for Prometheus.
· Operate, Troubleshoot, and Optimise Prometheus in agent mode.
· Review and Craft Grafana dashboards following best practices, such as the Four Golden Signals or RED methodology.
· Review and Craft Splunk dashboards following best practices, such as the Four Golden Signals or RED methodology.
· Revise alerting to reduce noise and false positives, determining any alerting gaps.
· Revise PagerDuty alerting rules and orchestration to reduce noise and false positives, determining any alerting gaps.
· Collaborate with the DLP squads on enhancing current alerting standards that follows SRE best practices.
· Innovate and improve with practical application on continuous enhancements of our monitoring systems.
· Building DLP squad actionable insights from telemetry data.
· Be part of a rota for the 24/7 support of DLP products.
Skills
Must Have:
· Critical thinking ability and a proactive approach to identifying and resolving issues.
· Have a track record with establishing microservice SLO and managing error budgets.
· Excellent communications and collaboration skills to work effectively with the squads.
· Experienced in the application of SRE principles.
· 3+ years of Prometheus experience, including Prometheus architecture, Prometheus exporters, and PromQL.
· 3+ years of Grafana skills.
· 3+ years on Splunk.
· Excellent knowledge of observability, especially metrics and dashboarding.
· Fluent in programming or scripting language.
· Experienced in the use of CI/CD tools (e.g. Bitbucket, Jenkins, etc.)
· Experienced in cloud platforms (AWS or similar) or/and an UNIX environment.
Would be nice to have experience:
· With any product that deals with incident, problem and change management.
· Automation.
· Cybersecurity.
· DLP product.
· Working in an Agile Environment.
· Operational Environment.
hackajob is partnering with mthree to fill this position. Create a profile to be automatically considered for this role—and others that match your experience.
Level up the hackajob way. Verify your skills, learn brand new ones and test your ability with Pathways, our learning and development platform.