Site Reliability Engineer/Senior Consultant Specialist

Pune, Maharashtra, India

Cloud Engineer Operations Engineer Site Reliability Engineer Platform Engineer DevOps Engineer

Actively hiring

Site Reliability Engineer/Senior Consultant Specialist

HSBC

Pune, Maharashtra, India

Cloud Engineer Operations Engineer Site Reliability Engineer Platform Engineer DevOps Engineer

HSBC

Actively hiring

hackajob is partnering with HSBC to fill this position. Create a profile to be automatically considered for this role—and others that match your experience.

Some careers shine brighter than others.

If you’re looking for a career that will help you stand out, join HSBC and fulfil your potential. Whether you want a career that could take you to the top, or simply take you in an exciting new direction, HSBC offers opportunities, support and rewards that will take you further.

HSBC is one of the largest banking and financial services organisations in the world, with operations in 64 countries and territories. We aim to be where the growth is, enabling businesses to thrive and economies to prosper, and, ultimately, helping people to fulfil their hopes and realise their ambitions.

We are currently seeking an experienced professional to join our team in the role of Senior Consultant Specialist

In this role, you will:

Ensure the availability and maintainability of our large-scale API and Microservices platform located across three points of presence in HK, UK, and the US.
Continuously improve the reliability, capacity, and performance of our platforms by applying SRE principles and practices to drive scale, enhance observability, reduce toil, more accurately measure risk, and more safely enable business driven change.
Elevate our expertise and maturity in safely managing our core technology stack underpinned by AWS, Kubernetes, Kong API gateway, Mulesoft API, Istio Service Mesh, and a host of supporting services in a hybrid hosting environment (i.e., private/public cloud & on-prem).
Develop best in class observability tools and techniques enabling monitoring and alerting capability which facilitate not only incident detection and response, but also capacity management, improved release safety, and greater resource efficiency.
Investigate, triage, and resolve production incidents and use data to articulate impact with relentless attention to the technical signals and underlying root causes that enable remediation and future avoidance/mitigation.
Contribute to the design and engineering of auto and self-healing capability for known failure modes across our platforms.Contribute code to our platform repositories enabling not only our reliability agenda (e.g., monitoring-as-code), but also higher release speed and safety, simpler tenant onboarding, and improved controls.
Author, contribute, and maintain our evolving knowledge base including support and operational runbooks, platform tenant guides, and onboarding and release documentation with an underlying goal of driving as much best practice and self-service as possible.
Participate in regular SRE on-call rota supporting a 24/7/365 support model across our mission critical platforms within a large banking eco-system of front-end, middleware, and back-end fulfilment systems.

To be successful, you will :

Possess fundamentals and evidence-based problem solving skills; Drive decision-making by function, first principles-based mind-set.
Demonstrate a bias-to-action and avoid analysis-paralysis, maintain a sense of ownership as you drive actions to the finish line with high quality and on time
Be ego-less when searching for the best ideas and contribute effectively outside of your specialty; You think about solving problems from the standpoint of best outcome for the team
Have strong fundamental knowledge in distributed systems and networking
Possess programming experience in at least one of the following languages: Python, Java, Go, Ruby, Bash scripting
Have the ability to debug and optimise code, while automating routine tasks (i.e., TOIL reduction)
Have a strong background in the setup, use, and optimisation of a variety of observability tools including Splunk, DataDog, AppDynamics, and Cloudwatch.
Understand the concepts of quantifying failure and availability in a prescriptive manner using SLOs, SLIs, and Error Budgets

You’ll achieve more when you join HSBC.

www.hsbc.com/careers

HSBC is an equal opportunity employer committed to building a culture where all employees are valued, respected and opinions count. We take pride in providing a workplace that fosters continuous professional development, flexible working and, opportunities to grow within an inclusive and diverse environment. We encourage applications from all suitably qualified persons irrespective of, but not limited to, their gender or genetic information, sexual orientation, ethnicity, religion, social status, medical care leave requirements, political affiliation, people with disabilities, color, national origin, veteran status, etc., We consider all applications based on merit and suitability to the role.”

Personal data held by the Bank relating to employment applications will be used in accordance with our Privacy Statement, which is available on our website.

Issued by – HSBC Software Development India

hackajob is partnering with HSBC to fill this position. Create a profile to be automatically considered for this role—and others that match your experience.

Upskill

Level up the hackajob way. Verify your skills, learn brand new ones and test your ability with Pathways, our learning and development platform.

Find out more

Ready to reach your potential?

Find out more

Platform

Customers

Solutions

Resources

Site Reliability Engineer/Senior Consultant Specialist

Pune, Maharashtra, India

Actively hiring

Site Reliability Engineer/Senior Consultant Specialist

HSBC

Pune, Maharashtra, India

HSBC

Actively hiring

Upskill

Ready to reach your potential?