Senior Engineer- SRE
Job Description
Job Description
Bridge the gap between operations and developer teams, aiming to expedite developments while improving reliability & quality.
Qualifications
- Build a Site Reliability Engineering culture across the organization by sharing best practices, approaches, documentation, and code with other engineering teams
- Create software that improves the reliability of systems in production, fixing issues and responding to incidents.
- Apply automation and software to any tasks or parts of the system that would benefit from it or are performed manually
- Able to troubleshoot complicated, cross platform issues handling OS, Networking, Database in a cloud-based SaaS environment and handle live production incidents, debug/troubleshoot application and infrastructure issues, follow and implement SRE best practices
- Monitor application performance take steps to improve overall application performance and stability and follow through with implementation
- Ensure SLA/SLO error margins are being adhered to by the teams before releasing new features. Take corrective actions if error margins are out of bounds.
- Conduct system analysis, configuration management and develops improvements for system software performance, availability and reliability
- Design, write, ship, and motivate the creation of software and systems to increase observability, product reliability and efficiency
- Work closely with software engineers and testers to ensure the system is responding properly to no-functional requirements such as performance, security, and availability
- Document your system knowledge as you acquire it over time, create runbooks, and ensure critical system information is readily available to those who need it
- Maintain and monitoring deployment, orchestration, of the servers, docker containers, databases, and general backend infrastructure
- Keep up-to date with security and proactively identify, diagnose, and solve complex security issue
Additional Information
- Bachelor's degree in computer science or other highly technical, scientific discipline
- Overall experience of 7+ years including 4+ years experience as SRE/DevOps Engineer
- Working closely with engineering teams to understand their product requirements and how they build/test/deploy their software applications
- Demonstrable experience in Containerization-Docker and orchestration (Kubernetes)
- Demonstrable experience in CI/CD tools such as bitbucket, bamboo, nexus and helm
- Experience with Infrastructure As Code (Terraform, Cloud Formation, Ansible)
- Knowledge and proven hands-on experience in large-scale databases and distributed technologies, such as Kafka and Confluent Platform Kafka
- Basic programming and scripting skills (preferably Golang, bash, shell, etc.,)
- Ability to provide advice, best practices and recommendations for the operation and deployment of Microsoft Azure
- Experience in monitoring and analyzing infrastructure performance using standard performance monitoring tools – Nagios, New Relic, Perfmon, PerfView, ProcDump, DebugDiag
- Familiarity with Linux and UNIX systems (e.g. CentOS, RedHat) and command line system administration such as Bash, VIM, SSH.
- Hands on experience in configuration management of server farms (using tools such as Puppet, Chef, Ansible, etc.,).
- Network routing, Load balancing and Networking protocols, a base knowledge of TCP/IP, with an understanding of HTTP and DNS
- Knowledge of SRE & Agile methodologies
Preferred Skills (Good to have)
- Demonstrated understanding of ITIL methodologies, ITIL v3 or v4 certification
- Kubernetes CKA or CKAD certification
Job Details
Employment Types:
Full time
Industry:
Banking / Accounting / Financial Services
Function:
IT
Roles:
Software Engineer / Programmer