Basic Information
Ref Number
Req_00147847
Primary Location
CA - Home Office
Country
Canada
Job Type
Digital Solutions
Work Style
Remote
Description and Requirements
Here’s the impact you’ll make and what we’ll accomplish together
We are looking for a Site Reliability Engineer to join our team and drive the reliability, performance, and scalability of our critical Google Cloud-based products. This role requires a deep understanding of SRE principles and a passion for automation. This role involves ensuring the reliability, performance, and scalability of critical Google Cloud-based products.
Responsibilities:
- Collaborate with development teams to design, deploy, and operate robust Site Reliability Engineering capabilities for Google Cloud products.
- Proactively identify and address performance bottlenecks based on key performance indicators (KPIs).
- Develop comprehensive monitoring solutions that alert on potential issues before they escalate into outages.
- Create and maintain clear, concise, and up-to-date documentation.
- Continuously improve our systems and processes to enhance security, scalability, performance, and resilience.
- Work closely with development teams to build high-quality, efficient, and secure services.
- Interact with vendors and service providers to resolve system-related issues.
- Troubleshoot production issues to pinpoint root causes and implement effective solutions.
- Mentor and guide team members on SRE best practices.
- Participate in on-call rotations to ensure 24/7 system availability.
Qualifications and Skills:
- Canadian Residency and ability to obtain a Government Security Clearance.
- Bachelor's degree in Computer Science or a related field.
- Proven experience as a Senior SRE or similar role.
- Proficiency in version control systems, such as Git.
- Expertise in monitoring infrastructure, application uptime, latency, and performance in large-scale distributed systems.
- Strong verbal and written communication skills with the ability to effectively communicate at all levels of the organization.
- Hands-on experience with Terraform, Dynatrace, and PagerDuty.
- A passion for automation and a commitment to eliminating toil.
- Proficiency in Linux command-line and scripting.
- Experience with logging and metric tracing tools like Grafana and Prometheus.
- Excellent problem-solving skills and attention to detail.
- Strong communication and teamwork abilities.
- Ability to thrive in a fast-paced, dynamic environment.
Additional Requirements:
- Knowledge of DevOps practices and tools.
- Familiarity with Identity Ping and ForgeRock Identity products.
- Understanding of infrastructure components, including Unix servers and F5 load balancers.
Additional Job Description
A highly skilled Site Reliability Engineer is sought to join our team and drive the reliability, performance, and scalability of our critical Google Cloud-based products.
EEO Statement
At TELUS Digital, we enable customer experience innovation through spirited teamwork, agile thinking, and a caring culture that puts customers first. TELUS Digital is the global arm of TELUS Corporation, one of the largest telecommunications service providers in Canada. We deliver contact center and business process outsourcing (BPO) solutions to some of the world's largest corporations in the consumer electronics, finance, telecommunications and utilities sectors. With global call center delivery capabilities, our multi-shore, multi-language programs offer safe, secure infrastructure, value-based pricing, skills-based resources and exceptional customer service - all backed by TELUS, our multi-billion dollar telecommunications parent.
Equal Opportunity Employer
At TELUS Digital, we are proud to be an equal opportunity employer and are committed to creating a diverse and inclusive workplace. All aspects of employment, including the decision to hire and promote, are based on applicants’ qualifications, merits, competence and performance without regard to any characteristic related to diversity.