Who we are:
Geotab is a global leader in IoT and connected transportation and certified "Great Place to Work." We are a company of diverse and talented individuals who work together to help businesses grow and succeed, and increase the safety and sustainability of our communities.
Geotab is advancing security, connecting commercial vehicles to the internet and providing web-based analytics to help customers better manage their fleets. Geotab's open platform and Geotab Marketplace , offering hundreds of third-party solution options, allows both small and large businesses to automate operations by integrating vehicle data with their other data assets. Processing billions of data points a day, Geotab leverages data analytics and machine learning to improve productivity, optimize fleets through the reduction of fuel consumption, enhance driver safety and achieve strong compliance to regulatory changes.
Our team is growing and we're looking for people who follow their passion, think differently and want to make an impact. Ours is a fast paced, ever changing environment. Geotabbers accept that challenge and are willing to take on new tasks and activities - ones that may not always be described in the initial job description. Join us for a fulfilling career with opportunities to innovate, great benefits, and our fun and inclusive work culture. Reach your full potential with Geotab. To see what it's like to be a Geotabber, check out ourblogand follow us @InsideGeotab onInstagram. Join ourtalent networkto learn more about job opportunities and company news.Who you are:
We are always looking for amazing talent who can contribute to our growth and deliver results! Geotab is seeking Site Reliability Engineer professionals who with training, will be able to quickly contribute to the Site Reliability team. If you love technology, are passionate about engineering support, and are keen to join an industry leader - we would love to hear from you!
What you'll do:
As a part of Site Reliability Engineering team, your key area of responsibility is to ensure the availability, reliability, and performance of Geotab's core products for our customers. This role acts as a primary escalation point, diagnosing and resolving complex application issues impacting service availability and performance of multiple large scale applications that support thousands of customers globally. SRE supports production applications and infrastructure, focusing on restoring normal service operations efficiently and contributing to long-term system stability.
How you'll make an impact
Act as a primary escalation point for critical production application/product issues.
Rapidly troubleshoot complex problems across the application stack, utilizing observability tools to identify root causes.
Coordinate effectively with development, infrastructure, and other technical teams during incidents to implement fixes and restore service swiftly.
Clearly communicate incident status, impact, and resolution steps to internal stakeholders.
Collaborate with team members to improve monitoring tools, dashboards, and alerting mechanisms for proactive detection of issues impacting Critical User Journeys (CUJs) within the application/product and computing architecture. Our complex environment encompasses monolithic applications, microservices, and a vast ecosystem of millions of hardware units.
Monitor application/product and system health proactively using a combination of tools to ensure high availability and adherence to Service Level Objectives (SLOs) / Service Level Agreements (SLAs).
Identify opportunities and implement automation tools/scripts to streamline routine operational tasks, reduce manual effort (toil), and improve response times.
Conduct system tests to validate performance, reliability, and successful remediation of issues.
Recommend design and process enhancements based on operational experience to improve overall application reliability and maintainability.
Participate in post major incident reviews (PMIRs) to analyze disruptions, document findings, track corrective actions to prevent recurrence, and identify areas of improvement for incident response processes.
Contribute to building a culture of learning from incidents.
Participate in a 24x7 on-call rotation to provide timely support for critical issues outside of business hours.
What you'll bring to the role
3 - 5 years experience in SRE/DevOps/Tier 3.
Strong troubleshooting skills with a systematic problem-solving approach.
Extensive experience resolving critical incidents in production environments.
Strong proficiency in Linux and operational scripting (Bash, Powershell, Python).
Experience with database/dataset querying (GoogleSQL, PostgreSQL, BigData), automated configuration management (via tools like Ansible), and GitOps tools (Argo CD).
Experience with data visualization platforms (e.g., Apache Superset/BigQuery... For full info follow application link.
Geotab provides equal employment opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, sex, national origin, age, disability or genetics. In addition to federal law requirements, Geotab complies with applicable state and local laws governing nondiscrimination in employment in every location in which the company has facilities. This policy applies to all terms and conditions of employment, including recruiting, hiring, placement, promotion, termination, layoff, recall, transfer, leaves of absence, compensation and training.
Geotab expressly prohibits any form of workplace harassment based on race, color, religion, gender, sexual orientation, gender identity or expression, national origin, age, genetic information, disability, or veteran status. Improper interference with the ability of Geotab's employees to perform their job duties may result in discipline up to and including discharge.