Responsible for working within the Site Reliability Tools (SRT) team to build next-generation monitoring and site analytics capabilities; build and coordinate a large-scale observability pipelines for PlayStation; participate in the full life cycle of system design, from concept, architecture, data governance, application decomposition, to deployment and creation of KPIs for tracking adoption; participate in the design, logic, flowcharting, data ingestion, data governance, visualization development, testing, debugging, documentation, and support of observability tools infrastructure; provide analysis of problems, recommend solutions, and assist in continuous improvement initiatives; develop, implement, and maintain Enterprise observability solutions; perform configure, upgrade, scale out, patch, and tune Splunk Enterprise and CRIBL Stream; support governance of Splunk and CRIBL usage in order to provide an efficient platform; collaborate with architects, senior engineers, stakeholders, and leadership to create an observability architecture; utilize and apply knowledge of AWS, Azure, GCP, Datadog, Splunk, Grafana, CloudWatch, Terraform, Kubernetes, Docker, CloudFormation, and enterprise observability platforms to perform assigned tasks; conduct frequent capacity and cost reviews of observability tools; build automation and self-service capabilities to improve day-to-day operations; troubleshoot incidents and provide RCAs; and create technical documentation that enables operations teams to support the observability stack.
Location: Troy, Michigan and multiple undetermined worksites throughout the US;
Salary: $131,164 per year (benefits include medical, dental, vision, 401(k), STD/LTD, life insurance, and EAP)
Education: Bachelors – Computer Science, Computer Engineering, Information Technology, Information Studies, or in a related field of study (will accept equivalent foreign degree).
Training: None
Experience: Two (2) years in the position above, as a DevOps Engineer, as a Platform Engineer, as a Reliability Engineer, as a Software (Site Reliability) Engineer, or in a related occupation.