Job Description
Oracle's Industry Engineering Services (IES) Platform Development organization is looking for a seasoned Technical Program Manager 4 with a strong background in cloud-native observability, monitoring, and reliability programs to help deliver scalable, secure, and operable observability solutions across both Oracle Health Federal / Veterans Affairs and Commercial cloud environments. Our primary stack includes Prometheus/Grafana, OpenSearch, and OCI Observability & Management solutions.
The Technical Program Manager will help analyze, define, manage, and execute the work required to drive successful observability and monitoring initiatives across multiple teams and services. This may mean defining telemetry and alerting standards, coordinating delivery across engineering and operations, and ensuring solutions meet operational readiness, security, and compliance expectations-while also enabling consistency and reuse across Federal and Commercial deployments where appropriate.
The successful candidate must have strong program leadership, technical depth in distributed systems operations, excellent judgment, and notable experience collaborating across organizations to improve service health, operational readiness, and incident response maturity.
We are looking for a self-starting technical leader with a proven ability to execute both strategically and tactically-someone excited to take on new programs, drive adoption of standards, and deliver measurable improvements. This position requires a solid DevOps/SRE and cloud infrastructure foundation, comfort driving alignment among teams with competing priorities, and the ability to operate effectively in both cleared federal contexts and high-scale commercial cloud environments.
Responsibilities
Key Responsibilities
As a Technical Program Manager, you will engage with stakeholders to understand business and operational needs and work with engineering teams to design and execute effective, efficient, secure, and operable observability solutions.
With your technical background in monitoring and reliability practices, you will help shape how services instrument telemetry, define SLOs/SLIs, operationalize alerting, and improve incident response outcomes through standardized patterns, automation, and enablement-leveraging Prometheus/Grafana, OpenSearch, and OCI Observability capabilities where best fit.
As a member of the IES Platform Development Observability and Monitoring team, you will establish program governance, drive Agile execution, and refine processes that improve delivery velocity and operational excellence across both Federal and Commercial environments.
What You'll Do
Prioritize deliverables according to business, mission, and operational needs; engage developers and stakeholders in planning, sizing, and sequencing work
Engage with partner teams across Oracle Health Federal / VA and Commercial organizations to understand current and future observability capability requirements (telemetry, dashboards, alerting, on-call, incident response)
Lead programs to standardize and scale monitoring and logging patterns across environments, including:
Metrics collection and alerting with Prometheus; operational dashboards and visualization standards in Grafana
Log aggregation, search, and analytics with OpenSearch (index strategy, retention, access patterns)
Cloud-native observability integrations using OCI Observability & Management services
Drive cross-environment alignment where feasible (common patterns, reusable templates, shared KPIs), while accounting for differences in security controls, tenancy boundaries, and compliance requirements
Ensure programs incorporate security and compliance requirements appropriate for U.S. Federal environments (e.g., least privilege, auditability, retention, access controls, change control)
Monitor program workstreams from initiation through delivery, ensuring required leadership approvals, architecture reviews, and operational readiness criteria are met
Regularly interact across functional areas with senior management to ensure objectives are met, dependencies are managed, risks are visible, and deliveries stay on schedule
Drive adoption through change management: documentation, rollout plans, templates, training/office hours, and measured onboarding of services to standards
Security Requirement
U.S. Federal government security clearance is required (active or ability to obtain/maintain-per role requirements).
Must be a U.S. Person and meet all eligibility requirements for access to U.S. Government controlled information and systems, as applicable.
Minimum Job Qualifications (Oracle Standard)
Education and/or Experience:
- 8+ years of experience as a Technical Program Manager or related leading large technical programs or projects, development/implementation projects, or managing complex cross-functional programs from conception to delivery
OR
- Bachelor's Degree in Computer Science, Information Technology, Software Development mor related field AND 7 years of experience as a Technical Program Manager leading large technical programs or projects, development/implementation projects, or managing complex cross-functional programs from conception to delivery
OR
- Master's Degree in Computer Science, Information Technology, Software Development, or related field AND 5 years of experience as a Technical Program Manager leading large technical programs or projects, development/implementation projects, or managing complex cross-functional programs from conception to delivery
OR
- Doctorate in Computer Science, Information Technology, Software Development or related field AND 3 years of experience as a Technical program Manager leading large technical programs or projects, development/implementation projects, or managing complex cross-functional programs from conception to delivery
Technical Qualifications
Demonstrated practical experience with the following technologies and concepts (not necessarily all):
Observability fundamentals: metrics, logs, traces; instrumentation and telemetry pipelines
Monitoring and visualization using Prometheus and Grafana (alert strategy, dashboards, ownership models)
Log aggregation and analysis using OpenSearch (query patterns, retention/rollover concepts, RBAC/access considerations)
Familiarity with OCI Observability & Management solutions and patterns for integrating OCI services and workloads
Distributed systems operations: alerting strategy, dashboards, runbooks, operational readiness reviews
SLO/SLI design, error budgets, and reliability-driven prioritization
Cloud-native ecosystems (containers/Kubernetes, service-to-service communication, CI/CD considerations for operability)
Incident management lifecycle: on-call readiness, incident response practices, and postmortems with actionable follow-through
Experience operating in regulated environments (e.g., FedRAMP, FISMA, NIST 800-53-aligned controls) is a plus
Soft Skill Qualifications
Excellent oral and written communication skills
Experience interacting with both business and engineering stakeholders at all levels, including senior leadership
Strong analytical, planning, and organizational skills with the ability to manage competing demands
Demonstrated leadership and ability to influence without authority across multiple teams
Ability to work independently-defining and managing one's own work while providing transparency and accountability
Effective engagement of individuals and teams located across multiple geographic regions and time zones
What the Perfect Candidate Will Have
In addition to the knowledge, skills, and experience listed above, you will score extra points if you also have:
- 2+ years technical background as a software engineer, SRE, or DevOps en