Job Description
At Oracle Cloud Infrastructure (OCI), we build the future of the cloud for Enterprises as a diverse team of fellow creators and inventors. We act with the speed and attitude of a start-up, with the scale and customer-focus of the leading enterprise software company in the world.
Oracle Generative AI Service is an exciting team in Oracle Cloud Infrastructure. We are delivering innovative services at the intersection of artificial intelligence and cloud infrastructure. In Generative AI Service team, you will build and operate massive-scale cloud services leveraging state of art machine learning technologies. We are committed to providing the best in cloud products to meet the needs of our customers who are tackling some of the world's most challenging problems.
You will be part of a team of smart, hands-on engineers with the expertise and passion to solve difficult problems in distributed highly available services and virtualized infrastructure. At every level, our engineers have a significant technical and business impact by designing and building innovative new systems to power our customer's business critical applications.
As a senior software engineer in Generative AI Service team, you will be leading the effort of building distributed, scalable, high-performance AI model training and serving systems in partnership with our applied scientists and software engineers. You will dive deep into model structure to optimize model performance and scalability. You will build state of art systems with cutting-edge technologies in this fast evolving area.
What we offer:
Being part of one of the most visionary and mission-driven organizations in Oracle, cooperating with talented peers with diverse backgrounds worldwide.
High visibility to senior leadership, opportunity to make huge impacts across organizations.
Opportunity to build state-of-the-art technologies in large language models (LLM) and generative AI at scale to solve real business problems.
Close partnership with applied scientists and software engineers to deploy solutions into production in various business-critical scenarios.
About You:
You are an experienced machine learning engineer with a proven track record of delivering large-scale, high-performance model serving/training systems in production.
You are obsessed with customers and exceeding their expectations.
You have excellent communication skills and you can clearly explain complex technical concepts.
You are a disciplined engineer who understands the importance of high standards, never satisfied with mediocrity and constantly striving for excellence.
You are passionate about technology and self-motivated to stay updated with latest developments in machine learning related technologies.
Minimum Qualifications
BS in Computer Science, or equivalent experience.
5+ years of experience shipping scalable, cloud-native distributed systems
Ability to work in a collaborative, cross-functional team environment.
Proficient in Python and shell scripting tools.
Experience with container orchestration technologies like Kubernetes.
Experience with production operations and best practices for putting quality code in production and troubleshoot issues when they arise.
Able to effectively communicate technical ideas verbally and in writing (technical proposals, design specs, architecture diagrams and presentations).
Preferred Qualifications
MS in Computer Science.
Production experience with cloud computing.
Experience with Large Language Model (LLM) serving technologies like DeepSpeed, FasterTransformer etc.
Experience with popular model training and serving frameworks like KServe, KubeFlow, Triton etc.
Experience with LLM fine-tuning, especially the latest parameter efficient fine-tuning technologies and multi-task serving technologies
Experience with deep learning frameworks (such as PyTorch, JAX, or TensorFlow) and deep learning architectures (especially Transformers).
Experience in diagnosing, troubleshooting and resolving issues in AI model training and serving
Responsibilities
As a senior software engineer in Generative AI Service team, you will be leading the effort of building distributed, scalable, high-performance AI model training and serving systems in partnership with our applied scientists and software engineers. You will dive deep into model structure to optimize model performance and scalability. You will build state of art systems with cutting-edge technologies in this fast evolving area. You will diagnose, troubleshoot and resolve issues in AI model training and serving. You may also perform other duties as assigned.
Disclaimer:
Certain US customer or client-facing roles may be required to comply with applicable requirements, such as immunization and occupational health mandates.
Range and benefit information provided in this posting are specific to the United States only
Hiring Range: from $94,200 to $223,500 per annum. May be eligible for bonus and equity.
Oracle maintains broad salary ranges for its roles in order to account for variations in knowledge, skills, experience, market conditions and locations, as well as reflect Oracle's differing products, industries and lines of business.
Candidates are typically placed into the range based on the preceding factors as well as internal peer equity.
Oracle offers a comprehensive benefits package which includes the following:
Medical, dental, and vision insurance, including expert medical opinion
Short term disability and long term disability
Life insurance and AD&D
Supplemental life insurance (Employee/Spouse/Child)
Health care and dependent care Flexible Spending Accounts
Pre-tax commuter and parking benefits
401(k) Savings and Investment Plan with company match
Paid time off: Flexible Vacation is provided to all eligible employees assigned to a salaried (non-overtime eligible) position. Accrued Vacation is provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week, the vacation accrual rate is 13 days annually for the first three years of employment and 18 days annually for subsequent years of employment. Vacation accrual is prorated for employees working between 20 and 34 hours per week. Employees working fewer than 20 hours per week are not eligible for vacation.
11 paid holidays
Paid sick leave: 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours.
Paid parental leave
Adoption assistance
Employee Stock Purchase Plan
Financial planning and group legal
Voluntary benefits including auto, homeowner and pet insurance
About Us
An Oracle career can span industries, roles, Countries and cultures, giving you the opportunity to flourish in new roles and innovate, while blending work life in. Oracle has thrived through 40+ years of change by innovating and operating with integrity while delivering for the top companies in almost every industry.
In order to nurture the talent that makes this happen, we are committed to an inclusive culture that celebrates and values diverse insights and perspectives, a workforce that inspires thought leadership and innovation.
Oracle offers a highly competitive suite of Employee Benefits designed on the principles of parity, consistency, and affordability. The overall package includes certain core elements such as Medical, Life Insurance, access to Retirement Planning, and much more. We also encourage our employees to engage in the culture of giving back to the communities where we live and do business.
At Oracle, we believe that innovation starts with diversity and inclusion and to create the future we need talent from various backgrounds, perspectives, and abilities. We ensure that