Description
We are seeking an experienced AWS Cloud Reliability Engineer to support the design and delivery of a secure, monitored, cloud-hosted production laboratory and cloud-hosted applications. The laboratory will be used by data science teams to create analytics for insights and AI/ML models for deployment in a high-stakes production environment.
The successful candidate will be responsible for designing, architecting, and integrating next-generation Artificial Intelligence (GenAI) solutions with various software platforms to ensure high reliability, scalability, and optimal performance. This individual will play a critical role in bridging the gap between complex cloud environments and cutting-edge AI technologies.
Key Responsibilities:
Collaborate with Solution Architects and Data Scientists to design and architect robust cloud solutions that support GenAI applications ensuring high availability, fault tolerance, and elasticity.
Develop and implement best practices for cloud reliability, including disaster recovery plans, backup solutions, and failover strategies.
Support the integration of GenAI solutions with existing software platforms, ensuring seamless operation and data consistency.
Conduct root cause analysis of complex issues affecting cloud environments and GenAI applications and collaborate across teams to implement permanent fixes.
Collaborate with cross-functional teams, including development, operations, and product teams, to enhance system reliability and performance.
Provide expert guidance on cloud-native technologies and mentor junior engineers in cloud reliability best practices.
Work closely with vendors and partners to stay up-to-date with the latest cloud technologies and GenAI advancements.
Participate in the full software development lifecycle, from requirements analysis to deployment and post-deployment support.
Create and maintain documentation related to cloud architecture, design decisions, and operational procedures.
Guide data science teams in adhering to Responsible AI Principles via the implementation of the Responsible AI Toolbox.
Qualifications
Qualifications:
Bachelor's or Master's degree in Computer Science, Data Science, Engineering, or a related field and 7 years of experience.
Proven experience with cloud services, particularly AWS (S3, Lambda, RDS, SNS, etc.).
Familiarity with machine learning frameworks and libraries.
Deep understanding of data security and privacy, especially in compliance with IRS standards.
Solid understanding of IaC (Infrastructure as Code) using tools like Terraform and CloudFormation.
Experience with CI/CD pipelines and tools for version control, automation, and deployment.
Target salary range: $160,001 - $200,000. The estimate displayed represents the typical salary range for this position based on experience and other factors.
SAIC accepts applications on an ongoing basis and there is no deadline.
Covid Policy: SAIC does not require COVID-19 vaccinations or boosters. Customer site vaccination requirements must be followed when work is performed at a customer site.
REQNUMBER: 2415542
SAIC is a premier technology integrator, solving our nation's most complex modernization and systems engineering challenges across the defense, space, federal civilian, and intelligence markets. Our robust portfolio of offerings includes high-end solutions in systems engineering and integration; enterprise IT, including cloud services; cyber; software; advanced analytics and simulation; and training. We are a team of 23,000 strong driven by mission, united purpose, and inspired by opportunity. Headquartered in Reston, Virginia, SAIC has annual revenues of approximately $6.5 billion. For more information, visit saic.com. For information on the benefits SAIC offers, see Working at SAIC. EOE AA M/F/Vet/Disability
S:SKDATVA1