Requisition Id 14393
Overview:
The Field Intelligence Operations Division is seeking a Senior High Performance Computing (HPC) Engineer for Classified Computing to lead the design, implementation, and management of HPC systems within a classified environment. We are looking for candidates with extensive experience in HPC architecture, cluster management, and parallel computing, with a proven ability to work within highly secure and regulated environments. This role involves close collaboration with security teams, scientists, and IT leadership to ensure that the HPC infrastructure meets the stringent performance, security, and compliance requirements necessary for classified work.
As part of our team, you will join a dynamic and elite group of professionals specializing in the design, implementation, and management of HPC systems to support cutting-edge computational needs. Our team is highly collaborative, striving to ensure a gold standard in HPC architecture and operations are understood, implemented, and optimized for performance. You will play a critical role in delivering exceptional service to users and stakeholders by supporting system deployment, configuration, training, and education, ensuring that HPC resources are accessible, secure, and operating at peak efficiency.
Major Duties/Responsibilities:
- HPC System Design and Architecture:
- Lead the design and deployment of HPC systems, ensuring they meet the computational needs and security requirements of a classified environment.
- Create and maintain detailed documentation of HPC architectures, configurations, and operational procedures.
- Cluster Management and Optimization:
- Oversee the installation, configuration, and management of HPC clusters, ensuring optimal performance, scalability, and reliability.
- Implement and manage job scheduling, resource allocation, and load balancing to maximize the efficiency of HPC resources.
- Security and Compliance:
- Ensure all HPC systems comply with security policies and regulatory requirements, implementing necessary controls and conducting regular audits.
- Collaborate with the security team to address vulnerabilities and ensure the protection of sensitive data within the HPC environment.
- Performance Tuning and Troubleshooting:
- Monitor and optimize the performance of HPC systems, identifying and resolving bottlenecks and inefficiencies.
- Identify and resolve complex issues, ensuring minimal downtime and disruption to critical operations.
- Collaboration and Leadership:
- Lead HPC-related projects, from initial planning and design through to implementation and operational support.
- Collaborate with scientists, researchers, and others to ensure that the HPC environment meets their computational needs.
- Mentor and support junior HPC engineers, sharing expertise and best practices.
- Continuous Improvement and Innovation:
- Research and remain informed of the latest advancements in HPC technologies, identifying opportunities for innovation and enhancement of the HPC infrastructure.
- Propose and implement improvements to existing systems and processes to support the evolving needs of the organization.
Basic Qualifications:
- BS in computer science, engineering, or a related field and eight (8) years of relevant experience. An equivalent combination of education and experience may be considered.
- Seven (7) years of experience in HPC engineering, with a focus on cluster management, parallel computing, and performance optimization.
- Demonstrated experience working in classified environments, including a thorough understanding of security policies, compliance frameworks, and associated standard processes (e.g., NIST, DISA STIGs).
- HPC systems architecture experience, including cluster management tools (e.g., SLURM, PBS, Moab).
- Linux system administration skills, with experience in scripting and automation using tools such as Bash, Python, or Ansible.
- Experience with performance tuning and benchmarking tools for HPC environments (e.g., Ganglia, Grafana, or similar).
- Experience with parallel programming frameworks (e.g., MPI, OpenMP, CUDA) and high-performance interconnects (e.g., InfiniBand).
Preferred Qualifications:
- Familiarity with advanced storage solutions and parallel file systems (e.g., Lustre, GPFS, or BeeGFS).
- Professional certifications (e.g., Certified HPC Professional, Linux+, or Security+)
- Excellent leadership and project management abilities.
- Strong problem-solving skills with a proactive approach to identifying and resolving issues.
- Effective communication and collaboration skills, with the ability to work closely with cross-collaborative teams.
- Ability to manage multiple priorities and work effectively in a fast-paced, high-security environment.
- Proactive mentality with a commitment to continuous learning and improvement in the rapidly evolving HPC field.
Special Requirements:
- Visa sponsorship is not available for this position.
- Physical Requirements: Work may involve various physical requirements and work environments typical of a high performance computing facility / data center.
- Q clearance with SCI: This position requires the ability to obtain and maintain a Secret Compartmented Information (SCI) clearance from the Department of Energy. As such, this position is a Workplace Substance Abuse (WSAP) testing designated position. WSAP positions require passing a pre-placement drug test and participation in an ongoing random drug testing program. In addition, due the SCI, you may also be subject to random polygraph testing.
Benefits at ORNL:
ORNL offers competitive pay and benefits programs to attract and retain dedicated people! The laboratory offers many employee benefits, including medical and retirement plans and flexible work hours, to help you and your family live happy and healthy. Employee amenities such as on-site fitness, banking, and cafeteria facilities are also provided for convenience.
Other benefits include the following: Prescription Drug Plan, Dental Plan, Vision Plan, 401(k) Retirement Plan, Contributory Pension Plan, Life Insurance, Disability Benefits, Generous Vacation and Holidays, Parental Leave, Legal Insurance with Identity Theft Protection, Employee Assistance Plan, Flexible Spending Accounts, Health Savings Accounts, Wellness Programs, Educational Assistance, Relocation Assistance, and Employee Discounts.
If you have difficulty using the online application system or need an accommodation to apply due to a disability, please email: ORNLRecruiting@ornl.gov.
#LI-ES1
This position will remain open for a minimum of 5 days after which it will close when a qualified candidate is identified and/or hired.
We accept Word (.doc, .docx), Adobe (unsecured .pdf), Rich Text Format (.rtf), and HTML (.htm, .html) up to 5MB in size. Resumes from third party vendors will not be accepted; these resumes will be deleted and the candidates submitted will not be considered for employment.
If you have trouble applying for a position, please email ORNLRecruiting@ornl.gov.
ORNL is an equal opportunity employer. All qualified applicants, including individuals with disabilities and protected veterans, are encouraged to apply. UT-Battelle is an E-Verify employer.