Today
DoE Q or L
Unspecified
Unspecified
IT - Hardware
Knoxville, TN (On-Site/Office)
Linux Systems Engineer
Founded in 1999 in the beautiful Smoky Mountains of East Tennessee, Cadre5 provides innovative technical solutions to our customers locally and nationally. Our Cadre5 Lab Partners division has partnered with the Emerging Technologies, AI & Computing group in the Research Computing Support division in the Information Technology Services Directorate at Oak Ridge National Laboratory (ORNL) to recruit a qualified Linux Systems Engineer who will focus on supporting the technological needs of ORNL researchers.
ORNL delivers scientific discoveries and technical breakthroughs needed to realize solutions in energy and national security and provides economic benefit to the nation. This premier research institution located near Knoxville in Oak Ridge, TN, addresses national needs through impactful research and world-leading research centers.
#CJ
This is a full-time, permanent position that requires on-site work.
Why Cadre5?
Overview:
This role advocates and promotes Linux systems to researchers who process large data sets and/or develop code as a part of their project. This includes ensuring the availability, performance, scalability, and security of production systems. The ETAC Group frequently uses automation and monitoring solutions to minimize our day-to-day maintenance and are always looking for opportunities to optimize system management practices or system performance. As the primary domain experts for these systems, you will work with technical staff to install and help tune the performance of various scientific toolsets.
Job Responsibilities:
• Develop and document system and service diagrams, procedures, and software build/install notes.
• Create observability metrics and dashboards and assist in ongoing documentation efforts.
• Install, configure, customize, and maintain Linux software including building software from source.
• Collaborate with researchers, developers, and other engineers to develop creative solutions and solve complex challenges.
• Create and foster partnerships with research at ORNL to encourage outstanding delivery of services.
• Translate project deliverables, milestones, and timelines into predictable team output and tasks.
• Provide consulting in the selection and purchase of hardware and software systems.
• Ensure configuration management using tools such as Git, Jenkins, Ansible, Puppet, etc.
• Ensure the secure and effective operation of systems through compliance with ORNL procedures and IT Internal Operating Procedures.
• Develop, architect, and engineer systems and related application software solutions, including research projects. This includes:
o Monitoring for system issues
o Guiding project tasks through the engineering processes and ensuring that standard methodologies are implemented continuously and consistently
o Managing backup services
o Troubleshooting and resolving system problems quickly and effectively
o Working with other systems engineers and vendors to resolve hardware and software issues
Basic Qualifications:
o Masters and PhD degree holders in the same fields of study are also encouraged to apply:
Preferred Qualifications:
• Experience supporting AI-enabled systems and software for model training is preferred.
• Understanding of platforms to support users with job submissions and troubleshooting.
• Excellent interpersonal skills suitable for communication with customers and management.
• Effective written, presentation, and verbal communication skills.
• Experience with Centos/RHEL, Ubuntu, VMware.
• Experience building and running containerized applications in an environment. Knowledge of Apptainer, Warewulf, Fuzzball.
• Experience managing systems using GPU/CUDA clusters for AI/ML and/or image processing.
• Proven ability to work in a dynamic environment and support large data systems.
• Effective documentation skills, including ability to prepare simple documentation web pages.
Benefits
Cadre5 offers excellent pay and benefits, to include full medical, dental, and vision coverage coupled with 401K match, 15 days PTO, and 10 holidays.
Cadre5 is an equal opportunity employer. All qualified applicants, including individuals with disabilities and protected veterans, are encouraged to apply. Cadre5 is an E-Verify Employer.
Founded in 1999 in the beautiful Smoky Mountains of East Tennessee, Cadre5 provides innovative technical solutions to our customers locally and nationally. Our Cadre5 Lab Partners division has partnered with the Emerging Technologies, AI & Computing group in the Research Computing Support division in the Information Technology Services Directorate at Oak Ridge National Laboratory (ORNL) to recruit a qualified Linux Systems Engineer who will focus on supporting the technological needs of ORNL researchers.
ORNL delivers scientific discoveries and technical breakthroughs needed to realize solutions in energy and national security and provides economic benefit to the nation. This premier research institution located near Knoxville in Oak Ridge, TN, addresses national needs through impactful research and world-leading research centers.
#CJ
This is a full-time, permanent position that requires on-site work.
Why Cadre5?
- Working with highly talented team members
- 3 weeks' vacation
- Excellent medical insurance, up to 100% paid by employer
Overview:
This role advocates and promotes Linux systems to researchers who process large data sets and/or develop code as a part of their project. This includes ensuring the availability, performance, scalability, and security of production systems. The ETAC Group frequently uses automation and monitoring solutions to minimize our day-to-day maintenance and are always looking for opportunities to optimize system management practices or system performance. As the primary domain experts for these systems, you will work with technical staff to install and help tune the performance of various scientific toolsets.
Job Responsibilities:
• Develop and document system and service diagrams, procedures, and software build/install notes.
• Create observability metrics and dashboards and assist in ongoing documentation efforts.
• Install, configure, customize, and maintain Linux software including building software from source.
• Collaborate with researchers, developers, and other engineers to develop creative solutions and solve complex challenges.
• Create and foster partnerships with research at ORNL to encourage outstanding delivery of services.
• Translate project deliverables, milestones, and timelines into predictable team output and tasks.
• Provide consulting in the selection and purchase of hardware and software systems.
• Ensure configuration management using tools such as Git, Jenkins, Ansible, Puppet, etc.
• Ensure the secure and effective operation of systems through compliance with ORNL procedures and IT Internal Operating Procedures.
• Develop, architect, and engineer systems and related application software solutions, including research projects. This includes:
o Monitoring for system issues
o Guiding project tasks through the engineering processes and ensuring that standard methodologies are implemented continuously and consistently
o Managing backup services
o Troubleshooting and resolving system problems quickly and effectively
o Working with other systems engineers and vendors to resolve hardware and software issues
Basic Qualifications:
- A BS degree in computer science, computer engineering, information technology, information systems, science, engineering, business, or a related discipline and a minimum of five (5) to seven (7) years of aligned professional experience is required for consideration. An overall combination of equivalent education and experience may be considered.
o Masters and PhD degree holders in the same fields of study are also encouraged to apply:
- Masters' holders should have a minimum of four (4) to six (6) years of relevant and aligned experience.
- PhD holders should have up to three (3) years of relevant and aligned experience.
- Strong knowledge of Enterprise Linux distributions and enterprise class server/storage hardware.
- Experience monitoring and maintaining hardware and software including, but not limited to, InfiniBand, Slurm, Lustre, RDMA, Weka, and related technologies central to this team's work.
- Experience with configuration management and automation tools such as Git, Jenkins, Ansible, Puppet.
- Experience managing a virtualized environment including tuning and maintenance.
- Strong working knowledge of system design.
- Ability to obtain and maintain a security clearance is required.
- Experience creating scripts using Bash, Python, etc.
- Experience with on premises cloud native platforms (OpenStack, VMware, or others).
- Experience with work planning and documentation tools (such as Jira, Confluence, etc.)
- The ability to obtain and maintain a Department of Energy "Q" clearance is required. This requires US Citizenship.
Preferred Qualifications:
• Experience supporting AI-enabled systems and software for model training is preferred.
• Understanding of platforms to support users with job submissions and troubleshooting.
• Excellent interpersonal skills suitable for communication with customers and management.
• Effective written, presentation, and verbal communication skills.
• Experience with Centos/RHEL, Ubuntu, VMware.
• Experience building and running containerized applications in an environment. Knowledge of Apptainer, Warewulf, Fuzzball.
• Experience managing systems using GPU/CUDA clusters for AI/ML and/or image processing.
• Proven ability to work in a dynamic environment and support large data systems.
• Effective documentation skills, including ability to prepare simple documentation web pages.
Benefits
Cadre5 offers excellent pay and benefits, to include full medical, dental, and vision coverage coupled with 401K match, 15 days PTO, and 10 holidays.
Cadre5 is an equal opportunity employer. All qualified applicants, including individuals with disabilities and protected veterans, are encouraged to apply. Cadre5 is an E-Verify Employer.
group id: 91128638