Today
Top Secret/SCI
Unspecified
Polygraph
IT - Software
Chantilly, VA (On-Site/Office)
Overview
We are seeking an experienced Senior Systems Engineer with US Government Top Secret/SCI security clearance with Polygraph to support a small standalone system dedicated to high-performance computing (HPC) and artificial intelligence (AI) workloads. This role demands a blend of operational expertise and strategic technical vision, focusing on the management and optimization of our standalone HPC/AI system. The ideal candidate will manage the technical operation of our infrastructure, develop standardized procedures for hardware, network, and software management across the system, and expertly oversee cluster management (including provisioning, optimization, and monitoring of clustered resources for HPC/AI workloads, such as NVIDIA BCM).
What will you do?
Do you have what it takes?
Operating Systems & Infrastructure:
Hardware & Networking:
Virtualization & Containerization:
Management & Orchestration:
Development Support & Software Management:
Operating system software repository synchronization (Apt, Snap, Yum)
We are seeking an experienced Senior Systems Engineer with US Government Top Secret/SCI security clearance with Polygraph to support a small standalone system dedicated to high-performance computing (HPC) and artificial intelligence (AI) workloads. This role demands a blend of operational expertise and strategic technical vision, focusing on the management and optimization of our standalone HPC/AI system. The ideal candidate will manage the technical operation of our infrastructure, develop standardized procedures for hardware, network, and software management across the system, and expertly oversee cluster management (including provisioning, optimization, and monitoring of clustered resources for HPC/AI workloads, such as NVIDIA BCM).
What will you do?
- This position requires broad expertise in HPC/AI system administration, with a focus on:
- Refining infrastructure management frameworks
- Traditional infrastructure management (hardware, networking, directory services)
- Modern HPC/AI support (Linux/Ubuntu, Proxmox, NVIDIA BCM, WEKA storage)
- Designing scalable, secure, and highly available system architectures
Do you have what it takes?
- Active TS/SCI with Polygraph required.
- Bachelor's degree in Engineering, Computer Science, Software Engineering, or related field.
- 7+ years' experience in systems engineering or related field
Operating Systems & Infrastructure:
- Expert-level Linux systems engineering
- Windows client operating systems deployment/maintenance
- Linux (Ubuntu) server operating systems deployment/maintenance
Hardware & Networking:
- Server hardware
- Network hardware, wiring, and switching configurations
Virtualization & Containerization:
- Virtualization (ideally Proxmox)
- Containerization (ideally Docker/Podman with Ray or Kubernetes)
Management & Orchestration:
- Directory services and PKI infrastructure deployment/maintenance
- Configuration management (ideally Ansible, Puppet, Chef, or DSC)
- Cluster orchestration (ideally NVIDIA Base Cluster Management (BCM))
Development Support & Software Management:
- Development support services (Gitlab, Jenkins, Nexus)
Operating system software repository synchronization (Apt, Snap, Yum)
group id: RTL806649