Are you ready to make your mark in the forefront of technological innovation? As an HPC Cluster Engineer, you’ll play a pivotal role in shaping the future of AI, deep learning, and machine learning initiatives. Join us and leverage Nvidia’s cutting-edge GPU technology to drive groundbreaking discoveries and revolutionize industries.
Sustainable Talent is thrilled to partner with Nvidia, a global powerhouse with over 25 years of trailblazing advancements in computer graphics, gaming, and accelerated computing.
This is a W-2 full-time contract based in Santa Clara, CA - Hybrid work option. We offer competitive pay based on factors like experience, education, location, etc. and provide full benefits, PTO, and amazing company culture!
Additional locations: MA, Westford; US, NC, Durham; US, TX, Austin.
What you’ll be doing:
- You’ll lead the charge in optimizing our Infiniband network and managing Lustre and GPFS storage solutions, ensuring seamless performance for our cutting-edge initiatives.
- Your expertise in the SLURM job scheduler will be instrumental in orchestrating the smooth operation of our clusters, from scheduling tasks to managing resources efficiently.
- As a Linux sysadmin guru, you’ll be responsible for maintaining the stability and security of our systems, leveraging your deep understanding of Linux environments.
- Harnessing the power of Ansible, you’ll automate routine tasks and streamline operations, freeing up time for innovation and optimization.
- Advanced python and bash scripting will drive automation efforts and enable dynamic solutions to complex challenges.
What We Need to See:
- Demonstrated experience with SLURM, coupled with a solid understanding of Infiniband networks and Lustre/GPFS storage systems, is essential.
- A proven track record in Linux system administration, ensuring robustness and security in our computing environment.
- Proficiency in Ansible is a must-have, enabling you to automate tasks and workflows efficiently.
- Strong scripting abilities in Python and bash are critical for developing custom solutions and optimizing cluster performance.
Ways to Stand Out From the Crowd:
- Showcase your knowledge of best practices in HPC cluster operations, automation, and upgrades, setting you apart as a seasoned professional in the field.
Sustainable Talent is a M/F+, disabled, and veteran equal employment opportunity and affirmative action employer.