Have you ever thought of automating end to end workflow for setting up a new data center by single click? Have you ever thought of implementing automations for infrastructure setups which generally takes months of effort to hours with single click? Some of the examples of such setups are:
1. Setting up DC network in reliable and reproducible manner.
2. Automatic OS provisioning on blade servers.
3. Secure login mechanism for human and service accounts.
4. Configuring the DC components using idempotent automation workflows.
5. Setting up highly available internal private cloud / container orchestration platform like Kubernetes on auto provisioned infra.
6. A very complex Inventory state life management workflow.
To accomplish reliable, reproducible and idempotent automation for infrastructure setup, NVIDIA DevOps team has been working on implementing *DC Automation Manager*, a framework developed using CICD tools ecosystem.
In this presentation we will talk about design and automation used at NVIDIA GPU Cloud to setup new DC of 1000s of GPU and CPU blade servers from scratch using Jenkins and GitOps for,
1. Streamlining inventory life cycle
2. L2/L3 network setups
3. Secret management to secure human or automated interaction with all the data center services.
4. Node provisioning and OS configuration with dynamic inventory capabilities
5. Setting up container orchestration platforms on BM/Cloud
6. Bridging the gap between application engineering and operation engineering.
Senior Engineering Manager, NVIDIA GPU Cloud DevOps Team.