Search suggestions:

remote
work from home
remote work
video editor
project manager
remote customer service
truck driver
work from home online
hr
administrative assistant
customer service representative
virtual assistant
jobs with accommodation
Federal Capital Territory
Rivers State
Abuja
Ibeju Lekki
Lagos
Lagos State
Ogun State
Kano State
Ibadan
Kaduna
Oyo State
Lagos State
Apply

DevOps Lead - Bare-Metal & GPU Infrastructure (Linux)

Runware
Nigeria
5 days ago
Today
R

DevOps Lead - Bare-Metal & GPU Infrastructure (Linux)

Runware

Rest of Nigeria (Nationwide)
Confidential
  • Minimum Qualification :

Job Description/Requirements


Company Description
Runware is the fastest AI-as-a-Service platform for media generation
Runware is an AI-as-a-Service platform that delivers real-time inference at 5-10× lower cost than competitors. Our platform is purpose-built for speed & efficiency: custom GPU design, server setup, and datacenter architecture matched with performance-optimized software and a best-in-class API. Engineering teams who work with Runware save up to 80% on inference, improve response times, and scale instantly across 300K+ AI models, all through a single flexible API. Usage-based pricing and on-demand capacity are already battle-tested by Wix, OpenArt, NightCafe, Freepik, and thousands more. Backed by Insight Partners, a16z Speedrun, Begin Capital, and Zero Prime.
Join Runware to power the AI products that are changing the world
At Runware you'll collaborate with the world's leading AI teams, turning cutting-edge research into breakthrough products for thousands of clients. New models hit the market every week, and our job isn't just to keep pace—it's to stay two steps ahead, delivering unbeatable speed and performance every time.
That takes a special kind of teammate: driven, self-directed, lightning-quick to learn, and rock-solid reliable. If you thrive on building ambitious things with people who work hard, care for one another, and refuse to settle for "good enough," you'll feel right at home.
Resumés matter, but passion, grit, and proof of excellence matter more—whether you honed your skills in a research lab, at work, or taught yourself at 2 a.m. If that sounds like you, let's talk.
About The Role
This is a full-time remote role for a DevOps Lead - Bare-Metal & GPU Infrastructure (Linux). The successful candidate will be responsible for ensuring 99.999% service availability and optimum usage/scale infrastructure ratios while shipping code across hundreds of Linux GPU servers in multiple data-center locations.
Responsibilities
Fleet reliability - design and automate HA architectures that tolerate node, rack, or site failure without user impact Ultra-fast delivery - build zero-touch CI/CD pipelines (GitOps, progressive rollout, instant rollback) that push config or container changes globally in under 10m Bare-metal lifecycle - PXE/Redfish/IPMI bootstrapping, firmware & driver orchestration, per-node GPU tuning, automated de-commissioning Kubernetes on metal - multi-cluster control-plane HA, GPU scheduling, CNI overlay (Cilium/Calico), MetalLB/Ingress <50 ms failover Observability at scale - end-to-end metrics, logs, traces, actionable SLO dashboards, and predictive auto-healing Incident command - primary on-call lead; run blameless post-mortems and automate root-cause fixes Capacity bursts - script server bring-up (Ansible/Terraform/Cluster-API) so 100+ new GPUs go live in minutes Security & compliance - kernel-level hardening, secrets management, GPU multi-tenancy isolation, continuous CVE patching Mentorship - guide a small SRE/DevOps pod, set coding standards, and champion best practices
In Your First 12 Months You Will:
Cut average deployment latency to 2m end-to-end, with one-click rollbacksMaintain 5 min total annual user-visible downtime (five nines) across all sitesAutomate server bring-up to <10 min from rack power-on to production workloadReduce P1 incidents by 60% through predictive alerting and auto-remediationDeliver fully auditable, Git-centric change pipelines adopted by 100% of engineering
Requirements

  • 5+ yrs Linux SRE/DevOps with 100+ bare-metal node fleets; 2+ yrs as technical lead
  • Deep knowledge of NVIDIA/AMD GPU servers, high-speed interconnects (40 GbE+/InfiniBand/RoCE), NVMe/RDMA storage
  • Proven record sustaining
  • 99.999% uptime in latency-sensitive, high-variance demand environments
  • Expert in Kubernetes on bare metal (Cluster-API, Kube-Virt, GPU Operator), advanced CNI, custom schedulers, and etcd care-and-feeding
  • Strong skills in Go or Python, plus Bash; you write the tools you can't find
  • Infrastructure-as-Code mastery (Terraform, Ansible, Packer), GitOps workflows, and container build systems
  • Monitoring/alerting stacks (Grafana), chaos/latency testing, synthetic probes
  • Clear architectural thinking, crisp documentation, and calm communication under pressure

Ready to architect zero-downtime, sub-minute rollouts for thousands of GPUs? Apply and let's run the world's AI together.
Benefits
We're a remote-first collective, meeting in person twice a year to plan, brainstorm, celebrate wins, and enjoy some face-to-face time. We have core hours for cooperative working and calls, but outside of that your calendar is yours. Work the hours that let you perform at your peak while also building a healthy life.
Our release cycles are fast and intense, but they're followed by real downtime. After big pushes we expect the team to unplug, recharge, and come back ready & stronger than ever for the next leap.

  • Generous paid time off - vacation, sick days, public holidays
  • Meaningful stock options - share in the upside you create
  • Remote-first setup - work from home anywhere we can employ you
  • Flexible hours - own your schedule outside core collaboration blocks
  • Family leave - paid maternity, paternity, and caregiver time
  • Company retreats - twice-yearly gatherings in inspiring locations


<

Save Apply
Report job
Other Job Recommendations:

Devops Trainer

Career Helps
Lagos, Lagos State
Career Helps is dedicated to empowering the next generation of industry leaders through its Industry Ready Program (IRP). The...
5 days ago

Senior DevOps and Automation Engineer

YNV Group
Lagos, Lagos State
  • Collaborate with development, QA, and operations teams to...
  • Deploy and manage observability frameworks with Azure...
3 days ago

Safety Officer (Metal Construction)

Start Up Africa
Lagos, Lagos State
  • Ensure full compliance with all site HSE policies and...
  • Conduct daily site inspections and enforce use of PPE by all...
1 day ago

DevOps Engineer

Prophius
Lagos, Lagos State
  • Specification and documentation of the new project features.
  • Increase the sophistication of our alerting and escalation...
1 week ago

Lead Generation Specialist

ITERATE
Uyo, Akwa Ibom State
You should be familiar with lead generation tools like LinkedIn Sales Navigator, Apollo, Crunchbase, Google Sheets, and basic CRM...
5 days ago

Lead, Decision Support Systems

Airtel Africa
Nigeria
To ensure that existing and new products are competitively priced in line with the financial target of the company and competition...
3 days ago

DevOps Lead - Bare-Metal & GPU Infrastructure (Linux)

Runware
Nigeria
Our platform is purpose-built for speed & efficiency: custom GPU design, server setup, and datacenter architecture matched...
5 days ago

Lead Generation (Intern)

Inkmedia Creatives Services Limited
Lagos, Lagos State
Lead Generation (Intern) Are you great at research, networking, and finding potential clients? Join our book writing and...
1 week ago

Research officer lead

Labrys Consults NG
Lagos, Lagos State
Research officer lead Research, Teaching & Training Research Officer/Lead: Minimum of 6 years relevant industry (asset...
5 days ago