وظائف الأماراتوظائف بيت الامارات

Devops, Distributed GPU Cloud Engineer

Devops, Distributed GPU Cloud Engineer

الوصف الوظيفي

What You’ll Do

  • Design, implement, and manage deployment of highly available services
  • Lead the implementation of a CI/CD pipeline for existing repositories
  • Contribute to design, implementation, testing, and documentation of orchestration and other services supporting GPU Cloud
  • Contribute to major cloud features that enable our customers to easily run large-scale AI research projects
  • Develop and maintain tools for automation, deployment, monitoring, and operations of our cloud infrastructure
  • Develop and maintain the infrastructure as code, including the design and implementation of automated infrastructure deployment and configuration management
  • Provide guidance and support to development teams in the implementation of DevOps best practices
  • Participate in building orchestration software within our data centers for launching/terminating VMs, networking, and attaching storage
  • Ensure correctness of software through high quality automated testing and setting standards for other developers
  • Stay current with emerging trends and technologies related to DevOps, cloud computing, and infrastructure automation
  • Linux distributions and package managers
  • Linux user management
  • VM management
  • GitLab administration
  • Network building, configuration and maintenance
  • Monitoring tool setup and maintenance (i.e.:Grafana)
  • Automatic backup setup and maintenance

المهارات

  • Strong engineering background – EECS preferred, Mathematics, Software Engineering, Physics
  • 5+ years of experience in DevOps or related field
  • Experience with the implementation of CI/CD pipelines for existing repositories using tools like Github Actions or Buildkite
  • Strong experience with infrastructure as code (IaC) and related tools such as Ansible, CloudFormation or Terraform
  • Proficient in one or more programming languages such as Python, Node.JS or Go
  • Familiarity with cloud computing platforms such as AWS, GCP, or experience deploying a web application on a a private cloud platform
  • Experience with containerization technologies such as Docker or Kubernetes

Nice to Have

  • Experience in a leadership role, with the ability to lead and mentor junior team members
  • Have experience developing major features from conception to production using a Python web application
  • Experience integrating with storage systems, data center networking, virtualization
  • Experience in the machine learning or computer hardware industry
  • Cloud platform management
  • Shell scripting
  • Container orchestration tool familiarity
  • Knowledge of cluster schedulers
  • Designing cloud deployment strategy for Riva and NeMo LLM and any other LLM model services: helm charts, k8s operators, etc
  • slurm, zabbix, rockclusters, proxmox for GPU orchestration and distributed full power in one computing solutions
  • Experience with GPU computing systems
  • Experience with Nvidia (Cuda) drivers.
  • Tensor flow, PyTorch experience is advantageous

تفاصيل الوظيفة

منطقة الوظيفة
دبي, الإمارات العربية المتحدة
قطاع الشركة
الاتصالات والشبكات; تخزين البيانات; ألعاب الفيديو
طبيعة عمل الشركة
صاحب عمل (القطاع الخاص)
الدور الوظيفي
تكنولوجيا المعلومات
نوع التوظيف
دوام كامل
الراتب الشهري
غير محدد
عدد الوظائف الشاغرة
1

للتقدم على الوظيفة

مقالات ذات صلة

اترك تعليقاً

لن يتم نشر عنوان بريدك الإلكتروني. الحقول الإلزامية مشار إليها بـ *

زر الذهاب إلى الأعلى