Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the jetpack domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /home/wathefty/public_html/jobs/wp-includes/functions.php on line 6114

Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the rank-math domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /home/wathefty/public_html/jobs/wp-includes/functions.php on line 6114

Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the advanced-ads domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /home/wathefty/public_html/jobs/wp-includes/functions.php on line 6114
Devops, Distributed GPU Cloud Engineer - وظيفتي
وظائف الأماراتوظائف بيت الامارات

Devops, Distributed GPU Cloud Engineer

Devops, Distributed GPU Cloud Engineer

الوصف الوظيفي

What You’ll Do

  • Design, implement, and manage deployment of highly available services
  • Lead the implementation of a CI/CD pipeline for existing repositories
  • Contribute to design, implementation, testing, and documentation of orchestration and other services supporting GPU Cloud
  • Contribute to major cloud features that enable our customers to easily run large-scale AI research projects
  • Develop and maintain tools for automation, deployment, monitoring, and operations of our cloud infrastructure
  • Develop and maintain the infrastructure as code, including the design and implementation of automated infrastructure deployment and configuration management
  • Provide guidance and support to development teams in the implementation of DevOps best practices
  • Participate in building orchestration software within our data centers for launching/terminating VMs, networking, and attaching storage
  • Ensure correctness of software through high quality automated testing and setting standards for other developers
  • Stay current with emerging trends and technologies related to DevOps, cloud computing, and infrastructure automation
  • Linux distributions and package managers
  • Linux user management
  • VM management
  • GitLab administration
  • Network building, configuration and maintenance
  • Monitoring tool setup and maintenance (i.e.:Grafana)
  • Automatic backup setup and maintenance

المهارات

  • Strong engineering background – EECS preferred, Mathematics, Software Engineering, Physics
  • 5+ years of experience in DevOps or related field
  • Experience with the implementation of CI/CD pipelines for existing repositories using tools like Github Actions or Buildkite
  • Strong experience with infrastructure as code (IaC) and related tools such as Ansible, CloudFormation or Terraform
  • Proficient in one or more programming languages such as Python, Node.JS or Go
  • Familiarity with cloud computing platforms such as AWS, GCP, or experience deploying a web application on a a private cloud platform
  • Experience with containerization technologies such as Docker or Kubernetes

Nice to Have

  • Experience in a leadership role, with the ability to lead and mentor junior team members
  • Have experience developing major features from conception to production using a Python web application
  • Experience integrating with storage systems, data center networking, virtualization
  • Experience in the machine learning or computer hardware industry
  • Cloud platform management
  • Shell scripting
  • Container orchestration tool familiarity
  • Knowledge of cluster schedulers
  • Designing cloud deployment strategy for Riva and NeMo LLM and any other LLM model services: helm charts, k8s operators, etc
  • slurm, zabbix, rockclusters, proxmox for GPU orchestration and distributed full power in one computing solutions
  • Experience with GPU computing systems
  • Experience with Nvidia (Cuda) drivers.
  • Tensor flow, PyTorch experience is advantageous

تفاصيل الوظيفة

منطقة الوظيفة
دبي, الإمارات العربية المتحدة
قطاع الشركة
الاتصالات والشبكات; تخزين البيانات; ألعاب الفيديو
طبيعة عمل الشركة
صاحب عمل (القطاع الخاص)
الدور الوظيفي
تكنولوجيا المعلومات
نوع التوظيف
دوام كامل
الراتب الشهري
غير محدد
عدد الوظائف الشاغرة
1

للتقدم على الوظيفة

مقالات ذات صلة

اترك تعليقاً

لن يتم نشر عنوان بريدك الإلكتروني. الحقول الإلزامية مشار إليها بـ *

شاهد أيضاً
إغلاق
زر الذهاب إلى الأعلى