DL job scheduling using Deep Reinforcement Learning

Navya Alapati

Deep learning is an evolution of machine learning which utilizes numerous layers to draw out higher-level functions from input. Deep learning algorithms have the capability to learn from unsupervised data and is also referred to as deep neural network. Typically machine learning algorithms create a model with parameters and deep learning techniques build the data based on the results obtained from the model created by machine learning. Deep learning training requires high computing power and high performance to run jobs efficiently. In many cases, clusters are used to improve the efficiency by running multiple jobs in parallel and is accessible to multiple users. Scheduling jobs as per priority and scheduling efficiently is very important in clustering. A large cluster has its own unique challenges such as delays in job scheduling which will lead to long queues and also leads to low performance. To improve the performance there is requirement to use efficient allotment and scheduling of jobs to reduce job completion times (JCT) and increase the resource utilization. There are many studies and techniques related to allocation of resources using deep learning, but there is no existing literature on job scheduling that can efficiently allocate resources in limited job completion time. To overcome the gap in existing literature this project proposes a deep reinforcement learning algorithm called DQN (Deep Q-Network) that allocates the resources efficiently by using maximum utilization of the cluster and also in less job completion times (JCT). The proposed model introduces scheduling Deep Learning (DL) jobs using combination of artificial neural network and reinforcement learning to schedule efficiently with no additional information from the users. DQN also helps in improving performance of the model. DQN algorithm uses the above CNN model that is trained using the reward obtained to the reinforcement learning agent and trains the model based on the target model provided to the model. It uses trial and error method to learn and save the experiences. It allocates each individual job to the cluster and can also schedule multiple jobs. DRL enables agents that learn best actions possible to obtain the best/target model needed from its experiences. DQN is a RL algorithm that merges deep neural network and Q-learning to deal with complex high-dimensional environments. This project aims in utilizing Deep Reinforcement Learning algorithm to achieve all the above objectives for scheduling the jobs efficiently with minimal job completion times and most utilization of the cluster. The goal of this project is to schedule the jobs efficiently in the cluster with maximum utilization of the resources available and minimum job completion times using deep reinforcement learning technique. The algorithm used is the DQN which outperforms traditional resource allocating algorithms.

DL job scheduling using Deep Reinforcement Learning

Abstract

Files and links (1)

Metrics

Details