Abstract
Introduction
The fast growth of parallel and distributed computing environments, driven by increasing demand for computing power, encourage a variety of distributed computing platforms emerging in academic and industrial communities, such as grid and cloud computing. 1 A challenging issue in distributed systems is task scheduling (TS). TS is a complex combinatorial optimization problem because the search space increases exponentially with the problem size. The traditional or conventional methods for TS need more time to locate the optimum solution. To get a near optimal solution within a finite duration, heuristics are used instead of exact optimization methods. Braun et al. 2 presented a comparative study on 11 heuristics for the TS problem and evaluated them on different types of heterogeneous environments. These heuristics are intended to minimize only a single objective, the make span of the schedule. Izakian et al. 3 compared six efficient heuristics such as min-min, max-min, min-max, LJFR-SJFR, suffrage, and work queue for scheduling meta-tasks on heterogeneous distributed environment. The effectiveness of these heuristics are estimated by make span and flow time.
To improve the quality of solutions, meta-heuristics have been presented for TS problem in distributed systems. The most popular meta-heuristic algorithms in the literature are Genetic Algorithm (GA), 4 Differential Evolution (DE), 5 and Particle Swarm Optimization (PSO). 6
Discrete particle swarm optimization (DPSO) is a recently developed population-based meta-heuristic algorithm proposed by Kang and He. 7 The performance of the DPSO was tested using a single criterion such as makespan. In this algorithm, the particles are encoded in position vector representation. This encoded scheme doesn't convey the flow of execution of tasks in the processors. The flow is not required, if the algorithm optimizes only the makespan. Because, the make span value cannot be changed, if the order of the tasks assigned within a processor varies. The single criterion is not enough to test the performance of the DPSO algorithm. This problem is addressed by Sarathambekai and Umamaheswari. 8 In Sarathambekai and Umamaheswari, 8 the multiple objectives such as makespan, flow time, and reliability cost are used to evaluate the efficiency of the multi-objective discrete particle swarm optimization (MDPSO) algorithm. Here, the particles are represented in permutation-based method because the metric flow time depends on the order of execution of the tasks in the processors. This article presents the MDPSO algorithm for scheduling of independent tasks in distributed systems.
Most of the existing works in the literature are focused on introducing or improving the control parameters and adaptive controlling of parameter settings in PSO. Population initialization can also affect the convergence speed and quality of the final solution. There is only limited research in this field. Kazimipour et al. 9 investigated the effect of population initialization methods for Evolutionary Algorithms (EA) on Large Scale Global Optimization (LSGO) problems. The well-known existing initialization methods are categorized into five major categories such as Stochastic, Deterministic, Two-step, Hybrid, and Application specific methods. Kazimipour et al. 9 suggested that the random initialization is not a proper choice for population initialization even with very large population size and also recommended to use two step methods to improve the performance of the EA. The two-step methods generate an initial population in the first step and then try to improve them using fitness function in the second step.
The Opposition-Based Learning (OBL) initialization is a two-step method proposed by Tizhoosh. 10 This has been included in some machine learning algorithms like opposition-based reinforcement learning, 11 opposition-based DE, 12 and opposition-based GA. 13 Wang and Liu 14 proposed opposition-based PSO. This method employs OBL and dynamic Cauchy-based mutation to avoid premature convergence in classical PSO. Omran 15 proposed two variants such as improved PSO (iPSO) and improved DE (iDE) based on OBL. The iPSO and iDE algorithms replace the least-fit solution with its opposite solution in each iteration. Wang and Liu 14 stated that OBL enhances the performance of PSO and DE without adding any additional parameters. All these OBL-based initialization are for solving continuous domain optimization problems. Ergezer 16 proposed OBL for discrete problems and presented two different methods of OBL such as Open Path Opposition (OPO) and Circular Opposition (CO) in solving discrete and combinatorial optimization problems. The effectiveness of the opposition techniques 16 are simulated on traveling salesman problem using biogeography-based optimization algorithm. The OPO is suitable for combinatorial problems where the final node is not connected to the first node. In CO technique, the final node of a graph is linked to the first node. The proposed research work has taken OPO 16 in swarm (population) initialization of DPSO, because here the tasks are independent; there are no links between the tasks.
The load balancing is the central component of distributed systems. The purpose of load balancing is to improve the performance of a distributed system through an appropriate distribution of the application load. This is one of the important issues when the demand for computing power increases in the distributed computing environments. Load balancing algorithms can be broadly classified into static and dynamic. The main benefit of static load balancing is that all the overhead of the scheduling process is incurred at compile time, resulting in a more efficient execution time environment compared with dynamic scheduling methods. The proposed work uses a static load balancing algorithm to distribute the task among processors prior to the execution of the algorithm in order to avoid the overhead of the scheduling process during the executing time.
The remainder of the article is organized as follows: The next section describes the problem formulation and the following section presents the proposed DPSO. Experimental results are reported in “Simulation results and analysis” section and the final section concludes the article.
Problem formulation
Task model
A Heterogeneous Computing (HC) system consists of a number of heterogeneous Processor Elements (PEs) connected with a mesh topology. Let T = {T1, T2…, Tn} denote the n number of tasks that are independent of each other to be scheduled on m processors P = {P1, P2…, Pm}. Because of the heterogeneous nature of the processors and disparate nature of the tasks, the expected execution times of a task executing on different processors are different. Every task has an Expected Time to Compute (ETC) on a specific processor. The ETC values are assumed to be known in advance. An ETC matrix is
The TS problem is formulated based on the following assumptions:
All tasks are available at zero time. Processors are always available. The execution time of each task on each processor is known and constant. Preemption is not allowed. Each processor can process only one task at a time. A task cannot be processed on more than one processor at a time. Each processor uses the First-Come, First-Served (FCFS) method for performing the received tasks.
Scheduler model
A static scheduler model in distributed systems is shown in Figure 1. The scheduler manages two queues such as Task Queue (TQ) and Processor Queue (PQ). The scheduling algorithm in the central scheduler is started to work with TQ. TQ contains a set of tasks in a particle. The scheduler is responsible for distributing each task in the TQ to the individual PQ based on the workload of the each processor in the distributed systems. Once the scheduler completes to place all the tasks from the TQ to PQ, the processors will start executing the tasks in their own PQ.
Scheduling model for heterogeneous environment.
Proposed DPSO
The previous research work 8 presented DPSO approach for scheduling problem. This DPSO algorithm performs random initialization. Population initialization plays an important role because it can affect the convergence speed and also the quality of the final solution. To enhance the quality of the solution, this article proposed DPSO algorithm namely Intelligent DPSO (IDPSO), which mainly focuses to fine tune the population initialization with the help of OPO technique and Greedy Load balancing algorithm. Also, the algorithm schedules independent tasks in the distributed systems with subject to the load of each processor.
The following subsections describe in detail the steps of IDPSO algorithm.
Swarm initialization
Swarm initialization consists of two parts: Particle initialization and Processor allocation. Number of tasks (n) and population size (N) are required to generate particles. Here, a particle is encoded in permutation-based method. In the permutation vector, the position of a task represents the sequence the task is scheduled and the corresponding value indicates a task number.
Initially, particle initialization generates random particles called original swarm and then the opponent particles of the original swarm are calculated using OPO technique, 16 which is shown in pseudo code 1. Now, both swarms are merged and the best particles according to fitness function are selected to form the initial swarm for the IDPSO.
Random processor allocation is a simple and straightforward technique. However, the problem in this technique is, some processors are busy with processing, while some processors are idle without any processing. To make better utilization of the processors, the IDPSO performs load balancing using Heuristic-based Greedy static Load balancing algorithm (HGL), which is shown in pseudo code 2.
Place Select the least loaded Place Update load of the
The resource utilization
17
is the performance criterion for the scheduler to perform scheduling with balancing the load. The processor's utilization is defined as the percentage of time that processor
The resource utilization
Figure 2 shows an illustrative example for a swarm initialization, which corresponds to the scheduling of five independent tasks that assigns to the three heterogeneous processors using random allocation and allocation with balancing the load using HGL. The processors utilization is 70% in random allocation and 94% in HGL. Hence, the HGL-based allocation utilizes the resources efficiently.
Processor allocation. (a) Random Processor Allocation (b) Processor Allocation with load balancing.
Particle evaluation
The previous research work 8 presented DPSO approach for scheduling problem to minimize make span, flow time, and reliability cost. The make span and flow time values are in incomparable ranges and the flow time has a higher magnitude order over the make span. To address this problem, the proposed IDPSO algorithm is assessed by using the three evaluation criteria such as make span, 8 mean flow time instead of flow time and reliability cost. 8
The value of mean flow time
18
is used to evaluate flow time. Assume k is the total number of tasks assigned to processor
The three objectives are used to evaluate the performance of the scheduler. The weighted single objective function called Adaptive Weighted Sum
19
is used to calculate the fitness value of each particle in the swarm. This can be estimated using equation (5), where
Update the particle's Personal best (Pbest) and Global best (Gbest) position
The Pbest position of each particle and Gbest position of the swarm can be determined based on the fitness value. The Pbest and the Gbest should be determined before updating the position of the particle.
Updating the particle's velocity and position
The particles in IDPSO update their velocity and position using equations (6) and (10), respectively.
8
The inertia weight (
VND heuristic
Local search technique was not included in MDPSO 8 algorithm. Therefore, the algorithm may get stuck in local optima. The IDPSO algorithm applies VND 7 (Variable NeighborhooD) local search heuristic when there is no appreciable improvement in the Gbest particle for five iterations.
Stopping condition
The above iterative processes on a swarm will continue until a pre-defined maximum number of iterations have been reached or no significant improvement in the fitness value for more than 10 iterations.
The average Relative Percentage Deviation (RPD)
7
is also calculated along with makespan, flow time, and reliability cost for comparing the results of the DPSO and IDPSO algorithms. It is calculated using equation (11), where P is the average result of the proposed algorithm and
Simulation results and analysis
The simulation results are attained using a set of benchmark ETC instances 20 for the distributed heterogeneous systems. The algorithms are coded in Java and executed in Net Beans IDE.
Benchmark instances description
The simulation is performed on the benchmark ETC instances, 20 which are categorized in 12 types of ETC's based on three metrics: task heterogeneity, machine heterogeneity, and consistency. 7
All instances consisting of 512 tasks and 16 processors are classified into 12 different types of ETC matrices according to the above three metrics.
The instances are labeled as g_a_bb_cc as follows:
g means gamma distribution used in generating the matrices. a shows the type of consistency, c means consistent, i means inconsistent, and s means semi-consistent. bb indicates the heterogeneity of the tasks, hi means high, and lo means low. cc represents the heterogeneity of the machines, hi means high, and lo means low.
Parameter setup
The following parameters are initialized for simulating the DPSO and IDPSO algorithms.
Number of iteration is set to 1000. The failure rate for each processor is uniformly distributed
21
in the range from 0.95 × 10−6/h to 1.05 × 10−6/h.
Performance comparisons
The improvement of DPSO 7 over the existing algorithms such as PSO, Re-excited PSO, GA, Simulated annealing, and Tabu search is 7.1%, 7.45%, 4.72%, 8.54%, and 3.35% across all ETC instances, respectively. 7 Therefore, the comparison of the IDPSO has been made only with DPSO. 7
The DPSO algorithm with varying population size.
In Table 1, the first column indicates the ETC instance name, the second, third, and fourth columns indicate the makespan, mean flow time, and reliability cost obtained by DPSO with population size 32 (DPSO-32) and DPSO with population size 50 (DPSO-50), respectively. In Tables 1–3, the values in red color indicate an optimal value obtained by the algorithm.
Table 1 shows that the algorithm DPSO-50 gives optimal solutions in majority of the ETC instances compared with DPSO-32. From Table 1, it is inferred that DPSO-50 is able to give better performance in terms of make span by 8.12%, mean flow time by 5.83%, and reliability cost by 4.06% compared with DPSO-32 across all ETC instances, respectively. Figure 3 shows that the DPSO-50 provides minimum fitness value in most of the ETC instances compared with DPSO-32.
Comparison of fitness value of DPSO-32 and DPSO-50.
Comparison of objective values of IDPSO with DPSO-50.
Comparison of objective values of IDPSO (without VND) with AWS and IDPSO with CWS.
The algorithm IDPSO gives optimal solutions in most of the ETC instances compared with DPSO-50, which are presented in Table 2. The results obtained from Table 2, the IDPSO is able to provide better performance in terms of make span by 31.06%, mean flow time by 15.05%, and reliability cost by 23.98% compared with DPSO-50 across all ETC instances, respectively. The RU of DPSO-50 and IDPSO are calculated using equation (2) and the values are plotted in Figure 4.
Comparison of RU of IDPSO with DPSO.
From Figure 4, the RU of IDPSO is in between 0.9 and 1 in most of the ETC instances. The average percentage of RU of IDPSO is 95.7% and DPSO-50 is 84.2% across all ETC instances. The time complexity of the IDPSO is O(α2). O(α2) is the time complexity of performing VND heuristic in the flow of IDPSO, where α is the number of tasks assigned to the heavily loaded processor. This heuristic is required not to trap the algorithm in local optima only if the allocation of tasks to the processors by random. In IDPSO, the scheduler performs load balancing using HGL algorithm. The improvement of inclusion of VND heuristic in IDPSO is only by 0.63% in makespan, 2.05% in mean flow time, and 3.09% in reliability cost across all ETC instances. This improvement is negligible compared with the complexity of the VND heuristic. Figure 5 shows the RU comparison of IDPSO with VND and without VND. There is not much more difference in IDPSO with VND and without VND. Therefore, the proposed research work removes the VND in the flow of IDPSO algorithm. After removing of VND in IDPSO, the time complexity of the IDPSO is O(Nnm), where N is the swarm size, n is the number of tasks, and m is the number of processors.
Comparison of RU of IDPSO-VND and IDPSO-No VND.
The fitness value of IDPSO is calculated by using AWS method. In this method, the weights of the objective values change generation to generation. The particles in different generation may have the same objective values, but different fitness values. Therefore, the algorithm may take more time to converge and may get stuck in local optima. To address this problem, the proposed work uses constant weights instead of adaptive weights in the fitness function. The Constant Weight Sum (CWS) fitness function is defined in equation (12).
The performance criteria makespan and flow time have equal importance in independent TS problem. Hence, the weights of the makespan and flow time are set to 0.4 and 0.4, respectively. If the scheduler schedules dependent tasks, then the reliability cost is an important criterion because it will consider both the link reliability and processor reliability. In the proposed work, the scheduler schedules the independent tasks only. Therefore, the reliability cost is less important criterion compared with makespan and flow time because it considers only the processor reliability. Hence, the weight of the reliability cost is set to 0.2.
Table 3 shows that the IDPSO-CWS provides optimal solutions in most of the ETC instances for makespan and flow time compared with IDPSO with AWS. The IDPSO-CWS gives less optimal values for reliability cost because of setting low weight age value that is 0.2. The improvement of the IDPSO-CWS compared with IDPSO-AWS of make span increased by 0.65%, mean flow time by 0.63%, and reliability cost by 0.147% across all ETC instances, respectively. Figure 6 shows the comparative performance of RU of IDPSO-AWS and IDPSO-CWS. The average percentage of RU of IDPSO-AWS and IDPSO-CWS is 95.5% across all ETC instances.
Comparison of RU of IDPSO-AWS and IDPSO-CWS.
Average RPD value of IDPSO.
From the results obtained in Table 4, the IDPSO algorithm is significantly suitable for consistent ETC instances.
Conclusion
This article presents the problem of scheduling independent tasks to heterogeneous multiprocessor systems using Intelligent DPSO algorithm. The DPSO is a recently developed population-based meta-heuristic technique for discrete optimization problems. The simulation experimental evaluation confirms the efficiency of the incorporation of OPO and HGL techniques into DPSO for scheduling independent tasks in distributed systems to minimize the tri-objectives such as makespan, flow time, and reliability cost. Also, the obtained results presents the IDPSO algorithm is considerably suitable for consistent ETC instances.
