Abstract
In a Chip Multiprocessor System (CMP) DRAM system is the one, which is shared by the multiple cores of the system. Each core has several threads running on them. Request from a thread, cannot only delay request from other threads but can also destroy the request from other threads as well. The primary reasons can be- • Bank Conflicts • Bus Conflicts • Row Buffer Conflicts As the gap between memory and processor performance continues to widen, it becomes imperative that new techniques, which improve memory performance, be proposed. This research describes the functioning of our high-performance Memory scheduler. The scheduler implements a parallelism aware batch-scheduling algorithm (PAR-BS). This research takes ideas from existing scheduling heuristics and combines them with new optimizations to obtain a scheduler with row based closure policy and smart read to write switching heuristics. The performance of the scheduler will be measured on 3 bases: - • Delay • Energy-Delay Product (EDP) • Performance-Fairness Product (PFP) This research proposes a new approach in designing the Scheduler that makes sure about having the fairness and providing a good quality of service to the running threads while keeping system throughput in mind. We can divide the functioning of our shared DRAM memory scheduler in two ways. First, PAR-BS processes DRAM request in batches and it will not allow the other batch to go until all the threads in previous batch have been satisfied. Second, to optimize system throughput it employs a parallelism aware batch scheduling policy that aims to process requests from a thread in parallel in the DRAM banks, hence reducing memory related stall time experienced by the thread. In addition to the already existing features of the scheduler, I tried to implement several new enhancement features that I think helped me achieve better memory scheduling algorithm than the already existing ones. The evaluation of our DRAM memory scheduler will be based on comparing its performance to the already existing ones like First Ready-First Come First Serve Base (FR-FCFS) and Close Page Policy (CPP) Algorithm. This can be achieved by running all the Schedulers with a set of trace files on the USIMM V3 Simulator [5]. Trace files, in general, are usually logs produced by a program. A polite program will erase them when done with them; sometimes they stick around. Most often, these are ASCII text files which you can examine in a text editor. In our project we are using 10 trace files like comm2, comm1 comm1, fluid swapt comm2 comm2 , stream stream stream stream etc which are embedded inside the simulator just to perform the experiment and measure the performance of our scheduler with default one. USIMM [5] is a simulation infrastructure that models the memory system and interfaces it with a trace based processor model and a memory scheduling algorithm. Our scheduler yields a performance improvement of 5.19% over a baseline of FCFS scheduler. In addition, our scheduler has an improvement of 10.00% in energy-delay product (EDP) and performance-fairness (PFP) improvement of 10.55% compared to the baseline FCFS scheduler. I have also compared the performance of our scheduler with another popular memory scheduling algorithm knows as Close Page Policy scheduler. The results I got after running our scheduler on USIMM simulator [5] are pretty close to the numbers I got from running CPP scheduler, but still sufficient enough to prove it superior than CPP scheduler. Our scheduler yields a performance improvement of 1.21% over CPP scheduler. It also improves energy delay product and performance fairness product by 2.34% and 1.25% respectively as compared to the CPP scheduler.