Abstract
Neural Networks (NNs) have been broadly used for many applications like Speech Recognition, Image Recognition, and character Recognition. However, the investment of larger training time and curse of dimensionality limits the usability of NN. There are alternative training algorithms for NNs that are parallel in nature. GPU computing is suitable to the problem domains, which are parallel in nature. Thus, this project is aimed to reduce training time of NNs by exploiting the parallelism in the training algorithms using GPU. Particle Swarm Optimization (PSO) is a NN training algorithm that involves minimal dependencies, which is suitable for parallelization. In this project, I have investigated the parallelization of NN training using PSO technique on the GPU utilizing CUDA 7.0 toolkit [1]. The NN we use in the study classifies the Fisher’s Iris dataset [2]. This is a 3-class classification problem based on four features (sepal length, sepal width, petal length, petal width) of the flowers as input. I could successfully implemented part of the PSO algorithm on the GPU. Then I compared the GPU implementation of PSO written in CUDA to the CPU implementation written in C++. In the current version of GPU implementation, a very small computation, where we update the all particle’s velocity and position, happens on the GPU and most of the execution takes place on the CPU. As a result, even after increasing the number of particles the GPU implementation is not faster than the CPU implementation. However in future this code can be completely parallelized to leverage the full power of the GPU. In addition, we are looking forward to use a different NN structure with larger particles and model a complete parallel version of it.