Particle swarm optimization (PSO), like other population-based meta-heuristics, is intrinsically parallel and can be effectively implemented on Graphics Processing Units (GPUs), which are, in fact, massively parallel processing architectures. In this paper we discuss possible approaches to parallelizing PSO on graphics hardware within the Compute Unified Device Architecture (CUDA (TM)), a GPU programming environment by nVIDIA (TM) which supports the company's latest cards. In particular, two different ways of exploiting GPU parallelism are explored and evaluated. The execution speed of the two parallel algorithms is compared, on functions which are typically used as benchmarks for PSO, with a standard sequential implementation of PSO (SPSO), as well as with recently published results of other parallel implementations. An in-depth study of the computation efficiency of our parallel algorithms is carried out by assessing speed-up and scale-up with respect to SPSO. Also reported are some results about the optimization effectiveness of the parallel implementations with respect to SPSO, in cases when the parallel versions introduce some possibly significant difference with respect to the sequential version.