The Walsh-Hadamard Transform plays a major role in many image and video coding algorithms. Its intensive use makes its acceleration a challenge. The available fast implementations are not efficient across different platforms. Here, a parallel-based implementation of the WHT is proposed for CPU and GPU platforms using the OpenCL standard. OpenCL achieves portability at code level, but its performance suffers when the same code is used for CPUs and GPUs. When properly programmed, multicore and manycore platforms such as CPUs and GPUs can yield fast executions, achieving substantial speedups. Therefore, two implementations are presented. Broadly, OpenCL-GPU implementation runs faster than OpenCL-CPU executed on a multicore multi-CPU system, with speedups that range from 120.87x to 1016.35x.