GPU – High Performance Computing Applications for Science and Engineering

GPU APPLICATION ANALYSIS AND OPTIMIZATION

Current low-cost GPU cards combine excellent performance with high energetic efficiency, offering at the same time thousands of computational cores and high memory access throughput (although, regrettably, not low latency). Hence they are being increasingly used by big-data scientific applications to speed up computational-intensive algorithms. In particular, GPUs would look like ideal candidates to tackle many problems where each thread can independently solve a piece of the problem. Then, we look for situations where we can find plenty of thread- and memory-level parallelism to hide the latencies of computation and memory-access operations. From here, we want to show the clear potential for an efficient GPU implementation.

These are some of our research topics where we have presented new resource management and algorithmic solutions using GPUs as computational platforms:

Pedestrian detection algorithms for autonomous driving optimizations for GPU platforms

Slanted Stixels: A way to represent steep streets. International Journal of Computer Vision. Vol. 127. Number 11-12. Pages: 1643-1658 (2019). Springer. Daniel Hernandez-Juarez, Lukas Schneider, Pau Cebrian, Antonio Espinosa, David Vazquez, Antonio M Lopez, Uwe Franke, Marc Pollefeys, Juan C Moure

Slanted Stixels: Representing San Francisco’s Steepest Streets. 28th British Machine Vision Conference BMVC, Imperial College London, 4th – 7th September 2017. Daniel Hernandez-Juarez, Lukas Schneider, Antonio Espinosa, David Vazquez, Antonio López, Uwe Franke, Marc Pollefeys and Juan Carlos Moure. BEST INDUSTRIAL PAPER AWARD.

GPU-Accelerated Real-Time Stixel Computation. IEEE Winter Conference on Applications of Computer Vision 2017. Santa Rosa, California. D. Hernandez-Juarez, A. Espinosa, J. C. Moure, D. Vazquez, A. M. Lopez.

Embedded real-time stereo estimation via Semi-Global Matching on the GPU. International Conference on Computational Science. ICCS 2016. D. Hernandez-Juarez, A. Chacon, A. Espinosa, D. Vazquez, J. C. Moure, and A. M. Lopez.

GPU-based pedestrian detection for autonomous driving. International Conference on Computational Science. ICCS 2016. V. Campmany, S. Silva, A. Espinosa, J.C. Moure, D. Vazquez, and A. M. Lopez

GPU-Based Pedestrian Detection for Autonomous Driving. GPU Technology Conference 2016. San Jose, California. Best Poster Award. V. Campmany, S. Silva, A. Espinosa, J.C. Moure, D. Vazquez, and A. M. Lopez

Real-time 3D Reconstruction for Autonomous Driving through Semi-Global Matching. GPU Technology Conference 2016. San Jose, California. D. Hernandez-Juarez, A. Chacon, A. Espinosa, D. Vazquez, J. C. Moure, and A. M. Lopez.

Genomic sequence read mapping applications performance optimization on multi-core and GPU

From Wet‐Lab to Variations: Concordance and Speed of Bioinformatics Pipelines for Whole Genome and Whole Exome Sequencing. Human Mutation 37 (12), 1263-1271. S. Laurie, M. Fernandez‐Callejo, S. Marco‐Sola, J‐R. Trotta, J. Camps, A. Chacón, A. Espinosa, M. Gut, Ivo Gut, S. Heath, S. Beltran.

GEM3: CPU-GPU Heterogeneous DNA Sequence Alignment for Scalable Read Sizes (video). Alejandro Chacon. GPU Technology Conference 2016. San Jose. California.

Boosting the FM-index on the GPU: effective techniques to mitigate random memory access. IEEE/ACM Transactions on Computational Biology and Bioinformatics. Vol.12 , num. 5. pp.1048-1059. 2015. A. Chacon, S. M. Sola, A. Espinosa, P. Ribeca, J.C. Moure

Thread-Cooperative, bit-parallel computation of Levenshtein distance on GPU. Proceedings of the 28th ACM International Conference on Supercomputing, pp. 103-112. (ICS 2014) Alejandro Chacón, Juan Carlos Moure, Antonio Espinosa, Santiago Sola, Paolo Ribeca

FM-Index on GPU: a collaborative scheme to reduce memory footprint. IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA 2014), pp. 1-9. 2014. Alejandro Chacon, S. M. Sola, A. Espinosa, P. Ribeca, J.C. Moure

n-step FM-Index for Faster Pattern Matching. Procedia Computer Science, vol. 18, pp. 70-79, (ICCS 2013). Alejandro Chacón, Juan Carlos Moure, Antonio Espinosa, Porfidio Hernández.

Performance engineering of combinatorial optimization algorithms

Coalition structure generation problems: optimization and parallelization of the IDP algorithm in multicore systems. Concurrency and Computation: Practice and Experience. Wiley. Volume 29, Issue 5
10 March 2017. F. Cruz, A. Espinosa, J. C. Moure, J. Cerquides, J. A. Rodriguez-Aguilar, K. Svensson, Sarvapali D Ramchurn

Paving the way for large-scale combinatorial auctions. International Conference on Autonomous Agents and Multiagents Systems. AAMAS 2015. F. Cruz, JC Moure, A Espinosa, J Cerquides, JA Rodriguez.

Parallelisation and application of AD3 as a method for solving large scale combinatorial auctions. 10th international Federated Conference on Distributed Computing Techniques. DisCoTec 2015. F. Cruz, JC Moure, A Espinosa, J Cerquides, JA Rodriguez.

Coalition Structure Generation Problems: optimization and parallelization of the IDP algorithm. 6th International workshop on Optimization in Multi-agent systems. OPTMAS 2014. F. Cruz, JC Moure, A Espinosa, J Cerquides, JA Rodriguez, S. Ramchurn.

Image coding in GPU architectures

GPU Implementation of Bitplane Coding with Parallel Coefficient Processing of High Performance Image Compression. IEEE Transactions on Parallel and Distributed Systems. Issue 8 • Aug. 1 2017. Page(s):2272 – 2284. P. Enfedaque. F. Auli-Llinas, J. C. Moure.

Bitplane Image Coding with parallel coefficient processing. IEEE Transactions on Image Processing, vol. 25, pp. 209-219. 2016. Pablo Enfedaque, Francesc Auli-Llinas, and Juan C. Moure.

Beyond Standards: A New GPU-Aware Image Coding System (video). Pablo Enfedaque. GPU Technology Conference 2016. San Jose, California.

Implementation of the DWT in a GPU through a Register-based Strategy. IEEE Transactions on Parallel and Distributed Systems, vol. 26, pp. 3394-3406. 2014. Pablo Enfedaque, Francesc Auli-Llinas, and Juan C. Moure.

Strategy of microscopic parallelism for bitplane image coding. Data Compression Conference (DCC 2015). F Auli-Llinas, P Enfedaque, JC Moure, I Blanes, V Sanchez.

Strategies of SIMD Computing for Image Coding in GPU. IEEE 2nd International Conference on High Performance Computing (HiPC 2015). Pablo Enfedaque, Francesc Auli-Llinas, and Juan C. Moure.

Scheduling and resource management of scientific workflows

Scheduling and resource management of scientific workflows in hybrid GPGPU environments. Jordi Delgado. Phd. Thesis. 2015.

Improving the Execution Performance of FreeSurfer. Neuroinformatics. Vol. 12(3). pp. 413-421. 2014. J Delgado, Juan C Moure, Y Vives-Gilabert, M Delfino, A Espinosa, B Gómez-Ansón

Design, implementation and optimization of cross-correlation in Satellite Navigation Systems applications

GNSS-R Cross Correlation: design, implementation and optimization. Carlos Calvin. Bachelor Thesis, 2014

Constrain Satisfaction Problem (CSP) parallelization using Arc-Consistency

Parallelization of the Constraint Satisfaction Problem using Arc Consistency. Jordi Alcaraz. Bachelor Thesis. 2015. Special Price of Best Bachelor Thesis of Computer Science Degree 2015

—

We are an NVIDIA GPU Research Center. Thanks to NVIDIA for their support of our research.