In this tute we’ll use a technique called blocking to finally fulfill Porky Water’s tall order!
Blocking is a technique where blocks of data are copied from global memory to shared memory, threads work on the data in the much faster shared memory. This greatly reduces the amount of traffic on the global memory bus and allows threads to use the much faster shared memory for most of the calculations.
Blocking with shared memory gives us a great speed up here and easily fulfills Porky’s boss’s request of a 10x speed up. There’s some small changes that could allow the code to run a little quicker but if the code had to run much faster a complete change in algorithm would be far more useful than tweaking this brute force one.