Browsing by Author "Kenzel, Michael"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
Item CUDA and Applications to Task-based Programming(The Eurographics Association, 2022) Kerbl, Bernhard; Kenzel, Michael; Winter, Martin; Steinberger, Markus; Hahmann, Stefanie; Patow, Gustavo A.Since its inception, the CUDA programming model has been continuously evolving. Because the CUDA toolkit aims to consistently expose cutting-edge capabilities for general-purpose compute jobs to its users, the added features in each new version reflect the rapid changes that we observe in GPU architectures. Over the years, the changes in hardware, growing scope of built-in functions and libraries, as well as an advancing C++ standard compliance have expanded the design choices when coding for CUDA, and significantly altered the directives to achieve peak performance. In this tutorial, we give a thorough introduction to the CUDA toolkit, demonstrate how a contemporary application can benefit from recently introduced features and how they can be applied to task-based GPU scheduling in particular. For instance, we will provide detailed examples of use cases for independent thread scheduling, cooperative groups, and the CUDA standard library, libcu++, which are certain to become an integral part of clean coding for CUDA in the near future.Item CUDA and Applications to Task-based Programming(The Eurographics Association, 2021) Kenzel, Michael; Kerbl, Bernhard; Winter, Martin; Steinberger, Markus; O'Sullivan, Carol and Schmalstieg, DieterSince its inception, the CUDA programming model has been continuously evolving. Because the CUDA toolkit aims to consistently expose cutting-edge capabilities for general-purpose compute jobs to its users, the added features in each new version reflect the rapid changes that we observe in GPU architectures. Over the years, the changes in hardware, growing scope of built-in functions and libraries, as well as an advancing C++ standard compliance have expanded the design choices when coding for CUDA, and significantly altered the directives to achieve peak performance. In this tutorial, we give a thorough introduction to the CUDA toolkit, demonstrate how a contemporary application can benefit from recently introduced features and how they can be applied to task-based GPU scheduling in particular. For instance, we will provide detailed examples of use cases for independent thread scheduling, cooperative groups, and the CUDA standard library, libcu++, which are certain to become an integral part of clean coding for CUDA in the near future. https://cuda-tutorial.github.io/Item Hierarchical Bucket Queuing for Fine‐Grained Priority Scheduling on the GPU(© 2017 The Eurographics Association and John Wiley & Sons Ltd., 2017) Kerbl, Bernhard; Kenzel, Michael; Schmalstieg, Dieter; Seidel, Hans‐Peter; Steinberger, Markus; Chen, Min and Zhang, Hao (Richard)While the modern graphics processing unit (GPU) offers massive parallel compute power, the ability to influence the scheduling of these immense resources is severely limited. Therefore, the GPU is widely considered to be only suitable as an externally controlled co‐processor for homogeneous workloads which greatly restricts the potential applications of GPU computing. To address this issue, we present a new method to achieve fine‐grained priority scheduling on the GPU: hierarchical bucket queuing. By carefully distributing the workload among multiple queues and efficiently deciding which queue to draw work from next, we enable a variety of scheduling strategies. These strategies include fair‐scheduling, earliest‐deadline‐first scheduling and user‐defined dynamic priority scheduling. In a comparison with a sorting‐based approach, we reveal the advantages of hierarchical bucket queuing over previous work. Finally, we demonstrate the benefits of using priority scheduling in real‐world applications by example of path tracing and foveated micropolygon rendering.While the modern graphics processing unit (GPU) offers massive parallel compute power, the ability to influence the scheduling of these immense resources is severely limited. Therefore, the GPU is widely considered to be only suitable as an externally controlled co‐processor for homogeneous workloads which greatly restricts the potential applications of GPU computing. To address this issue, we present a new method to achieve fine‐grained priority scheduling on the GPU: hierarchical bucket queuing. By carefully distributing the workload among multiple queues and efficiently deciding which queue to draw work from next, we enable a variety of scheduling strategies. These strategies include fair‐scheduling, earliest‐deadline‐first scheduling and user‐defined dynamic priority scheduling.