cs_dispatch_queue.h cs_dispatch_queue.h provides 3 objects that implement a SYCL-like device task management system based on 3 objects:
cs_dispatch_queue: The main object to interact with, allows running tasks on the host and the device with dependency management.cs_task/cs_host_task: Allow interacting with tasks launched by cs_dispatch_queue by waiting for end of a task or extracting an event to synchronize with.cs_event: Represents an event triggered at the end of a task for other tasks to synchronize with. It is used as an input of cs_dispatch_queue methods to express dependencies between tasks.cs_dispatch_queue is meant to be used in a way similar to sycl::queue. It can be used to declare parallel tasks on a device that depend on (ie. wait for) each other. It holds a context that can be accessed as a member named initializer_context which is used to initialize each task created by the cs_dispatch_queue.
initializer_context and its own CUDA stream for asynchronous, parallel execution.Tasks will be waited upon at destruction. This behavior ensures resources held for a task are not destroyed before its termination.
cs_device_queue wraps all of cs_dispatch_queue's algorithms to run them as separated tasks represented by cs_task. An additional single_task method can be used to declare tasks run on the host and represented by cs_host_task.
cs_task holds resources relative to the execution of a task started with cs_dispatch_queue (ie. a CUDA stream in device mode).cs_host_task is a specialization of cs_task that holds additional data necessary for the execution of a host task.cs_event represent a time point in a task to synchronize with.
cudaEvent_t.Both cs_task and cs_host_task provide an operator to generate a cs_event for synchronization or time measurement.
The system can be used as SYCL (without memory handle based dependency management) on top of CUDA: