tf namespace
taskflow namespace
Classes
- class ChromeObserver
- observer interface based on Chrome tracing format
- class CriticalSection
- class to create a critical region of limited workers to run tasks
- class cublasFlowCapturer
- class to construct a cuBLAS task graph
- class cudaFlow
- class for building a CUDA task dependency graph
- class cudaFlowCapturer
- class for building a CUDA task dependency graph through stream capture
- class cudaFlowCapturerBase
- base class to construct a CUDA task graph through stream capture
- class cudaRoundRobinCapturing
- class to capture the described graph into a native cudaGraph using a greedy round-robin algorithm on a fixed number of streams
- class cudaScopedDevice
- RAII-styled device context switch.
- class cudaScopedPerThreadEvent
- class that provides RAII-styled guard of event acquisition
- class cudaScopedPerThreadStream
- class that provides RAII-styled guard of stream acquisition
- class cudaSequentialCapturing
- class to capture the described graph into a native cudaGraph using a single stream
- class cudaTask
- handle to a node of the internal CUDA graph
- class Executor
- execution interface for running a taskflow graph
- class FlowBuilder
- building methods of a task dependency graph
-
template<typename T>class Future
- class to access the result of task execution
- class ObserverInterface
- The interface class for creating an executor observer.
- class Semaphore
- class to create a semophore object for building a concurrency constraint
- class Subflow
- class to construct a subflow graph from the execution of a dynamic task
- class Task
- handle to a node in a task dependency graph
- class Taskflow
- main entry to create a task dependency graph
- class TaskView
- class to access task information from the observer interface
- class TFProfObserver
- observer interface based on the built-in taskflow profiler format
- class WorkerView
- class to create an immutable view of a worker in an executor
Enums
- enum class TaskType: int { PLACEHOLDER = 0, CUDAFLOW, STATIC, DYNAMIC, CONDITION, MODULE, ASYNC, UNDEFINED }
- enumeration of all task types
- enum class ObserverType: int { TFPROF = 0, CHROME, UNDEFINED }
- enumeration of all observer types
- enum class cudaTaskType: int { EMPTY = 0, HOST, MEMSET, MEMCPY, KERNEL, SUBFLOW, CAPTURE, UNDEFINED }
- enumeration of all cudaTask types
Typedefs
-
using observer_stamp_t = std::
chrono:: time_point<std:: chrono:: steady_clock> - default time point type of observers
- using cudaPerThreadStreamPool = cudaPerThreadDeviceObjectPool<cudaStream_t, cudaStreamCreator, cudaStreamDeleter>
- alias of per-thread stream pool type
- using cudaPerThreadEventPool = cudaPerThreadDeviceObjectPool<cudaEvent_t, cudaEventCreator, cudaEventDeleter>
- alias of per-thread event pool type
Functions
- auto to_string(TaskType type) -> const char*
- convert a task type to a human-readable string
-
auto operator<<(std::
ostream& os, const Task& task) -> std:: ostream& - overload of ostream inserter operator for cudaTask
- auto to_string(ObserverType type) -> const char*
- convert an observer type to a human-readable string
- auto cuda_get_num_devices() -> size_t
- queries the number of available devices
- auto cuda_get_device() -> int
- gets the current device associated with the caller thread
- void cuda_set_device(int id)
- switches to a given device context
- void cuda_get_device_property(int i, cudaDeviceProp& p)
- obtains the device property
- auto cuda_get_device_property(int i) -> cudaDeviceProp
- obtains the device property
-
void cuda_dump_device_property(std::
ostream& os, const cudaDeviceProp& p) - dumps the device property
- auto cuda_get_device_max_threads_per_block(int d) -> size_t
- queries the maximum threads per block on a device
- auto cuda_get_device_max_x_dim_per_block(int d) -> size_t
- queries the maximum x-dimension per block on a device
- auto cuda_get_device_max_y_dim_per_block(int d) -> size_t
- queries the maximum y-dimension per block on a device
- auto cuda_get_device_max_z_dim_per_block(int d) -> size_t
- queries the maximum z-dimension per block on a device
- auto cuda_get_device_max_x_dim_per_grid(int d) -> size_t
- queries the maximum x-dimension per grid on a device
- auto cuda_get_device_max_y_dim_per_grid(int d) -> size_t
- queries the maximum y-dimension per grid on a device
- auto cuda_get_device_max_z_dim_per_grid(int d) -> size_t
- queries the maximum z-dimension per grid on a device
- auto cuda_get_device_max_shm_per_block(int d) -> size_t
- queries the maximum shared memory size in bytes per block on a device
- auto cuda_get_device_warp_size(int d) -> size_t
- queries the warp size on a device
- auto cuda_get_device_compute_capability_major(int d) -> int
- queries the major number of compute capability of a device
- auto cuda_get_device_compute_capability_minor(int d) -> int
- queries the minor number of compute capability of a device
- auto cuda_get_device_unified_addressing(int d) -> bool
- queries if the device supports unified addressing
- auto cuda_get_driver_version() -> int
- queries the latest CUDA version (1000 * major + 10 * minor) supported by the driver
- auto cuda_get_runtime_version() -> int
- queries the CUDA Runtime version (1000 * major + 10 * minor)
- auto cuda_get_free_mem(int d) -> size_t
- queries the free memory (expensive call)
- auto cuda_get_total_mem(int d) -> size_t
- queries the total available memory (expensive call)
-
template<typename T>auto cuda_malloc_device(size_t N, int d) -> T*
- allocates memory on the given device for holding
N
elements of typeT
-
template<typename T>auto cuda_malloc_device(size_t N) -> T*
- allocates memory on the current device associated with the caller
-
template<typename T>auto cuda_malloc_shared(size_t N) -> T*
- allocates shared memory for holding
N
elements of typeT
-
template<typename T>void cuda_free(T* ptr, int d)
- frees memory on the GPU device
-
template<typename T>void cuda_free(T* ptr)
- frees memory on the GPU device
- void cuda_memcpy_async(cudaStream_t stream, void* dst, const void* src, size_t count)
- copies data between host and device asynchronously through a stream
- void cuda_memset_async(cudaStream_t stream, void* devPtr, int value, size_t count)
- initializes or sets GPU memory to the given value byte by byte
- auto cuda_per_thread_stream_pool() -> cudaPerThreadStreamPool&
- acquires the per-thread cuda stream pool
- auto cuda_per_thread_event_pool() -> cudaPerThreadEventPool&
- per-thread cuda event pool
- auto to_string(cudaTaskType type) -> const char* constexpr
- convert a cuda_task type to a human-readable string
-
auto operator<<(std::
ostream& os, const cudaTask& ct) -> std:: ostream& - overload of ostream inserter operator for cudaTask
- auto cuda_default_max_threads_per_block() -> size_t constexpr
- queries the maximum threads allowed per block
- auto cuda_default_threads_per_block(size_t N) -> size_t constexpr
- queries the default number of threads per block in an 1D vector of N elements
- auto version() -> const char* constexpr
- queries the version information in a string format
major.minor.patch
Variables
-
std::
array<TaskType, 7> TASK_TYPES constexpr - array of all task types (used for iterating task types)
-
template<typename C>bool is_static_task_v constexpr
- determines if a callable is a static task
-
template<typename C>bool is_dynamic_task_v constexpr
- determines if a callable is a dynamic task
-
template<typename C>bool is_condition_task_v constexpr
- determines if a callable is a condition task
-
template<typename C>bool is_cudaflow_task_v constexpr
- determines if a callable is a cudaflow task
Function documentation
template<typename T>
T* tf:: cuda_malloc_device(size_t N,
int d)
allocates memory on the given device for holding N
elements of type T
The function calls cudaMalloc
to allocate N*sizeof(T)
bytes of memory on the given device d
and returns a pointer to the starting address of the device memory.
template<typename T>
T* tf:: cuda_malloc_device(size_t N)
allocates memory on the current device associated with the caller
The function calls cuda_malloc_device from the current device associated with the caller.
template<typename T>
T* tf:: cuda_malloc_shared(size_t N)
allocates shared memory for holding N
elements of type T
The function calls cudaMallocManaged
to allocate N*sizeof(T)
bytes of memory and returns a pointer to the starting address of the shared memory.
template<typename T>
void tf:: cuda_free(T* ptr,
int d)
frees memory on the GPU device
Template parameters | |
---|---|
T | pointer type |
Parameters | |
ptr | device pointer to memory to free |
d | device context identifier |
This methods call cudaFree
to free the memory space pointed to by ptr
using the given device context.
template<typename T>
void tf:: cuda_free(T* ptr)
frees memory on the GPU device
Template parameters | |
---|---|
T | pointer type |
Parameters | |
ptr | device pointer to memory to free |
This methods call cudaFree
to free the memory space pointed to by ptr
using the current device context of the caller.
void tf:: cuda_memcpy_async(cudaStream_t stream,
void* dst,
const void* src,
size_t count)
copies data between host and device asynchronously through a stream
Parameters | |
---|---|
stream | stream identifier |
dst | destination memory address |
src | source memory address |
count | size in bytes to copy |
The method calls cudaMemcpyAsync
with the given stream
using cudaMemcpyDefault
to infer the memory space of the source and the destination pointers. The memory areas may not overlap.
void tf:: cuda_memset_async(cudaStream_t stream,
void* devPtr,
int value,
size_t count)
initializes or sets GPU memory to the given value byte by byte
Parameters | |
---|---|
stream | stream identifier |
devPtr | pointer to GPU mempry |
value | value to set for each byte of the specified memory |
count | size in bytes to set |
The method calls cudaMemsetAsync
with the given stream
to fill the first count
bytes of the memory area pointed to by devPtr
with the constant byte value value
.
Variable documentation
template<typename C>
bool tf:: is_static_task_v constexpr
determines if a callable is a static task
A static task is a callable object constructible from std::function<void()>.
template<typename C>
bool tf:: is_dynamic_task_v constexpr
determines if a callable is a dynamic task
A dynamic task is a callable object constructible from std::function<void(Subflow&)>.
template<typename C>
bool tf:: is_condition_task_v constexpr
determines if a callable is a condition task
A condition task is a callable object constructible from std::function<int()>.
template<typename C>
bool tf:: is_cudaflow_task_v constexpr
determines if a callable is a cudaflow task
A cudaFlow task is a callable object constructible from std::function<void(tf::cudaFlow&)> or std::function<void(tf::cudaFlowCapturer&)>.