Warp level functions

Reduce

template<typename data_type, typename reduce_policy>
class culib::warp::reduce : public reduce_policy

A reduction uses a binary combining operator to compute a single aggregate from a list of input elements:

\[y = \oplus \left( \cdots \oplus \left( \oplus \left( x_{1}, x_{2} \right), x_{3} \right), \cdots, x_{n} \right)\]

Every thread in the warp uses thread-local object of this class specialization. Implementation details are gathered in reduce_policy. If data_type is in list of supported types shuffle instruction is used, otherwise shared memory is needed.

Template Parameters
  • data_type: The reduction input/output element type

  • reduce_policy: Type dependent reduce implementation.

Public Functions

template<typename binary_operation = binary_op::sum<data_type>>
__device__ data_type operator()(data_type val, binary_operation binary_op = {})

Performs a warp-wide all-reduce in the calling warp. The output is valid in each lane of the warp. The number of entrant threads must be equal to warpSize.

Return

Warp-wide result of reduction

Template Parameters
  • binary_operation: Binary combining function object type that will be applied in unspecified order. The behaviour is undefined if binary_operation modifies any element.

Parameters
  • val: Thread-local value

Scan

template<typename data_type, typename scan_policy>
class culib::warp::scan : public scan_policy

Class for parallel scan within warp.

Scan uses a binary combining operator to compute a single aggregate from an array of elements. The number of entrant threads must be equal to warpSize. The default binary combining operator is the sum.

Template Parameters
  • data_type: Scanned data type

  • scan_policy: Policy for warp threads data exchange

Public Functions

template<typename binary_operation>
__device__ data_type inclusive(data_type val, binary_operation binary_op = {})

\[y_{i} = \bigoplus_{j=0}^{i} x_{j}\]

Return

Value that would be in lane-id element of warp array after scan.

Parameters
  • val: Warp local value

Template Parameters
  • binary_operation: Binary combining function object that will be applied in unspecified order. The behaviour is undefined if binary_operation modifies any element.

template<typename binary_operation = binary_op::sum<data_type>>
__device__ data_type exclusive(data_type val, binary_operation binary_op = {})

\[y_{i} = \bigoplus_{j=0}^{i - 1} x_{j}\]

Usage example:

culib::warp::scan<data_type> scan;
const data_type val = scan.exclusive (thread_local_value);
Return

Value that would be in lane-id element of warp array after scan.

Parameters
  • val: Warp local value

Template Parameters
  • binary_operation: Binary combining function object that will be applied in unspecified order. The behaviour is undefined if binary_operation modifies any element.

Compact

template<typename data_type>
class culib::warp::compact : public culib::warp::scan<int>

Public Functions

template<typename filter_operation>
__device__ int operator()(data_type val, const filter_operation &filter_op)

Note

Return -1 for filtered-out elements

Template Parameters
  • filter_operation:

Parameters
  • val:

  • filter_op:

Utils

template<typename data_type>
__device__ constexpr bool culib::warp::is_shuffle_available()

Check if specified data_type is supported by warp shuffle functions.