Warp level functions¶
Reduce¶
-
template<typename
data_type, typenamereduce_policy>
classculib::warp::reduce: public reduce_policy¶ A reduction uses a binary combining operator to compute a single aggregate from a list of input elements:
\[y = \oplus \left( \cdots \oplus \left( \oplus \left( x_{1}, x_{2} \right), x_{3} \right), \cdots, x_{n} \right)\]Every thread in the warp uses thread-local object of this class specialization. Implementation details are gathered in
reduce_policy. Ifdata_typeis in list of supported types shuffle instruction is used, otherwise shared memory is needed.- Template Parameters
data_type: The reduction input/output element typereduce_policy: Type dependent reduce implementation.
Public Functions
-
template<typename
binary_operation= binary_op::sum<data_type>>
__device__ data_typeoperator()(data_type val, binary_operation binary_op = {})¶ Performs a warp-wide all-reduce in the calling warp. The output is valid in each lane of the warp. The number of entrant threads must be equal to warpSize.
- Return
Warp-wide result of reduction
- Template Parameters
binary_operation: Binary combining function object type that will be applied in unspecified order. The behaviour is undefined if binary_operation modifies any element.
- Parameters
val: Thread-local value
Scan¶
-
template<typename
data_type, typenamescan_policy>
classculib::warp::scan: public scan_policy¶ Class for parallel scan within warp.
Scan uses a binary combining operator to compute a single aggregate from an array of elements. The number of entrant threads must be equal to warpSize. The default binary combining operator is the sum.
- Template Parameters
data_type: Scanned data typescan_policy: Policy for warp threads data exchange
Public Functions
-
template<typename
binary_operation>
__device__ data_typeinclusive(data_type val, binary_operation binary_op = {})¶ - \[y_{i} = \bigoplus_{j=0}^{i} x_{j}\]
- Return
Value that would be in
lane-idelement of warp array after scan.- Parameters
val: Warp local value
- Template Parameters
binary_operation: Binary combining function object that will be applied in unspecified order. The behaviour is undefined if binary_operation modifies any element.
-
template<typename
binary_operation= binary_op::sum<data_type>>
__device__ data_typeexclusive(data_type val, binary_operation binary_op = {})¶ - \[y_{i} = \bigoplus_{j=0}^{i - 1} x_{j}\]
Usage example:
culib::warp::scan<data_type> scan; const data_type val = scan.exclusive (thread_local_value);
- Return
Value that would be in
lane-idelement of warp array after scan.- Parameters
val: Warp local value
- Template Parameters
binary_operation: Binary combining function object that will be applied in unspecified order. The behaviour is undefined if binary_operation modifies any element.
Compact¶
-
template<typename
data_type>
classculib::warp::compact: public culib::warp::scan<int>¶ Public Functions
-
template<typename
filter_operation>
__device__ intoperator()(data_type val, const filter_operation &filter_op)¶ Note
Return
-1for filtered-out elements- Template Parameters
filter_operation:
- Parameters
val:filter_op:
-
template<typename