mergeWGSL¶
Overview¶
A template that merges together two sorted arrays into a single sorted array.
This version uses block-level loading (for memory coalescing) and circular buffers noted in "Programming Massively Parallel Processors" by Hwu, Kirk and Hajj.
@author Jonathan Olson <jonathan.olson@colorado.edu>
Type mergeWGSLOptions¶
- lengthA: WGSLExpressionU32
- lengthB: WGSLExpressionU32
- compare: ( indexA: WGSLExpressionU32, indexB: WGSLExpressionU32 ) => WGSLExpressionI32
=> {-1, 0, 1} (i32) - greaterThan?: ( ( indexA: WGSLExpressionU32, indexB: WGSLExpressionU32 ) => WGSLExpressionBool ) | null
used (sometimes) instead of compare if provided - lessThanOrEqual?: ( ( indexA: WGSLExpressionU32, indexB: WGSLExpressionU32 ) => WGSLExpressionBool ) | null
- workgroupA: WGSLVariableName
var<workgroup> array<T,sharedMemorySize> - workgroupB: WGSLVariableName
- loadFromA: ( indexA: WGSLExpressionU32 ) => WGSLExpressionT
- loadFromB: ( indexB: WGSLExpressionU32 ) => WGSLExpressionT
- storeOutput: ( indexOutput: WGSLExpressionU32, value: WGSLExpressionT ) => WGSLStatements
TODO: we should provide either storeOutput OR setFromA/setFromB. In one case, we set from our shared memory, TODO: but in the other case, it is a global memory (say that we're sorting objects that are much larger?) TODO: would that ALWAYS have worse memory performance? I mean, we're dealing with "global" indices anyway, so TODO: it isn't a huge lift. TODO: For more clarity, if setFromA/setFromB are provided (AND we don't have storeOutput), we'll use those TODO: to directly move things from global memory to global memory. This WILL require more reads, HOWEVER TODO: it will also enable us to have loadFromX methods return a much smaller object used in shared memory. TODO: It is unclear how much of a performance win this would be, so I haven't implemented it yet. TODO: setFromA, // ( indexOutput, indexA ) => void TODO: setFromB, // ( indexOutput, indexB ) => void - blockOutputSize: number
- sharedMemorySize: number
should be a divisor of blockOutputSize, and ideally a multiple of workgroupSize - atomicConsumed?: boolean
controls whether we use atomics to track consumed_a/consumed_b, OR whether we compute another corank - & WorkgroupSizable
Source Code¶
See the source for mergeWGSL.ts in the alpenglow repository.