mergeWGSL¶

Under Construction

This documentation is auto-generated, and is a work in progress. Please see the source code at https://github.com/phetsims/alpenglow/blob/main/js/webgpu/wgsl/gpu/mergeWGSL.ts for the most up-to-date information.

Overview¶

A template that merges together two sorted arrays into a single sorted array.

This version uses block-level loading (for memory coalescing) and circular buffers noted in "Programming Massively Parallel Processors" by Hwu, Kirk and Hajj.

@author Jonathan Olson <jonathan.olson@colorado.edu>

Type mergeWGSLOptions¶

import type { mergeWGSLOptions } from 'scenerystack/alpenglow';

lengthA: WGSLExpressionU32
lengthB: WGSLExpressionU32
compare: ( indexA: WGSLExpressionU32, indexB: WGSLExpressionU32 ) => WGSLExpressionI32
=> {-1, 0, 1} (i32)
greaterThan?: ( ( indexA: WGSLExpressionU32, indexB: WGSLExpressionU32 ) => WGSLExpressionBool ) | null
used (sometimes) instead of compare if provided
lessThanOrEqual?: ( ( indexA: WGSLExpressionU32, indexB: WGSLExpressionU32 ) => WGSLExpressionBool ) | null
workgroupA: WGSLVariableName
var<workgroup> array<T,sharedMemorySize>
workgroupB: WGSLVariableName
loadFromA: ( indexA: WGSLExpressionU32 ) => WGSLExpressionT
loadFromB: ( indexB: WGSLExpressionU32 ) => WGSLExpressionT
storeOutput: ( indexOutput: WGSLExpressionU32, value: WGSLExpressionT ) => WGSLStatements
TODO: we should provide either storeOutput OR setFromA/setFromB. In one case, we set from our shared memory, TODO: but in the other case, it is a global memory (say that we're sorting objects that are much larger?) TODO: would that ALWAYS have worse memory performance? I mean, we're dealing with "global" indices anyway, so TODO: it isn't a huge lift. TODO: For more clarity, if setFromA/setFromB are provided (AND we don't have storeOutput), we'll use those TODO: to directly move things from global memory to global memory. This WILL require more reads, HOWEVER TODO: it will also enable us to have loadFromX methods return a much smaller object used in shared memory. TODO: It is unclear how much of a performance win this would be, so I haven't implemented it yet. TODO: setFromA, // ( indexOutput, indexA ) => void TODO: setFromB, // ( indexOutput, indexB ) => void
blockOutputSize: number
sharedMemorySize: number
should be a divisor of blockOutputSize, and ideally a multiple of workgroupSize
atomicConsumed?: boolean
controls whether we use atomics to track consumed_a/consumed_b, OR whether we compute another corank
& WorkgroupSizable

Source Code¶

See the source for mergeWGSL.ts in the alpenglow repository.