Skip to content

mergeWGSL

Overview

A template that merges together two sorted arrays into a single sorted array.

This version uses block-level loading (for memory coalescing) and circular buffers noted in "Programming Massively Parallel Processors" by Hwu, Kirk and Hajj.

@author Jonathan Olson <jonathan.olson@colorado.edu>

Type mergeWGSLOptions

import type { mergeWGSLOptions } from 'scenerystack/alpenglow';
  • lengthA: WGSLExpressionU32
  • lengthB: WGSLExpressionU32
  • compare: ( indexA: WGSLExpressionU32, indexB: WGSLExpressionU32 ) => WGSLExpressionI32
    => {-1, 0, 1} (i32)
  • greaterThan?: ( ( indexA: WGSLExpressionU32, indexB: WGSLExpressionU32 ) => WGSLExpressionBool ) | null
    used (sometimes) instead of compare if provided
  • lessThanOrEqual?: ( ( indexA: WGSLExpressionU32, indexB: WGSLExpressionU32 ) => WGSLExpressionBool ) | null
  • workgroupA: WGSLVariableName
    var<workgroup> array<T,sharedMemorySize>
  • workgroupB: WGSLVariableName
  • loadFromA: ( indexA: WGSLExpressionU32 ) => WGSLExpressionT
  • loadFromB: ( indexB: WGSLExpressionU32 ) => WGSLExpressionT
  • storeOutput: ( indexOutput: WGSLExpressionU32, value: WGSLExpressionT ) => WGSLStatements
    TODO: we should provide either storeOutput OR setFromA/setFromB. In one case, we set from our shared memory, TODO: but in the other case, it is a global memory (say that we're sorting objects that are much larger?) TODO: would that ALWAYS have worse memory performance? I mean, we're dealing with "global" indices anyway, so TODO: it isn't a huge lift. TODO: For more clarity, if setFromA/setFromB are provided (AND we don't have storeOutput), we'll use those TODO: to directly move things from global memory to global memory. This WILL require more reads, HOWEVER TODO: it will also enable us to have loadFromX methods return a much smaller object used in shared memory. TODO: It is unclear how much of a performance win this would be, so I haven't implemented it yet. TODO: setFromA, // ( indexOutput, indexA ) => void TODO: setFromB, // ( indexOutput, indexB ) => void
  • blockOutputSize: number
  • sharedMemorySize: number
    should be a divisor of blockOutputSize, and ideally a multiple of workgroupSize
  • atomicConsumed?: boolean
    controls whether we use atomics to track consumed_a/consumed_b, OR whether we compute another corank
  • & WorkgroupSizable

Source Code

See the source for mergeWGSL.ts in the alpenglow repository.