Skip to content

bitPackRadixAccessWGSL

Overview

Accesses a count from within a bit-packed histogram. This is used for memory-efficient in-workgroup-memory sorting.

Main documentation for the bit-packing pattern and rationale:

Naively, we can store counts in a vec4 (similar to https://betterprogramming.pub/memory-bandwidth-optimized-parallel-radix-sort-in-metal-for-apple-m1-and-beyond-4f4590cfd5d3). However, we're using a u32 for each count, and it's taking up a lot of workgroup memory to store these counts (16 bytes per thread). If our count only takes N bits, we can store up to floor( 32 / N ) counts in a single u32, and with a vector size M (M=1 is u32, M=2 is vec2u, M=3 is vec3u, M=4 is vec4u), we can thus store up to M * floor( 32 / N ) counts efficiently. NOTE that it might be possible to pack things even tighter, but it would involve a single count being spread across different vector components, so we skip that.

E.g. if we have a simple 2-bit sort with workgroupSize=256 (8 bit counts), we can pack all of these into a single u32, e.g. count0 | ( count1 << 8 ) | ( count2 << 16 ) | ( count3 << 24 )

E.g. if we have a 3-bit sort (8 bins) with a possible count of 1024 (10 bit counts), we'll have: vec3( count0 | ( count1 << 10 ) | ( count2 << 20 ), count3 | ( count4 << 10 ) | ( count5 << 20 ), count6 | ( count7 << 10 ) )

TODO: We could actually have the bitVector workgroup variable be shorter(!) since a lot of the time we don't need TODO: that much memory. ACTUALLY this depends on... the length? Likely unless we know our data is shorter, this is TODO: NOT the case

@author Jonathan Olson <jonathan.olson@colorado.edu>

Type bitPackRadixAccessWGSLOptions

import type { bitPackRadixAccessWGSLOptions } from 'scenerystack/alpenglow';
  • bits: WGSLExpressionU32
    u32 name
  • bitVector: WGSLVariableName
    (u32/vec2u/vec3u/vec4u) name
  • bitsPerInnerPass: number
    TODO: once this is working, form it into a larger object of settings for Radix handling e.g. 2 for a two-bit sort
  • bitVectorSize: number
    (½/¾) for (u32/vec2u/vec3u/vec4u) e.g. 4 for a vec4u
  • maxCount: number
    the maximum count in the histogram

Source Code

See the source for bitPackRadixAccessWGSL.ts in the alpenglow repository.