hush d1 $ note ((scaleP scalePattern $ off 4 ((+ 2 ).slow 2) $ off 1 (inversion.slow 2) $ off 3 (inversion.slow 3) $ off 1.5 ((+ 2).rev.slow 2) $ generateMelodicSeed ))#s "[pe-gtr:10,midi]" #gain 1 #orbit 0 #midichan 1 inversion = (* (-1)) d3 $ note ((scaleP scalePattern $ (rotR 4) $ (+ slow 8 "x" <~> ((0.25 ~>) generateMelodicSeed)) -- $ slow 4 \n $ generateMelodicSeed ))#s "[pe-gtr:8,midi]" #gain 1.2 #orbit 2 #midichan 3
index > /home/xinniw/Documents/garden/SIMD vectorization.md

Single Instruction Multiple Data

:cc0:

SIMD or "Single Instruction Multiple Data" is an optimization technique that takes advantage of extensions to the instruction set offered by some processors. These extended instructions allow for multiple numbers to be affected by an operation during a single cycle of the processor.

This has been important tool in the optimization of real time algorithms for graphics and audio, among other applications.

:TODO: document the things I learn about SIMD here

techniques for utilizing SIMD

use intrinsics

There are libraries available for manually instructing the compiler to load, shuffle, and unload data from registers. This lets you force the compiler to utilize SIMD for particular operations.

auto-vectorization

Provided that the compiler is set to optimize for speed (eg. the flag -O3 is set for gcc) the compiler will recognize certain patterns and may introduce SIMD vectorization as it compiles. This typically occurs when compiling for loops. The compiler is sensitive to the pattern present in the code and even slight deviations may prevent vectorization.

I have found it useful to explore sub sections of my code using tools like Compiler Explorer to look at the resulting assembly and see what the compiler did. For c++ and c the gcc project has a page on what the compiler is able to vectorize.

Here are some of the components of patterns I am aware of so far:

1. Looping over a statically sized container

This doesn't necessarily result in vectorization on its own but can result in loop unrolling depending on the size of the container. Loop unrolling is significantly faster then having control structure in the assembly.

2. Binary or unary operations with constant values in a loop

If every element of a container operated on by a binary or unary op and the other operand is constant with respect to the loop, the compiler will likely be able to use SIMD to act on multiple array elements at the same time.

3. Binary or unary operations with other vectors in a loop

If every element of a container is operated on by some equation and the other operands are also vectors of the same size that are iterated over, the compiler is often able to leverage SIMD to act on multiple elements of the vectors together with SIMD.

4. No mutating operands in the loop

If the loop mutates the state that it depends on for an equation or some set of operations it is unable to be vectorized as each iteration of the loop depends on the last loop's end state. For some DSP applications (ex. IIR filters), this feel unavoidable.

Opportunities to vectorize DSP code

operation on blocks of samples

If you're algorithm is capable of operating on vectors or blocks of samples, the compiler may be able to vectorize operation over the block of samples using SIMD. (ex. multiplying 2 blocks together, multiplying a block by a constant, ect...)

operations over multiple voices

If the algorithm has multiple parallel components with the same structure, (ex. a synth voice with a unison section, an oscillator bank, a bank of parallel filters) the compiler may be able to use SIMD to complete the iterations of the loop in parallel as each iteration is completely independent of the others.


index > /home/xinniw/Documents/garden/SIMD vectorization.md