hush
d1
$ note
((scaleP scalePattern
$ off 4 ((+ 2 ).slow 2)
$ off 1 (inversion.slow 2)
$ off 3 (inversion.slow 3)
$ off 1.5 ((+ 2).rev.slow 2)
$ generateMelodicSeed
))#s "[pe-gtr:10,midi]" #gain 1 #orbit 0 #midichan 1
inversion = (* (-1))
d3
$ note
((scaleP scalePattern
$ (rotR 4)
$ (+ slow 8 "x" <~> ((0.25 ~>) generateMelodicSeed))
-- $ slow 4 \n
$ generateMelodicSeed
))#s "[pe-gtr:8,midi]" #gain 1.2 #orbit 2 #midichan 3
Single Instruction Multiple Data
:cc0:
SIMD or "Single Instruction Multiple Data" is an optimization technique that takes advantage of extensions to the instruction set offered by some processors. These extended instructions allow for multiple numbers to be affected by an operation during a single cycle of the processor.
This has been important tool in the optimization of real time algorithms for graphics and audio, among other applications.
:TODO: document the things I learn about SIMD here
techniques for utilizing SIMD
use intrinsics
There are libraries available for manually instructing the compiler to load, shuffle, and unload data from registers. This lets you force the compiler to utilize SIMD for particular operations.
auto-vectorization
Provided that the compiler is set to optimize for speed (eg. the flag
-O3
is set for gcc
) the compiler will
recognize certain patterns and may introduce SIMD vectorization as it
compiles. This typically occurs when compiling for loops. The compiler
is sensitive to the pattern present in the code and even slight
deviations may prevent vectorization.
I have found it useful to explore sub sections of my code using tools
like Compiler Explorer to look at the
resulting assembly and see what the compiler did. For c++
and c
the gcc
project has a page
on what the compiler is able to vectorize.
Here are some of the components of patterns I am aware of so far:
1. Looping over a statically sized container
This doesn't necessarily result in vectorization on its own but can result in loop unrolling depending on the size of the container. Loop unrolling is significantly faster then having control structure in the assembly.
2. Binary or unary operations with constant values in a loop
If every element of a container operated on by a binary or unary op and the other operand is constant with respect to the loop, the compiler will likely be able to use SIMD to act on multiple array elements at the same time.
3. Binary or unary operations with other vectors in a loop
If every element of a container is operated on by some equation and the other operands are also vectors of the same size that are iterated over, the compiler is often able to leverage SIMD to act on multiple elements of the vectors together with SIMD.
4. No mutating operands in the loop
If the loop mutates the state that it depends on for an equation or some set of operations it is unable to be vectorized as each iteration of the loop depends on the last loop's end state. For some DSP applications (ex. IIR filters), this feel unavoidable.
Opportunities to vectorize DSP code
operation on blocks of samples
If you're algorithm is capable of operating on vectors or blocks of samples, the compiler may be able to vectorize operation over the block of samples using SIMD. (ex. multiplying 2 blocks together, multiplying a block by a constant, ect...)
operations over multiple voices
If the algorithm has multiple parallel components with the same structure, (ex. a synth voice with a unison section, an oscillator bank, a bank of parallel filters) the compiler may be able to use SIMD to complete the iterations of the loop in parallel as each iteration is completely independent of the others.