Simulating speakers & cabinets – part I

In this section, we’ll talk about ‘impulse responses’ as the hottest shit on the market. But we’ll also get our feet wet, wading through Physical Modelling Land, trying to understand what all this blahblah has got to do with recreating guitar tones. Roll up your sleeves…

We all know how important a good speaker and cabinet is for a guitar or bass setup. If you’d just take an amp and run its output through hi-fi speakers or directly into a mixing desk (via attenuation boxes and such), it would just sound plain horrible. There are numerous manufacturers that rule the market for guitar & bass speakers for decades, thereby being a part of rock ‘n roll history. This is of course directly related to people crafting the ‘right’ enclosures for the ‘right’ speakers, to get played by the ‘right’ people. Some of these great-for-all-times designs came to life through happy accidents, like the famous 4×12″ cabinet. Legend says it was planned as an 8×12″ for Pete Townsend but turned out to be a bit hefty for the crew to handle. And so the 4×12 success began…

So when it comes to recreating these legendary tones with digital technology, developers do mostly the following to get started:

Feed a bunch of test tones through an actual guitar/bass amp (if you want the amp character involved) or through a PA/HiFi amp (if you just want the cabinet response). Test tones can be very short transients (‘diracs’), sine sweeps or random noise. This can vary from one situation to another.
Measure the frequency response of a popular speaker/enclosure combination using a microphone with ruler-flat frequency response. This captures mostly the cabinet sound.
Or measure the response using a typical guitar/bass suited microphone, like for instance an SM57, an e609 or an RE20, whatever you’d prefer for a ‘real’ recording of an artist performance.
Perhaps experiment with different micing positions, angles, mic switch options, directional patterns etc.
Apply steps 1 through 4 for the direct position as well as the recording room.

Once any data is gathered, a dsp developer has to think what to do at this point in order to achieve the same sound (or almost).

With analog gear (such as direct boxes, speaker loads and such), a relatively simple circuit is used which tries to mimic the typical frequency response curve of a cabinet. Due to manufacturing costs and parts size, this is often just a coarse approximation and usually achieved by running the signal through cascaded filter networks (sort of like an EQ). Certainly, the output is not an exact replica, but it mostly ‘sounds alike’ and at least hopefully manages to cut off all the nasty high frequencies that we don’t need for, say, dirtorted guitars (e.g. everything above 5k). Funny thing is, such analog speaker sims often have a very distinctive sound character, in a way that some of them still find their use and a very sought after among a lot of artists.

In the digital world, you have at least two main technologies at hand to create your ‘tone clone’:

Treating the measurement data as ‘impulse responses‘ (IRs) and applying a technique called convolution and de-convolution to the audio. This maps any input sample with the IR sample, so that the resulting output takes on the tonal shape of the measured data. The process of convolution is widely used among guitar modelling manufacturers. A big advantage is that the technology doesn’t care about the actual data; any speaker/cabinet can be treated that way. Hence, there’s hundereds or thousands of impulse response files available worldwide, even for free. Once you’ve gathered or set up an IR library, you’ve got a large variety of cabinet tones at hand. If measured in a proper way, IRs can sound dramatically realistic. However, there’s a problem: while the frequency domain (-> tone) and time domain (-> response over time, like reverb in an enclosure) can be captured pretty well, a typical signal chain associated with guitar or bass is full of non-linearities and feed-back partials: When you play a tube amp, the output stage interacts with the output transformer. The tranny itself interacts with the speaker. The speaker, being an inductive load like the tranny, interacts with the enclosure and feeds back upon the output transformer. This in turn acts as a load for the output tubes that, in turn, change their power consumption from the supply rails. This also has an effect on the entire amp circuit, feeding back on the whole signal and so forth… The nonlinearities are everywhere, in the tubes and all other devices, the transformer and the speaker.
Cutting this short: convolution can certainly capture the tone, but not necessarily the feel of a thing.
Do something called physical modelling. This technology isn’t found very often in the wild, but is quite powerful and versatile. Physical modelling here means, you’d describe the speaker and enclosure as discrete parts to model, but also concentrate on the interction between these parts (and even beyond). You’d have to answer questions like: in what way does the speaker transform electrical energy into sound, how does the spectrum look like, how does it change at different levels, how does it behave in an open baffle or change when you mount the speaker into an enclosure, how does the sound break up when you stress the system…? The enclosure itself can be treated as an echoic space, similar to a reverberating room, just smaller in size. One certainly has to take reflections and resonances into account, bass ports/vents, open backs etc. The whole model of such a speaker / enclosure combination is really a mighty thing. One could literally put different speakers into different enclosures, even combinations that might not be available in the real world. So you can be quite inventive. Cool thing about the model is, by design, it manages to capture the tiny details that come up in such a complex and chaotic system like a guitar & bass signal path, especially the nonlinearities and feedback involved. As pointed out above, these details get lost through common IR technology. However, for practical reasons and mainly computational complexity, one has to make a lot of trade-offs between granularity and sonic outcome. A problem is, a physical model can easily take on a life of its own, sounding really personal then. But perhaps that isn’t so much of a problem, or… is it even desirable? If people learned to love dead-simple analog RC networks called ‘speaker simulators’ in their own way of interpreting things, chances are some might favour the liveliness and uniqueness of a physical-modelling approach.

It was quite early in the technical design phase of VANDAL, that this decision was made: we take the second approach, we definetly want to do physical modelling here.

Easier said than done. It turned out to be a hard road, as we’ll see (and hear). Even though the main DSP model could be setup quite fast, the actual data we fed it with was something like hit & miss at first.

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Vandal Amps