You are here

Q. How does Mid-Sides encoding/decoding actually work?

Q. How does Mid-Sides encoding/decoding actually work?

I’ve recently bought and started using a Zoom H6 recorder, and agree with pretty much everything Tom Flint had to say in his SOS 2013 review. My recording mic rig uses two cabled mics up close, plus the Zoom’s Mid-Sides (M-S) mic module to capture the room sound, and I found a free M-S decoder plug-in on Zoom’s web site. When installed on the track in your DAW, this enables the relative gain of the Mid and Sides mics to be adjusted to control the stereo width. It’s magic and very powerful!

When I do a Mid-Sides recording, I usually record the two mics to separate mono tracks, then clone the Sides track, flip the polarity, opposite-pan the two Sides tracks, and adjust the Mid-Sides levels to taste. However, the Zoom records both the Mid and Sides signals to just one track, and I don’t understand how that can be done! Zoom’s web site also talks about the Sides part of the mic being ‘bidirectional’, but how can you record a bidirectional signal, combine it with a Mid signal, and put all that information on one stereo track? That’s one reason I called it magic — I’d love to know how it actually works!

SOS Forum post

Technical Editor Hugh Robjohns replies: Unsurprisingly, there’s no magic involved! The ‘single track’ which the Zoom uses to record its Mid-Sides mic is, as you have surmised, a standard stereo (or dual-mono) track. Stereo information always requires two tracks, but the stereo information can be conveyed equally accurately in either the conventional Left-Right format, or the Mid-Sides format, and the M-S and L-R formats are completely interchangeable, without any loss.

The conversion or decoding process requires only a simple ‘sum and difference’ matrix. The Mid-Sides conversion you’re familiar with, using three channels with polarity inversion and panning, is just one simple manifestation of that sum and difference matrix, using the facilities available in any mixer. However, it can also be done using a couple of transformers (which was the way Alan Blumlein and Abbey Road’s EMI REDD and TG consoles did it), or some simple DSP programming, which is the way Zoom’s plug-in does it.

The maths involved in the sum and difference matrix is actually very straightforward. So (where L is the Left channel, R the Right, M the Mid and S the Sides):

L = M+S

R = M-S

M = L+R

S = L-R

An M-S matrix based around audio transformers.An M-S matrix based around audio transformers.All that’s needed for this matrix is a ‘box’, whether analogue or digital, hardware or virtual, with two outputs: one derived from the summation of both inputs, and the other from the difference between them. Whichever format is fed to the inputs, the alternative format appears at the outputs. So M-S in gives L-R out, and L-R in gives M-S out. Which is very handy!

If you think about the decoding technique you’ve described using your (virtual) mixer, mixing sums the signals together, so the left mix bus produces the sum of the Mid signal (which is panned centrally to feed both the left and right mix output buses) and the Sides signal from the original Sides channel (Left = M+S).

An M-S matrix based around ICs.An M-S matrix based around ICs.However, to derive the difference signal part of the matrix process we need to employ some algebraic manipulation: M-S is the same as M+(-S) — in other words, the Mid plus an inverted version of the Sides. That’s why you switch in a polarity inversion to the clone Sides channel, which is then panned right — that way, the right mix bus sums the Mid signal with the inverted Sides signal: Right = M+(-S).

Regarding the ‘bidirectional signal’ reference; this is just another term for the figure-of-eight polar pattern. The figure-of-eight mic picks up sound from both the front and back, and is therefore bidirectional, as opposed to omnidirectional, or unidirectional.

And, finally, there is a small caveat to mention in the context of the point I made earlier about being able to convert freely between the M-S and L-R formats. If the conversion is performed with the simple matrix algebra I described above, the final output of a double conversion will be 6dB louder than the input. This is explained in the table.

How Gain Is Added During L-R to M-S to L-R Conversion

L-R to M-S

M-S to L-R

M = L+R

L output = M+S = (L+R)+(L-R) = 2L

S = L-R

R output = M-S = (L+R)-(L-R) = 2R

Thus, multiple conversions (such as are necessary when wanting to adjust the stereo width of a Left-Right signal) can quickly eat into the available headroom. The usual corrective tweak is simply to attenuate the matrix process outputs by 3dB, so that a full L-R / M-S / L-R conversion chain ends up delivering the same levels at the outputs as were fed to the inputs. In other words, a typical, practical sum and difference matrix’s functions are described algebraically as:

Left = (M+S)-3dB

Right = (M-S)-3dB

Mid = (L+R)-3dB

Sides = (L-R)-3dB

In case it helps you get your head around all this, I’ve included a few diagrams. The first shows how this matrix can be configured using a mixer. The second and third show simple diagrams for transformer-based and IC-based M-S matrices.