PREV UP NEXT Tracker 4.44

8.9: The core of tracker: the resampling engine

Any profiler will tell you that tracker spends most of its time emulating the Amiga hardware in `resample.c' (except on the Amiga, of course). Basically, the machine has to compute the actual stream of bytes to output. There are several things to do:

There is also some provision for oversampling: instead of using one sample value for each sample output, we add several sample values distributed in the right area.

All this can take a lot of time, too much for your cpu maybe. What can be done to alleviate the problem ?

As a consequence, new sample styles (like 16 bit samples) should be implemented as new types of commands like DO_NOTHING, PLAY, REPLAY (PLAY16 and REPLAY16 come to mind), even though this will duplicate loads of code.

Recently, the code has changed to allow for more than four channels. This incurs a slight overcost (two more additions), which is actually negligible for oversampled replay, and has been deemed acceptable for simple replay.

The lookup table for converting 0-64 volumes to a linear scale is a cheap way to allow for all sorts of manipulations on the sample volumes at a low cost, and also to use n-bit samples in an almost transparent way, even with n not being an integral multiple of 8.

If your machine is really slow, and uses ulaw, computing a complete lookup table (all 16384 values of it) might speed things up somewhat. Removing the oversampling test altogether might do it also. Then you can unroll the for(i = 0; i < number; i++) loop, not initialising value[LEFT_SIDE] and value[RIGHT_SIDE] to 0, but giving them their initial real value.

If all that fails, you can still find a better compiler, check whether your audio bandwidth is not too limited, downgrade the audio output to a lower acceptable frequency (stuttering and outputting several times the same sample is possible). Lastly, you can still go to assembly language code.

An important optimization may exist if your machine uses dynamic libraries and dynamic linking: on some Unixes (Sparcs for instance), some table lookup and dynamic linking occurs at runtime, which means that function calls may be slightly slower. In that case, coercing your linker to use static linking may be a good idea. If you can, link only `Arch/machine/audio.c' and `resample.c' statically, since this is the place where the speed bottleneck occurs. This way, you will get both the advantages of static linking (speed) and dynamic linking (size).

Maybe the specific audio code for your architecture can be improved. Recently, I've added some optimization to `Arch/common.c'. Checking where tracker was spending time, I discovered that almost all the time was spent computing divisions/multiplications, like for the stereo mixing. Instead of computing:

realLeft = left*primary + right*secondary
realRight = right*primary + left*secondary

computing

sum = (left+right) * (primary+secondary)/2
diff = (left-right) * (primary-secondary)/2
realLeft = sum+diff
realRight = sum-diff

gains two multiplications! Just realize that (primary+secondary)/2 and (primary-secondary)/2 don't change and can be precomputed.

Apart from primitive architectures where multiplication and addition costs are the same, this gains a lot. On a Sparc 5, this makes the difference between being able to use `-over 2 -freq 44' and not!

There is a switch in `Arch/common.c' (NEW_OUTPUT_SAMPLES_AWARE) used for compatibility. In older implementations of tracker, the resample code called output_samples(left_value, right_value), where left and right values were 23 bits signed. The newer version call is output_samples(left-value, right_value, width), with width the number of bits used. Newer ports should define NEW_OUTPUT_SAMPLES_AWARE and use the code of `Arch/common.c' whenever possible.

This gains a lot when, for instance, oversample is used, since this keeps the shifting of data left or right to a minimum. Check whether your implementation uses the new form of output_samples. If it does not, it is a good idea to convert it.

Also, the audio routine should give its output resolution when needed. Right now, tracker doesn't use it, but when I get around to adding 16 bit samples, tracker will routinely convert them down to 8 bit if the audio output is only 8 bits.

As a rule, the Sparc version is the most complete. Try to refer to it in case of doubt.