Here it comes - my implementation of an MP3 decoder on 8 bit AVR.
The main intention to do this was to check if there's a lower bitrate
alternative to four bit ADPCM providing better quality than the terrible
sounding two or three bit ADPCM variants.
And while considering the feasiblity I was wondering if an 8 bit
platform providing 10.000.000 multiply operations @ 20 MHz clock is
capable of handling an MP3 frame every 70 ms in realtime.
Well - it does.
Of course, you won't get a full-blown MP3 decoder running on an 8 bit
AVR supporting every option and bitrate. But it's well-suited to play
back some speech samples or simple melodies, a 64 MBit serial flash will
provide enough capacity to store more than half an hour of sound.
Unfortunately I didn't have any idea how to keep the RAM requirements
below 4 kBytes, so an ATmega with at least 8 kBytes RAM is needed.
And yes - we all know that there are plenty of ways to do this much
better, quicker, easier and whatever.
No need to mention here, please.
It was fun for me, I like to do it in assembly language.
The current implementation is made to play back a single channel, 16
kBit/s MP3 stream with 8 kHz sampling frequency (stream has to be
encoded according to MPEG 2.5 Layer 3/LSF).
As this is a proof of concept there are some prerequisites and
restrictions to respect.
The decoder can only handle special crafted MP3 files.
It doesn't handle any ID3 tags (neither V1 nor V2.x).
It doesn't support the MP3 bit reservoir, thus requiring a true
forward-sequential format without any back-references.
To focus on the essential parts of the decoding process only long blocks
are handled yet.
If the MP3 stream contains short/mixed blocks they will just be skipped.
Fortunately some older versions of the free LAME MP3 encoder are
offering the needed options to encode the MP3 stream omitting the
I recommend to use the following parameters for a single-channel source
file with 8 kHz sampling frequency:
lame -b 16 --cbr --noshort --nores -q 0 -t <source file> <output file>
This will produce an MP3 file with 16 kBit/s (-b 16) constant bitrate
(--cbr), no short frames (--noshort), no use of the bit reservoir
(--nores), with the best quality (-q 0) and without the Lame tag
embedded (-t). The output file doesn't contain proprietary data. Every
MP3 decoder should be able to play back as long as it supports MPEG 2.5.
I've successfully used the LAME version 3.90.3 for Windows (this version
is a little hard to find and therefore has been included in the project
Apparently the LAME encoder everytime inserts two short blocks at the
beginning of the MP3 stream even if instructed not to do.
Although the decoder will skip this frames silently, they're wasting
memory. You can just delete these frames (simply cut the first 288 bytes
of the MP3 file). To avoid losing the first part of the audio, add 150
ms of silence before (this is recommended anyway even if you don't touch
the short blocks).
I've used AVR Studio 4 for the project.
The assembly code provides three major options of how the MP3 input and
the decoded output can be handled and has been set up to run on an
ATmega 1284P@16MHz (16.384 MHz would exactly match the 8 kHz sampling
frequency). Using the internal 8 MHz clock unfortunately won't be
sufficient, but you could try to set the OSCCAL register to a higher
value to gain some more processing power (didn't try this myself yet).
Before including your own MP3 file into the code it has to be converted
using some bin2inc utility.
1. You can just use the AVR Studio simulator to play with the code, no
need of real AVR hardware.
I've implemented and debugged the decoder this way.
The simulator provides a function to write the output of a parallel port
into a file.
To activate use menu "Debug"->"AVR Simulator Options"->"Stimuli and
logging" and enter an output file name for PORTA. When running the code,
the decoded samples will be written into the output file in a special
To make use of it I've provided two small Gawk scripts for conversion.
Using "gawk -f convout.awk simulator_output_file >samples.txt" will
convert the special format to a text file containing lines with 16 bit
hexadecimal values. You may use the script also to convert other output
data, it was quite helpful for array dumps during troubleshooting.
With "gawk -f mkbin.awk -v BINMODE=3 samples.txt >samples.raw" the
samples will be written as binary PCM data.
I've used Audacity to import the raw data. It should recognize the main
parameters itself (Signed 16 bit PCM, Big Endian, one channel), you just
have to enter the sampling frequency (8000 Hz).
If you want to go deeper into the code (perhaps to add some of the
missing features), my urgent recommendation is to get a source of
libmad/madplay and Helix decoder, get it to run on your PC and use it
2. Play back an MP3 file included directly in the code.
To use this option a high-impedance speaker or headphones have to be
connected to pins OC1A and OC1B via simple RC low pass filter. Try 47
Ohms and 2.2 or 4.7 uF. For suppression of DC connect a 100 uF capacitor
in series. Of course, this is just the simplest design. Higher order low
pass filters will provide better quality. The play back will start after
Keep in mind that using an 8 bit PWM for audio play back only provides a
small range of dynamic, so the use of optimized/normalized full scale
source material is recommended.
3. Play back an MP3 file read from serial SPI flash.
To use this option you need the output circuitry described with option
The PIND.3 input should be applied with logic H or L for controlling
Additionally a 64 MBit serial flash chip is required, I've used Winbond
W25Q64BV (it's available in PDIP package). Connect it to the
controller's SPI interface, using its /SS output as /CS on the flash
chip. Note that the W25Q64BV only supports 3.6 volts max for supply.
If you own a programmer for the serial flash you can use it to store an
MP3 file (starting with address 0).
Alternatively a quick & dirty mini terminal has been added to the code
to upload an MP3 file into the serial flash. For this option an RS232
interface on USART0 has to be adapted.
In any case a two-byte value (low byte first) has to be inserted at the
beginning of the MP3 file, holding the number of MP3 frames (to
calculate just take the length of the file divided by frame size 144).
If you decide to use the built-in upload terminal, connect your PC to
the RS232 interface using 115k2 8n2 and no flow control (I recommend
TeraTerm). Keep PIND.3 at logic H and reset the controller. The built-in
terminal should respond, erasing any current flash memory content. When
prompted, start the upload in BINARY mode (simple file upload, no
transmission protocol like X- or Z-Modem). Once the upload has been
finished, put a logic L on PIND.3, reset the controller and the play
back should start.
There are several additional flags to control if a particular function
is optimized for speed or for a lower code flash memory usage. They
basically switch between the insertion of full expanded macros (most of
all are multiply operations) or just subroutine calls. Depending on the
controller's clock frequency you can save approx. half of the code
memory (size of the tables won't be reduced) and there's still room to
save even a few bytes more because currently unrolled loops remain
unrolled, for instance.
Would be interesting if the decoder part of other compressing voice
codecs could also be implemented on ATmega, perhaps an even more
efficient one such as Wideband AMR/G722.2 (maybe only a subset of codec
modes for a start).