Here it comes - my implementation of an MP3 decoder on 8 bit AVR. The main intention to do this was to check if there's a lower bitrate alternative to four bit ADPCM providing better quality than the terrible sounding two or three bit ADPCM variants. And while considering the feasiblity I was wondering if an 8 bit platform providing 10.000.000 multiply operations @ 20 MHz clock is capable of handling an MP3 frame every 70 ms in realtime. Well - it does. Of course, you won't get a full-blown MP3 decoder running on an 8 bit AVR supporting every option and bitrate. But it's well-suited to play back some speech samples or simple melodies, a 64 MBit serial flash will provide enough capacity to store more than half an hour of sound. Unfortunately I didn't have any idea how to keep the RAM requirements below 4 kBytes, so an ATmega with at least 8 kBytes RAM is needed. And yes - we all know that there are plenty of ways to do this much better, quicker, easier and whatever. No need to mention here, please. It was fun for me, I like to do it in assembly language. The current implementation is made to play back a single channel, 16 kBit/s MP3 stream with 8 kHz sampling frequency (stream has to be encoded according to MPEG 2.5 Layer 3/LSF). As this is a proof of concept there are some prerequisites and restrictions to respect. The decoder can only handle special crafted MP3 files. It doesn't handle any ID3 tags (neither V1 nor V2.x). It doesn't support the MP3 bit reservoir, thus requiring a true forward-sequential format without any back-references. To focus on the essential parts of the decoding process only long blocks are handled yet. If the MP3 stream contains short/mixed blocks they will just be skipped. Fortunately some older versions of the free LAME MP3 encoder are offering the needed options to encode the MP3 stream omitting the unsupported features. I recommend to use the following parameters for a single-channel source file with 8 kHz sampling frequency: lame -b 16 --cbr --noshort --nores -q 0 -t <source file> <output file> This will produce an MP3 file with 16 kBit/s (-b 16) constant bitrate (--cbr), no short frames (--noshort), no use of the bit reservoir (--nores), with the best quality (-q 0) and without the Lame tag embedded (-t). The output file doesn't contain proprietary data. Every MP3 decoder should be able to play back as long as it supports MPEG 2.5. I've successfully used the LAME version 3.90.3 for Windows (this version is a little hard to find and therefore has been included in the project folder). Apparently the LAME encoder everytime inserts two short blocks at the beginning of the MP3 stream even if instructed not to do. Although the decoder will skip this frames silently, they're wasting memory. You can just delete these frames (simply cut the first 288 bytes of the MP3 file). To avoid losing the first part of the audio, add 150 ms of silence before (this is recommended anyway even if you don't touch the short blocks). I've used AVR Studio 4 for the project. The assembly code provides three major options of how the MP3 input and the decoded output can be handled and has been set up to run on an ATmega 1284P@16MHz (16.384 MHz would exactly match the 8 kHz sampling frequency). Using the internal 8 MHz clock unfortunately won't be sufficient, but you could try to set the OSCCAL register to a higher value to gain some more processing power (didn't try this myself yet). Before including your own MP3 file into the code it has to be converted using some bin2inc utility. 1. You can just use the AVR Studio simulator to play with the code, no need of real AVR hardware. I've implemented and debugged the decoder this way. The simulator provides a function to write the output of a parallel port into a file. To activate use menu "Debug"->"AVR Simulator Options"->"Stimuli and logging" and enter an output file name for PORTA. When running the code, the decoded samples will be written into the output file in a special manner. To make use of it I've provided two small Gawk scripts for conversion. Using "gawk -f convout.awk simulator_output_file >samples.txt" will convert the special format to a text file containing lines with 16 bit hexadecimal values. You may use the script also to convert other output data, it was quite helpful for array dumps during troubleshooting. With "gawk -f mkbin.awk -v BINMODE=3 samples.txt >samples.raw" the samples will be written as binary PCM data. I've used Audacity to import the raw data. It should recognize the main parameters itself (Signed 16 bit PCM, Big Endian, one channel), you just have to enter the sampling frequency (8000 Hz). If you want to go deeper into the code (perhaps to add some of the missing features), my urgent recommendation is to get a source of libmad/madplay and Helix decoder, get it to run on your PC and use it for reference. 2. Play back an MP3 file included directly in the code. To use this option a high-impedance speaker or headphones have to be connected to pins OC1A and OC1B via simple RC low pass filter. Try 47 Ohms and 2.2 or 4.7 uF. For suppression of DC connect a 100 uF capacitor in series. Of course, this is just the simplest design. Higher order low pass filters will provide better quality. The play back will start after reset. Keep in mind that using an 8 bit PWM for audio play back only provides a small range of dynamic, so the use of optimized/normalized full scale source material is recommended. 3. Play back an MP3 file read from serial SPI flash. To use this option you need the output circuitry described with option 2. The PIND.3 input should be applied with logic H or L for controlling purposes. Additionally a 64 MBit serial flash chip is required, I've used Winbond W25Q64BV (it's available in PDIP package). Connect it to the controller's SPI interface, using its /SS output as /CS on the flash chip. Note that the W25Q64BV only supports 3.6 volts max for supply. If you own a programmer for the serial flash you can use it to store an MP3 file (starting with address 0). Alternatively a quick & dirty mini terminal has been added to the code to upload an MP3 file into the serial flash. For this option an RS232 interface on USART0 has to be adapted. In any case a two-byte value (low byte first) has to be inserted at the beginning of the MP3 file, holding the number of MP3 frames (to calculate just take the length of the file divided by frame size 144). If you decide to use the built-in upload terminal, connect your PC to the RS232 interface using 115k2 8n2 and no flow control (I recommend TeraTerm). Keep PIND.3 at logic H and reset the controller. The built-in terminal should respond, erasing any current flash memory content. When prompted, start the upload in BINARY mode (simple file upload, no transmission protocol like X- or Z-Modem). Once the upload has been finished, put a logic L on PIND.3, reset the controller and the play back should start. There are several additional flags to control if a particular function is optimized for speed or for a lower code flash memory usage. They basically switch between the insertion of full expanded macros (most of all are multiply operations) or just subroutine calls. Depending on the controller's clock frequency you can save approx. half of the code memory (size of the tables won't be reduced) and there's still room to save even a few bytes more because currently unrolled loops remain unrolled, for instance. Would be interesting if the decoder part of other compressing voice codecs could also be implemented on ATmega, perhaps an even more efficient one such as Wideband AMR/G722.2 (maybe only a subset of codec modes for a start). Any volunteers?
Please log in before posting. Registration is free and takes only a minute.
Existing account
Do you have a Google/GoogleMail account? No registration required!
Log in with Google account
Log in with Google account
No account? Register here.