This page contains MatLab functions, m-files, which do Huffman coding and
arithmetic coding of integer (symbol) sequences. Complete coding may be done by
calling an easy to use main program (or main function), where input argument is
the sequences you want to compress and the output is the compressed bit stream,
as a vector of bytes. Decoding is done just by switching the arguments. These
main programs are: the Huffman coder, Huff06, and two versions of the
arithmetic coder, Arith06.m and Arith07.m.
My interest for lossless
coding started when I took a course in Information Theory which included the
principles both for Huffman coding and arithmetic coding. My first projects as
a Ph.D. student was in signal representation and signal compression. Most of
this work was done in MatLab, so I searched for an entropy coder, i.e. lossless
coding of a symbol (integer) sequence, that was easy to use from MatLab. What I
found was mostly programs written in C, so I tried to make a Huffman Coder as a
function (m-file) in MatLab. The function (m-file) was especially made for
compression of quantized coefficient values or for values after run-length
coding. It is assumed that the histogram of the values, i.e. probabilities, is
symmetric about zero (or only non-negative numbers), and that probability
decrease as distance from zero increase. The program works ok also if this it
not the case, but the compression ratio is not as good as it will be if the
sequences are as expected. For example this Huffman coder is not the best to
use if you want to compress a text file, where the symbols are the ASCII-codes
(bytes), or a general file where you just have a sequence of bytes.
An entropy coder usually
only exploits the symbol probabilities independent of previous symbols, this is
optimal for uncorrelated sequences. For signal compression a decorrelation process usually precede the entropy coding,
but often the decorrelation is not perfect. The
Huffman coder was made such that it can exploit some of these remaining
dependencies; this was done by manipulating the input sequence. The Huffman
coder worked quite well, and we wrote an article on it. The article, "Improved
Huffman coding using recursive splitting", norsig99.pdf (172 kB), was
presented at the NORSIG-99 conference 9-11 sep. 1999. At the end of this page
the abstract of the article is included.
Since then I have updated
the program several times, the discovered errors have been corrected, the
interface (arguments passed and returned) were changed. Now almost any kind of
integer sequences can be compressed, but it will still work best if the initial
assumptions are met or nearly met. I have also made an arithmetic coder in MatLab, two versions of this are also available on this
page.
Lossless compression of a
sequence of symbols is an important part of data and signal compression.
Huffman coding is lossless, it is also often used in lossy compression as the final step after decomposition and
quantization of a signal. In signal compression, the decomposition/quantization
part seldom manages to produce a sequence of completely independent symbols.
Here we present a scheme giving better results than straightforward Huffman
coding by utilizing this fact. We split the original symbol sequence into two
sequences in such a way that the symbol statistics are, hopefully, different
for the two sequences. Individual Huffman coding for each of these sequences
will reduce the average bit rate. This split is done recursively for each
sub-sequence until the cost associated with the split is larger than the gain.
Experiments were done on
different signals. They were decomposed with the Discrete Cosine Transform,
quantized, and End Of Block coded, to give the input
symbol sequence. The results using the split scheme was a bit rate reduction of
usually more than 10% compared to straightforward Huffman coding, and 0-15%
better than JPEG-like Huffman coding, best at low bit rates.
This WWW page is maintained by
Karl Skretting at University of Stavanger.
Last
update: Oct. 22. 2010.