Contact:

room: 339
phone: +48 71 3757836
email:
www: http://www.ii.uni.wroc.pl/~prz/

Office hours (room 339):

  • Monday 18-19
  • Wednesday 12-13
Send me email before meeting, please.

data compression (winter 2012/13)

Address:

Institute of Computer Science
University of Wrocław
ul. Joliot-Curie 15
PL-50-383 Wrocław
ADVERTISMENTS

4th October, 2012

The first meeting:

The first meeting will take place on Thursday, 11th October, 2012.

3rd January, 2013

The exam:

The exam at the end of semester will take place on Thursday, 24th January, 2013.

The data compression field has always been an important part of computer science and it is becoming increasingly popular and important today. Although computers become faster and data storage becomes less expensive and more efficient, the increased importance of usage of data necessitates the use of at least a small measure of data compression due to vast storage and transmission requirements. The question in many applications is no longer whether to compress data, but what compression method should be applied.

Data compression is the process of converting an input data stream (the source stream or the original raw data) into another data stream (the output, the bitstream, or the compressed stream) that has a smaller size. A stream can be a file, a buffer in memory, or individual bits sent on a communications channel.

Compression can be either be lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression. Lossy compression reduces bits by identifying marginally important information and removing it. The process of reducing the size of a data file is popularly referred to as data compression, although its formal name is source coding (coding done at the source of the data, before it is stored or transmitted).

Compression is useful because it helps reduce resources usage, such as data storage space or transmission capacity. Because compressed data must be decompressed to be used, this extra processing imposes computational or other costs through decompression, this situation is far from being a free lunch.

Prerequisites

Students should know:

  • programming language like C++ or Java;
  • algorithms and data structures.

Bibliography

Main handbooks (password: handbook):

  • [1] Khalid Sayood: Introduction to Data Compression. Third Edition. Elsevier, 2006.
  • [2] Adam Drozdek: Elements of Data Compression. Brooks/Cole, 2002.
  • [3] Peter Wayner: Data Compression for Real Programmers. Morgan Kaufman, 1999.

Other books:

  • [4] David Salomon, Giovanni Motta: Handbook of Data Compression. Fifth Edition. Springer, 2010.
  • [5] David Salomon: A Concise Introduction to Data Compression. Springer, 2008.
  • [6] Ida Mengyi Pu: Fundamental Data Compression. Elsevier, 2006.

Links:

Schedule

Meetings:

  • Thursday 12-14, room 237.

Learning

Thursday, 18'th October

Read the first section from the handbook [1]:

    1. Introduction

  • 1.1. Compression Techniques
  • 1.2. Modeling and Coding

Questions and exercises:

  • An example of simple data compression: Morse code and Braile code (find some information about this in the internet).
  • How does a compression algorithm work?
  • What is the main difference between lossless and lossy compression?
  • When we need lossless compression, and when we can use lossy compression?
  • How do we calculate compression ratio? Give an example.
  • What is the rate? How do we calculate the rate?
  • What is the distortion?
  • What is modeling and coding in the context of data compression?
  • What is the residual?
  • Consider the following sequence of numbers: xn=(3, 2, 3, 3, 4, 3, 5, 5, 4, 5, 6, 5, 7, 6, 7, 7) for n=0, 1, 2...
    If we were to store the binary representation of these numbers, we would need to use 4 bits per sample. Characterize a structure of the data by an equation of the form ⌊a/b·x+c ⌋ (use 2 bits for coefficients a, b, and c). Calculate the difference the data and the model (the difference between each pair of elements should be 0 or 1). We can obtain compression by storing the parameters of the model and the residual sequence. Perform the encoding. Calculate the compression ratio.
  • Consider the following sequence of numbers: xn=(24, 21, 20, 22, 25, 23, 26, 27, 29, 28, 25, 24, 21, 22, 19, 18) for n=0, 1, 2...
    If we were to store the binary representation of these numbers, we would need to use 5 bits per sample. Note that each value is close to the previous value in the sequence (the difference between each pair of elements should from -3 to 3). We can obtain compression by storing the first value, than in place of subsequet values we store the difference between it and the previous value. Perform the encoding. Calculate the compression ratio.
Thursday, 25'th October

Read the second section from the handbook [1]:

    2. Mathematical preliminaries for Lossless Compression

  • 2.2 A Brief Introduction to Information Theory
  • 2.3 Models
  • 2.4 Coding

Questions and exercises:

Thursday, 8'th November

Read the second section from the handbook [1]:

    2. Mathematical preliminaries for Lossless Compression

  • 2.3 Models
  • 2.4 Coding

Questions and exercises:

Tuesday, 13'th November

Read the third section from the handbook [1]:

    3. Huffman Coding

  • 3.2 The Huffman Coding Algorithm
  • 3.3 Nonbinary Huffman Codes

Questions and exercises:

Thursday, 22'nd November

Read the second section from the handbook [2]:

    2. Shannon-Fano Coding

  • 2.1 Shannon Coding
  • 2.2 Shannon-Fano Coding

Questions and exercises:

Thursday, 29'th November

Read the third section from the handbook [1]:

    3. Huffman Coding

  • 3.5 Golomb Codes
  • 3.6 Rice Codes
  • 3.7 Tunstall Codes

Questions and exercises:

First project: Huffman Coding

Implement Huffman Coding. Write a program for encoding and decoding text files using Huffman coding. Test your program on lrrh.txt file.

Thursday, 6'th December

Read the third section from the handbook [1] and the handbook [2]:

    5. Dictionary Techniques [1]

  • 5.4 Adaptive Dictionary
  • 5. Dictionary Techniques [2]

  • 5.1 The LZ77 Technique

Questions and exercises:

Thursday, 13'th December

Read the third section from the handbook [1] and the handbook [2]:

    5. Dictionary Techniques [1]

  • 5.4 Adaptive Dictionary
  • 5. Dictionary Techniques [2]

  • 5.2 The LZ78 Technique

Questions and exercises:

Thursday, 20'th December

Read the third section from the handbook [2]:

    5. Dictionary Techniques [2]

  • 5.1.1 The LZSS Technique
  • 5.2.1 The LZW Technique

Questions and exercises:

Second project: Dictionary Coding

Implement LZSS or LZW coding technique. Write a program for encoding and decoding text files using LZSS or LZW coding. Test your program on lrrh.txt file.

Thursday, 3'rd January

Read the section sixth from the handbook [3]:

    5. Grammar Compression [3]

  • 5 Grammar Compression

Questions and exercises:

Thursday, 10'th and 17'th January

Read the section fourth from the handbook [1] and the handbook [2]:

    4. Arithmetic Coding [1]

  • 4.2 Introduction
  • 4.3 Coding a Sequence
  • 4.4 Generating a Binary Code
  • 4. Arithmetic Coding [2]

  • 4.1 Implementation of Arithmetic Coding
  • 4.1.1 Integer Implementation

Questions and exercises:

Instytut Informatyki