Burrows-Wheeler with Inversion Coder (BWIC)

Research page on BWIC (formerly BSIC), authored by Prof. Ziya Arnavut.

Note: Previously, BWIC was called BSIC. However, to give more attention to Burrows-Wheeler's excellent work, the name was changed to BWIC.

The Move-to-Front (or Recency Ranking) has been introduced in 1986 by Benetley et. al. (also discovered independently by Elias). The MTF coder has been especially popular since the introduction of the Block-Sorting coder by Burrows and Wheeler in 1994, which opened new frontiers in data compression.

We introduced a new coder, called Inversion Coder (or Inversion Ranks). Inversion coder yields better compression than MTF-based coders on most files and on large text files. Our extensive simulation results have shown that an Inversion-based Coding technique is superior in compressing DNA, Image, Audio and other Signal-based data, in addition to large text files. It simply yields better compression than MTF-based coders. The compression gain with respect to MTF-based coders in terms of weighted average bit per symbol is in the range of 4-12% for the test files utilized.

We have utilized a variety of test files. Some of the test files can be obtained here for testing/research purposes. For some others, we provide links.

As we keep improving our BWIC, we will update this page! At this moment, for research/testing purposes, we are releasing the BWIC's executables which run on a Context-based arithmetic coder. Hence, it requires large memory. With the current binaries you can compress files of any size. The block (window) size for BWT is set to 8 MB. Experimentally, we determined that 8 MB gives us good compression and speed. Hence, for files bigger than 8 MB, compression is achieved by processing the file in chunks of 8 MB. You may run the executables on Linux or Windows OS as follows:

bwic   input-filename output-filename
ubwic  compressed-file output-filename

The people who are involved, contributed or helped in this project are: my friend David Leavitt, and Fredonia Computer Science students Jeffrey Jones, Ryan Smith and Eric Penoyer.

You can obtain the current test version of the BWIC encoder and decoder for LINUX in compressed form by clicking: BWIC and UBWIC. For MS Windows OS (32-bit version), use BWIC and UBWIC. For 64-bit Windows, use BWIC.exe.

Usage examples:

BWIC -e inputfile outfile   (encode)
BWIC -d compressedfile decompressedfile   (decode)

You can also obtain some of the test files used in our work: DNA, Image, Sound. Other test files are from Calgary and Canterbury Corpuses: calgary.tar.gz or Canterbury Corpus.