Burrows-Wheeler with Inversion Coder (BWIC)

Note: Previously, BWIC was called BSIC. However, to give more attention to Burrows-Wheeler's excellent work, I changed the name BSIC to BWIC.

The Move-to-Front (or Recency Ranking) has beeen introduced in 1986 by Benetley et. al. (also, discovered independently by Elias). The MTF coder ha s been especially popular since the introduction of the Block-Sorting coder by Bur rows and Wheeler in 1994, which opened new frontiers in data compression.

We introduced a new coder, called Inversion Coder (or Inversion Ranks). Inversion coder yields better compression than MTF-based coders on most of the f iles and on large text files. Our extensive simulation results have shown that a Inversion-based Coding technique is superior in compressing DNA, Image, Audio and other Signal-based data, in addition to large text files. It simply yields better compression than MTF-based coders. The compression gain with respect to MTF-based coders in terms of weighted average bit per symbol is in the range of 4-12% for the test files utilized.

We have utilized a varity of test files. Some of the test files can be obtained here for testing/research purposes. For s ome others, we provide links.

As we keep improving our BWIC, we will update this page! At this moment, for research/testing purposes, we are releasing the BWIC's executables which runs on a Context-based arithmetic coder. Hence, it requires large memory. With the current binaries you can compress files of any size. The block (window) size for BWT is set to 8-MB. Experimentally, we determined that 8 MB gives us good compression and and speed. Hence, for files bigger than 8-MB, compression is achieved by processing the file in chuncks of 8-MB. You may run the executables on a Linux or Windows OS as follows:

bwic input-filename output-filename

ubwic compressed-file output-filename

The people who are involved, contributed or helped us in this project are:

My friend David Leavitt, and Fredonia Computer Science students Jeffrey Jones, Ryan Smith and Eric Penoyer.

You can obtain the current test version of the BWIC encoder and decoder for LINUX in in compressed form by clicking the link BWIC and UBWIC . To obtain executables for MS Windows OS (32-bit version), click onto the following links BWIC and UBWIC . For windows, encoder and decoder are uncompressed.For 64-bit MS Windows OS, please use BWIC.exe. When using BWIC.exe, to encode a raw file type BWIC -e inputfile outfile and to decode a compressed file with BWIC, type BWIC -d compressedfile decompressedfile .

You can also obtain some of test files used in our work. All the DNA test files are tarred and compressed with gzip. Similarly all the Image and Sound files utilized in our work are tarred and compressed with gzip. Other test files used in our work are from Calgary and Cantarbury Corpuses.

You can acess to the files or web sites of these corpuses by clicking calgary.tar.gz or visiting the link Canterburry Corpus .