SPO600 – Project Stage 1

For this stage of the project, we had to find and identify a CPU-intensive function or method in an open source software project. This took quite some time because there are so many open source software projects out there. I wasn’t sure were to start, so I started looking at compression and decompression software. I found a few interesting one’s such as Zstandard by Facebook or Brotli by Google. However, since these projects were created by such large and well-known companies, I was hesitant in optimizing their code. I decided to look for a smaller company with less developers on the project.

I found a compression tool called Density. It is a free compression library that focuses on “super fast” compression at the best ratio possible. One of the biggest assets of DENSITY is that its work unit is not a byte like other libraries, but a group of 4 bytes.

I installed the program and ran the initial benchmark tests on my laptop. I’m running a Intel Core i7-3612QM CPU @ 2.10 GHZ, 8 GB RAM on 64-bit OS. Density has three different algorithms available; Chameleon, Cheetah and Lion.


Chameleon ( DENSITY_ALGORITHM_CHAMELEON )

Chameleon is a dictionary lookup based compression algorithm. It is designed for absolute speed and usually reaches a 60% compression ratio on compressible data. Decompression is just as fast. This algorithm is a great choice when main concern is speed.

cha


Cheetah ( DENSITY_ALGORITHM_CHEETAH )

Cheetah was developed with inputs from Piotr Tarsa. It is derived from chameleon and uses swapped double dictionary lookups and predictions. It can be extremely good with highly compressible data (ratio reaching 10% or less). On typical compressible data compression ratio is about 50% or less. It is still extremely fast for both compression and decompression and is a great, efficient all-rounder algorithm.

chet


Lion ( DENSITY_ALGORITHM_LION )

Lion is a multiform compression algorithm derived from cheetah. It goes further in the areas of dynamic adaptation and fine-grained analysis. It uses multiple swapped dictionary lookups and predictions, and forms rank entropy coding. Lion provides the best compression ratio of all three algorithms under any circumstance, and is still very fast.

li

Based on the 3 algorithms above, I think the best one for me to try to optimize is the Chameleon algorithm because it solely focuses on speed.

Leave a comment