A curiosity with LZMA data compression
Uncompressed file: 1'029'744 bytes.
Compressed size (excluding Zip or 7z archive metadata; data is not preprocessed):
Bytes | Compressed / Uncompressed ratio | Format | Software |
172'976 | 16.80% | PPMd | 7-Zip 21.02 alpha |
130'280 | 12.65% | BZip2 | Zip 3.0 |
119'327 | 11.59% | BZip2 | 7-Zip 21.02 alpha |
61'584 | 5.98% | LZMA | Zip-Ada v.57 |
50'398 | 4.89% | LZMA2 | 7-Zip 21.02 alpha |
50'396 | 4.89% | LZMA | 7-Zip 21.02 alpha |
42'439 | 4.12% | LZMA | Zip-Ada v.58 (preview) |
41'661 | 4.05% | LZMA | Zip-Ada (current research branch) |
Conclusion: the Zip-Ada (current research branch) compresses that data 17.3% better than 7-Zip v.21.02!
The file (zipped to its smallest compressed size, 4.05%) can be downloaded here. It is part of the old Canterbury corpus benchmark file collection (file name: kennedy.xls).
Please don't draw any conclusion: the test data is a relatively small, special binary file with lots of redundancy.
But that result is a hint that some more juice can be extracted from the LZMA format.