A curiosity with LZMA data compression

By: [email protected] (GdM)

10 August 2021 at 09:42

Uncompressed file: 1'029'744 bytes.

Compressed size (excluding Zip or 7z archive metadata; data is not preprocessed):

Bytes	Compressed / Uncompressed ratio	Format	Software
172'976	16.80%	PPMd	7-Zip 21.02 alpha
130'280	12.65%	BZip2	Zip 3.0
119'327	11.59%	BZip2	7-Zip 21.02 alpha
61'584	5.98%	LZMA	Zip-Ada v.57
50'398	4.89%	LZMA2	7-Zip 21.02 alpha
50'396	4.89%	LZMA	7-Zip 21.02 alpha
42'439	4.12%	LZMA	Zip-Ada v.58 (preview)
41'661	4.05%	LZMA	Zip-Ada (current research branch)

Conclusion: the Zip-Ada (current research branch) compresses that data 17.3% better than 7-Zip v.21.02!

The file (zipped to its smallest compressed size, 4.05%) can be downloaded here. It is part of the old Canterbury corpus benchmark file collection (file name: kennedy.xls).

Please don't draw any conclusion: the test data is a relatively small, special binary file with lots of redundancy.
But that result is a hint that some more juice can be extracted from the LZMA format.

The open-source Zip-Ada project can be found here and here.

Gautier's blog
Some research with LZMA...
28 November 2020 at 19:54

Some research with LZMA...

Gautier's blog

By: [email protected] (GdM)

28 November 2020 at 19:54

A rare case where Zip-Ada's LZMA encoder is much better than LZMA SDK's. Rare but still interesting, and with standard LZMA parameters (no specific tuning for that file):

The compressed size with current revision (rev.#882) of Zip-Ada is slightly worse (42,559 bytes).

The file is part of the classic Canterbury Corpus compression benchmark data set.

Gautier's blog
Zip-Ada v.57
3 October 2020 at 20:55

Zip-Ada v.57

Gautier's blog

By: [email protected] (GdM)

3 October 2020 at 20:55

New in v.57 [rev. 799]:

- UnZip: fixed bad decoding case for the Shrink (LZW) format,
on some data compressed only by PKZIP up to v.1.10,
release date 1990-03-15.
- Zip.Create: added Zip_Entry_Stream_Type for doing output
streaming into Zip archives.
- Zip.Compress: Preselection method detects Audacity files (.aup, .au)
and compresses them better.

***

Zip-Ada is a pure Ada library for dealing with the Zip compressed
archive file format. It supplies:
- compression with the following sub-formats ("methods"):
Store, Reduce, Shrink (LZW), Deflate and LZMA
- decompression for the following sub-formats ("methods"):
Store, Reduce, Shrink (LZW), Implode, Deflate, Deflate64,
BZip2 and LZMA
- encryption and decryption (portable Zip 2.0 encryption scheme)
- unconditional portability - within limits of compiler's provided
integer types and target architecture capacity
- input archive to decompress can be any kind of indexed data stream
- output archive to build can be any kind of indexed data stream
- input data to compress can be any kind of data stream
- output data to extract can be any kind of data stream
- cross format compatibility with the most various tools and file formats
based on the Zip format: 7-zip, Info-Zip's Zip, WinZip, PKZip,
Java's JARs, OpenDocument files, MS Office 2007+,
Google Chrome extensions, Mozilla extensions, E-Pub documents
and many others
- task safety: this library can be used ad libitum in parallel processing
- endian-neutral I/O

***

Main site & contact info:
http://unzip-ada.sf.net
Project site & subversion repository:
https://sf.net/projects/unzip-ada/
GitHub clone with git repository:
https://github.com/zertovitch/zip-ada

Enjoy!

Gautier's blog
AZip 2.40 - Windows Explorer context menus
3 October 2020 at 18:00

AZip 2.40 - Windows Explorer context menus

Gautier's blog

By: [email protected] (GdM)

3 October 2020 at 18:00

New release (2.40) of AZip.

The long-awaited Windows Explorer integration is there:


Context menu for a file

Context menu for a folder

This integration is activated upon installation or on demand via the Manage button:

Configuration

This new version is based on the Zip-Ada library v.57 and includes its recent developments.

Enjoy!

Gautier's blog
Zip-Ada for Audacity backups
22 September 2020 at 18:21

Zip-Ada for Audacity backups

Gautier's blog

By: [email protected] (GdM)

22 September 2020 at 18:21

Audacity is a free, open source, audio editor, available here.

If you want to backup you Audacity project, you can manually do it with "Save Lossless Copy of Project..." with the name, say, X, which will create X.aup (project file), a folder X_data, and, in there, a file called "Audio Track.wav".

Some drawbacks:

It is a manual operation.
It is blocked during playback.
Envelopes are applied to the "Audio Track.wav" data. So data is altered and no more a real lossless copy of the project. Actually this operation is something between a backup and an export of the project to a foreign format.

A solution: Zip-Ada.

The latest commit (rev. 796) adds to the Preselection method a specific configuration for detecting Audacity files, so they are compressed better than with default settings.

Funny detail: that configuration makes, in most cases, the compression better than the best available compression with 7-Zip (v.19.00, "ultra" mode, .7z archive).

The compressing process is also around twice as fast as 7-Zip in "ultra" mode. This is no magic, since the "LZ" part of the LZMA compression scheme spends less time finding matches, in the chosen configuration for Zip-Ada.

A backup script could look like this (here for Windows' cmd):

rem --------------------------
rem Nice date YYYY-MM-DD_HH.MM
rem --------------------------

set year=%date:~-4,4%

set month=%date:~-7,2%
if "%month:~0,1%" equ " " set month=0%month:~1,1%

set day=%date:~-10,2%
if "%day:~0,1%" equ " " set day=0%day:~1,1%

set hour=%time:~0,2%
if "%hour:~0,1%" equ " " set hour=0%hour:~1,1%

set min=%time:~3,2%

set nice_date=%year%-%month%-%day%_%hour%.%min%

rem --------------------------

set audacity_project=The Cure - A Forest

zipada -ep2 "%audacity_project%_%nice_date%" "%audacity_project%.aup" "%audacity_project%_data\e08\d08\*.au"

Gautier's blog
AZip in action for duplicating a Thunderbird profile
18 September 2020 at 07:16

AZip in action for duplicating a Thunderbird profile

Gautier's blog

By: [email protected] (GdM)

18 September 2020 at 07:16

You want to copy your Thunderbird profile from machine A to machine B (with all mail accounts, passwords, settings, feeds, newgroups, ...) ? Actually it is very easy. From the user storage (on Windows, %appdata% (you get there with Windows key+R and typing %appdata%)), you copy the entire Thunderbird folder of machine A to the equivalent location on machine B, and that's it. The new active profile will be automatically selected since the file profiles.ini will be overwritten on the way.

Now, if you want or need to use a cloud drive or a USB stick for the operation, it's better to wrap everything in a Zip file (a single file instead of hundreds) to save time. Plus, you can store the Zip file in case of an emergency (losing data on both A and B machines).

With AZip, it's pretty easy:

Shut down Thunderbird on both machines.
On machine A: drag & drop the Thunderbird folder on an empty AZip window.
Copy or move the Zip file.
On machine B: extract everything with another drag & drop, from AZip to the Explorer window with the %appdata% path. When asked "Use archive's folder names for output", say "Yes". When asked "Do you want to replace this file ?", say "All".

That's it!

Here a few screenshots:

Folder tree view - click to enlarge

You can squeeze the data to a smaller size (the LZMA format will be most of the time chosen over Deflate) with the "Recompress" button (third from the right).

After recompression - click to enlarge

Gautier's blog
AZip 2.38
1 August 2020 at 07:21

AZip 2.38

Gautier's blog

By: [email protected] (GdM)

1 August 2020 at 07:21

AZip can now install itself (if requested)!

AZip is a free, open-source Zip Archive Manager.

You can download the Windows version from here: https://azip.sourceforge.io/

Normal view