# "Finite Field Arithmetic." Chapter 21A-Ter: Fix for a False Alarm in Ch.14; "Litmus" Errata.

This article is part of a series of hands-on tutorials introducing FFA, or the Finite Field Arithmetic library. FFA differs from the typical "Open Sores" abomination, in that -- rather than trusting the author blindly with their lives -- prospective users are expected to read and fully understand every single line. In exactly the same manner that you would understand and pack your own parachute. The reader will assemble and test a working FFA with his own hands, and at the same time grasp the purpose of each moving part therein.

You will need:

Add the above vpatches and seals to your V-set, and press to ffa_ch21a_ter_ch14_ch20_errata.kv.vpatch.

You should end up with the same directory structure as previously.

As of Chapter 21A-Ter, the versions of Peh and FFA are 250 and 199, respectively.

Now compile Peh:

```cd ffacalc
gprbuild```

But do not run it quite yet.

This Chapter concerns fixes for several flaws recently reported by a careful Finnish reader known only as cgra. Thank you, cgra!

Let's begin with his first find: a false alarm bug in Chapter 14B's implementation of Barrett's Modular Reduction. (Note that the proofs given in Ch.14A and Ch.14A-Bis presently stand; the bug exists strictly in the Ada program.)

Recall Step 5 of the algorithm given in Ch.14A :

For each new input X, to compute the reduction R := X mod M:

1. Xs := X >> JM
2. Z  := Xs × BM
3. Zs := Z >> SM
4. Q  := Zs × M
5. R  := X - Q
6. R  := R - M, C := Borrow
7. R  := R + (M × C)
8. R  := R - M, C := Borrow
9. R  := R + (M × C)
10. R  := R - (R × DM)
11. R is now equal to X mod M.

 ←2WM→ Ignore X ←WM - L→ ←WM + L→ ←2WM→ - Ignore Q ←WM - L→ ←WM + L→ = R ←WM + L→

... and finally, its implementation in Chapter 14B :

```   -- Reduce X using the given precomputed Barrettoid.
procedure FZ_Barrett_Reduce(X          : in     FZ;
Bar        : in     Barretoid;
XReduced   : in out FZ) is

..............................

-- R is made one Word longer than Modulus (see proof re: why)
Rl      : constant Indices := Ml + 1;

..............................

-- Barring cosmic ray, no underflow can take place in (4) and (5)
NoCarry : WZeroOrDie := 0;

begin

..............................

-- (5) R  := X - Q (we only need Rl-sized segments of X and Q here)
FZ_Sub(X => X(1 .. Rl), Y => Q(1 .. Rl),
Difference => R, Underflow => NoCarry);

..............................```

Even though we had demonstrated that Q ≤ X, the prohibition of a nonzero subtraction borrow in (5) is fallacious.

To illustrate: this Tape, on a 256-bit run of Peh :

`  .1 .FF LS .1 .3 MX # QY`

... will not print the expected answer to the given modular exponentiation, i.e.:

`  0000000000000000000000000000000000000000000000000000000000000002`

... with a Verdict of Yes; but instead will print nothing, and yield a Verdict of EGGOG. Specifically, Peh will halt at (5) via a Constraint_Error (range check failed), when the range of NoCarry's WZeroOrDie type is violated by an assignment of 1.

This is because -- early in this modular exponentiation's sequence of Barrett reductions -- and immediately prior to (5) :

```X == 0x40000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

Q == 0x3FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF```

... but what will be actually computed in (5) is X(1 .. Rl) - Q(1 .. Rl), i.e.:

```  0x00000000000000000000000000000000000000000000000000000000000000000000000000000000 -
0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

=

1 (Underflow == 1)```

... that is, the borrow bit is legitimately 1, in this and in a number of other readily-constructed cases. The constraints we have demonstrated for X, Q, and R do not imply that a borrow will never occur in the subtraction at (5). Therefore, the intended cosmic ray detector is strictly a source of false alarms, and we will remove it:

```   -- Reduce X using the given precomputed Barrettoid.
procedure FZ_Barrett_Reduce(X          : in     FZ;
Bar        : in     Barretoid;
XReduced   : in out FZ) is

..............................

-- Borrow from Subtraction in (5) is meaningless, and is discarded
IgnoreC : WBool;
pragma Unreferenced(IgnoreC);

begin

..............................

-- (5) R  := X - Q (we only need Rl-sized segments of X and Q here)
FZ_Sub(X => X(1 .. Rl), Y => Q(1 .. Rl),
Difference => R, Underflow => IgnoreC); -- Borrow is discarded

..............................```

... and that's it.

Cgra's second find concerned the Ch.20 demo script, Litmus. He had discovered that two mutually-canceling bugs exist in the program. Specifically, in :

```
..................

# Hashed Section Length
get_sig_bytes 2
turd+=\$r
hex_to_int
sig_hashed_len=\$r

# Hashed Section (typically: timestamp)
get_sig_bytes \$sig_hashed_len
turd+=\$r
sig_hashed=\$r

# Unhashed Section Length
get_sig_bytes 1
hex_to_int
sig_unhashed_len=\$r

get_sig_bytes \$sig_unhashed_len

..................
..................

# RSA Packet Length (how many bytes to read)
get_sig_bytes 1
hex_to_int
rsa_packet_len=\$r

# The RSA Packet itself
get_sig_bytes \$rsa_packet_len
rsa_packet=\$r

# Digest Prefix (2 bytes)
get_sig_bytes 2
digest_prefix=\$r

..................```

... the Unhashed Section Length is erroneously treated as a 1-byte field, whereas in reality the GPG format gives 2 bytes. The script only worked (on all inputs tested to date) on account of the presence of the superfluous routine (RSA Packet reader, which remained from an early version of the demo!); in all of the test cases to date, the second byte of the Unhashed Section Length (and the unhashed section in its entirety, for so long as it does not exceed 255 bytes in length -- which it appears to never do) are consumed by get_sig_bytes \$rsa_packet_len.

I am almost pleased that I had made this mistake; it is in fact a better illustration of programs which operate correctly despite erroneous logic -- as well as the unsuitability of shell script as a language for nontrivial tasks -- than anything I could readily unearth in the open literature.

And the fix is readily obvious :

```
..................

# Hashed Section Length
get_sig_bytes 2
turd+=\$r
hex_to_int
sig_hashed_len=\$r

# Hashed Section (typically: timestamp)
get_sig_bytes \$sig_hashed_len
turd+=\$r
sig_hashed=\$r

# Unhashed Section Length
get_sig_bytes 2
hex_to_int
sig_unhashed_len=\$r

get_sig_bytes \$sig_unhashed_len

..................
..................

# Digest Prefix (2 bytes)
get_sig_bytes 2
digest_prefix=\$r

..................```

I also incorporated cgra's earlier suggestion regarding error checking. Thank you again, cgra!

And that's it for Litmus, presently.

~ The next Chapter, 21B, will (yes!) continue the Extended-GCD sequence of Chapter 21A. ~

# "Cryostat" Genesis.

Cryostat is a Fits-in-Head minimal (~700 LOC, including comments) static library for adding safe and reliable persistent storage to Ada data structures. It makes use of memory-mapped disk I/O via the MMap() system call, present in Linux (kernel 2.4 and newer) and all compatible operating systems. This mechanism permits efficient work with persistent data structures substantially larger than a machine's physical RAM.

AdaCore offers their own implementation of MMap() support in GNATColl. However, IMHO their item is an atrocity, in many ways very similarly to their GNAT Sockets pile of garbage (the near-unusability of the latter is what prompted me to write Ada-UDP in 2018.) AdaCore's MMap library is not only a behemoth replete with e.g. special cases for MS-Win support, but its use is entirely incompatible with safety-restricted compilation profiles.

Cryostat, on the other hand, does NOT require enabling the use of pointerism, unchecked conversions, the secondary stack, heap allocators, or other bulky and objectionable GNAT features, in the calling program. It does however require finalization to be enabled. This is used to guarantee the safe sync-to-disk and closing of the backing MMap when the data structure it contains goes out of scope.

Let's proceed to building Cryostat and its included demo program.

You will need:

Add the above vpatch and seal to your V-set, and press to cryostat_genesis.kv.vpatch.

Now compile the included CryoDemo:

```cd demo
gprbuild```

... this will build both the demo and the library.

But do not run it quite yet.

First, let's see what this demo consists of :

```with Interfaces;  use Interfaces;

with Cryostat;

procedure CryoDemo is

-- Path on disk for the example Cryostat backing file :
File_Path : constant String := "cryotest.bin";

-- Now, let's define an example data structure to place in a Cryostat :

-- Example payload array's element type: byte.

-- Let's make it 512MB - far bigger than a typical stack, to demonstrate
-- that it will in fact reside in the Cryostat, rather than on the stack :
A_MBytes : constant Unsigned_32 := 512;

subtype ARange is Unsigned_32 range 0 .. (A_MBytes * 1024 * 1024) - 1;

-- Complete the definition of the payload data structure :
type TestArray is array(ARange) of ADatum;

-- Declare a Cryostat which stores a TestArray :
package Cryo is new Cryostat(Form     => TestArray,
Path     => File_Path,
Writable => True,  -- Permit writing
Create   => True); -- Create file if not exists

-- Handy reference to the payload; no pointerisms needed !
T : TestArray renames Cryo.Item;

-- T can now be treated as if it lived on the stack :

begin

Put_Line("T(0)    before :  " & ADatum'Image(T(0)));
Put_Line("T(Last) before :  " & ADatum'Image(T(T'Last)));

-- Increment each of the elements of T :
for i in T'Range loop
T(i) := T(i) + 1;
end loop;

Put_Line("T(0)    after  :  " & ADatum'Image(T(0)));
Put_Line("T(Last) after  :  " & ADatum'Image(T(T'Last)));

--- Optional, finalizer always syncs in this example
--  Cryo.Sync;

--- Test of Zap -- uncomment and get zeroized payload every time :
--  Cryo.Zap;

Put_Line("OK.");

end CryoDemo;```

In the demo, we define TestArray -- a data structure consisting of a 512 megabyte array, and invoke Cryostat to create a persistent disk store for it. (When the program is first run, the array -- instantiated as T -- will contain only zeros.) After this, we increment each byte in T, and terminate. When, in the end, T goes out of scope, the finalizer kicks in and properly syncs the payload to disk. Thus, T behaves exactly like a stack-allocated variable, with the exception of the fact that its contents are loaded from disk upon its creation (on the second and subsequent runs of the program) and synced to disk upon its destruction (or if Sync were to be invoked.)

Observe that the calling code is not required to perform any file-related manipulations, or to juggle memory; all of the necessary mechanisms (including error handling) are contained in the Cryostat static library.

When we first execute the demo:

`./bin/cryodemo`

The following output will appear:

```T(0)    before :   0
T(Last) before :   0
T(0)    after  :   1
T(Last) after  :   1
OK.```

If we run it again, will see the following:

```T(0)    before :   1
T(Last) before :   1
T(0)    after  :   2
T(Last) after  :   2
OK.```

... and so forth. cryotest.bin, the backing file used by the Cryostat in the demo, will consist of 512 megabytes of byte value N, where N is the number of times the demo has executed. For example, after the first run, a hex dump:

`hexdump -C cryotest.bin`

... will yield:

```00000000  01 01 01 01 01 01 01 01  01 01 01 01 01 01 01 01  |................|
*
20000000```

Let's use the traditional strace tool to confirm that the demo behaves as specified:

`strace ./bin/cryodemo`

The following output will appear:

```execve("./bin/cryodemo", ["./bin/cryodemo"], [/* 84 vars */]) = 0
arch_prctl(ARCH_SET_FS, 0x644798)       = 0
rt_sigprocmask(SIG_UNBLOCK, [RT_1 RT_2], NULL, 8) = 0
rt_sigaction(SIGABRT, {0x41c360, [], SA_RESTORER|SA_RESTART|SA_NODEFER|SA_SIGINFO, 0x42498c}, NULL, 8) = 0
rt_sigaction(SIGFPE, {0x41c360, [], SA_RESTORER|SA_RESTART|SA_NODEFER|SA_SIGINFO, 0x42498c}, NULL, 8) = 0
rt_sigaction(SIGILL, {0x41c360, [], SA_RESTORER|SA_RESTART|SA_NODEFER|SA_SIGINFO, 0x42498c}, NULL, 8) = 0
rt_sigaction(SIGBUS, {0x41c360, [], SA_RESTORER|SA_RESTART|SA_NODEFER|SA_SIGINFO, 0x42498c}, NULL, 8) = 0
sigaltstack({ss_sp=0x644a80, ss_flags=0, ss_size=16384}, NULL) = 0
rt_sigaction(SIGSEGV, {0x41c360, [], SA_RESTORER|SA_STACK|SA_RESTART|SA_NODEFER|SA_SIGINFO, 0x42498c}, NULL, 8) = 0
fstat(2, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 5), ...}) = 0
fstat(0, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 5), ...}) = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 5), ...}) = 0
open("cryotest.bin", O_RDWR|O_CREAT, 0666) = 3
ftruncate(3, 536870912)                 = 0
mmap(NULL, 536870912, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0x7f3bcc575000
writev(1, [{"", 0}, {"T(0)    before :   0\n", 21}], 2) = 21
writev(1, [{"", 0}, {"T(Last) before :   0\n", 21}], 2) = 21
writev(1, [{"", 0}, {"T(0)    after  :   1\n", 21}], 2) = 21
writev(1, [{"", 0}, {"T(Last) after  :   1\n", 21}], 2) = 21
writev(1, [{"", 0}, {"OK.\n", 4}], 2)   = 4
msync(0x7f3bcc575000, 536870912, MS_SYNC) = 0
munmap(0x7f3bcc575000, 536870912)       = 0
close(3)                                = 0
exit_group(0)                           = ?
+++ exited with 0 +++```

There are a few minor knobs that still ought to be added to Cryostat (See README.TXT) but even as it presently stands, it is already sufficient for basic experimentation with clean and compact databases implemented wholly in Ada.

~ To Be Continued. ~

# "Finite Field Arithmetic." Chapter 21A-Bis: Fix for Lethal Flaw in Ch.15's Greatest Common Divisor.

This article is part of a series of hands-on tutorials introducing FFA, or the Finite Field Arithmetic library. FFA differs from the typical "Open Sores" abomination, in that -- rather than trusting the author blindly with their lives -- prospective users are expected to read and fully understand every single line. In exactly the same manner that you would understand and pack your own parachute. The reader will assemble and test a working FFA with his own hands, and at the same time grasp the purpose of each moving part therein.

You will need:

Add the above vpatches and seals to your V-set, and press to ffa_ch21a_bis_fix_ch15_gcd.kv.vpatch.

You should end up with the same directory structure as previously.

As of Chapter 21A-Bis, the versions of Peh and FFA are 250 and 200, respectively.

Now compile Peh:

```cd ffacalc
gprbuild```

But do not run it quite yet.

## 1. The Boojum.

This Chapter repairs a lethal bug in Chapter 15's "Constant-Time Iterative GCD" algorithm and its implementation.

While working on Ch. 21B, I found that the algorithm used in Ch. 15 is subtly broken on a very small but readily-constructible subset of its input domain.

My original (unpublished -- it was the nominal answer to Ch. 15 Exercise 2...) proof for this algo was fallacious: the calculation as formulated there is superficially correct, but it is not guaranteed to converge in Bitness(A) + Bitness(B) iterations! It is possible to construct pairs of inputs where we end up with an incorrect GCD result, and we will discuss several examples of these in this Chapter.

Cut corners are to be paid for, and I will admit that I would rather pay for this one now, with a Vpatch, a new proof, and an apology to readers, than later -- when I put FFA to use in anger, and offer BTC bounties for reporting defects.

Several supposedly-attentive FFA students at various times reported that they have eaten Ch. 15; and at least one of last year's readers even signed it. However, no one reported the defect, which I ended up uncovering while testing the final version of the Ch. 21 Extended-GCD against Ch. 15's conventional one. The breakage is triggered by a small subset of the possible input space where one or both input FZs to GCD consist almost entirely of ones (i.e. most of the bits are set.) No such case turned up in Ch. 15's randomly-generated test battery, reinforcing Dijkstra's famous observation that testing can reveal the presence of bugs, but never their absence.

Only proof can be relied on, and the proof had better be correct.

Let's proceed with two concrete examples of inputs which break the Ch. 15 GCD:

Ch. 15 GCD Counter-Example No. 1:

This Tape:

```  .FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBB
.FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFB
G
#```

... produces 4, when fed into a 256-bit-FZ run of Peh. But the two numbers are in fact co-prime, i.e. their actual GCD is 1.

And this Tape:

Ch. 15 GCD Counter-Example No. 2:

```  .FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEB
.FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFB
G
#```

... produces 0. Which is not only not the correct answer (again 1; the given numbers are yet again co-prime) but is in fact an impossible output of any GCD invocation apart from GCD(0,0) -- and is true there only by convention.

It is possible to generate further counter-examples, but these are quite enough to demonstrate that the Ch. 15 GCD algorithm does not work as specified.

Now let's review the broken algo:

Chapter 15 Algorithm 3: Broken Constant-Time Iterative GCD.

For FZ integers A and B:

1. Twos := 0
2. Iterate Bitness(A) + Bitness(B) times:
3.    Ae   := 1 - (A AND 1)
4.    Be   := 1 - (B AND 1)
5.    A    := A >> Ae
6.    B    := B >> Be
7.    Twos := Twos + (Ae AND Be)
8.    D    := |A - B|, C ← Borrow
9.    B    := {C=0 → B, C=1 → A}
10.    A    := D
11. A := A << Twos
12. A contains the GCD.

To those who have read Ch. 21A, the defect may be already apparent.

Specifically, when we introduce the Iterate Bitness(A) + Bitness(B)... condition in place of the original non-constant-time algo's Iterate until B = 0..., steps 8...10 become erroneous, in two respects.

First: in a run which requires exactly Bitness(A) + Bitness(B) - 1 iterations to reach the point where A = B, A will end up equal to zero, while the GCD ends up permanently trapped in B.

The root of the problem is that D := |A - B| becomes the new value of A even when D = 0 (i.e. A and B were equal.) For virtually the whole input space of the algo, this state is temporary: the final GCD will move into A in the very next iteration, and stays there. But if there is no remaining iteration, we end up with the invalid output 0. This is illustrated by Ch. 15 GCD Counter-Example No. 2.

Second: the logic of steps 8...10 permits a condition where one or more rounds of the iteration execute without reducing the bitness of either A or B. Enough of these, and the algorithm in fact will not converge. This is illustrated by Ch. 15 GCD Counter-Example No. 1.

This time, the problem is that the subtractive step is performed without demanding (as we did starting with Algorithm 2 of Ch. 21A) that both A and B be odd. This can result in an iteration where we in fact get no closer to convergence than we were at the preceding one.

To understand exactly how the Ch.15 algo is subtly broken, let's pretend that we had a 8-bit FZ (for didactic purposes strictly; the minimal FZ bitness in FFA is 256); and then construct and illustrate a simple example, where two 8-bit quantities -- which, theoretically, would be expected to converge to their correct GCD (1) in 16 iterations -- fail to do so.

For clarity, this and all subsequent worked examples will show binary representations of A and B.

 i Ai Bi AEven : BEven : Twos A ← |A - B| B ←?← A Ai+1 Bi+1 A ← A/2 B ← B/2 1 11111011 11011011 0 100000 11011011 100000 11011011 2 100000 11011011 10000 0 11001011 10000 11001011 10000 3 11001011 10000 1000 0 11000011 1000 11000011 1000 4 11000011 1000 100 0 10111111 100 10111111 100 5 10111111 100 10 0 10111101 10 10111101 10 6 10111101 10 1 0 10111100 1 10111100 1 7 10111100 1 1011110 0 1011101 1 1011101 1 8 1011101 1 0 1011100 1 1011100 1 9 1011100 1 101110 0 101101 1 101101 1 10 101101 1 0 101100 1 101100 1 11 101100 1 10110 0 10101 1 10101 1 12 10101 1 0 10100 1 10100 1 13 10100 1 1010 0 1001 1 1001 1 14 1001 1 0 1000 1 1000 1 15 1000 1 100 0 11 1 11 1 16 11 1 0 10 1 10 1 17 10 1 1 0 0 1 0 1 18 0 1 0 0 1 0 1 0 GCD (Incorrect) 10

The iterations marked in red, would have been necessary for successful convergence, but never execute; those marked in yellow, do not reduce the bitness of A or B.

The mine is in the fact that, if, at step 8 of the algo:

1.    D    := |A - B|, C ← Borrow
2.    B    := {C=0 → B, C=1 → A}
3.    A    := D

... it so happens that A is even and B is odd, then D (and consequently the value of A in the next iteration) will be odd. And, if A ≥ B, then B will remain odd in the next iteration; and we will get an iteration cycle where the total bitness of A and B will not decrease, unless A and B happen to be close enough in value for the subtraction in step 8 to decrease it.

Thus we have seen how to construct a condition where carrying out an iteration of Ch. 15's GCD does not reduce the bitness of either A or B.

The interesting bit is that in virtually all possible inputs to GCD, this flaw does not lead to an ultimately incorrect output, because -- given sufficient iterations -- the correct answer is obtained. But in a very small number of possible input pairs, convergence will not be reached inside 2 × FZ_Bitness iterations.

It appears to be virtually impossible to arrive at the fatal condition by chance.

This kind of thing could be an ideal cryptological boobytrap, if GCD were in fact a key element in any known cryptosystem.

But AFAIK, there is no such cryptosystem. On top of this, from a cryptological point of view, the broken GCD "fails safe", i.e. it can be coaxed into reporting two co-prime inputs as being non-coprime, but not the opposite.

And, fortunately (or unfortunately, from the POV of quickly turning up possible defects) FFA does not currently use conventional-GCD inside any other internal algorithm. But let's consider where we have thus far made use of GCD.

Apart from the original Ch.15 tests, the only two places in the series where we have used GCD were the primorial generator demo depicted in Ch. 18C and the prime generator demo which used it.

The primorial generator was unaffected -- it apparently produces correct primorials for all valid FZ Widths from 256 to -- at least -- 16384.

The prime generator was also unaffected. In principle, a defective GCD would result, there, in a... slightly slower prime generator, which would attempt a larger number of doomed Miller-Rabin tests; GCD was used there strictly as speedup pre-filter for the intake candidates. But there was no detectable decrease in the number of M-R-failed runs after I put the corrected GCD into service.

The Ch. 21A material, interestingly, is also unaffected: the initially non-constant-time algo from the middle of Ch. 15 is given there as a starting point. And that algo (and all of the subsequent extensions offered) was... correct. Only the constant-time version which was used to actually write the GCD routine in Ch. 15, was not...

The bullet, it could be said, went through the hat, but not the head. Nevertheless, it is a serious defect, and will be corrected in this Chapter. And this time, the full proof of convergence will be given.

## 2. The Cure.

Let's rewrite the GCD algo, so that a reduction of the bitness of A, B, or both, is in fact guaranteed at every iteration.

First, we'll write a readability-optimized schematic version (expressed with branching logic) of the correct algorithm :

Chapter 21A-Bis Algorithm 1: Schematic Version of Corrected Constant-Time Iterative GCD :

For FZ integers A and B:

1. Twos := 0
2. Iterate Bitness(A) + Bitness(B) - 1 times:
3.    D := |A - B|
4.    If Odd(A) and Odd(B) :
5.       If A < B :
6.          B := A
7.       A := D
8.    If Even(A) and Even(B) :
9.       Twos := Twos + 1
10.    If Even(A):
11.       A := A / 2
12.    If Even(B):
13.       B := B / 2
14. A := Bitwise-OR(A, B)
15. A := A * 2Twos
16. A contains the GCD.

Second, and equivalently, a properly constant-time formulation of the above :

Chapter 21A-Bis Algorithm 2: Corrected Constant-Time Iterative GCD :

For FZ integers A and B:

1. Twos := 0
2. Iterate Bitness(A) + Bitness(B) - 1 times:
3.    OO   := (A AND 1) AND (B AND 1)
4.    D    := |A - B|, C ← Borrow
5.    B    := {(OO AND C)=0 → B, (OO AND C)=1 → A}
6.    A    := {OO=0 → A, OO=1 → D}
7.    Ae   := 1 - (A AND 1)
8.    Be   := 1 - (B AND 1)
9.    A    := A >> Ae
10.    B    := B >> Be
11.    Twos := Twos + (Ae AND Be)
12. A := A OR B
13. A := A << Twos
14. A contains the GCD.

Now, let's show, step by step, that Algorithm 1 and Algorithm 2 are arithmetically-equivalent.

 Algorithm 1 Operation Description Algorithm 2 Operation Twos := 0 Initialize the common-power-of-two counter to 0. Twos := 0 Iterate Bitness(A) + Bitness(B) - 1 times: Begin a loop with exactly Bitness(A) + Bitness(B) - 1 iterations. In FFA, this is definitionally equivalent to 2×FZ_Bitness(N) - 1, where N is any of the inputs A, B or the output GCD. Iterate Bitness(A) + Bitness(B) - 1 times: Start of Iteration D := |A - B| Compute the absolute value of the difference A - B. In the constant-time Algorithm 2, we save the carry bit to C, which will trigger subsequent operations predicated on the condition A < B. D := |A - B|, C ← Borrow If Odd(A) and Odd(B) : Determine whether both A and B are presently odd. OO := (A AND 1) AND (B AND 1) If A < B :         B := A Assign A to B if A and B were both odd, and A < B. B := {(OO AND C)=0 → B, (OO AND C)=1 → A} A := D Assign D to A if A and B were both odd. A := {OO=0 → A, OO=1 → D} If Even(A) and Even(B) :     Twos := Twos + 1 If both A and B are presently even, increment the common power-of-two counter. Ae := 1 - (A AND 1) Be := 1 - (B AND 1) Twos := Twos + (Ae AND Be) If Even(A):     A := A / 2 If A was found to be even, divide it by two. In Algorithm 2, this is accomplished via a Right Shift by 1. A := A >> Ae If Even(B):     B := B / 2 If B was found to be even, divide it by two. In Algorithm 2, this is accomplished via a Right Shift by 1. B := B >> Be End of Iteration A := Bitwise-OR(A, B) Normally, B will contain the intermediate result after convergence. However, when calculating GCD(N, 0), where N > 0, it will be found in A. The other variable will always equal 0. Hence, we obtain the final result by performing a bitwise-OR of A, B. It is assigned to A. A := A OR B A := A * 2Twos Reintroduce the common-power-of-two factor into the intermediate GCD result. If there was none (i.e. one or both inputs were odd to begin with) this will be 20, i.e. 1, and then has no effect. In Algorithm 2, the multiplication is accomplished via a constant-time Left Shift. A := A << Twos A contains the GCD. A now contains the actual GCD of the two input integers. A contains the GCD. GCD is in A.

If you are entirely satisfied that Algorithm 1 and Algorithm 2 are equivalent in their effects, proceed to the proof below. For clarity, we will base it on Algorithm 1.

First, let's demonstrate that each iteration of the loop preserves the GCD of A, B :

 Algorithm 1 Operation Identity D := |A - B| If Odd(A) and Odd(B) :     If A < B :         B := A     A := D A restatement of Euclid's original : GCD(A, B) = GCD(|A - B|, Min(A, B)) Observe that |A - B|, the new value of A, is necessarily even, while Min(A, B), the new value of B, remains odd. If Even(A) and Even(B) :     Twos := Twos + 1 A common factor of 2 will be removed (in steps 11 and 13) from A and B, respectively, and the Twos counter is incremented, so we can multiply 2Twos back in at step 15. GCD(2 × K, 2 × L) = 2 × GCD(K, L) If Even(A):     A := A / 2 A factor of 2 is removed from A. GCD(2 × K, B) = GCD(K, B) If Even(B):     B := B / 2 A factor of 2 is removed from B. GCD(A, 2 × L) = GCD(A, L)

Now, let's confirm that Algorithm 1 is in fact guaranteed to converge within at most Bitness(A) + Bitness(B) - 1 iterations. Let's exhaustively describe the effects of an iteration across the four possible combinations of parities of A and B. (Steps 8 and 9 do not affect the bitness of A or B and will be omitted.)

 A is Odd, B is Odd : Algorithm 1 Operation Effect on Bitness of A Effect on Bitness of B D := |A - B| If Odd(A) and Odd(B) :     If A < B :         B := A     A := D No guaranteed effect (but may decrease by 1 or more bits). However, A is now necessarily even. No guaranteed effect (but may decrease by 1 or more bits). B remains odd. If Even(A):     A := A / 2 As A is guaranteed to have become even in step 7: decreases by 1 bit. None If Even(B):     B := B / 2 None None Net Effect on Bitnesses: Decreased by at least 1 bit. May become zero if A and B had been equal. None; may decrease by 1 or more bits.

Note that in the convergence case where A=1 B=1, the above parity combination will yield A=0 B=1.

 A is Odd, B is Even : Algorithm 1 Operation Effect on Bitness of A Effect on Bitness of B D := |A - B| If Odd(A) and Odd(B) :     If A < B :         B := A     A := D None None If Even(A):     A := A / 2 None None If Even(B):     B := B / 2 None Decreases by 1 bit, unless B = 0. Net Effect on Bitnesses: None Decreased by 1 bit, unless B = 0.
 A is Even, B is Odd : Algorithm 1 Operation Effect on Bitness of A Effect on Bitness of B D := |A - B| If Odd(A) and Odd(B) :     If A < B :         B := A     A := D None None If Even(A):     A := A / 2 Decreases by 1 bit, unless A = 0. None If Even(B):     B := B / 2 None None Net Effect on Bitnesses: Decreased by 1 bit, unless A = 0. None
 A is Even, B is Even : Algorithm 1 Operation Effect on Bitness of A Effect on Bitness of B D := |A - B| If Odd(A) and Odd(B) :     If A < B :         B := A     A := D None None If Even(A):     A := A / 2 Decreases by 1 bit, unless A = 0. None If Even(B):     B := B / 2 None Decreases by 1 bit, unless B = 0. Net Effect on Bitnesses: Decreased by 1 bit, unless A = 0. Decreased by 1 bit, unless B = 0.

We have shown that each iteration of Algorithm 1 (and, as we have demonstrated their equivalence -- Algorithm 2) is guaranteed to reduce the bitness of A, B, or of both A and B, by at least 1 bit -- supposing we have not yet reached convergence (i.e. when nothing can be reduced because one of A or B is equal to zero, while the other is odd.)

Therefore, to compute the GCD of A, B where each is of bitness W, we in fact need at most (2 × W) - 1 iterations (i.e. to arrive at a GCD of 1, which is of bitness 1.)

Now, let's illustrate two concrete symmetric cases where the maximum permitted number of iterations is in fact required. Each of these pairs of 8-bit inputs demands 15 (i.e. ( 2 × 8 ) - 1) shots for convergence.

GCD(0x80, 0xff) :

 i Ai Bi AOdd and BOdd : AEven : BEven : Twos Ai+1 Bi+1 A ← |A - B| B ← A A ← A/2 B ← B/2 1 10000000 11111111 1000000 0 1000000 11111111 2 1000000 11111111 100000 0 100000 11111111 3 100000 11111111 10000 0 10000 11111111 4 10000 11111111 1000 0 1000 11111111 5 1000 11111111 100 0 100 11111111 6 100 11111111 10 0 10 11111111 7 10 11111111 1 0 1 11111111 8 1 11111111 11111110 1 1111111 0 1111111 1 9 1111111 1 1111110 1 111111 0 111111 1 10 111111 1 111110 1 11111 0 11111 1 11 11111 1 11110 1 1111 0 1111 1 12 1111 1 1110 1 111 0 111 1 13 111 1 110 1 11 0 11 1 14 11 1 10 1 1 0 1 1 15 1 1 0 1 0 0 0 1 GCD 1

GCD(0xff, 0x80) :

 i Ai Bi AOdd and BOdd : AEven : BEven : Twos Ai+1 Bi+1 A ← |A - B| B ← A A ← A/2 B ← B/2 1 11111111 10000000 1000000 0 11111111 1000000 2 11111111 1000000 100000 0 11111111 100000 3 11111111 100000 10000 0 11111111 10000 4 11111111 10000 1000 0 11111111 1000 5 11111111 1000 100 0 11111111 100 6 11111111 100 10 0 11111111 10 7 11111111 10 1 0 11111111 1 8 11111111 1 11111110 1 1111111 0 1111111 1 9 1111111 1 1111110 1 111111 0 111111 1 10 111111 1 111110 1 11111 0 11111 1 11 11111 1 11110 1 1111 0 1111 1 12 1111 1 1110 1 111 0 111 1 13 111 1 110 1 11 0 11 1 14 11 1 10 1 1 0 1 1 15 1 1 0 1 0 0 0 1 GCD 1

In both of these examples, Bitness(A)+Bitness(B) is reduced by exactly one in each iteration. And we have already demonstrated that it is impossible to construct a case -- aside from the convergence case -- where an iteration will reduce this quantity by 0. So in fact these are instances of the "worst case" number of required iterations. (Recall that we are writing a constant-time algo, and the "worst case" number of iterations will always execute.)

Now, let's illustrate what happens in the "degenerate" cases. Starting with GCD(0,0) :

GCD(0, 0) :

 i Ai Bi AOdd and BOdd : AEven : BEven : Twos Ai+1 Bi+1 A ← |A - B| B ← A A ← A/2 B ← B/2 1 0 0 0 0 1 0 0 2 0 0 0 0 2 0 0 3 0 0 0 0 3 0 0 ... 0 0 0 0 ... 0 0 iLast 0 0 0 0 i 0 0 GCD 0

For illustrative purposes, three iterations are shown. But the end result, without regard to FZ Width, will always be the same: 0. The Twos counter increases with each additional iteration, as both A and B remain even, but this has no effect on the final output.

Now, let's examine the case GCD(0, N) where N > 0 :

GCD(0, 3) :

 i Ai Bi AOdd and BOdd : AEven : BEven : Twos Ai+1 Bi+1 A ← |A - B| B ← A A ← A/2 B ← B/2 1 0 11 0 0 0 11 2 0 11 0 0 0 11 3 0 11 0 0 0 11 ... 0 11 0 0 0 11 iLast 0 11 0 0 0 11 GCD 11

It should be clear at this point that once A has become equal to 0, and B is odd, further iterations have no effect.

Lastly, let's illustrate the "interesting", from the POV of the new algo, degenerate case: GCD(N, 0) where N > 0 :

GCD(3, 0) :

 i Ai Bi AOdd and BOdd : AEven : BEven : Twos Ai+1 Bi+1 A ← |A - B| B ← A A ← A/2 B ← B/2 1 11 0 0 0 11 0 2 11 0 0 0 11 0 3 11 0 0 0 11 0 ... 11 0 0 0 11 0 iLast 11 0 0 0 11 0 GCD 11

In this and only this case, the intermediate result will wind up in A rather than in B, given as the subtractive step demands an odd A and odd B, and this never happens in the (N, 0) case.

This is why we need step 14: A := Bitwise-OR(A, B).

Let's conclude with a second arbitrary example of the above :

GCD(0x80, 0x0) :

 i Ai Bi AOdd and BOdd : AEven : BEven : Twos Ai+1 Bi+1 A ← |A - B| B ← A A ← A/2 B ← B/2 1 10000000 0 1000000 0 1 1000000 0 2 1000000 0 100000 0 2 100000 0 3 100000 0 10000 0 3 10000 0 4 10000 0 1000 0 4 1000 0 5 1000 0 100 0 5 100 0 6 100 0 10 0 6 10 0 7 10 0 1 0 7 1 0 8 1 0 0 7 1 0 ... 1 0 0 7 1 0 iLast 1 0 0 7 1 0 GCD 10000000

At this point, there is not much else left to say about the algorithm per se.

And now, let's see the implementation of the new GCD:

```package body FZ_GCD is

-- Find Greatest Common Divisor (GCD) of X and Y.
-- Note that by convention, GCD(0, 0) = 0.
procedure FZ_Greatest_Common_Divisor(X      : in  FZ;
Y      : in  FZ;
Result : out FZ) is

-- Widths of X, Y, and Result are equal
subtype Width is Word_Index range X'Range;

-- Working buffers for GCD computation, initially equal to the inputs
A      : FZ(Width) := X;
B      : FZ(Width) := Y;

-- Evenness (negation of lowest bit) of A and B respectively
Ae, Be : WBool;

-- Common power-of-2 factor: incremented when Ae and Be are both 1
Twos   : Word := 0;

-- This flag is set when A and B are BOTH ODD
OO     : WBool;

-- |A - B|
D      : FZ(Width);

-- This flag is set iff A < B
A_lt_B : WBool;

begin

-- To converge, requires number of shots equal to (2 * FZ_Bitness) - 1:
for i in 1 .. (2 * FZ_Bitness(X)) - 1 loop

-- Whether A and B are currently BOTH ODD :
OO := FZ_OddP(A) and FZ_OddP(B);

-- D := |A - B|
FZ_Sub_Abs(X => A, Y => B, Difference => D, Underflow => A_lt_B);

-- IFF A,B both ODD, and A < B : B' := A ; otherwise no change :
FZ_Mux(X => B, Y => A, Result => B, Sel => OO and A_lt_B);

-- IFF A,B both ODD: A' := |A - B| ; otherwise no change :
FZ_Mux(X => A, Y => D, Result => A, Sel => OO);

-- If A is now EVEN: A := A >> 1; otherwise no change
Ae := 1 - FZ_OddP(A);
FZ_ShiftRight(A, A, WBit_Index(Ae));

-- If B is now EVEN: B := B >> 1; otherwise no change
Be := 1 - FZ_OddP(B);
FZ_ShiftRight(B, B, WBit_Index(Be));

-- If both A and B were even, increment the common power-of-two
Twos := Twos + (Ae and Be);

end loop;

-- Normally, B will contain the GCD, but in the (N,0) N > 0 case -- A.
-- The other variable will always equal 0. Hence, take Bitwise-OR(A,B):
FZ_Or(X => A, Y => B, Result => A);

-- Reintroduce the common power-of-2 factor stored in 'Twos'
FZ_Quiet_ShiftLeft(N => A, ShiftedN => A, Count => Indices(Twos));

-- Output final result -- the GCD.
Result := A;

end FZ_Greatest_Common_Divisor;

end FZ_GCD;```

It is very slightly more expensive than the Ch.15 routine: by one MUX (in the loop) and one FZ_Or (outside of the loop.) But this is not a tragedy. Overall it comes out to cost just about the same, after taking into account the reduced (by one) number of iterations.

Unsurprisingly, this corrected variant of GCD passes the Ch.15 test battery; properly swallows both of the Ch.15 counter-examples given earlier; and, of course, the...

Ch. 21A-Bis GCD "Worst Case" Tape for 256-bit FZ :

```  .8000000000000000000000000000000000000000000000000000000000000000
.FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
G
#```

... correctly produces the output 1, converging -- as predicted -- in the 511th and final iteration.

To generate this test for any FZ Width, all you need is:

Ch. 21A-Bis GCD "Worst Case" Tape for arbitrary FZ Width :

```  .1 .0~W.1- LS
.0~
G
#```

~ The next Chapter, 21B, will continue the Extended-GCD sequence of Chapter 21A. ~

# "Finite Field Arithmetic." Chapter 21A: Extended GCD and Modular Multiplicative Inverse. (Part 1 of 3)

This article is part of a series of hands-on tutorials introducing FFA, or the Finite Field Arithmetic library. FFA differs from the typical "Open Sores" abomination, in that -- rather than trusting the author blindly with their lives -- prospective users are expected to read and fully understand every single line. In exactly the same manner that you would understand and pack your own parachute. The reader will assemble and test a working FFA with his own hands, and at the same time grasp the purpose of each moving part therein.

You will need:

• All of the materials from the Chapters 1 through 15.
• There is no vpatch in Chapter 21A.
• For best results, please read this chapter on a colour display!

ACHTUNG: The subject of the Chapter 21 originally promised at the conclusion of Chapter 20 is postponed. That topic will be revisited later!

On account of the substantial heft of Chapter 21, I have cut it into three parts: 21A, 21B, and 21C. You are presently reading 21A, which consists strictly of an introduction to FFA's variant of the Extended GCD algorithm and its proof of correctness.

Recall that in Chapter 18C, we demonstrated the use of FFA and Peh to generate cryptographic primes, of the kind which are used in e.g. the RSA public key cryptosystem.

We also performed GPG-RSA signature verifications using Peh, and learned how to adapt traditional GPG public keys for use with it.

But why have we not yet seen how to generate a new RSA keypair ?

The still-missing mathematical ingredient is an operation known as the modular multiplicative inverse.

This is a process where, provided an integer N and a non-zero modulus M, we find (or fail to find -- its existence depends on the co-primality of the inputs) the integer X where:

NX ≡ 1 (mod M).

... or, in FFAistic terminology, where FFA_FZ_Modular_Multiply(N, X, M) would equal 1.

In RSA, this operation is used when deriving a public key corresponding to a given private key. (It is also used when ensuring that a selected pair of secret primes has certain properties.) The exact details of this process will be the subject of Chapter 22.

Typically, the modular multiplicative inverse is obtained via the Extended GCD algorithm. It differs from the ordinary GCD of Chapter 15 in that, when given inputs X, Y, it returns, in addition to their greatest common divisor, the Bezout coefficients: a pair of integers P, Q which satisfy the equation:

PX + QY = GCD(X,Y)

The modular multiplicative inverse of N modulo M exists strictly when GCD(N, M) = 1.

Therefore, in that case,

NX + MY = GCD(N,M) = 1

... which can be written as:

NX - 1 = (-Y)M

i.e.:

NX ≡ 1 (mod M).

The reader has probably already guessed that the typical heathen formulation of Extended GCD, i.e. one which uses a series of divisions with remainder, is unsuitable as a starting point for writing FFA's Extended GCD -- for the same reason as why the analogous Euclidean GCD was unsuitable (and rejected) for writing the ordinary GCD routine of Chapter 15. Division is the most expensive arithmetical operation, and it would have to be repeated a large number of times, while discarding many of the outputs (a la the old-style non-Barrettian modular exponentiation of Chapter 6.)

Fortunately, it is possible to write a much more efficient constant-spacetime algorithm for Extended GCD, rather closely similar to the one we used for ordinary GCD.

Our algorithm of choice for a properly FFAtronic Extended GCD will be very similar to the Binary Extended GCD pictured in Vanstone's Applied Cryptography.

However, in addition to constant-spacetime operation, our algorithm will differ from the traditional one by avoiding any internal use of signed arithmetic, and will produce outputs having the form :

PX - QY = GCD(X,Y)

Because the Extended GCD algorithm chosen for FFA is not wholly identical to any previously-published one, on account of having to satisfy the familiar requirements, I decided that a full proof of correctness for it ought to be given prior to publishing Chapter 21's Vpatch.

Let's proceed with deriving this algorithm and its correctness proof.

As a starting point, let's take the traditional non-constant-time Stein's GCD given as Algorithm 2 at the start of Chapter 15:

Iterative Quasi-Stein GCD from Chapter 15.

For integers A ≥ 0 and B ≥ 0:

1. Twos := 0
2. Iterate until B = 0:
3.    Ae, Be := IsEven(A), IsEven(B)
4.    A      := A >> Ae
5.    B      := B >> Be
6.    Twos   := Twos + (Ae AND Be)
7.    Bnext   := Min(A, B)
8.    Anext   := |A - B|
9.    A, B   := Anext, Bnext
10. A := A × 2Twos
11. A contains the GCD.

Now, suppose that Twos -- the common power-of-two factor of A and B -- were known at the start of the algorithm, and that it were to be divided out of both numbers prior to entering the loop. The rewritten algorithm could look like this:

Algorithm 1: Quasi-Stein GCD with "Twos" Removal.

For integers X ≥ 0 and Y ≥ 0:

1. Twos := Find_Common_Power_Of_Two_Factor(X, Y)
2. X    := X >> Twos
3. Y    := Y >> Twos
4. A    := X
5. B    := Y
6. Iterate until B = 0:
7.    A      := A >> IsEven(A)
8.    B      := B >> IsEven(B)
9.    Bnext   := Min(A, B)
10.    Anext   := |A - B|
11.    A, B   := Anext, Bnext
12. A := A << Twos
13. A contains GCD(X,Y).

Suppose that we have a correct definition of Find_Common_Power_Of_Two_Factor. (For now, let's merely suppose.)

Algorithm 1 will do precisely the same job as Iterative Quasi-Stein GCD. However, it has one new property -- presently useless -- of which we will later take advantage: because 2Twos (which could be 1, i.e. 20, if either input were odd to begin with) was divided out of X and Y in steps 2 and 3 -- upon entering the loop at step 7, at least one of X, Y can be now relied upon to be odd. (If X and Y were both equal to zero at the start, step 7 is never reached, and the algorithm terminates immediately.)

Now we want to extend Algorithm 1 to carry out Extended GCD.

First, let's introduce two pairs of additional variables. Each pair will represent the coefficients in an equation which states A or B in terms of multiples of X and Y: Ax, Ay for A; and Bx, By for B :

 A = AxX - AyY B = ByY - BxX

Elementarily, if we were to give these coefficients the following values:

 Ax := 1 Ay := 0 Bx := 0 By := 1

... both equations will hold true, regardless of the particular values of A and B.

Of course, this alone does not give us anything useful.

However, if we assign these base-case values to the coefficients at the start of the algorithm, and correctly adjust them every time we make a change to A or B, the equations will likewise hold true after the completion of the loop, and, in the end, it will necessarily be true that:

 AxX - AyY = GCD(X,Y)

... and, importantly, also true that :

 AxX - AyY = GCD(X,Y)

When A or B is divided by two, we will divide each of the coefficients in the corresponding pair (Ax, Ay or Bx, By respectively) likewise by two. When A and B switch roles, the pairs of coefficients will likewise switch roles. When A is subtracted from B, or vice-versa, the respective pairs will be likewise subtracted from one another, keeping the invariant.

In order to keep the invariant at all times, it will be necessary to apply certain transformations. The mechanics of these transformations will account for most of the moving parts of our Extended GCD algorithm.

To make it clear where we will want to put the new variables and the mechanisms for keeping the invariant, let's rewrite Algorithm 1 in traditional branched-logic form:

Algorithm 2. Quasi-Stein with Branched Logic.

1. Twos := Find_Common_Power_Of_Two_Factor(X, Y)
2. X    := X >> Twos
3. Y    := Y >> Twos
4. A    := X
5. B    := Y
6. Iterate until B = 0:
7.    If IsOdd(A) and IsOdd(B) :
8.       If B < A :
9.          A := A - B
10.       Else :
11.          B := B - A
12.    If IsEven(A) :
13.       A := A >> 1
14.    If IsEven(B) :
15.       B := B >> 1
16. A := A << Twos
17. A contains GCD(X,Y).

Now, let's work out what must be done to the coefficients Ax, Ay and Bx, By when we carry out the operations A - B and B - A (steps 9 and 11, respectively) to keep the invariant :

 A - B = (AxX - AyY) - (ByY - BxX) = (Ax + Bx)X - (Ay + By)Y B - A = (ByY - BxX) - (AxX - AyY) = (Ay + By)Y - (Ax + Bx)X

If we write these simplified expressions side-by-side with the original invariant equations :

 A = AxX - AyY A - B = (Ax + Bx)X - (Ay + By)Y B = ByY - BxX B - A = (Ay + By)Y - (Ax + Bx)X

... it becomes quite obvious. In both the A - B and the B - A cases, we only need to compute the sums Ax + Bx and Ay + By ; the only difference will consist of whether they must be assigned, respectively, to Ax, Ay or Bx, By.

Now, suppose we introduce these new parts into Algorithm 2, and end up with :

Algorithm 3. A naive attempt at Extended GCD.

1. Twos := Find_Common_Power_Of_Two_Factor(X, Y)
2. X    := X >> Twos
3. Y    := Y >> Twos
4. A    := X
5. B    := Y
6. Ax   := 1
7. Ay   := 0
8. Bx   := 0
9. By   := 1
10. Iterate until B = 0:
11.    If IsOdd(A) and IsOdd(B) :
12.       If B < A :
13.          A  := A - B
14.          Ax := Ax + Bx
15.          Ay := Ay + By
16.       Else :
17.          B  := B - A
18.          Bx := Bx + Ax
19.          By := By + Ay
20.    If IsEven(A) :
21.       A  := A  >> 1
22.       Ax := Ax >> 1
23.       Ay := Ay >> 1
24.    If IsEven(B) :
25.       B  := B  >> 1
26.       Bx := Bx >> 1
27.       By := By >> 1
28. A := A << Twos
29. A contains GCD(X,Y).
30. AxX - AyY = GCD(X,Y).

Unfortunately, Algorithm 3 will not work; the equation at step 30 will not hold true. And the attentive reader probably knows why.

For the inattentive: the erroneous logic is marked in red.

The problem is that we have no guarantee that Ax, Ay or Bx, By are in fact even at steps 22,23 and 26,27 where they are being divided by two. An entire bit of information "walks away into /dev/null" every time one of these coefficients turns out to have been odd. And then, the invariant no longer holds. Instead of correct coefficients Ax, Ay at step 30, we will end up with rubbish.

The pill needed here is known (according to D. Knuth) as Penk's Method. (I have not succeeded in discovering who, when, or where, Penk was. Do you know? Tell me! A reader found the works of Penk.)

This method, as it happens, is not complicated. And it looks like this:

Algorithm 4. Extended GCD with Penk's Method.

1. Twos := Find_Common_Power_Of_Two_Factor(X, Y)
2. X    := X >> Twos
3. Y    := Y >> Twos
4. A    := X
5. B    := Y
6. Ax   := 1
7. Ay   := 0
8. Bx   := 0
9. By   := 1
10. Iterate until B = 0:
11.    If IsOdd(A) and IsOdd(B) :
12.       If B < A :
13.          A  := A - B
14.          Ax := Ax + Bx
15.          Ay := Ay + By
16.       Else :
17.          B  := B - A
18.          Bx := Bx + Ax
19.          By := By + Ay
20.    If IsEven(A) :
21.       A  := A  >> 1
22.       If IsOdd(Ax) or IsOdd(Ay) :
23.          Ax := Ax + Y
24.          Ay := Ay + X
25.       Ax := Ax >> 1
26.       Ay := Ay >> 1
27.    If IsEven(B) :
28.       B  := B  >> 1
29.       If IsOdd(Bx) or IsOdd(By) :
30.          Bx := Bx + Y
31.          By := By + X
32.       Bx := Bx >> 1
33.       By := By >> 1
34. A := A << Twos
35. A contains GCD(X,Y).
36. AxX - AyY = GCD(X,Y).

Of course, for our purposes, it does not suffice to merely know what it looks like; we must also understand precisely why it is guaranteed to work !

At this point, the reader should be satisfied with the logic of the Odd-A-Odd-B case; and therefore knows that the invariant in fact holds when we first reach either of the two places where Penk's Method is applied.

Let's start by proving that what Penk does to Ax, Ay (in steps 22..24) and Bx, By (in steps 29..31) does not itself break the invariant.

Review the invariant again:

 A = AxX - AyY B = ByY - BxX

... and see that in the first case,

1.       If IsOdd(Ax) or IsOdd(Ay) :
2.          Ax := Ax + Y
3.          Ay := Ay + X

... the A invariant equation holds :

 (Ax + Y)X - (Ay + X)Y = (AxX + XY) - (AyY + XY) = AxX - AyY = A

And in the second case,

1.       If IsOdd(Bx) or IsOdd(By) :
2.          Bx := Bx + Y
3.          By := By + X

... the B invariant equation holds :

 (By + X)Y - (Bx + Y)X = (ByY + XY) - (BxX + XY) = ByY - BxX = B

The added X and Y terms simply cancel out in both cases.

However, it remains to be proven that Penk's Method actually resolves our problem, rather than merely avoids creating a new one; i.e. that these operations in fact guarantee the evenness of the coefficients prior to their division by two.

Let's demonstrate this for the Even-A case first; later, it will become apparent that the same approach works likewise in the Even-B case.

At the start of step 21, we know that A is even :

1.    If IsEven(A) :
2.       A  := A  >> 1
3.       If IsOdd(Ax) or IsOdd(Ay) :
4.          Ax := Ax + Y
5.          Ay := Ay + X

And, at all times, we know that X and Y cannot be even simultaneously (because we have divided out 2Twos from each of them.)

On top of all of this: since we graduated from kindergarten, we also know about the following properties of arithmetic operations upon odd and even numbers:

 +/- Odd Even Odd Even Odd Even Odd Even

Parity of Multiplication:

 × Odd Even Odd Odd Even Even Even Even

So, what then do we know about the parities of the working variables at step 21? Let's illustrate all possible combinations of parities, using the above colour scheme; and observe that we only need the A invariant equation for this. As it happens, there are exactly two possible combinations of parities for its terms :

 A = AxX - AyY A = AxX - AyY

Elementarily, given that A is even at this step, both terms of its invariant equation must have the same parity.

Let's produce a table where all possible combinations of parities of the variables having an unknown parity (Ax, Ay) are organized by the known parities of the variables having a known parity (X, Y, A).

In yellow, we will mark variables which could be either odd or even in a particular combination; in red, those which are known to be odd; in green, those known to be even; and in grey, those for which neither of the possible parities would make the known parities of X, Y, and A in that combination possible, and therefore imply a logical impossibility :

 Odd(X) Even(X) Odd(X) Odd(Y) Odd(Y) Even(Y) A = Ax X - Ay Y Ax X - Ay Y Ax X - Ay Y Ax X - Ay Y Ax X - Ay Y Ax X - Ay Y

Now, let's remove the two impossible entries from the above table:

 Odd(X) Even(X) Odd(X) Odd(Y) Odd(Y) Even(Y) A = Ax X - Ay Y A would be Odd! A would be Odd! Ax X - Ay Y Ax X - Ay Y Ax X - Ay Y

And now suppose that we have reached step 23. Therefore we already know that at least one of Ax and Ay is odd. Let's remove from the table the only entry where this is not true; and, finally, in all of the remaining entries, indicate the deduced parity of the variables whose parity was previously indeterminate :

 Odd(X) Even(X) Odd(X) Odd(Y) Odd(Y) Even(Y) A = Ax X - Ay Y A would be Odd! A would be Odd! Ax,Ay are Even Ax X - Ay Y Ax X - Ay Y

All of the parities are now determined.

We are left with only three possible combinations of parities of the terms of A which could exist at step 23. So let's list what happens to each of them when Penk's Method is applied, i.e. Y is added to Ax and X is added to Ay :

 Possible Parity Combos Parity of Ax + Y ? Parity of Ay + X ? A = Ax X - Ay Y Ax + Y = Even Ay + X = Even Ax X - Ay Y Ax + Y = Even Ay + X = Even Ax X - Ay Y Ax + Y = Even Ay + X = Even

In the Even-A case, Penk's Method works, QED. It is difficult to think of how this could be made more obvious. The parities on both sides of the + sign always match, and so we have forced both Ax and Ay to be even, without breaking the invariant equation for A.

And now, how about the Even-B case ? Suppose that we have reached step 30. Let's do the same thing as we have done for the Even-A case:

 B = ByY - BxX B = ByY - BxX

... does this remind you of anything you've seen before?

Instead of telling the nearly-same and rather tedious story twice, let's leave step 30 as an...

Chapter 21A, Exercise #1: Prove that Penk's Method is correct in the Even-B case.

Chapter 21A, Exercise #2: Why are simultaneously-even X and Y forbidden in this algorithm?

Now, we are ready to rewrite Algorithm 4, this time grouping any repeating terms together, as follows:

Algorithm 5. Extended GCD with Penk's Method and Unified Terms.

For integers X ≥ 1 and Y ≥ 0:

1. Twos := Find_Common_Power_Of_Two_Factor(X, Y)
2. X    := X >> Twos
3. Y    := Y >> Twos
4. A    := X
5. B    := Y
6. Ax   := 1
7. Ay   := 0
8. Bx   := 0
9. By   := 1
10. Iterate until B = 0:
11.    If IsOdd(A) and IsOdd(B) :
12.       D  := |B - A|
13.       Sx :=  Ax + Bx
14.       Sy :=  Ay + By
15.       If B < A :
16.          A  := D
17.          Ax := Sx
18.          Ay := Sy
19.       Else :
20.          B  := D
21.          Bx := Sx
22.          By := Sy
23.    If IsEven(A) :
24.       A  := A  >> 1
25.       If IsOdd(Ax) or IsOdd(Ay) :
26.          Ax := Ax + Y
27.          Ay := Ay + X
28.       Ax := Ax >> 1
29.       Ay := Ay >> 1
30.    If IsEven(B) :
31.       B  := B  >> 1
32.       If IsOdd(Bx) or IsOdd(By) :
33.          Bx := Bx + Y
34.          By := By + X
35.       Bx := Bx >> 1
36.       By := By >> 1
37. A := A << Twos
38. A contains GCD(X,Y).
39. AxX - AyY = GCD(X,Y).

At this point, we have something which closely resembles commonly-used traditional variants of the Binary Extended GCD algorithm. With the not-unimportant difference that all of the working variables stay positive throughout. This way, we will avoid the need to introduce signed arithmetic into the internals of FFA.

Should we proceed with rewriting Algorithm 5 using constant-time operations? ...or is something important still missing? What is it ?

Chapter 21A, Exercise #3: What, other than the use of FFA-style constant-time arithmetic, is missing in Algorithm 5 ?

~To be continued in Chapter 21B.~

# The Fossil Vault.

I have recently built two new WWW mirrors containing certain publicly-available software:

## 1. Historic Gentoo Distfiles.

I have been using Gentoo for nearly all Linux-related work since 2007. It was "the lesser of evils": to my knowledge, no other Linux variant ever offered a comparable level of rodenticidal control, while at the same time providing an adequate means of automatically cutting through "dependency hell".

This uniqueness made Gentoo a target for concerted attack by the Enemy, on a variety of fronts. Through creeping Poetteringism, the cancerous blight of GCC 5+, and the eventual decay -- fostered by corrupt maintainers -- of the portage system into unusability, it became nearly impossible to set up a hygienic Gentoo system from scratch.

The minute you try emerge --sync, you will be force-fed a boiling ocean of liquid shit under the name of "progress". And the machine will have to be scrubbed and re-OSed, if you wish to continue with civilized life on it.

Eventually I resorted to creating "canned" Gentoos using bitwise copies of old installations. (E.g. this system for ARM-64, and this one for AMD-64, are provided for my ISP service customers.)

One obvious problem with the "canned" approach is the decay of the official Gentoo distfiles mirrors. At one time these operated purely by accretion, i.e. added new packages while preserving the old. At a certain point this changed, and the servants of "progress" began to sabotage the ecosystem by deliberately removing "ungodly" packages. Mirror operators which refused to participate in this "cultural revolution" were delisted and you will not find them via the Gentoo WWW -- even supposing their mirrors are still standing (many have simply given up.)

Until and unless a cultural "reset" takes place, and something like what Gentoo tried to be in 2007 again exists as a living entity, I intend to continue the use of my hand-curated "fossilized" variants. And to make this easier for others, I have put together a public distfiles mirror:

## http://dulap.xyz/gentoo/distfiles

... using the contents of my backup tapes from a multitude of Gentoo systems I have operated.

Gentoo users may add this URL to their GENTOO_MIRRORS list in /etc/portage/make.conf; or download any necessary tarballs into their local /usr/portage/distfiles by hand; this is a matter of taste.

WARNING: this repository is offered with no warranty or endorsement of any kind, and certainly contains software with dangerous flaws. It does not necessarily reflect the current configuration of any of the Gentoo machines I presently use (although packages from both the RK and the "Dulap" variant's default distfiles directories are all present.) I have trimmed some -- but by no means all! -- of the obvious garbage. It goes without saying that I am not responsible for the contents of the tarballs, or even their integrity. Please do not use tarballs for which you do not have an authoritative signature for safety-critical work! From this -- or any other public repository.

Any and all questions regarding the above -- I prefer to answer here.

People from my L1 WoT who wish to contribute specimens to this collection, are invited to contact me.

At this time, the Gentoo collection weighs ~16GB.

## 2. GNAT.

The following mirror now contains a multitude of GNAT and miscellaneous Ada-related packages/dependencies, obtained on April 10, 2020 via JFW's method:

READMEs, as well as MS-Win and Apple binaries have been omitted. Packages with duplicate names are stored in the "dupes" subdirectories (1, 2, 3, 4, 5, 6, 7)

The same warning as given for the Gentoo repository applies to this collection.

At this time, the Ada collection weighs ~17GB. Aside from the binaries removal, this set was not curated in any way.

Readers are encouraged to mirror the mirrors. Anyone who has done this, is encouraged to leave a comment here, with the URL of the new mirror.

# "Finite Field Arithmetic." Chapter 20D: "Litmus" Errata: Support for Nested Clearsigned Texts.

This article is part of a series of hands-on tutorials introducing FFA, or the Finite Field Arithmetic library. FFA differs from the typical "Open Sores" abomination, in that -- rather than trusting the author blindly with their lives -- prospective users are expected to read and fully understand every single line. In exactly the same manner that you would understand and pack your own parachute. The reader will assemble and test a working FFA with his own hands, and at the same time grasp the purpose of each moving part therein.

You will need:

Add the above vpatches and seals to your V-set, and press to ffa_ch20d_litmus_nested_fix.kv.vpatch.

As of Chapter 20D, the versions of Peh and FFA are 250 and 253, respectively. FFA and Peh themselves have not changed from Chapter 19.

Compile Peh:

```cd ffacalc
gprbuild```

... and install it to a path visible in your shell (e.g. /usr/bin.)

Litmus as given in the previous chapter would choke (failing safe, i.e. rejecting the signature) when fed "nested" clearsigned texts. This bug has been fixed:

```# 'ASCII-Armoured' PGP signatures have mandatory start and end markers:
START_MARKER="^\-\-\-\-\-BEGIN PGP SIGNATURE\-\-\-\-\-"
END_MARKER="^\-\-\-\-\-END PGP SIGNATURE\-\-\-\-\-"

.......

CLEAR_MARKER="^\-\-\-\-\-BEGIN PGP SIGNED MESSAGE\-\-\-\-\-"

.......```

To test, invoke Litmus with two arguments: the public key, followed by the input file:

```-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

- -----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

This is a test-fire GPG 'clearsigned' document written for
FFA Chapter 20C.
- -----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iQEcBAEBCgAGBQJeGimSAAoJELmCKKABq//HfgsH/1bg0zpL1PoApHE5BMNXmJ7U
7vWKQJ3ZCsGvkUSIgIZESTp6EpGDclBQqvBsdBZ9KVnBpGiIbmKbNjDNlZU5YcWt
MCZXkR6Lfd3rYKHcySOEvkoCMQ4ytbEQM4IBE57AqQIpmnI7Mo1033TnDgL2KdXd
PHLtedNyaKwzVJw08BtKjzole130GvElSMmxilrHZ0w+2J4uHRuBELyZbJ55vu+x
Bx+Q5OpMwz0ZG4vjz8ncZE/nCvK/sZv/RXsdNow95fETH5DKC9HfIP279kKlBRg=
=3ozu
- -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iQEcBAEBCgAGBQJeHgg9AAoJELmCKKABq//HHywH/3cfUfV8SH2yuxHNBPmk3Yha
Nw/40YK52/DJ/H+RUY6Zei2X3tJPYe3vDN2P3iRqKBycpUPtNnCDaF3BAvg7cyRQ
PghpykoSeERxIQZaIQWTUyMQRBwoMF90wiQpcgCEXRH3xZo/p0M3M6zPrqKElXf4
essZmM3jsvfU9T8JGho0RPyK1J42fTBCiRb0Y++ZQGWVEwJugtVnQOL76fYmSUpW
vDBKgNJfvGOMNTCTLemqD2nn6DZzLN9TOrRlmLvfr0lbzu9rSGAMtRKqozhZeCXf
LiTWWVZkCUGr/SEk5olhbHQnfoYMH1V071qJnpv1/5QqIiZ1z2KbP65Ba/i3i0A=
=o/9T
-----END PGP SIGNATURE-----```

`./litmus.sh asciilifeform.peh asciilifeform-clearsigned-twice.txt`

... which will yield the output:

`VALID GPG RSA signature from asciilifeform <[email protected]>`

Naturally, no verification is performed on the embedded clearsigned text (just as in traditional GPG. If you want to operate on the embedded text, you must extract it.)

~To be continued!~

# "Finite Field Arithmetic." Chapter 20C: Support for 'Clearsigned' GPG texts in "Litmus."

This article is part of a series of hands-on tutorials introducing FFA, or the Finite Field Arithmetic library. FFA differs from the typical "Open Sores" abomination, in that -- rather than trusting the author blindly with their lives -- prospective users are expected to read and fully understand every single line. In exactly the same manner that you would understand and pack your own parachute. The reader will assemble and test a working FFA with his own hands, and at the same time grasp the purpose of each moving part therein.

You will need:

Add the above vpatches and seals to your V-set, and press to ffa_ch20c_litmus_clearsigned.kv.vpatch.

As of Chapter 20C, the versions of Peh and FFA are 250 and 253, respectively. FFA and Peh themselves have not changed from Chapter 19.

Compile Peh:

```cd ffacalc
gprbuild```

... and install it to a path visible in your shell (e.g. /usr/bin.)

Litmus now supports GPG "clearsigned" texts. Compatibility with the program given in the previous chapter is retained. The "clearsigned" mode of operation is selected simply by invoking the script with two arguments instead of the usual three:

```.......

# Whether we are working on a 'clearsigned text'
CLEARSIGN_MODE=false

# Set up in the selected mode:
case \$ARGCOUNT in
2) # If given two arguments, verify a 'clearsigned' text file:
CLEARSIGN_MODE=true
# The processed payload will end up in a temporary file:
DATAFILE=\$(mktemp) || { echo "Failed to create temp file!" >&2; \
exit \$RET_EGGOG; }
# On exit, if in 'clearsign' mode, remove temporary file with payload:
trap remove_temp_file EXIT
# Expect 'Canonical Text Signature' in GPG sig packet turd
expect_sig_class=1
;;
3) # Verify Detached Signature on given Data File (third argument is path):
# The given Data file to be verified against the Signature
DATAFILE=\$3 # i.e. path given on command line
# Expect 'Detached Binary Signature' in GPG sig packet turd
expect_sig_class=0
;;
*) # If invalid arg count -- print usage and abort:
echo "Usage: \$0 publickey.peh signature.sig datafile"
echo "   or: \$0 publickey.peh clearsigned.txt"
exit \$RET_EGGOG
;;
esac
.......```

RFC4880 (the document GPG nominally conforms to) specifies certain transformations required to obtain the hashable payload from the document (e.g. imposition of MSDOS-style line endings) which were implemented as follows:

```.......

# If we are operating on a 'clearsigned' text file, \$DATAFILE will be
# an empty temporary file, and the payload is to be extracted to it,

.......

if [ \$CLEARSIGN_MODE == true ]
then
# Find position of 'clearsign' payload start marker:
CLEAR_MARKER="\-\-\-\-\-BEGIN PGP SIGNED MESSAGE\-\-\-\-\-"
start_clr=\$(grep -m 1 -n "\$CLEAR_MARKER" \$SIGFILE | cut -d ':' -f1)

if [ "\$start_clr" == "" ]
then
eggog_broken_clearsigned
fi

start_clr=\$((\$start_clr + 1))

# The payload ends with the line preceding the sig start:
end_clr=\$((start_ln - 2))

start_body=\$(tail -n "+\$start_clr" \$SIGFILE | \
grep -v -n -m 1 "^Hash:" | cut -d ':' -f1)

# Skip the above headers and mandatory empty line:
start_clr=\$((\$start_clr + \$start_body))

# If there is no payload, or the markers are misplaced, abort:
if [ \$start_clr -ge \$end_clr ]
then
eggog_broken_clearsigned
fi

# Extract the 'clearsign' payload to the temporary file:
cat \$SIGFILE | sed -n "\$start_clr,\$end_clr p" | \
sed 's/[ \t]*\$//; s/^- //' | \
awk '{printf("%s\r\n",\$0)}' \
> \$DATAFILE

# Remove the trailing CR,LF ending:
truncate -s -2 \$DATAFILE

# After this, proceed exactly like with 'detached' sigs, but
# with the expected 'class' being 1 rather than 0.
fi

.......```

To verify a "clearsigned" message, invoke Litmus with two arguments: the public key, followed by the input file, e.g.:

`./litmus.sh asciilifeform.peh asciilifeform-clearsigned.txt`

... which will yield the output:

`VALID GPG RSA signature from asciilifeform <[email protected]>`

~To be continued!~

# "Finite Field Arithmetic." Chapter 20B: Support for Selected Ancient Hash Algos in "Litmus."

This article is part of a series of hands-on tutorials introducing FFA, or the Finite Field Arithmetic library. FFA differs from the typical "Open Sores" abomination, in that -- rather than trusting the author blindly with their lives -- prospective users are expected to read and fully understand every single line. In exactly the same manner that you would understand and pack your own parachute. The reader will assemble and test a working FFA with his own hands, and at the same time grasp the purpose of each moving part therein.

You will need:

Add the above vpatches and seals to your V-set, and press to ffa_ch20b_litmus_legacy_hashes.kv.vpatch.

As of Chapter 20B, the versions of Peh and FFA are 250 and 253, respectively. FFA and Peh themselves have not changed from Chapter 19.

Compile Peh:

```cd ffacalc
gprbuild```

... and install it to a path visible in your shell (e.g. /usr/bin.)

In the course of experimenting with the subject of the previous Chapter, I found that a number of current and past Vtronicists had misconfigured their GPG, and emit suboptimal signatures (i.e. not SHA-512, the strongest of the ancient hashes supported by that utility). And so I added support for these. (Litmus will emit a warning if such a signature is encountered.)

The following routine has been added to Litmus:

```# If Sig was made with an unsupported hash algo:
eggog_unsupported_hash() {
algo=\$1
echo "This sig uses an unsupported Digest Algo: \$1 !" >&2
exit \$RET_EGGOG
}

.......

# Warnings:
achtung() {
echo "WARNING: \$1" >&2
}

.......

# Digest Algo (only certain hash algos are supported)
get_sig_bytes 1
turd+=\$r
hex_to_int
sig_digest_algo=\$r

# If hash algo is supported, get ASN turd and MD_LEN; and if not, eggog:
case \$sig_digest_algo in
1)  ## MD5 -- NOT SUPPORTED ##
eggog_unsupported_hash "MD5"
;;

2)  ## SHA1 ##
achtung "This sig was made with SHA-1, which is cheaply breakable!"
HASHER="shasum -a 1 -b"
ASN="3021300906052b0e03021a05000414"
MD_LEN=20
;;

3)  ## RIPE-MD/160 -- NOT SUPPORTED ##
eggog_unsupported_hash "RIPE-MD/160"
;;

8)  ## SHA256 ##
achtung "This sig was made with SHA-256; GPG supports SHA-512."
HASHER="shasum -a 256 -b"
ASN="3031300d060960864801650304020105000420"
MD_LEN=32
;;

9)  ## SHA384 ##
achtung "This sig was made with SHA-384; GPG supports SHA-512."
HASHER="shasum -a 384 -b"
ASN="3041300d060960864801650304020205000430"
MD_LEN=48
;;

10) ## SHA512 ##
HASHER="shasum -a 512 -b"
ASN="3051300D060960864801650304020305000440"
MD_LEN=64 # 512 / 8 == 64 bytes
;;

11) ## SHA224 ##
achtung "This sig was made with SHA-224; GPG supports SHA-512."
HASHER="shasum -a 224 -b"
ASN="302D300d06096086480165030402040500041C"
MD_LEN=28
;;

*)  ## Unknown Digest Type ##
eggog_unsupported_hash "UNKNOWN (type \$sig_digest_algo)"
;;
esac

# Calculate length (bytes) of the ASN turd for the digest used in the sig:
ASN_LEN=\$((\${#ASN} / 2))

.......```

To test, for instance, verification of SHA-1 signatures (please stop using SHA-1, people! see e.g. here), download this Litmus-converted GPG public key of former contributor Diana Coman, and verify her signature of Chapter 1:

`./litmus.sh diana_coman.peh ffa_ch1_genesis.kv.vpatch.diana_coman.sig ffa_ch1_genesis.kv.vpatch`

... which will yield the output:

```WARNING: This sig was made with SHA-1, which is cheaply breakable!
VALID GPG RSA signature from Diana Coman <[email protected]>```

~To be continued!~

# "Finite Field Arithmetic." Chapter 20: "Litmus", a Peh-Powered Verifier for GPG Signatures.

This article is part of a series of hands-on tutorials introducing FFA, or the Finite Field Arithmetic library. FFA differs from the typical "Open Sores" abomination, in that -- rather than trusting the author blindly with their lives -- prospective users are expected to read and fully understand every single line. In exactly the same manner that you would understand and pack your own parachute. The reader will assemble and test a working FFA with his own hands, and at the same time grasp the purpose of each moving part therein.

You will need:

Add the above vpatches and seals to your V-set, and press to ffa_ch20_litmus.kv.vpatch.

As of Chapter 20, the versions of Peh and FFA are 250 and 253, respectively. FFA and Peh themselves have not changed from Chapter 19.

Compile Peh:

```cd ffacalc
gprbuild```

... and install it to a path visible in your shell (e.g. /usr/bin.)

The subject of this chapter is Litmus: a simple and practical demonstration program powered by Peh. It is expected to work on all reasonable Unix-like systems, so long as the required command-line utilities (see EXTERNALS) are present.

Litmus verifies traditional GPG public key signatures (presently, only of the "detached" type, with RSA and SHA512 hash -- the optimal knob settings available in GPG) and is suitable for use in e.g. Vtronics.

The source code of Litmus appears below in its entirety:

```#!/bin/sh

############################################################################
# 'Litmus' Utility. Verifies traditional GPG RSA signatures using Peh.     #
#                                                                          #
# Usage: ./litmus.sh publickey.peh signature.sig datafile                  #
#                                                                          #
# Currently, supports only RSA 'detached' signatures that use SHA512 hash. #
# See instructions re: converting traditional GPG public keys for use with #
# this program.                                                            #
#                                                                          #
# Peh, xxd, hexdump, shasum, and a number of common utils (see EXTERNALS)  #
# must be present on your machine.                                         #
#                                                                          #
# (C) 2020 Stanislav Datskovskiy ( www.loper-os.org )                      #
# http://wot.deedbot.org/17215D118B7239507FAFED98B98228A001ABFFC7.html     #
#                                                                          #
# You do not have, nor can you ever acquire the right to use, copy or      #
# distribute this software ; Should you use this software for any purpose, #
# or copy and distribute it to anyone or in any manner, you are breaking   #
# the laws of whatever soi-disant jurisdiction, and you promise to         #
# continue doing so for the indefinite future. In any case, please         #
# always : read and understand any software ; verify any PGP signatures    #
# that you use - for any purpose.                                          #
############################################################################

# External programs that are required (if not found, will eggog) :
EXTERNALS="peh xxd hexdump base64 shasum cut tr sed wc grep printf"

# Return Codes:

# Signature is VALID for given Sig, Data File, and Public Key:
RET_VALID_SIG=0

# Signature is INVALID:

# All Other Cases:
RET_EGGOG=-1

# Terminations:

# Success (Valid RSA signature) :
done_sig_valid() {
echo "VALID \$pubkey_algo signature from \$pubkey_owner"
exit \$RET_VALID_SIG
}

# Failure (INVALID RSA signature) :
echo "Signature is INVALID for this public key and input file!"
}

# Failure in decoding 'GPG ASCII armour' :
eggog_sig_armour() {
echo "\$SIGFILE could not decode as a GPG ASCII-Armoured Signature!" >&2
exit \$RET_EGGOG
}

# Failure from corrupt signature :
eggog_sig_corrupt() {
echo "\$SIGFILE is corrupt!" >&2
exit \$RET_EGGOG
}

# Failure from bad Peh :
eggog_peh() {
echo "EGGOG in executing Peh tape! Please check Public Key." >&2
exit \$RET_EGGOG
}

# Number of Arguments required by this program:
REQD_ARGS=3

# If invalid arg count, print usage and abort:
if [ "\$#" -ne \$REQD_ARGS ]; then
echo "Usage: \$0 publickey.peh signature.sig datafile"
exit \$RET_EGGOG
fi

# We only support SHA512. Parameters for it:
HASHER="shasum -a 512 -b"

# For 'PKCS' encoding, the ASN magic turd corresponding to SHA512:
ASN="3051300D060960864801650304020305000440"
ASN_LEN=\$((\${#ASN} / 2))
MD_LEN=64 # 512 / 8 == 64 bytes

# Minimal Peh Width (used for non-arithmetical ops, e.g. 'Owner')
MIN_PEH_WIDTH=256

# The given public key file (a Peh tape, see docs)
PUBFILE=\$1

# The given Detached GPG Signature file to be verified
SIGFILE=\$2

# The given Data file to be verified against the Signature
DATAFILE=\$3

# Verify that each of the given input files exists:
FILES=(\$PUBFILE \$SIGFILE \$DATAFILE)
for f in \${FILES[@]}; do
if ! [ -f \$f ]; then
echo "\$f does not exist!" >&2
exit \$RET_EGGOG
fi
done

# Calculate length of the pubkey file:
PUBFILE_LEN=\$(wc -c \$PUBFILE | cut -d ' ' -f1)

# Peh's Return Codes
PEH_YES=1
PEH_NO=0
PEH_MU=255
PEH_EGGOG=254

# Execute given Peh tape, with given FFA Width and Height,
# on top of the pubkey tape; returns output in \$peh_res and \$peh_code.
run_peh_tape() {
# The tape itself
tape=\$1

# FFA Width for the tape
peh_width=\$2

# FFA Stack Height for the tape
peh_height=\$3

# Compute the length of the given tape
tape_len=\${#tape}

# Add the length of the Public Key tape to the above
tape_len=\$((\$tape_len + \$PUBFILE_LEN))

# Max Peh Life for all such tapes
peh_life=\$((\$tape_len * 2))

# Execute the tape:
peh_res=\$((cat \$PUBFILE; echo \$tape) |
peh \$peh_width \$peh_height \$tape_len \$peh_life);
peh_code=\$?

# # If Peh returned PEH_EGGOG:
if [ \$peh_code -eq \$PEH_EGGOG ]
then
# Abort: likely, coarse error of pilotage in the public key tape.
eggog_peh
fi
}

run_peh_tape "@Algo!QY" \$MIN_PEH_WIDTH 1
pubkey_algo=\$peh_res

run_peh_tape "@Owner!QY" \$MIN_PEH_WIDTH 1
pubkey_owner=\$peh_res

# The only supported algo is GPG RSA:
if [ "\$pubkey_algo" != "GPG RSA" ]
then
echo "This public key specifies algo '\$pubkey_algo';" >&2
echo "The only algo supported is 'GPG RSA' !" >&2
exit \$RET_EGGOG
fi

# Verify that all of the necessary external programs in fact exist:
for i in \$EXTERNALS
do
command -v \$i >/dev/null && continue ||
exit \$RET_EGGOG; }
done

# 'ASCII-Armoured' PGP signatures have mandatory start and end markers:
START_MARKER="-----BEGIN PGP SIGNATURE-----"
END_MARKER="-----END PGP SIGNATURE-----"

# Determine start and end line positions for payload:
start_ln=\$(grep -m 1 -n "\$START_MARKER" \$SIGFILE | cut -d ':' -f1)
end_ln=\$(grep -m 1 -n "\$END_MARKER" \$SIGFILE | cut -d ':' -f1)

# Both start and end markers must exist :
if [ "\$start_ln" == "" ] || [ "\$end_ln" == "" ]
then
echo "\$SIGFILE does not contain ASCII-armoured PGP Signature!" >&2
exit \$RET_EGGOG
fi

start_ln=\$((\$start_ln + 1))
end_ln=\$((\$end_ln - 1))

# If there is no payload, or the markers are misplaced, abort:
if [ \$start_ln -ge \$end_ln ]
then
eggog_sig_armour
fi

sig_payload=\$(sed -n "\$start_ln,\$end_ln p" < \$SIGFILE |
sed -n "/^Version/!p" | sed -n "/^=/!p" | tr -d " tnr")

# If eggog -- abort:
if [ \$? -ne 0 ]
then
eggog_sig_armour
fi

# Obtain the sig bytes:
sig_bytes=(\$(echo \$sig_payload | base64 -d | hexdump -ve '1/1 "%.2x "'))

# If eggog -- abort:
if [ \$? -ne 0 ]
then
eggog_sig_armour
fi

# Number of bytes in the sig file
sig_len=\${#sig_bytes[@]}

# Test that certain fields in the Sig have their mandatory value
sig_field_mandatory() {
f_name=\$1
f_value=\$2
f_mandate=\$3
if [ "\$f_value" != "\$f_mandate" ]
then
reason="\$f_name must equal \$f_mandate; instead is \$f_value."
echo "\$SIGFILE is UNSUPPORTED : \$reason" >&2
echo "Only RSA and SHA512 hash are supported !" >&2
exit \$RET_EGGOG
fi
}

# Starting Position for get_sig_bytes()
sig_pos=0

# Extract given # of sig bytes from the current sig_pos; advance sig_pos.
get_sig_bytes() {
# Number of bytes requested
count=\$1

# Result: \$count bytes from current \$sig_pos (contiguous hex string)
r=\$(echo \${sig_bytes[@]:\$sig_pos:\$count} | sed "s/ //g" | tr 'a-z' 'A-Z')

sig_pos=\$((\$sig_pos + \$count))

# If more bytes were requested than were available in sig_bytes:
if [ \$sig_pos -gt \$sig_len ]
then
# Abort. The signature was mutilated somehow.
eggog_sig_corrupt
fi
}

# Convert the current sig component to integer
hex_to_int() {
r=\$((16#\$r))
}

# Turd to be composed of certain values from the sig, per RFC4880.
# Final hash will run on the concatenation of DATAFILE and this turd.
turd=""

## Parse all of the necessary fields in the GPG Signature:

# CTB (must equal 0x89)
get_sig_bytes 1
sig_ctb=\$r
sig_field_mandatory "Version" \$sig_ctb 89

# Length
get_sig_bytes 2
hex_to_int
sig_length=\$r

# Version (only Version 4 -- what GPG 1.4.x outputs -- is supported)
get_sig_bytes 1
turd+=\$r
sig_version=\$r
sig_field_mandatory "Version" \$sig_version 04

# Class (only class 0 is supported)
get_sig_bytes 1
turd+=\$r
sig_class=\$r
sig_field_mandatory "Class" \$sig_class 00

# Public Key Algo (only RSA is supported)
get_sig_bytes 1
turd+=\$r
sig_pk_algo=\$r
sig_field_mandatory "Public Key Algo" \$sig_pk_algo 01

# Digest Algo (only SHA512 is supported)
get_sig_bytes 1
turd+=\$r
sig_digest_algo=\$r
sig_field_mandatory "Digest Algo" \$sig_digest_algo 0A

# Hashed Section Length
get_sig_bytes 2
turd+=\$r
hex_to_int
sig_hashed_len=\$r

# Hashed Section (typically: timestamp)
get_sig_bytes \$sig_hashed_len
turd+=\$r
sig_hashed=\$r

# Unhashed Section Length
get_sig_bytes 1
hex_to_int
sig_unhashed_len=\$r

get_sig_bytes \$sig_unhashed_len

# Compute Byte Length of Hashed Header (for last field)

# Final section of the hashed turd (not counted in hashed_header_len)
turd+=\$sig_version
turd+="FF"

# Compute the hash of data file and the hashed appendix from sig :
hash=\$((cat \$DATAFILE; xxd -r -p < << \$turd) | \$HASHER | cut -d ' ' -f1)
# Convert to upper case
hash=\$(echo \$hash | tr 'a-z' 'A-Z')

# Parse the RSA Signature portion of the Sig file:

# RSA Packet Length (how many bytes to read)
get_sig_bytes 1
hex_to_int
rsa_packet_len=\$r

# The RSA Packet itself
get_sig_bytes \$rsa_packet_len
rsa_packet=\$r

# Digest Prefix (2 bytes)
get_sig_bytes 2
digest_prefix=\$r

# See whether it matches the first two bytes of the actual computed hash :
computed_prefix=\$(printf "%.4s" \$hash)

if [ "\$digest_prefix" != "\$computed_prefix" ]
then
# It didn't match, so we can return 'bad signature' immediately:
fi

# If prefix matched, we will proceed to do the actual RSA operation.

# RSA Bitness given in Sig
get_sig_bytes 2
hex_to_int
rsa_bitness=\$r

# Compute RSA Byteness from the above
rsa_byteness=\$(((\$rsa_bitness + 7) / 8))

# RSA Bitness for use in determining required Peh width:
rsa_width=\$((\$rsa_byteness * 8))

# Only traditional GPG RSA widths are supported:
if [ \$rsa_width != 2048 ] && [ \$rsa_width != 4096 ] && [ \$rsa_width != 8192 ]
then
reason="Only 2048, 4096, and 8192-bit RSA are supported."
echo "\$SIGFILE is UNSUPPORTED : \$reason" >&2
exit \$RET_EGGOG
fi

# RSA Signature per se (final item read from sig file)
get_sig_bytes \$rsa_byteness
rsa_sig=\$r

# Per RFC4880, 'PKCS' encoding of hash is as follows:
# 0 1 [PAD] 0 [ASN] [MD]

# First two bytes of PKCS-encoded hash will always be 00 01 :
pkcs="0001"

# Compute necessary number of padding FF bytes :
pkcs_pad_bytes=\$((\$rsa_byteness - \$MD_LEN - \$ASN_LEN - 3))

for ((x=1; x< =\$pkcs_pad_bytes; x++)); do
pkcs+="FF"
done

# Attach the 00 separator between the padding and the ASN:
pkcs+="00"

# Attach the ASN ('magic' corresponding to the hash algo) :
pkcs+=\$ASN

# Finally, attach the computed (from Data file) hash itself :
pkcs+=\$hash

# Generate a Peh tape which will attempt to verify \$rsa_sig against the pubkey,
# computing the expression \$rsa_sig ^ PUB_E mod PUB_M and comparing to \$pkcs.
# Outputs 'Valid' and returns Yes_Code (1) if and only if signature is valid.
tape=".\$rsa_sig@Public-Op!.\$pkcs={[Valid]QY}{[Invalid]QN}_"

# Execute the tape:
run_peh_tape \$tape \$rsa_width 3

# 'Belt and suspenders' -- test both output and return code:
# If verification succeeded, return code will be 1, and output 'Valid':
if [ \$peh_code -eq \$PEH_YES ] && [ "\$peh_res" == "Valid" ]
then
# Valid RSA signature:
done_sig_valid
else
# Signature was not valid:
fi
# The end.```

Public keys for use with Litmus are Peh tapes, and their format is illustrated here. Mine (converted from my GPG public key using, for the time being, PGPDump and a few minutes of elbow grease) appears below:

```(----------------------------------------------------------------------------)
( Public key converted from GPG key 17215D118B7239507FAFED98B98228A001ABFFC7 )
(----------------------------------------------------------------------------)
@RSA-Public-Modulus@
.
CDD49A674BAF76D3B73E25BC6DF66EF3ABEDDCA461D3CCB6416793E3437C7806562694
EC37EABEFF3FBE6D3C726DC68E599EBFE5456EF19813398CD7D548D746A30AA47D4293
968BFBAFCBF65A90DFFC87816FEE2A01E1DC699F4DDABB84965514C0D909D54FDA7062
A2037B50B771C153D5429BA4BA335EAB840F9551E9CD9DF8BB4A6DC3ED1318FF3969F7
9535B91B417D0AF14BE09C78B53AF9C5F494BCF2C60349FFA93C81E817AC682F0055A6
07BB56D6A281C1A04CEFE1
;
(----------------------------------------------------------------------------)
@RSA-Public-Exponent@
.
10001
;
LC
(----------------------------------------------------------------------------)
@Public-Op@
(N is on stack) @RSA-Public-Exponent! @RSA-Public-Modulus! MX
;
(----------------------------------------------------------------------------)
@Algo@[GPG RSA];
(----------------------------------------------------------------------------)
@Owner@[asciilifeform <[email protected]>];
(----------------------------------------------------------------------------)
RC```

Now verify the vpatch of this chapter and its signature using Litmus:

`./litmus.sh asciilifeform.peh ffa_ch20_litmus.kv.vpatch.asciilifeform.sig ffa_ch20_litmus.kv.vpatch`

... which should yield the output:

`VALID GPG RSA signature from asciilifeform <[email protected]>`

If you find that Litmus balks on your GPG signatures, ensure that SHA512 (the longest hash supported by old rotten GPG) is selected in your config (typically ~/.gnupg/gpg.conf) :

```personal-digest-preferences SHA512
cert-digest-algo SHA512```

In Chapter 21, we will discuss the astonishing ugliness of the GPG data format, and construct a means for semi-automatic conversion of traditional GPG public keys into the format used by Litmus.

~To be continued!~

# “Finite Field Arithmetic.” Chapter 19: Peh Tuning and Demo Tapes.

This article is part of a series of hands-on tutorials introducing FFA, or the Finite Field Arithmetic library. FFA differs from the typical "Open Sores" abomination, in that -- rather than trusting the author blindly with their lives -- prospective users are expected to read and fully understand every single line. In exactly the same manner that you would understand and pack your own parachute. The reader will assemble and test a working FFA with his own hands, and at the same time grasp the purpose of each moving part therein.

You will need:

Add the above vpatches and seals to your V-set, and press to ffa_ch19_peh_tuning_and_demos.kv.vpatch.

You should end up with the same directory structure as previously.

As of Chapter 19, the versions of Peh and FFA are 250 and 253, respectively.

The demo tapes from Chapter 18C have been added to the Peh v-set and are henceforth available under the subdirectory demos.

Now compile Peh:

```cd ffacalc
gprbuild```

But do not run it quite yet.

First, the mail bag!

Reader PeterL observed that the declare block in:

```   -- Modulus. Permits the asymmetric Dividend and Divisor in FZ_Mod_Exp.
procedure FZ_Mod(Dividend  : in FZ;
Divisor   : in FZ;
Remainder : out FZ) is

-- ......................................
-- ......................................

-- Performs Restoring Division on a given segment of Dividend:Divisor
procedure Slice(Index : Dividend_Index;
Cut   : Divisor_Cuts) is
begin

declare

-- Borrow, from comparator
C   : WBool;

-- Left-Shift Overflow
LsO : WBool;

-- Current cut of Remainder register
Rs  : FZ renames R(1 .. Cut);

-- Current cut of Divisor
Ds  : FZ renames Divisor(1 .. Cut);

-- Current word of Dividend, starting from the highest
W   : Word  := Dividend(Dividend'Last + 1 - Index);

begin

-- For each bit in the current Dividend word:
for b in 1 .. Bitness loop

-- Send top bit of current Dividend word to the bottom of W
W := Rotate_Left(W, 1);

-- Advance Rs, shifting in the current Dividend bit
FZ_ShiftLeft_O_I(N => Rs, ShiftedN => Rs, Count => 1,
OF_In => W and 1,
Overflow => LsO);

-- Subtract Divisor-Cut from R-Cut; Underflow goes into C
FZ_Sub(X => Rs, Y => Ds, Difference => Rs, Underflow => C);

-- If C=1, subtraction underflowed, and we must undo it:
FZ_Add_Gated(X => Rs, Y => Ds, Sum => Rs,
Gate => C and W_Not(LsO));

end loop;

end;

end Slice;

-- ......................................
-- ......................................```

... is redundant. It has been removed as of this Chapter. As this removal has no effect on the built binary, the FFA version number is unchanged.

Now, let's eat the meat of this Chapter.

Before we can proceed with what was promised at the conclusion of Chapter 18, we will offer some simplification of the Peh moving parts, and fix a small but potentially dangerous flaw in the semantics of the Overflow Flag. Let's start with the former.

Recall where in Chapter 18, we have the reader modality controls:

```   -- Start a Peh Machine with the given Dimensions and Tape; return a Verdict.
function Peh_Machine(Dimensions : in Peh_Dimensions;
Tape       : in Peh_Tapes;
RNG        : in RNG_Device) return Peh_Verdicts is

............
............

-- Whether we are currently inside a Proposed Subroutine Name:
SubNameMode   : Boolean      := False;

-- Whether we are currently inside a Proposed Subroutine Body:
SubBodyMode   : Boolean      := False;

............

-- Prefixed Operators
PrevC         : Character    := ' ';
HavePrefix    : Boolean      := False;

............```

As of the present Chapter, these have been replaced with:

```   -- Start a Peh Machine with the given Dimensions and Tape; return a Verdict.
function Peh_Machine(Dimensions : in Peh_Dimensions;
Tape       : in Peh_Tapes;
RNG        : in RNG_Device) return Peh_Verdicts is

............
............

-- The possible Modes of the reader:
type Modes is (Normal, SubName, SubBody, PrefixOp);

Mode          : Modes        := Normal;

............

-- Prefixed Operators
PrevC         : Character    := ' ';

............```

... and consequently, Op is now a much more readable item:

```      -- All Peh Symbols begin their processing here :
procedure Op(C : in Character) is
begin

-- See whether we are inside a 'Block' :

-- ... in a Comment block:
if CommLevel > 0 then
case C is
when ')' =>  -- Drop a nesting level:
CommLevel := CommLevel - 1;
when '(' =>  -- Add a nesting level:
CommLevel := CommLevel + 1;
when others =>
null; -- Other symbols have no effect at all
end case;

-- ... in a Quote block:
elsif QuoteLevel > 0 then
case C is
when ']' =>   -- Drop a nesting level:
QuoteLevel := QuoteLevel - 1;
when '[' =>   -- Add a nesting level:
QuoteLevel := QuoteLevel + 1;
when others =>
null; -- Other symbols have no effect on the level
end case;

-- If we aren't the mode-exiting ']', print current symbol:
if QuoteLevel > 0 then
Write_Char(C);
end if;

--- ... in a ~taken~ Conditional branch:
elsif CondLevel > 0 then
case C is
when '}' =>   -- Drop a nesting level:
CondLevel := CondLevel - 1;

-- If we exited the Conditional as a result,
-- we push a 1 to trigger the possible 'else' clause:
if CondLevel = 0 then
Push;
FFA_WBool_To_FZ(1, Stack(SP));
end if;

when '{' =>   -- Add a nesting level:
CondLevel := CondLevel + 1;

when others =>
null; -- Other symbols have no effect on the level
end case;

else
--- ... we are not inside a 'Block' :

case Mode is

--- ... a character in a proposed Subroutine Name:
when SubName =>
SubName_Symbol(C);

--- ... a character in a proposed Subroutine Body:
when SubBody =>
SubBody_Symbol(C);

--- ... the second character of a Prefixed Op:
when PrefixOp =>
-- Drop prefix-op hammer, until another prefix-op cocks it:
Mode := Normal;

-- Dispatch this op, where prefix is the preceding character
Op_Prefixed(Prefix => PrevC, O => C);

-- This is a Normal Op...
when Normal =>
-- ... so proceed with the normal rules:
Op_Normal(C);

-- Save the current Symbol as a possible prefix:
PrevC := C;

end case;

end if;
end Op;```

SubName_Symbol and SubBody_Symbol do exactly what you would expect, from having read Chapter 18, with the obvious necessary changes in regards to the reader mode state variable. (See also line 1073, with regards to the prefix control.)

This is just about all of the treatment the subject deserves. The Peh version has been advanced to 250K, on account of this and the following change.

Now, let's proceed to the matter of the Overflow Flag.

In particular, the treatment of Registers in the Cutout mechanism :

1. All Register-accessing opcodes (including Zaps and Debug Print) which execute inside either the Guarded or the Cutout segments, will make use exclusively of the segregated Cutout Register Set.
2. All Register-accessing opcodes (including Zaps and Debug Print) which execute inside the Public segment, will make use exclusively of the Ordinary Register Set.

Observe that, in Chapter 18, the Overflow Flag -- unlike the registers -- was not included in the set of segregated state which prevents inadvertent flow of information between routines residing in the Cutout Tape segment and those in the rest of a Tape.

This flaw has been corrected as of this Chapter:

```   -- Start a Peh Machine with the given Dimensions and Tape; return a Verdict.
function Peh_Machine(Dimensions : in Peh_Dimensions;
Tape       : in Peh_Tapes;
RNG        : in RNG_Device) return Peh_Verdicts is

............
............

-- Carry/Borrow Flag set by certain arithmetical Ops:
Flag          : WBool        := 0;

-- 'Cutout'-segregated Carry/Borrow Flag:
CO_Flag       : WBool        := 0;

............

-- Zero the currently-active Overflow Flag:
procedure Zap_Flag is
begin
if Use_CO_Registers then
CO_Flag := 0;
else
Flag    := 0;
end if;
end Zap_Flag;

............
............

-- Print a Debug Trace (used in 'QD') :
procedure Print_Trace is
begin

............
............

-- Print active Overflow-Flag, then Ticks and IP:

if Use_CO_Registers then
Write_String("Flag (CO) :" & WBool'Image(CO_Flag));
else
Write_String("Flag      :" & WBool'Image(Flag));
end if;

............
............

----------------
-- Arithmetic --
----------------

-- Subtract
when '-' =>
Want(2);
FFA_FZ_Subtract(X          => Stack(SP - 1),
Y          => Stack(SP),
Difference => Stack(SP - 1),
Underflow  => F);

-- If we are in the Cutout, write the CO_Flag instead of Flag:
if Use_CO_Registers then
CO_Flag := FFA_Word_NZeroP(F);
else
Flag    := FFA_Word_NZeroP(F);
end if;
Drop;

when '+' =>
Want(2);
Y        => Stack(SP),
Sum      => Stack(SP - 1),
Overflow => F);

-- If we are in the Cutout, write the CO_Flag instead of Flag:
if Use_CO_Registers then
CO_Flag := FFA_Word_NZeroP(F);
else
Flag    := FFA_Word_NZeroP(F);
end if;
Drop;

............
............
............

-- Put the Overflow flag on the stack
when 'O' =>
Push;
-- If we are in the Cutout, read CO_Flag instead of Flag:
if Use_CO_Registers then
FFA_WBool_To_FZ(CO_Flag, Stack(SP));
else
FFA_WBool_To_FZ(Flag,    Stack(SP));
end if;

............
............```

The difference in semantics is easily illustrated via the following Peh Tape:

```(----------------------------------------------------------------------------)
(----------------------------------------------------------------------------)
(- Demo Tape for 'Peh'; illustrates change in Flag semantics in Chapter 19. -)
(-                                                                          -)
(- (C) 2019 Stanislav Datskovskiy ( www.loper-os.org )                      -)
(- http://wot.deedbot.org/17215D118B7239507FAFED98B98228A001ABFFC7.html     -)
(-                                                                          -)
(- You do not have, nor can you ever acquire the right to use, copy or      -)
(- distribute this software ; Should you use this software for any purpose, -)
(- or copy and distribute it to anyone or in any manner, you are breaking   -)
(- the laws of whatever soi-disant jurisdiction, and you promise to         -)
(- continue doing so for the indefinite future. In any case, please         -)
(- always : read and understand any software ; verify any PGP signatures    -)
(- that you use - for any purpose.                                          -)
(-                                                                          -)
(----------------------------------------------------------------------------)
(----------------------------------------------------------------------------)

(----------------------------------------------------------------------------)

( Begin the Cutout: )
LC

(----------------------------------------------------------------------------)

( This subroutine causes the Cutout-Active Overflow Flag to be Set : )

@Set-OF-In-Cutout@ ( Regs : none )
.0 .1 - _
;

(----------------------------------------------------------------------------)

( This subroutine causes the Cutout-Active Overflow Flag to be Cleared : )

@Clear-OF-In-Cutout@ ( Regs : none )
ZF
;

(----------------------------------------------------------------------------)

( This subroutine returns the Cutout-Active Overflow Flag : )

@Get-OF-In-Cutout@ ( Regs : none )
O
;

(----------------------------------------------------------------------------)

( Terminate the Cutout : )
RC

(----------------------------------------------------------------------------)

( This subroutine causes the Ordinary Overflow Flag to be Set : )

@Set-OF-Ordinary@ ( Regs : none )
.0 .1 - _
;

(----------------------------------------------------------------------------)

( This subroutine causes the Ordinary Overflow Flag to be Cleared : )

@Clear-OF-Ordinary@ ( Regs : none )
ZF
;

(----------------------------------------------------------------------------)

( This subroutine returns the Ordinary Overflow Flag : )

@Get-OF-Ordinary@ ( Regs : none )
O
;

(----------------------------------------------------------------------------)

( Display both Overflow Flags : )
@Show-Both-OF-Flags@ (Regs : none)

[Ordinary OF = ]
@Get-OF-Ordinary!
{[1]}{[0]}_

[ ]

@Get-OF-In-Cutout!
[Cutout's OF = ]
{[1]}{[0]}_
[]
;

(----------------------------------------------------------------------------)

(------------------------------ Main Program : ------------------------------)

( Regs: none )

[Setting Ordinary OF:
]
@Set-OF-Ordinary!

@Show-Both-OF-Flags!
[

]

[Setting Cutout's OF:
]
@Set-OF-In-Cutout!

@Show-Both-OF-Flags!
[

]

( Clear the Flags : )

[Clearing Ordinary OF:
]
@Clear-OF-Ordinary!

@Show-Both-OF-Flags!
[

]

[Clearing Cutout's OF:
]
@Clear-OF-In-Cutout!

@Show-Both-OF-Flags!
[

]

( we're done:  )
QY

(--------------------------------~~The End~~---------------------------------)```

This Tape is invoked as, e.g.:

`    cat ch19_flag.peh | ./ffa/ffacalc/bin/peh 256 32 10000 0`

In Chapter 18, where there is only one Overflow Flag register, you will expect to see the following output:

```Setting Ordinary OF:
Ordinary OF = 1 Cutout's OF = 1

Setting Cutout's OF:
Ordinary OF = 1 Cutout's OF = 1

Clearing Ordinary OF:
Ordinary OF = 0 Cutout's OF = 0

Clearing Cutout's OF:
Ordinary OF = 0 Cutout's OF = 0```

Whereas, as of this Chapter, where the flags are properly segregated, unsurprisingly:

```Setting Ordinary OF:
Ordinary OF = 1 Cutout's OF = 0

Setting Cutout's OF:
Ordinary OF = 1 Cutout's OF = 1

Clearing Ordinary OF:
Ordinary OF = 0 Cutout's OF = 1

Clearing Cutout's OF:
Ordinary OF = 0 Cutout's OF = 0```

In Chapter 20, we will proceed with more interesting refinements to Peh.

~To be continued!~