This article is part of a series of handson tutorials introducing FFA, or the Finite Field Arithmetic library. FFA differs from the typical "Open Sores" abomination, in that  rather than trusting the author blindly with their lives  prospective users are expected to read and fully understand every single line. In exactly the same manner that you would understand and pack your own parachute. The reader will assemble and test a working FFA with his own hands, and at the same time grasp the purpose of each moving part therein.
You will need:
 All of the materials from the Chapters 1 through 15.
 There is no vpatch in Chapter 21A.
 For best results, please read this chapter on a colour display!
ACHTUNG: The subject of the Chapter 21 originally promised at the conclusion of Chapter 20 is postponed. That topic will be revisited later!
On account of the substantial heft of Chapter 21, I have cut it into three parts: 21A, 21B, and 21C. You are presently reading 21A, which consists strictly of an introduction to FFA's variant of the Extended GCD algorithm and its proof of correctness.
Recall that in Chapter 18C, we demonstrated the use of FFA and Peh to generate cryptographic primes, of the kind which are used in e.g. the RSA public key cryptosystem.
We also performed GPGRSA signature verifications using Peh, and learned how to adapt traditional GPG public keys for use with it.
But why have we not yet seen how to generate a new RSA keypair ?
The stillmissing mathematical ingredient is an operation known as the modular multiplicative inverse.
This is a process where, provided an integer N and a nonzero modulus M, we find (or fail to find  its existence depends on the coprimality of the inputs) the integer X where:
NX ≡ 1 (mod M).
... or, in FFAistic terminology, where FFA_FZ_Modular_Multiply(N, X, M) would equal 1.
In RSA, this operation is used when deriving a public key corresponding to a given private key. (It is also used when ensuring that a selected pair of secret primes has certain properties.) The exact details of this process will be the subject of Chapter 22.
Typically, the modular multiplicative inverse is obtained via the Extended GCD algorithm. It differs from the ordinary GCD of Chapter 15 in that, when given inputs X, Y, it returns, in addition to their greatest common divisor, the Bezout coefficients: a pair of integers P, Q which satisfy the equation:
PX + QY = GCD(X,Y)
The modular multiplicative inverse of N modulo M exists strictly when GCD(N, M) = 1.
Therefore, in that case,
NX + MY = GCD(N,M) = 1
... which can be written as:
NX  1 = (Y)M
i.e.:
NX ≡ 1 (mod M).
The reader has probably already guessed that the typical heathen formulation of Extended GCD, i.e. one which uses a series of divisions with remainder, is unsuitable as a starting point for writing FFA's Extended GCD  for the same reason as why the analogous Euclidean GCD was unsuitable (and rejected) for writing the ordinary GCD routine of Chapter 15. Division is the most expensive arithmetical operation, and it would have to be repeated a large number of times, while discarding many of the outputs (a la the oldstyle nonBarrettian modular exponentiation of Chapter 6.)
Fortunately, it is possible to write a much more efficient constantspacetime algorithm for Extended GCD, rather closely similar to the one we used for ordinary GCD.
Our algorithm of choice for a properly FFAtronic Extended GCD will be very similar to the Binary Extended GCD pictured in Vanstone's Applied Cryptography.
However, in addition to constantspacetime operation, our algorithm will differ from the traditional one by avoiding any internal use of signed arithmetic, and will produce outputs having the form :
PX  QY = GCD(X,Y)
Because the Extended GCD algorithm chosen for FFA is not wholly identical to any previouslypublished one, on account of having to satisfy the familiar requirements, I decided that a full proof of correctness for it ought to be given prior to publishing Chapter 21's Vpatch.
Let's proceed with deriving this algorithm and its correctness proof.
As a starting point, let's take the traditional nonconstanttime Stein's GCD given as Algorithm 2 at the start of Chapter 15:
Iterative QuasiStein GCD from Chapter 15.
For integers A ≥ 0 and B ≥ 0:

Twos := 0

Iterate until B = 0:

A_{e}, B_{e} := IsEven(A), IsEven(B)

A := A >> A_{e}

B := B >> B_{e}

Twos := Twos + (A_{e} AND B_{e})

B_{next} := Min(A, B)

A_{next} := A  B

A, B := A_{next}, B_{next}

A := A × 2^{Twos}

A contains the GCD.
Now, suppose that Twos  the common poweroftwo factor of A and B  were known at the start of the algorithm, and that it were to be divided out of both numbers prior to entering the loop. The rewritten algorithm could look like this:
Algorithm 1: QuasiStein GCD with "Twos" Removal.
For integers X ≥ 0 and Y ≥ 0:

Twos := Find_Common_Power_Of_Two_Factor(X, Y)

X := X >> Twos

Y := Y >> Twos

A := X

B := Y

Iterate until B = 0:

A := A >> IsEven(A)

B := B >> IsEven(B)

B_{next} := Min(A, B)

A_{next} := A  B

A, B := A_{next}, B_{next}

A := A << Twos

A contains GCD(X,Y).
Suppose that we have a correct definition of Find_Common_Power_Of_Two_Factor. (For now, let's merely suppose.)
Algorithm 1 will do precisely the same job as Iterative QuasiStein GCD. However, it has one new property  presently useless  of which we will later take advantage: because 2^{Twos} (which could be 1, i.e. 2^{0}, if either input were odd to begin with) was divided out of X and Y in steps 2 and 3  upon entering the loop at step 7, at least one of X, Y can be now relied upon to be odd. (If X and Y were both equal to zero at the start, step 7 is never reached, and the algorithm terminates immediately.)
Now we want to extend Algorithm 1 to carry out Extended GCD.
First, let's introduce two pairs of additional variables. Each pair will represent the coefficients in an equation which states A or B in terms of multiples of X and Y: A_{x}, A_{y} for A; and B_{x}, B_{y} for B :
A 
= 
A_{x}X  A_{y}Y 
B 
= 
B_{y}Y  B_{x}X 
Elementarily, if we were to give these coefficients the following values:
A_{x} 
:= 
1 

A_{y} 
:= 
0 
B_{x} 
:= 
0 

B_{y} 
:= 
1 
... both equations will hold true, regardless of the particular values of A and B.
Of course, this alone does not give us anything useful.
However, if we assign these basecase values to the coefficients at the start of the algorithm, and correctly adjust them every time we make a change to A or B, the equations will likewise hold true after the completion of the loop, and, in the end, it will necessarily be true that:
A_{x}X  A_{y}Y 
= 
GCD(X,Y) 
... and, importantly, also true that :
A_{x}X  A_{y}Y 
= 
GCD(X,Y) 
When A or B is divided by two, we will divide each of the coefficients in the corresponding pair (A_{x}, A_{y} or B_{x}, B_{y} respectively) likewise by two. When A and B switch roles, the pairs of coefficients will likewise switch roles. When A is subtracted from B, or viceversa, the respective pairs will be likewise subtracted from one another, keeping the invariant.
In order to keep the invariant at all times, it will be necessary to apply certain transformations. The mechanics of these transformations will account for most of the moving parts of our Extended GCD algorithm.
To make it clear where we will want to put the new variables and the mechanisms for keeping the invariant, let's rewrite Algorithm 1 in traditional branchedlogic form:
Algorithm 2. QuasiStein with Branched Logic.

Twos := Find_Common_Power_Of_Two_Factor(X, Y)

X := X >> Twos

Y := Y >> Twos

A := X

B := Y

Iterate until B = 0:

If IsOdd(A) and IsOdd(B) :

If B < A :

A := A  B

Else :

B := B  A

If IsEven(A) :

A := A >> 1

If IsEven(B) :

B := B >> 1

A := A << Twos

A contains GCD(X,Y).
Now, let's work out what must be done to the coefficients A_{x}, A_{y} and B_{x}, B_{y} when we carry out the operations A  B and B  A (steps 9 and 11, respectively) to keep the invariant :
A  B 
= 
(A_{x}X  A_{y}Y) 
 
(B_{y}Y  B_{x}X) 

= 
(A_{x} + B_{x})X 
 
(A_{y} + B_{y})Y 
B  A 
= 
(B_{y}Y  B_{x}X) 
 
(A_{x}X  A_{y}Y) 

= 
(A_{y} + B_{y})Y 
 
(A_{x} + B_{x})X 
If we write these simplified expressions sidebyside with the original invariant equations :
A 
= 
A_{x}X 
 
A_{y}Y 
A  B 
= 
(A_{x} + B_{x})X 
 
(A_{y} + B_{y})Y 
B 
= 
B_{y}Y 
 
B_{x}X 
B  A 
= 
(A_{y} + B_{y})Y 
 
(A_{x} + B_{x})X 
... it becomes quite obvious. In both the A  B and the B  A cases, we only need to compute the sums A_{x} + B_{x} and A_{y} + B_{y} ; the only difference will consist of whether they must be assigned, respectively, to A_{x}, A_{y} or B_{x}, B_{y}.
Now, suppose we introduce these new parts into Algorithm 2, and end up with :
Algorithm 3. A naive attempt at Extended GCD.

Twos := Find_Common_Power_Of_Two_Factor(X, Y)

X := X >> Twos

Y := Y >> Twos

A := X

B := Y

A_{x} := 1

A_{y} := 0

B_{x} := 0

B_{y} := 1

Iterate until B = 0:

If IsOdd(A) and IsOdd(B) :

If B < A :

A := A  B

A_{x} := A_{x} + B_{x}

A_{y} := A_{y} + B_{y}

Else :

B := B  A

B_{x} := B_{x} + A_{x}

B_{y} := B_{y} + A_{y}

If IsEven(A) :

A := A >> 1

A_{x} := A_{x} >> 1

A_{y} := A_{y} >> 1

If IsEven(B) :

B := B >> 1

B_{x} := B_{x} >> 1

B_{y} := B_{y} >> 1

A := A << Twos

A contains GCD(X,Y).

A_{x}X  A_{y}Y = GCD(X,Y).
Unfortunately, Algorithm 3 will not work; the equation at step 30 will not hold true. And the attentive reader probably knows why.
For the inattentive: the erroneous logic is marked in red.
The problem is that we have no guarantee that A_{x}, A_{y} or B_{x}, B_{y} are in fact even at steps 22,23 and 26,27 where they are being divided by two. An entire bit of information "walks away into /dev/null" every time one of these coefficients turns out to have been odd. And then, the invariant no longer holds. Instead of correct coefficients A_{x}, A_{y} at step 30, we will end up with rubbish.
The pill needed here is known (according to D. Knuth) as Penk's Method. (I have not succeeded in discovering who, when, or where, Penk was. Do you know? Tell me! A reader found the works of Penk.)
This method, as it happens, is not complicated. And it looks like this:
Algorithm 4. Extended GCD with Penk's Method.

Twos := Find_Common_Power_Of_Two_Factor(X, Y)

X := X >> Twos

Y := Y >> Twos

A := X

B := Y

A_{x} := 1

A_{y} := 0

B_{x} := 0

B_{y} := 1

Iterate until B = 0:

If IsOdd(A) and IsOdd(B) :

If B < A :

A := A  B

A_{x} := A_{x} + B_{x}

A_{y} := A_{y} + B_{y}

Else :

B := B  A

B_{x} := B_{x} + A_{x}

B_{y} := B_{y} + A_{y}

If IsEven(A) :

A := A >> 1

If IsOdd(Ax) or IsOdd(Ay) :

Ax := Ax + Y

Ay := Ay + X

A_{x} := A_{x} >> 1

A_{y} := A_{y} >> 1

If IsEven(B) :

B := B >> 1

If IsOdd(Bx) or IsOdd(By) :

Bx := Bx + Y

By := By + X

B_{x} := B_{x} >> 1

B_{y} := B_{y} >> 1

A := A << Twos

A contains GCD(X,Y).

A_{x}X  A_{y}Y = GCD(X,Y).
Of course, for our purposes, it does not suffice to merely know what it looks like; we must also understand precisely why it is guaranteed to work !
At this point, the reader should be satisfied with the logic of the OddAOddB case; and therefore knows that the invariant in fact holds when we first reach either of the two places where Penk's Method is applied.
Let's start by proving that what Penk does to A_{x}, A_{y} (in steps 22..24) and B_{x}, B_{y} (in steps 29..31) does not itself break the invariant.
Review the invariant again:
A 
= 
A_{x}X 
 
A_{y}Y 
B 
= 
B_{y}Y 
 
B_{x}X 
... and see that in the first case,

If IsOdd(Ax) or IsOdd(Ay) :

Ax := Ax + Y

Ay := Ay + X
... the A invariant equation holds :

(A_{x} + Y)X 
 
(A_{y} + X)Y 


= 
(A_{x}X + XY) 
 
(A_{y}Y + XY) 


= 
A_{x}X 
 
A_{y}Y 
= 
A 
And in the second case,

If IsOdd(Bx) or IsOdd(By) :

Bx := Bx + Y

By := By + X
... the B invariant equation holds :

(B_{y} + X)Y 
 
(B_{x} + Y)X 


= 
(B_{y}Y + XY) 
 
(B_{x}X + XY) 


= 
B_{y}Y 
 
B_{x}X 
= 
B 
The added X and Y terms simply cancel out in both cases.
However, it remains to be proven that Penk's Method actually resolves our problem, rather than merely avoids creating a new one; i.e. that these operations in fact guarantee the evenness of the coefficients prior to their division by two.
Let's demonstrate this for the EvenA case first; later, it will become apparent that the same approach works likewise in the EvenB case.
At the start of step 21, we know that A is even :

If IsEven(A) :

A := A >> 1

If IsOdd(Ax) or IsOdd(Ay) :

Ax := Ax + Y

Ay := Ay + X
And, at all times, we know that X and Y cannot be even simultaneously (because we have divided out 2^{Twos} from each of them.)
On top of all of this: since we graduated from kindergarten, we also know about the following properties of arithmetic operations upon odd and even numbers:
Parity of Addition and Subtraction:
+/ 
Odd 
Even 
Odd 
Even 
Odd 
Even 
Odd 
Even 
Parity of Multiplication:
× 
Odd 
Even 
Odd 
Odd 
Even 
Even 
Even 
Even 
So, what then do we know about the parities of the working variables at step 21? Let's illustrate all possible combinations of parities, using the above colour scheme; and observe that we only need the A invariant equation for this. As it happens, there are exactly two possible combinations of parities for its terms :
A 
= 
A_{x}X 
 
A_{y}Y 
A 
= 
A_{x}X 
 
A_{y}Y 
Elementarily, given that A is even at this step, both terms of its invariant equation must have the same parity.
Let's produce a table where all possible combinations of parities of the variables having an unknown parity (A_{x}, A_{y}) are organized by the known parities of the variables having a known parity (X, Y, A).
In yellow, we will mark variables which could be either odd or even in a particular combination; in red, those which are known to be odd; in green, those known to be even; and in grey, those for which neither of the possible parities would make the known parities of X, Y, and A in that combination possible, and therefore imply a logical impossibility :

Odd(X)


Even(X)


Odd(X)


Odd(Y)


Odd(Y)


Even(Y)

A 
= 
A_{x} 
X 
 
A_{y} 
Y 



A_{x} 
X 
 
A_{y} 
Y 



A_{x} 
X 
 
A_{y} 
Y 
A_{x} 
X 
 
A_{y} 
Y 

A_{x} 
X 
 
A_{y} 
Y 

A_{x} 
X 
 
A_{y} 
Y 
Now, let's remove the two impossible entries from the above table:

Odd(X)


Even(X)


Odd(X)


Odd(Y)


Odd(Y)


Even(Y)

A 
= 
A_{x} 
X 
 
A_{y} 
Y 



A would be Odd! 



A would be Odd! 
A_{x} 
X 
 
A_{y} 
Y 

A_{x} 
X 
 
A_{y} 
Y 

A_{x} 
X 
 
A_{y} 
Y 
And now suppose that we have reached step 23. Therefore we already know that at least one of A_{x} and A_{y} is odd. Let's remove from the table the only entry where this is not true; and, finally, in all of the remaining entries, indicate the deduced parity of the variables whose parity was previously indeterminate :

Odd(X)


Even(X)


Odd(X)


Odd(Y)


Odd(Y)


Even(Y)

A 
= 
A_{x} 
X 
 
A_{y} 
Y 



A would be Odd! 



A would be Odd! 
A_{x},A_{y} are Even 

A_{x} 
X 
 
A_{y} 
Y 

A_{x} 
X 
 
A_{y} 
Y 
All of the parities are now determined.
We are left with only three possible combinations of parities of the terms of A which could exist at step 23. So let's list what happens to each of them when Penk's Method is applied, i.e. Y is added to A_{x} and X is added to A_{y} :

Possible Parity Combos


Parity of A_{x} + Y ?


Parity of A_{y} + X ?

A 
= 
A_{x} 
X 
 
A_{y} 
Y 

A_{x} 
+ 
Y 
= 
Even 

A_{y} 
+ 
X 
= 
Even 
A_{x} 
X 
 
A_{y} 
Y 

A_{x} 
+ 
Y 
= 
Even 

A_{y} 
+ 
X 
= 
Even 
A_{x} 
X 
 
A_{y} 
Y 

A_{x} 
+ 
Y 
= 
Even 

A_{y} 
+ 
X 
= 
Even 
In the EvenA case, Penk's Method works, QED. It is difficult to think of how this could be made more obvious. The parities on both sides of the + sign always match, and so we have forced both A_{x} and A_{y} to be even, without breaking the invariant equation for A.
And now, how about the EvenB case ? Suppose that we have reached step 30. Let's do the same thing as we have done for the EvenA case:
B 
= 
B_{y}Y 
 
B_{x}X 
B 
= 
B_{y}Y 
 
B_{x}X 
... does this remind you of anything you've seen before?
Instead of telling the nearlysame and rather tedious story twice, let's leave step 30 as an...
Chapter 21A, Exercise #1: Prove that Penk's Method is correct in the EvenB case.
Chapter 21A, Exercise #2: Why are simultaneouslyeven X and Y forbidden in this algorithm?
Now, we are ready to rewrite Algorithm 4, this time grouping any repeating terms together, as follows:
Algorithm 5. Extended GCD with Penk's Method and Unified Terms.
For integers X ≥ 1 and Y ≥ 0:

Twos := Find_Common_Power_Of_Two_Factor(X, Y)

X := X >> Twos

Y := Y >> Twos

A := X

B := Y

A_{x} := 1

A_{y} := 0

B_{x} := 0

B_{y} := 1

Iterate until B = 0:

If IsOdd(A) and IsOdd(B) :

D := B  A

S_{x} := A_{x} + B_{x}

S_{y} := A_{y} + B_{y}

If B < A :

A := D

A_{x} := S_{x}

A_{y} := S_{y}

Else :

B := D

B_{x} := S_{x}

B_{y} := S_{y}

If IsEven(A) :

A := A >> 1

If IsOdd(Ax) or IsOdd(Ay) :

Ax := Ax + Y

Ay := Ay + X

A_{x} := A_{x} >> 1

A_{y} := A_{y} >> 1

If IsEven(B) :

B := B >> 1

If IsOdd(Bx) or IsOdd(By) :

Bx := Bx + Y

By := By + X

B_{x} := B_{x} >> 1

B_{y} := B_{y} >> 1

A := A << Twos

A contains GCD(X,Y).

A_{x}X  A_{y}Y = GCD(X,Y).
At this point, we have something which closely resembles commonlyused traditional variants of the Binary Extended GCD algorithm. With the notunimportant difference that all of the working variables stay positive throughout. This way, we will avoid the need to introduce signed arithmetic into the internals of FFA.
Should we proceed with rewriting Algorithm 5 using constanttime operations? ...or is something important still missing? What is it ?
Chapter 21A, Exercise #3: What, other than the use of FFAstyle constanttime arithmetic, is missing in Algorithm 5 ?
~To be continued in Chapter 21B.~