Submissions for the Ada, SPARK and embedded Alire crates of the year are open until December 31st. Doesn't have to be a project you started in 2022, any project in the Alire ecosystem can participate.
SPARK/Ada is a language derived from Ada that allows for a formal checking (i.e., mathematically prove the correctness) of the software. Several types of checks can be done: from the absence of runtime exceptions to no use of uninitialized variables, up to the formal proof that a given procedure/function fulfills its contract.
I like formal checking because it can actually prove that your program is correct (or that, at least, some kind of errors are absent), something that is usually not possible with testing. Of course, formal checking cannot applied to every program, but when it is possible it is a very powerful tool.
SPARK/Ada is definitively on my list of stuff that I want to learn.
Recently I had to write a procedure for a binary search in an ordered array. I thought that it could be an interesting exercise to write it in SPARK/Ada in order to have it formally verified. This is a brief summary of this experience, written with a tutorial spirit.
The problem I want to solve is the following
Table: an array of
Floatsorted in strictly increasing order (that is,
Table(n+1) > Table(n)for every
- it is included between the array extremes, that is,
Table(Table'First) ≤ What ≤ Table(Table'Last), but
- it is not necessarily in
Table, that is it is not required that there is
Table(n) = What
- it is included between the array extremes, that is,
- The unique index
- The unique index
Table(n) ≤ What < Table(n+1)
The spec file
First basic draft of specs
Let's first flesh out the spec file (roughly [very roughly] similar to a
*.h file for you C people).
pragma SPARK_Mode (On); package Searching is subtype Element_Type is Float; subtype Index_Type is Natural; type Array_Type is array (Index_Type range <>) of Element_Type; function Find (What : Element_Type; Table : Array_Type) return Index_Type; end Searching;
Entering SPARK Mode
The first line we encounter is
pragma SPARK_Mode (On);
This claims that this package will be compatible with SPARK restrictions and conventions. The lines
subtype Element_Type is Float; subtype Index_Type is Natural;
Index_Type as synominous of
Index_Type, respectively. This is not strictly necessary, but it can make it easier to change the types in a future
function Find (What : Element_Type; Table : Array_Type) return Index_Type;
should be clear enough.
In order for SPARK to be able to proof that our implementation of
Find is correct, we need to describe to SPARK what we expect
Find to do. This is done by means of a contract. Like a normal contract between people, a function contract usually has two parts: preconditions and postconditions.
The idea is that if you (the caller) do your part (i.e. you meet the preconditions when you call me), then I, the function, promise to do my part, that is, that the result will satisfy the post-conditions.
If a contract is given, I can ask SPARK to prove that the post-conditions follow from the pre-conditions. If SPARK succeeds, I know that the code I wrote is correct (in the sense that it respects the contract) without doing any test: I have a mathematical proof for it.
BTW, contracts are very useful even if you are not using SPARK. First, they are a wonderful "bug trap." By using the right options, you can ask the compiler to insert code that checks the pre-conditions when the function is called and the post-conditions when the function ends. If pre/post-conditions are not met, an exception is raised.
Moreover, contracts document in a formal and unambiguous way the behavior of the function and, differently from usual comment-based documentation, cannot go out of sync with the code.
OK, so contracts are cool. How do we write the contract for our function? Well, let's start with the precondition and let's check the specs. It is said that (i)
Table must be sorted and (ii)
What must be between
Table extrema; we just translate that in Ada in this way
function Find (What : Element_Type; Table : Array_Type) return Index_Type with Pre => Is_Sorted (Table) and Table (Table'Last) >= What and What >= Table (Table'First);
Is_Sorted is a function that checks if its argument is sorted and defined as follows
function Is_Sorted (Table : Array_Type) return Boolean is (for all L in Table'Range => (for all M in Table'Range => (if L > M then Table (L) > Table (M)))) with Ghost;
The body of the function (this form is called expression function) is the translation in Ada of the definition of monotonic array.
with Ghostin the definition. This says that
Is_Sortedis a ghost function, that is, a function that can be used only in contracts or assertions; if used in other places ("real code") an error is raised. (I love this kind of protections ♥)
The preconditions look right and they represent the actual specs, but if we try to run SPARK we get an error
medium: array index check might fail
and What >= Table (Table'First);
This means that
Table'First, that is the first index of the array, can be ... outside array bounds? The first index is by definition within the bounds, right?
Well... Actually, SPARK is right. If you ask for a counterexample you get
e.g. when Table'First=1 and Table'Last=0
Now, when in the world can be possible that the last index is smaller than the first one?!? Well, if
Table is empty... Empty arrays in Ada have the last index smaller than the first one. Therefore, trying to access
Table (Table'First) would cause an error.
Well, SPARK, good catch...
Well, I agree (although it could happen because of an error), but SPARK does not know it. To make SPARK happy it suffices to add
Table'Length > 0 to the precondition to get
function Find (What : Element_Type; Table : Array_Type) return Index_Type with Pre => Table'Length > 0 and then (Is_Sorted (Table) and Table (Table'Last) >= What and What >= Table (Table'First));
Now SPARK is happy and even the casual user sees that you cannot call the function with an empty table.
Also for the post-conditions we translate directly the requirement from English to Ada. It is required that
What is between
n is the value returned by the function. In Ada we get
(if Find'Result = Table'Last then Table (Find'Result) = What else Table (Find'Result) <= What and What < Table (Find'Result + 1));
Find'Result is the value returned by
Find and that we consider as special case
Find'Result = Table'Last since in this case there is no "following entry" in
Table. Overall, the function declaration with the full contract is
function Find (What : Element_Type; Table : Array_Type) return Index_Type with Pre => Table'Length > 0 and then (Is_Sorted (Table) and Table (Table'Last) >= What and What >= Table (Table'First)), Post => (if Find'Result = Table'Last then Table (Find'Result) = What else Table (Find'Result) <= What and What < Table (Find'Result + 1));
The body file (implementation)
The algorithm for a binary search is well known and exemplified in the following picture
Basically, we keep two "cursors" in the table:
Top with the condition that it must always be
Table(Bottom) ≤ What < Top(top)
We get the
Middle point between
Bottom and if
Table(Middle) is too large we move
Middle, otherwise we move
Bottom. We iterate this until on of the following two conditions holds
Table(Bottom) = What, that is, we actually found
Top = Bottom+1there are no intermediate entries between
In both cases we return
The actual code
Let's start with a basic skeleton of the procedure.
function Find (What : Element_Type; Table : Array_Type) return Index_Type is Bottom : Index_Type; Top : Index_Type; Middle : Index_Type; begin if Table (Table'Last) = What then return Table'Last; end if; Bottom := Table'First; Top := Table'Last; pragma Assert (Table (Top) > What and What >= Table(Bottom)); while Table (Bottom) < What and Top - Bottom > 1 loop Middle := (Bottom + Top)/2; if Table (Middle) > What then Top := Middle; else Bottom := Middle; end if; end loop; return Bottom; end Find;
It is just a translation of the algorithm informally described above. Note how the special case
Table (Table'Last) = What is handled separately; in this way we know that
Table (Top) > What and What >= Table(Bottom)
holds (see the
Now, in order to make it easier for SPARK to prove the correctness of the code we insert in the code some
pragma Assert claiming properties that hold true in different points of the code.
Automatic theorem proving is not easy and some hint to the prover can help. Moreover, I love spreading generously my code with assertion since they help understanding what the code does and which properties I expect to be true. They are also formidable "bug traps".
The function body with all the assertions looks like
function Find (What : Element_Type; Table : Array_Type) return Index_Type is Bottom : Index_Type; Top : Index_Type; Middle : Index_Type; begin if Table (Table'Last) = What then return Table'Last; end if; pragma Assert (Table (Table'Last) > What); Bottom := Table'First; Top := Table'Last; pragma Assert (Bottom < Top); pragma Assert (Table (Bottom) <= What and What < Table (Top)); while Table (Bottom) < What and Top - Bottom > 1 loop pragma Loop_Invariant (Bottom >= Table'First); pragma Loop_Invariant (Top <= Table'Last); pragma Loop_Invariant (Top > Bottom); pragma Loop_Invariant (Table (Bottom) <= What and What < Table (Top)); pragma Loop_Variant (Decreases => Top - Bottom); Middle := (Bottom + Top)/2; pragma Assert (Bottom < Middle and Middle < Top); if Table (Middle) > What then Top := Middle; else Bottom := Middle; end if; end loop; pragma Assert (Table (Bottom) <= What and Table (Bottom + 1) > What); return Bottom; end Find;
There are just two points worth describing; the first is this sequence of
pragmas inside the loop
pragma Loop_Invariant (Bottom >= Table'First); pragma Loop_Invariant (Top <= Table'Last); pragma Loop_Invariant (Top > Bottom); pragma Loop_Invariant (Table (Bottom) <= What and What < Table (Top)); pragma Loop_Variant (Decreases => Top - Bottom);
Loop_Variant are specific to SPARK. A loop invariant is a condition that is true at every iteration and you can see that this loop invariant is a formalization of what we said before: at every iteration
What is between
Bottom. Loop invariants are important in the proof of correctness of
while loop can potentially go on forever; a technique that allows us to prove that the loop will terminate is to search for some value that always increases (or decreases) at every iteration; if we know a bound for this value (e.g., it is always positive and it always decreases) we can deduce that the loop will terminate. The
Loop_Variant allows us to declare said value. In this case the distance between
Bottom is halved at every iteration, therefore it is a good variant. Since
Top-Bottom cannot be smaller than 2 (see condition in the while), we deduce that the loop will terminate.
A second observation is about seemly innocuous line
Middle := (Top + Bottom) / 2;
Here SPARK complains with
medium: overflow check might fail (e.g. when Bottom = Index_Type'Last-4 and Top = Index_Type'Last-2)
Good catch! Here SPARK is observing that although theoretically
Middle should be in the range of representable integers if
Bottom are, there is a possibility of an overflow while doing the sum. Since on my PC
Index_Type'Last = 2^63-1, it is unlikely that one would work with tables so big, nevertheless...
We have two solutions: (i) allow a smaller range of integers (that is, up to
Index_Type'Last/2) or (ii) compute the mean with function
function Mean (Lo, Hi : Index_Type) return Index_Type is (Lo + (Hi - Lo) / 2) with Pre => Lo < Index_Type'Last and then Hi > Lo + 1, Post => Hi > Mean'Result and Mean'Result > Lo;
that is guaranteed to have no overflow. Note that in the precondition we require that
Hi > Lo + 1, this is guaranteed by the loop condition and it is necessary in order to guarantee that the post conditions hold which in turn guarantees that the loop variant decreases at every iteration.
OK, now if we run SPARK we get
Summary of SPARK analysis ========================= ------------------------------------------------------------------------------------------------------- SPARK Analysis results Total Flow Interval CodePeer Provers Justified Unproved ------------------------------------------------------------------------------------------------------- Data Dependencies . . . . . . . Flow Dependencies . . . . . . . Initialization 3 3 . . . . . Non-Aliasing . . . . . . . Run-time Checks 29 . . . 29 (CVC4) . . Assertions 14 . . . 14 (CVC4) . . Functional Contracts 3 . . . 3 (CVC4) . . LSP Verification . . . . . . . ------------------------------------------------------------------------------------------------------- Total 49 3 (6%) . . 46 (94%) . .
Do you see on the extreme right the column
Unproved with no entries? That is what I want to see... SPARK was able to prove everything, so now you can kick back, relax and enjoy your binary search function with confidence: you made terra bruciata (scorched earth) around it and no bug can survive.
I program in Ada and I like it.
Yes, I know, it sounds like a "coming out"... 😉
In the world of Ada there is a thing that I always wanted to try: it is a reduced subset of Ada called SPARK (nothing to do with Apache) that it allows for formal checking, that is, proving some properties of your code such as: absence of buffer overflow and other runtime errors, no uninitialized variable read, and so on. I like the idea to have my code (or, at least, part of it) armored by a mathematical proof 😊.
Curious about SPARK? Check out this Introduction to SPARK
Few days ago I decided to give it a try using an old function of mine: a Ruby-like
split to split strings at a separator. It is small piece of code that I wrote some time ago and used safely in many programs of mine: the right "guinea pig" for the experiment.
If you are curious, the final SPARK code is on github, be my guest! 😊
The first thing to do was to replace the use of
Vectors (a kind of dynamic array, similar to its
C++ counterpart) with normal arrays. You see, as you can imagine,
Vectors use pointers (access types in Ada jargon) and they are prohibited in SPARK. The fact is that pointers would allow for aliasing of objects and this would make the formal check impossible.
It is not as bad as it sounds: in Ada the necessity of using pointers is much reduced with respect to
C++. The only real need is when you want to dynamically allocate structures.
Therefore, I wrote my own pseudo-
Vector type with the minimal set of needed functionality. The idea was to keep the extracted pieces in a fixed array (allocated to the stack, so no explicit pointers) large enough to keep the pieces (if you
split a string with length
N you get at most
N+1 pieces). Maybe not very memory efficient, but fine for my needs.
The resulting code (slightly pruned with respect to the original, for "didactic" purposes...) follows. It seems long, but it is just that I added few comments to explain the more Ada-esque syntax.
First the spec part (very roughly the equivalent of a
with Ada.Strings.Unbounded; use Ada.Strings.Unbounded; package Token_Lists with SPARK_Mode => On -- This specs is in SPARK is subtype List_Length is Integer range 1 .. Integer'Last; -- "Tagged types" correspond roughly to what other -- languages call "class". Token_List is the desired -- pseudo-vector for holding the pieces. -- -- The funny (<>) means that the type is parameterized, but the -- parameter is not public. This forces you to initialize -- any variable of type Token_List with some kind of constructor. -- -- Finally, "private" means that the internal details are not -- visible and are described later, in the private part. -- type Token_List (<>) is tagged private; function Create (N : List_Length) return Token_List with Post => Create'Result.Capacity = N and Create'Result.Length = 0; -- -- Post is an "attribute" and it specifies a Boolean expression -- that will be true when the function returns (contract). -- -- The post-condition says that the created list will have -- room for N entries and it will be empty -- function Capacity (Item : Token_List) return Positive; function Length (Item : Token_List) return Natural with Post => Length'Result <= Item.Capacity; -- -- Number of the pieces currently in the list. Of course -- it cannot be larger than the capacity, if it happens -- there is something wrong somewhere... --- procedure Append (List : in out Token_List; What : String) with Pre'Class => List.Length < List.Capacity, Post => List.Length = List.Length'Old + 1 and List.Capacity = List.Capacity'Old; -- -- The precondition (attribute "Pre") says that before calling -- the procedure there must be some room in the list; the -- post condition says that after the call there is a new entry, -- but the capacity is unchanged (it is obvious to you, not to -- SPARK). SPARK will analyze the body of Append and it -- will try to prove that the contract is respected, that is, -- that the postcondition follows from the precondition. -- -- If you arm correctly the code that calls Append, SPARK will -- try to prove that the precondition is always verified. If it -- succeeds you know you'll never have an overflow! -- private -- -- Some privacy needed... :-) -- type Token_Array is array (Positive range <>) of Unbounded_String; -- -- The funny "(Positive range <>)" says that the index of -- indexes of a variable of type Token_Array is a range of -- positive integers. -- -- Yes, in Ada the array indexes do not *necessarily* start -- from zero... They can start from whatever you want... ;-D -- type Token_List (Length : List_Length) is tagged record Tokens : Token_Array (1 .. Length) := (others => Null_Unbounded_String); First_Free : Positive := 1; end record with Predicate => Token_List.First_Free <= Integer (Token_List.Length) + 1 and Token_List.Tokens'Length = Token_List.Length; -- -- This is the full definition of Token_List anticipated -- in the public part above. In Ada it is not possible -- to have some field public and some private (like in C++ -- or Ruby). If you put the full definition in the public -- part, everything is public (typically considered bad practice), -- otherwise everything is private. Honestly, I never felt the -- need of the hybrid solution. -- -- "Predicate" is another attribute. It specifies a condition -- that a variable of type Token_List must always satisfy. -- -- If you ask to the compiler to check assertions, the -- compiler will produce code that checks the predicate -- at the end of methods ("primitive procedures" in Ada -- jargon) that modify the variable and if the condition -- is not satisfied, an exception is raised, pointing an -- accusing finger against the culprit. -- -- A powerful bug trap! -- function Create (N : List_Length) return Token_List is (Token_List'(Length => N, Tokens => (others => Null_Unbounded_String), First_Free => 1)); function Capacity (Item : Token_List) return Positive is (Item.Tokens'Last); function Length (Item : Token_List) return Natural is (Item.First_Free - 1); end Token_Lists;
Now the body (more or less the equivalent of a
pragma Ada_2012; package body Token_Lists with SPARK_Mode => On -- This body is in SPARK is ------------ -- Append -- ------------ procedure Append (List : in out Token_List; What : String) is begin -- -- If you look at the definition of Token_List -- you'll see that List.Tokens is an array. -- List.Tokens'Last is the last index of Lists.Tokens -- if List.First_Free > List.Tokens'Last then raise Constraint_Error; end if; List.Tokens (List.First_Free) := To_Unbounded_String (What); List.First_Free := List.First_Free + 1; end Append; end Token_Lists;
Even if you do not know Ada, I think you should be able to understand the code.
As you can see,
Append is very simple and obviously correct, right? I mean, the
if at the beginning stops every tentative of writing beyond the array
Well, SPARK complained saying that it was not able to prove that an overflow would not happen. My first reaction was something like "Are you crazy? Don't you see the check 4 lines above? What is wrong with you?"
Anyway, next to the message error there was a "magnifying glass" icon. I clicked on it and I got, SPARK's courtesy, a counterexample
[Counterexample] List = (Length => ?, First_Free => 2147483647) and List.Tokens'First = 1 and List.Tokens'Last = 2147483647
What the... ? Ah! ... Yeah, OK, you are right...
In case you did not recognize it at once,
2147483647 is 2^31 -1. If I created a
Token_List with 2 Giga entries (doubtful if it would fit on the stack... anyway...) and I filled it completely, then I would actually have an overflow.
Now I had two choices: the first one was to say that I will never use 2 Giga entries and ignore the case; the second one was to make it official that you can have at most 2^31 -2 entries. This was really easy to do: it sufficed to change in the specs file the line
subtype List_Length is Integer range 1 .. Integer'Last;
that defines the type used for the list length to
subtype List_Length is Integer range 1 .. Integer'Last-1;
Notice the -1 at the end. Since this type is used in the constructor
function Create (N : List_Length) return Token_List;
this guarantees that you'll never call
N larger than 2^31 -2 (if you try to do it, an exception is raised). This placated SPARK.
After catching this border-line-condition bug I moved to SPARK-ing the rest of the code, catching another couple of border-line cases.
Overall, I can say that original code was fine since the bugs were triggered by some very extreme conditions; nevertheless it is nice to know that everything is clean now...
The ARA congratulates Dr. Peter Chapin on his receipt of ACM SIGAda’s Robert Dewar Award, which acknowledges outstanding contributions to the Ada community. Dr. Chapin was a major contributor to the Vermont Tech Lunar CubeSat project (cubesatlab.org) whose software was written in SPARK/Ada. The Vermont Tech CubeSat was launched in November 2013 and successfully completed its full two-year mission, the only one out of twelve academic CubeSats to do so. Dr. Chapin attributes the software’s reliability in large part to the SPARK/Ada technology, which was used to prove the absence of run-time errors.
Dr. Chapin is now coordinating the work on CubedOS, a SPARK/Ada implementation of a software framework for small spacecraft, with plans to release the result as an open-source project. Other groups will thus have access to a high-integrity software base for their CubeSats, which currently have a very high failure rate.
Dr. Chapin is the co-author, along with Prof. John McCormick, of “Building High Intergrity Applications with SPARK”, a student-oriented textbook on SPARK 2014.