❌ About FreshRSS

Normal view

There are new articles available, click to refresh the page.
Before yesterdayNews from the Ada programming language world

FOSDEM 2022

4 February 2022 at 07:41

FOSDEM 2022 (Free and Open Source Developer European Meeting) is coming (5 & 6 February 2022) with its 8000+ participants and an Ada DevRoom. This year FOSDEM is online only.

The Ada DevRoom has Lots of interesting stuff, both for expert and for people who knows nothing about Ada.

A talk that I find quite intriguing is Getting Started with AdaWebPack that introduces an Ada to WebAssembly compiler that will allow you to do web development in Ada, replacing the frail JS.

Also, Overview of Ada GUI presents a new approach to GUI programming.

Reasons for loving Ada: Type invariants (because bugs shouldn't sleep...)

19 July 2020 at 14:30

This tutorial talks about a powerful bug trap that was introduced in Ada 2012: the Type_invariant attribute.

Type invariant? What's that?

The idea of type invariant is pretty simple. Sometimes data structure are quite simple, just an aggregation of values, but often they are quite complex with an internal structure that needs to be preserved.

A type invariant associated to a given type is a Boolean expression that is true for every valid value of the type. Ada allows the association of a type invariant to a private type; the compiler — if instructed so — will insert code that checks that the invariant remains satisfied; if not, an exception will be raised.

An example: a table with two keys

Let's do a simple example. Suppose that

  • We have a set of users
  • Every user has a Serial_ID (a 6 digit number) and a SSN (Social Security Number). We will assume the Italian convention where the SSN is a 16 character string.
  • Both the Serial_ID and SSN identify uniquely the user, that is, there are no two users with the same Serial_ID nor two users with the same SSN.
  • We want to build a data structure User_Directory that will allow us to get (efficiently) an user both by Serial_ID and SSN

A possible solution is shown in the figure

Two table pointing to the same record

The data structure is made of two Ordered_Maps: one map is indexed by the Serial_ID, the other one by the SSN and both maps store pointers (access values in Ada jargon) to a User_Record, a record that contains the user information. This structure clearly introduces some (quite obvious) redundancy

  • The SSN stored in a User_Record must be equal to the corresponding key in the SSN_Map, the same for the Serial_ID
  • The entries of the two maps corresponding to the same user must map to the same record
  • The two maps have the same number of entries.

The spec file

This is an excerpt of the public part of the spec file (roughly the *.h of C)

   subtype ID_Char is Character range '0' .. '9';
   type Serial_ID is array (1 .. 6) of ID_Char;

   type User_SSN is new String (1 .. 16);

   type User_Directory is private;

   procedure Add_User (Dir : in out User_Directory;
                       ID  : in Serial_ID;
                       SSN : in User_SSN)
     with
       Pre => not Contains (Dir, Id) and not Contains (Dir, Ssn),
     Post =>  Contains (Dir, Id) and Contains (Dir, Ssn);


   procedure Delete (Dir : in out User_Directory;
                     ID  : in Serial_ID)
     with
       Pre => Contains (Dir, Id),
     Post =>  not Contains (Dir, Id);

Here we define

  • The types Serial_ID and User_SSN as fixed lenght strings. Note that Serial_ID can contain only digits.
  • User_Directory is the data structure we are defining.
  • Add_User and Delete should be quite obvious. Note the use of preconditions (after Pre =>) and postconditions (after Post =>) that document the behavior of Add_User and Delete.

The actual definition (in the private part) of User_Directory is basically the translation in Ada of the scheme shown above

 type User_Record is
      record
         ID  : Serial_ID;
         SSN : User_SSN;
      end record;

   type User_Record_Pt is access User_Record;

   package ID_To_User_Maps is
     new Ada.Containers.Ordered_Maps (Key_Type     => Serial_ID,
                                      Element_Type => User_Record_Pt);

   package SSN_To_User_Maps is
     new Ada.Containers.Ordered_Maps (Key_Type     => User_SSN,
                                      Element_Type => User_Record_Pt);

   type User_Directory is
      record
         ID_To_User  : ID_To_User_Maps.Map;
         SSN_To_User : SSN_To_User_Maps.Map;
      end record;

We have

  • The record User_Record holding the user information and its access User_Record_Pt
  • A package ID_To_User_Maps (obtained by specializing the generic standard package Ordered_Maps) that implements maps that associate User_ID to access to User_Record
  • A similar package SSN_To_User_Maps
  • Finally, we have the User_Directory that simply contains the two required maps.

The implementation

Let's look at the implementation of the package. Let's start with Add_User.

 procedure Add_User
     (Dir : in out User_Directory; 
      ID  : in Serial_ID; 
      SSN : in User_SSN)
   is
      New_User : User_Record_Pt;
   begin
      if Dir.ID_To_User.Contains (ID) then
         raise Constraint_Error;
      end if;

      New_User  := new User_Record'(ID, SSN);

      Dir.ID_To_User.Include (Key      => Id,
                              New_Item => New_User);


      Dir.SSN_To_User.Include (Key      => SSN,
                               New_Item => New_User);
   end Add_User;

The implementation of Add_User is quite straightforward: with

New_User  := new User_Record'(ID  => ID, SSN => SSN);

we allocate dynamically a record with the user information and with

   Dir.ID_To_User.Include (Key      => Id,
                           New_Item => New_User);


   Dir.SSN_To_User.Include (Key      => SSN,
                            New_Item => New_User);

we update the two internal tables by associating the given ID and SSN with the address of the newly allocated record.

This is our first tentative to the implementation of Delete.

procedure Delete (Dir : in out User_Directory; ID : in Serial_ID) is
      To_Be_Removed : User_Record_Pt := Dir.ID_To_User (Id);
   begin
      Dir.ID_To_User.Delete (Id);

      Free (To_Be_Removed);
   end Delete;

This implementation has a bug 🐛, as it is clear from this figure that pictures the behavior of Delete.

Entry deleted from only one table

The dashed line means that the memory area of the user record now has gone back to the heap. It is clear that we forgot to remove the entry from Dir.SSN_To_User and now we have a dangling pointer referring to the old user record.

This is a nasty bug.

Gollum with a scolopendra saying it is a nasty bug
Oh, yes, believe me. I have already seen this movie, actually I was the main character. What could go wrong? Well, for example, later you could want to update the entries of Dir.SSN_To_User in some way. However, the memory that was assigned to the user record now belongs to some other structure that will be corrupted by your update.

Depending on your code, the bug can remain dormant for long time, then, suddenly, one day, if you are lucky, you get a SEGFAULT when you try to use the corrupted structure again. If you are unlucky you'll get funny random behaviors, possibly difficult to replicate.

Even if you are lucky (so to say...) you'll get the SEGFAULT in a place totally unrelated with the real bug in Delete. Believe me, finding this bug can take days of stumbling around in the dark. Not even an army of rubber ducks can save you.

Army of rubber ducks

Although this is a specific example, this kind of time bomb 💣 (or Sword of Damocles) behavior is typical of bugs that cause a loss of internal coherence in some complex structure. The bug will not manifest itself when the structure is corrupted, but later as a delayed consequence of the loss of coherence, making bug hunting difficult.

Remember: there is one thing worse than a bug that causes a crash: a bug that causes the program to continue as nothing happened.

The solution: type invariant

Fear not, my friends and colleagues. Ada comes to the rescue with the Type_Invariant. It suffices to add it to the type definition as in

type User_Directory is
      record
         ID_To_User  : ID_To_User_Maps.Map;
         SSN_To_User : SSN_To_User_Maps.Map;
      end record
    with Type_Invariant => Is_Coherent (User_Directory);

where Is_Coherent is a function that checks that the whole structure is coherent, that is, that the two maps have the same number of entries and that the data in the user record match the corresponding key in the tables. The source code of the (fairly long) function is included, for reference, at the end of this post.

Now if we run the following test program

with User_Directories;   use User_Directories;

procedure Main is
   Dir : User_Directory;
begin
   Add_User (Dir => dir,
             ID  => "123456",
             SSN => "ABCDEF64X12Q456R");

   Add_User (Dir => dir,
             ID  => "654321",
             SSN => "XBCDEF64X12Q456R");

   Delete (Dir, Id => "123456");
end Main;

we get the following results

Execution of obj/main terminated by unhandled exception
raised SYSTEM.ASSERTIONS.ASSERT_FAILURE : 
  failed invariant from user_directories.ads:65
Call stack traceback locations:
⋯/s-assert.adb:46
⋯/adainclude/a-coorma.adb:1561
⋯/user_directories.adb:52
⋯/user_directories.adb:59
⋯/main.adb:18
⋯

where, for the sake of readability I replaced part of the dump text with . The line

failed invariant from user_directories.ads:65

refers to the line where type User_Directory is declared. In the stack dump the first two lines refer to some system files, we are interested in the third line

⋯/user_directories.adb:52

that refers to the first line of procedure Delete.
Summarizing,

  • Before calling Delete user directory Dir was fine
  • After calling Delete it is not fine anymore, who are you gonna blame?

Ghostbusters! No, sorry, wrong citation... 😊😛 (this is also a hint to my age... 😊)

Judge finds Delete guilty of data structure damage

After a brief inspection of Delete we discover the problem and we fix it quite easily

procedure Delete (Dir : in out User_Directory; ID : in Serial_ID) is
      To_Be_Removed : User_Record_Pt := Dir.ID_To_User (Id);
   begin
      Dir.SSN_To_User.Delete (To_Be_Removed.SSN); -- New line
      -- and here???
      Dir.ID_To_User.Delete (Id);

      Free (To_Be_Removed);
   end Delete;

Wait a minute...

Someone maybe could observe that at the line -- and here??? the variable Dir is in a non coherent state since we updated one table, but not the other. The same happen in the Add_User procedure. Wouldn't this raise an exception?

Well, no. The details are a bit complex, but the basic idea is that since Add_User and Delete are declared in the same package of User_Directory, they are able to operate on the internal structure of the type and it is acceptable that during this manipulation the internal structure is not coherent for a moment. The type invariant will be checked only when the procedures end;
see the Reference manual if you want every single, gory, detail...

Appendix

Implementation of Is_Coherent

function Is_Coherent (Dir : User_Directory) return Boolean is
      use Ada.Containers;
      use ID_To_User_Maps;
      use SSN_To_User_Maps;
   begin
      if Dir.ID_To_User.Length /= Dir.SSN_To_User.Length then
         return False;
      end if;

      for Pos in Dir.ID_To_User.Iterate loop
         declare
            Usr_Entry : constant User_Record_Pt := Element (Pos);
         begin 
            if Usr_Entry /= Dir.SSN_To_User (Usr_Entry.Ssn) then
               return False;
            end if;

            if Key (Pos) /= Usr_Entry.Id then
               return False;
            end if;
         end;         
      end loop;

      return True; 
   end Is_Coherent;

Credits

Proving the correctness of a binary search procedure with SPARK/Ada

9 July 2020 at 13:55

Introduction

SPARK/Ada is a language derived from Ada that allows for a formal checking (i.e., mathematically prove the correctness) of the software. Several types of checks can be done: from the absence of runtime exceptions to no use of uninitialized variables, up to the formal proof that a given procedure/function fulfills its contract.

I like formal checking because it can actually prove that your program is correct (or that, at least, some kind of errors are absent), something that is usually not possible with testing. Of course, formal checking cannot applied to every program, but when it is possible it is a very powerful tool.

SPARK/Ada is definitively on my list of stuff that I want to learn.

Recently I had to write a procedure for a binary search in an ordered array. I thought that it could be an interesting exercise to write it in SPARK/Ada in order to have it formally verified. This is a brief summary of this experience, written with a tutorial spirit.

The requirements

The problem I want to solve is the following

  • INPUTS
    • Table: an array of Float sorted in strictly increasing order (that is, Table(n+1) > Table(n) for every n).
    • What : a Float such that
      • it is included between the array extremes, that is, Table(Table'First) ≤ What ≤ Table(Table'Last), but
      • it is not necessarily in Table, that is it is not required that there is n such that Table(n) = What
  • OUTPUTS
    • The unique index n such that
Table(n)  What < Table(n+1)

The solution

The spec file

First basic draft of specs

Let's first flesh out the spec file (roughly [very roughly] similar to a *.h file for you C people).

pragma SPARK_Mode (On);

package Searching is
   subtype Element_Type is Float;
   subtype Index_Type is Natural;

   type Array_Type is 
      array (Index_Type range <>) of Element_Type;

   function Find (What  : Element_Type;
                  Table : Array_Type)
                  return Index_Type;
end Searching;

Entering SPARK Mode

The first line we encounter is

pragma SPARK_Mode (On);

This claims that this package will be compatible with SPARK restrictions and conventions. The lines

   subtype Element_Type is Float;
   subtype Index_Type is Natural;

define Element_Type and Index_Type as synominous of Float and Index_Type, respectively. This is not strictly necessary, but it can make it easier to change the types in a future

Function declaration

 function Find (What  : Element_Type;
                Table : Array_Type)
                return Index_Type;

should be clear enough.

Introducing contracts

In order for SPARK to be able to proof that our implementation of Find is correct, we need to describe to SPARK what we expect Find to do. This is done by means of a contract. Like a normal contract between people, a function contract usually has two parts: preconditions and postconditions.

The idea is that if you (the caller) do your part (i.e. you meet the preconditions when you call me), then I, the function, promise to do my part, that is, that the result will satisfy the post-conditions.

If a contract is given, I can ask SPARK to prove that the post-conditions follow from the pre-conditions. If SPARK succeeds, I know that the code I wrote is correct (in the sense that it respects the contract) without doing any test: I have a mathematical proof for it.

BTW, contracts are very useful even if you are not using SPARK. First, they are a wonderful "bug trap." By using the right options, you can ask the compiler to insert code that checks the pre-conditions when the function is called and the post-conditions when the function ends. If pre/post-conditions are not met, an exception is raised.

Moreover, contracts document in a formal and unambiguous way the behavior of the function and, differently from usual comment-based documentation, cannot go out of sync with the code.

OK, so contracts are cool. How do we write the contract for our function? Well, let's start with the precondition and let's check the specs. It is said that (i) Table must be sorted and (ii) What must be between Table extrema; we just translate that in Ada in this way

    function Find (What  : Element_Type;
                  Table : Array_Type)
                  return Index_Type
     with
       Pre =>
          Is_Sorted (Table)
          and Table (Table'Last) >= What
          and What >= Table (Table'First);

where Is_Sorted is a function that checks if its argument is sorted and defined as follows

function Is_Sorted (Table : Array_Type) return Boolean
   is (for all L in Table'Range =>
         (for all M in Table'Range =>
            (if L > M then Table (L) > Table (M))))
   with Ghost;

The body of the function (this form is called expression function) is the translation in Ada of the definition of monotonic array.

Note the with Ghost in the definition. This says that Is_Sorted is a ghost function, that is, a function that can be used only in contracts or assertions; if used in other places ("real code") an error is raised. (I love this kind of protections ♥)

The preconditions look right and they represent the actual specs, but if we try to run SPARK we get an error

medium: array index check might fail

at line

      and What >= Table (Table'First);

This means that Table'First, that is the first index of the array, can be ... outside array bounds? The first index is by definition within the bounds, right?

Perplexed woman
Go home SPARK, you're drunk...

Well... Actually, SPARK is right. If you ask for a counterexample you get

e.g. when Table'First=1 and Table'Last=0

Now, when in the world can be possible that the last index is smaller than the first one?!? Well, if Table is empty... Empty arrays in Ada have the last index smaller than the first one. Therefore, trying to access Table (Table'First) would cause an error.
Well, SPARK, good catch...

Old man asking why someone would want to search an empty table

Well, I agree (although it could happen because of an error), but SPARK does not know it. To make SPARK happy it suffices to add Table'Length > 0 to the precondition to get

    function Find (What  : Element_Type;
                  Table : Array_Type)
                  return Index_Type
     with
       Pre => Table'Length > 0
              and then (Is_Sorted (Table)
                        and Table (Table'Last) >= What
                        and What >= Table (Table'First));

Now SPARK is happy and even the casual user sees that you cannot call the function with an empty table.

The postcondition

Also for the post-conditions we translate directly the requirement from English to Ada. It is required that What is between Table(n) and Table(n+1) where n is the value returned by the function. In Ada we get

         (if Find'Result = Table'Last then
             Table (Find'Result) = What
          else
             Table (Find'Result) <= What 
         and What < Table (Find'Result + 1));

Note that Find'Result is the value returned by Find and that we consider as special case Find'Result = Table'Last since in this case there is no "following entry" in Table. Overall, the function declaration with the full contract is

   function Find (What  : Element_Type;
                  Table : Array_Type)
                  return Index_Type
     with
       Pre => 
          Table'Length > 0
          and then (Is_Sorted (Table)
                    and Table (Table'Last) >= What
                    and What >= Table (Table'First)),
       Post =>
         (if Find'Result = Table'Last then
             Table (Find'Result) = What
          else
             Table (Find'Result) <= What 
         and What < Table (Find'Result + 1));

The body file (implementation)

The algorithm

The algorithm for a binary search is well known and exemplified in the following picture

Example of binary search over a length 8 array

Basically, we keep two "cursors" in the table: Bottom and Top with the condition that it must always be

Table(Bottom)  What < Top(top)

We get the Middle point between Top and Bottom and if Table(Middle) is too large we move Top to Middle, otherwise we move Bottom. We iterate this until on of the following two conditions holds

  • Table(Bottom) = What, that is, we actually found What in Table
  • Top = Bottom+1 there are no intermediate entries between Top and Bottom

In both cases we return Bottom

The actual code

Let's start with a basic skeleton of the procedure.

function Find (What : Element_Type; Table : Array_Type) return Index_Type
   is
      Bottom : Index_Type;
      Top    : Index_Type;
      Middle : Index_Type;
   begin
      if Table (Table'Last) = What then
         return Table'Last;
      end if;

      Bottom := Table'First;
      Top    := Table'Last;

      pragma Assert (Table (Top) > What and What >= Table(Bottom));

      while Table (Bottom) < What and Top - Bottom > 1 loop
         Middle := (Bottom + Top)/2;

         if Table (Middle) > What then
            Top := Middle;
         else
            Bottom := Middle;
         end if;
      end loop;

      return Bottom;
   end Find;

It is just a translation of the algorithm informally described above. Note how the special case Table (Table'Last) = What is handled separately; in this way we know that

Table (Top) > What and What >= Table(Bottom)

holds (see the pragma Assert).

Now, in order to make it easier for SPARK to prove the correctness of the code we insert in the code some pragma Assert claiming properties that hold true in different points of the code.

Automatic theorem proving is not easy and some hint to the prover can help. Moreover, I love spreading generously my code with assertion since they help understanding what the code does and which properties I expect to be true. They are also formidable "bug traps".

The function body with all the assertions looks like

function Find (What : Element_Type; Table : Array_Type) return Index_Type
   is
      Bottom : Index_Type;
      Top    : Index_Type;
      Middle : Index_Type;
   begin
      if Table (Table'Last) = What then
         return Table'Last;
      end if;

      pragma Assert (Table (Table'Last) > What);

      Bottom := Table'First;
      Top    := Table'Last;

      pragma Assert (Bottom < Top);
      pragma Assert (Table (Bottom) <= What and What < Table (Top));

      while Table (Bottom) < What and Top - Bottom > 1 loop

         pragma Loop_Invariant (Bottom >= Table'First);
         pragma Loop_Invariant (Top <= Table'Last);
         pragma Loop_Invariant (Top > Bottom);
         pragma Loop_Invariant (Table (Bottom) <= What and What < Table (Top));
         pragma Loop_Variant (Decreases => Top - Bottom);

         Middle := (Bottom + Top)/2;
         pragma Assert (Bottom < Middle and Middle < Top);

         if Table (Middle) > What then
            Top := Middle;
         else
            Bottom := Middle;
         end if;
      end loop;

      pragma Assert (Table (Bottom) <= What and Table (Bottom + 1) > What);
      return Bottom;
   end Find;

There are just two points worth describing; the first is this sequence of pragmas inside the loop

pragma Loop_Invariant (Bottom >= Table'First);
pragma Loop_Invariant (Top <= Table'Last);
pragma Loop_Invariant (Top > Bottom);
pragma Loop_Invariant (Table (Bottom) <= What 
                       and What < Table (Top));
pragma Loop_Variant (Decreases => Top - Bottom);

pragmas Loop_Invariant and Loop_Variant are specific to SPARK. A loop invariant is a condition that is true at every iteration and you can see that this loop invariant is a formalization of what we said before: at every iteration What is between Top and Bottom. Loop invariants are important in the proof of correctness of while loops.

A while loop can potentially go on forever; a technique that allows us to prove that the loop will terminate is to search for some value that always increases (or decreases) at every iteration; if we know a bound for this value (e.g., it is always positive and it always decreases) we can deduce that the loop will terminate. The pragma Loop_Variant allows us to declare said value. In this case the distance between Top and Bottom is halved at every iteration, therefore it is a good variant. Since Top-Bottom cannot be smaller than 2 (see condition in the while), we deduce that the loop will terminate.

A second observation is about seemly innocuous line

 Middle := (Top + Bottom) / 2;

Here SPARK complains with

medium: overflow check might fail 
(e.g. when Bottom = Index_Type'Last-4 and Top = Index_Type'Last-2)

Good catch! Here SPARK is observing that although theoretically Middle should be in the range of representable integers if Top and Bottom are, there is a possibility of an overflow while doing the sum. Since on my PC Index_Type'Last = 2^63-1, it is unlikely that one would work with tables so big, nevertheless...

We have two solutions: (i) allow a smaller range of integers (that is, up to Index_Type'Last/2) or (ii) compute the mean with function

function Mean (Lo, Hi : Index_Type) return Index_Type
is (Lo + (Hi - Lo) / 2)
   with
     Pre  => Lo < Index_Type'Last and then Hi > Lo + 1,
     Post => Hi > Mean'Result and Mean'Result > Lo;

that is guaranteed to have no overflow. Note that in the precondition we require that Hi > Lo + 1, this is guaranteed by the loop condition and it is necessary in order to guarantee that the post conditions hold which in turn guarantees that the loop variant decreases at every iteration.

Finally...

OK, now if we run SPARK we get

Summary of SPARK analysis
=========================

-------------------------------------------------------------------------------------------------------
SPARK Analysis results        Total      Flow   Interval   CodePeer      Provers   Justified   Unproved
-------------------------------------------------------------------------------------------------------
Data Dependencies                 .         .          .          .            .           .          .
Flow Dependencies                 .         .          .          .            .           .          .
Initialization                    3         3          .          .            .           .          .
Non-Aliasing                      .         .          .          .            .           .          .
Run-time Checks                  29         .          .          .    29 (CVC4)           .          .
Assertions                       14         .          .          .    14 (CVC4)           .          .
Functional Contracts              3         .          .          .     3 (CVC4)           .          .
LSP Verification                  .         .          .          .            .           .          .
-------------------------------------------------------------------------------------------------------
Total                            49    3 (6%)          .          .     46 (94%)           .          .

Do you see on the extreme right the column Unproved with no entries? That is what I want to see... SPARK was able to prove everything, so now you can kick back, relax and enjoy your binary search function with confidence: you made terra bruciata (scorched earth) around it and no bug can survive.

Castle with flag labelled "Find" in the middle of a desert and a soldier with a flamethrower with a "SPARK" labeled t-shirt

Watchdoging in Ada

28 February 2020 at 16:46

This project was inspired by an article about how to write a thread watchdog in C. After reading it I thought "this would be a nice Ada project!"

So, here it is. This post is about my experience in writing it. My main motivation was to do an "exercise" in programming, but maybe it can be useful somewhere.

Task watchdog and how I did it

The problem is to monitor different tasks in a multi-task program and raise an alarm if a task stops working. A task proves that it is still alive by calling a specific function I_Am_Alive. If it fails to call it regularly, it is considered dead and an alarm is raised.

Three ingredients are involved in this

  1. The watcher itself, that is, the task that check regularly if the other tasks are still alive.
  2. A connection to the watchdog used by the controlled task to tell the watchdog Ehi, I am still alive!
  3. An alarm handler that does something when the watchdog raises an alarm.

Let's examine the three ingredients

The watcher

Let's check the package interface. The key ingredient is the watcher

package Watchdogs.Connections is

   --
   -- Type representing the watcher, that is, the object that wakes up
   -- and checks if its tasks are still alive.
   --
   type Watcher_Type is private;

   --
   -- Create a new watcher type specifying an alarm handler and a sampling
   -- time, that is, the time interval between successive wake ups.
   --
   function Create (Alarm_Handler : Alarm_Handlers.Alarm_Handler_Access;
                    Sampling      : Duration := 1.0)
                    return Watcher_Type;

    -- Other stuff ...

end  Watchdogs.Connections;

Watcher_Type represents the object that does the "dirty work." The task to be controlled will make a connection to it and use it to communicate with the watcher.

A watcher must be created with the function Create that expects as first parameter an access (a pointer in C jargon) to an alarm handler.

The alarm handler

The definition of alarm handler is the following

package Watchdogs.Alarm_Handlers is
   --
   -- Interface for an alarm handler.  Every handler you want to implement
   -- must descend from this and implement Task_Exited and Task_Unresponsive.
   --
   type Alarm_Handler_Interface is limited interface;


   type Alarm_Handler_Access is
     access all Alarm_Handler_Interface'Class;

For the non experienced in Ada: interface means that Alarm_Handler_Interface is an abstract type and you cannot create variables of this type, it works like a interface template. You need to derive new concrete classes from it. limited means that you can derive limited types, that is, types that cannot be assigned (see in the following). Finally, Alarm_Handler_Interface'Class is a catch-all type that includes Alarm_Handler_Interface and every type derived from it. In other words, Alarm_Handler_Access can point to values of any type derived by Alarm_Handler_Interface.

The interface Alarm_Handler_Interface requires that any non-abstract descendant implements two procedure: Task_Unresponsive (called when a task does not respond anymore, maybe because is stuck somewhere) and Task_Exited, called when a task exists

   --
   -- Called when a task is unresponsive.  It receives the task identification
   -- (name and/or ID) and the latest registered checkpoint
   --
   procedure Task_Unresponsive (Handler    : in out Alarm_Handler_Interface;
                                ID         : Task_Identification.Task_Id;
                                Name       : String;
                                Checkpoint : Checkpoint_Type)
   is abstract
     with Pre'Class =>
       (Name /= "" or ID /= Task_Identification.Null_Task_Id);

   --
   -- Called when a task exits.  It receives the task identification
   -- (name and/or ID)
   --
   procedure Task_Exited (Handler    : in out Alarm_Handler_Interface;
                          ID         : Task_Identification.Task_Id;
                          Name       : String)
   is abstract
     with Pre'Class =>
       (Name /= "" or ID /= Task_Identification.Null_Task_Id);


end Watchdogs.Alarm_Handlers;

Both procedures expects as parameter an identification of the task, namely its Task_ID and/or a name. The ID or the name (but not both) can be empty. The procedure Task_Unresponsive also expect a Checkpoint value that can be used to know where the task got stuck, more about this later.

This approach allows the user to implement its own alarm handler that can do anything. For convenience, package Watchdogs.Alarm_Handlers.To_Stderr defines an alarm handler prêt à porter that just prints a message to the standard error.

This is an example (from main.adb) of how to create a watcher

   --
   -- Get a watcher
   --
   Watcher : constant Connections.Watcher_Type :=
               Connections.Create (Alarm_Handler => new To_Stderr.Handler_Type);

The connection

The type for a connection to the watcher is Watchdog_Connection. Its definition is

   --
   -- A watchdog connection allows a task to communicate with the
   -- watcher
   --
   type Watchdog_Connection (<>) is limited private;

If you have no experience with Ada you can find the syntax above a bit obscure. The (<>) means that Watchdog_Connection can have some unknown discriminant. Without entering in technical details, this prevents the user to declare a variable of this type without initialization. The limited part means that you cannot copy a value of type Watchdog_Connection, a value is born, lives and dies in the same variable. This is useful for values that carry an "external connection" and it makes no sense to copy.

A connection is created with the function Open


   --
   -- Open a connection with the watcher.  The task needs to introduce itself
   -- with a name or a Task_ID, possibly both.  Those values will be passed
   -- to the Alarm_Handler if the task becomes unresponsive.
   --
   function Open (Watchdog  : Watcher_Type;
                  Task_Name : String := "";
                  ID        : Task_Identification.Task_Id := Task_Identification.Null_Task_Id)
                  return Watchdog_Connection
     with
       Pre => (Task_Name /= "" or ID /= Task_Identification.Null_Task_Id);

The function expects the watcher to connect to and a way to identify the task, it can be a name, the task ID or both, but at least one value must be present, as specified by the pre-condition.

Again, if you have no experience with Ada you could wonder what the part with Pre => ... is. It is a pre-condition, a condition that must be satisfied when you call the function. It can be considered part of the documentation, but it has the advantage that the compiler (if instructed to do so) can add code that checks the pre-condition at run-time and raises an exception if not satisfied. A powerful bug trap...

When the task ends the connection is automatically closed (and Task_Exited called) by the destroyer of the connection.

After opening a connection the task must declare its being alive; it does it by calling I_Am_Alive

   type Checkpoint_Type is mod 2 ** 16;
   --
   -- Let the watcher know that we are still alive.  If this function is
   -- called in different points of the task it is possible to distinguish
   -- different calls via the Checkpoint parameter.  The reason for having
   -- it is just to know what is the latest instance of I_Am_Alive
   -- called before the task crash.  Its value will be given to the
   -- alarm handler.
   --
   procedure I_Am_Alive (Connection : in out Watchdog_Connection;
                         Checkpoint : Checkpoint_Type := 0);

The procedure I_Am_Alive can accept a Checkpoint parameter (a 16 bit unsigned integer, actually) that the task can use to distinguish between different calls of I_Am_Alive. If the task gets stuck the latest Checkpoint is given to the alarm handler together with the task identification (ID or name), allowing to identify where the task got stuck.

An example

This is a very simple example of how a connection is used. This is a simplified version of what you find in main.adb

  task body Foo is

      Connection : Connections.Watchdog_Connection :=
                     Connections.Open (Watchdog  => Watcher,
                                       Task_Name => "my name is foo, task foo",
                                       ID        => Task_Identification.Current_Task);

      Sleep_Time : Duration := 0.1;
   begin
      --
      -- Now we are connected with the watcher that will check that we
      -- call I_Am_Alive regularly
      --
      loop
         --
         -- At every iteration we increase the Sleep_Time so that sooner
         -- or later it will exceed the wake up time of the watcher
         --
         delay Sleep_Time;
         Sleep_Time := Sleep_Time + 0.2;

         --
         -- Tell the watcher we are alive
         --
         Connections.I_Am_Alive (Connection);
      end loop;
   end Foo;

Digging in the internals

The user API is nice and cool, but you want a bit of gory details about the implementation, right? OK, so let's checkout the private definition of the watcher from package Watchdogs.Connections

private
    type Watcher_Type is access Watchers.Watchdog_Core;

Uh?!? That's it? Just an access to a "core type"? That's cheating...

Well, let's check the definition of Watchdog_Core in Watchdogs.Connections

private package Watchdogs.Watchers is
   --
   -- Object doing all the work.  This exports an interface similar
   -- to the user visible Watcher_Type.  This object is multitask safe
   -- (with Ada it is just too easy...)
   --
   type Watchdog_Core is limited private;
private
   --
   -- Other stuff... 
   --

   type Watchdog_Core is limited
      record
         Doa_Table : Task_Table_Access;
         Watcher   : Watchdog_Task_Access;
         Handler   : Alarm_Handlers.Alarm_Handler_Access;
      end record;
end Watchdogs.Watchers;

Several comments are in order.

First, do you see the keyword private before package? This means that Watchdogs.Watchers is a private package and it cannot be made visible outside the hierarchy of Watchdogs. In particular, the library user (i.e., the programmer that uses the library) will not be able to access directly the resources provided by Watchdogs.Watchers, but only through Watchdogs.Connections that withs Watchdogs.Watchers with

   private with Watchdogs.Watchers;

The keyword private before with says "Listen, I need the resources in Watchdogs.Watchers, but I promise, cross my heart, that never ever I'll let the user see it". Indeed, if you check watchdogs-connections.ads you'll see that Watchdogs.Watchers is referred only in the private part of the package, out of reach of the prying hands of the user...

Second, the definition of Watcher_Type looks simple, just three fields. The last one is the access to the alarm handler (this is easy), what are the other two fields? Here the hard stuff lies... ;-)
Let's begin with the easy stuff: the field Watcher is a Watchdog_Task_Access that we guess being an access to Watchdog_Task, but what is the latter? Well, a task

   task type Watchdog_Task is
      --
      -- This task is the real watchdog: it wakes up every now and then,
      -- check the task table for dead tasks and, if necessary, call
      -- the alarm handler
      --
      entry Init (Sampling : Duration;
                  Table    : Task_Table_Access;
                  Handler  : Alarm_Handlers.Alarm_Handler_Access);
   end Watchdog_Task;

   type Watchdog_Task_Access is access Watchdog_Task;

Since we declared it as task type Watchdog_Task behaves as type and we can, for example, declare variables of this type. Declaring a variable of type Watchdog_Task would start a new task that proceeds in parallel. In Ada synchronization is done traditionally by message passing via the call to task entry. In this case the entry is just used to give the task few parameters. The task will wake up every Sampling seconds, check the unresponsive tasks and call, if necessary, the handler.

OK, cool, and what about Table in the parameter list and in the definition of Watcher_Type? Well, here is where most of the complexity is hidden. A Task_Table_Access is an access to a Task_Table that in turn has the following definition

   --
   -- The protected object Task_Table is the core data structure.
   -- It keeps which tasks are still alive and which ones did not
   -- confirm that they are alive.
   --
   protected type Task_Table is
      -- Register that the task associated with the connection
      -- just went by th checkpoint
      procedure Mark_Alive (Connection : Connection_ID;
                            Checkpoint : Checkpoint_Type);

      -- Get the set of tasks that did not declared themselves alive
      procedure Get_Dead_Tasks (Set : out Connection_To_Checkpoint_Tables.Map);

      -- Reset the state, setting all task as "to be confirmed alive"
      procedure Reset;

      -- Delete a task
      procedure Delete (Connection : Connection_ID);

      -- Allocate a new connection ID to a task
      procedure Get_New_Id (Connection : out Connection_ID;
                            Name       : String;
                            ID         : Task_Identification.Task_Id);

      function ID_Of (Connection : Connection_ID)
                      return Task_Identification.Task_Id;

      function Name_Of (Connection : Connection_ID)
                        return String;
   private
      --
      -- It works in this way: we keep two sets of "tasks:" Alive (the
      -- tasks that declared to be alive) and Dead (the task that still
      -- have to declare to be alive).  At timeout we read the Dead list
      -- and raise a warning for the tasks in list; successively we copy
      -- (with a Reset) Alive to Dead, restarting the iteration
      --
      Alive   : Connection_To_Checkpoint_Tables.Map;
      Dead    : Connection_To_Checkpoint_Tables.Map;
      Next_Id : Connection_ID := Connection_ID'First;

      -- Keep name and ID of the tasks associated with a connection
      Connection_Table : Connection_To_Task_Tables.Map;
   end Task_Table;

A Task_Table is the object that stores the state of the monitored tasks: if dead or alive and their identifications. It is a protected type which means that it is accessed according to a reader/writer model (many tasks can read it at the same time, but writers have exclusive access). The compiler will take care of inserting the required synchronization code.

This object is manipulated mainly by the task watcher whose body is

   task body Watchdog_Task is
      Task_Table      : Task_Table_Access;
      Alarm_Handler   : Alarm_Handlers.Alarm_Handler_Access;
      Sampling_Period : Duration;

      Dead_Tasks      : Connection_To_Checkpoint_Tables.Map;

      use Connection_To_Checkpoint_Tables;
   begin
      -- Accept calls to the Init entry
      accept Init (Sampling : Duration;
                   Table    : Task_Table_Access;
                   Handler  : Alarm_Handlers.Alarm_Handler_Access)
      do
         Sampling_Period := Sampling;
         Task_Table := Table;
         Alarm_Handler := Handler;
      end Init;

      loop           
         delay Sampling_Period;  -- get some sleep

         --
         -- Extract from the task table the task that did not
         -- claimed to be alive
         --
         Task_Table.Get_Dead_Tasks (Dead_Tasks);

         --
         -- Iterate over the list of dead tasks
         --
         for Pos in Dead_Tasks.Iterate loop
            declare
               Connection : constant Connection_ID := Key (Pos);
               Checkpoint : constant Checkpoint_Type := Element (Pos);
            begin
               --
               -- Call the alarm handler with the task data
               --
               Alarm_Handler.Task_Unresponsive
                 (ID         => Task_Table.Id_Of (Connection),
                  Name       => Task_Table.Name_Of (Connection),
                  Checkpoint => Checkpoint);

               --
               --  Remove the task from the table
               --
               Task_Table.Delete (Connection);
            end;
         end loop;

         Dead_Tasks.Clear;

         --
         -- All the tasks that were declared alive get marked as
         -- dead.  Let them prove that they are alive! :-)
         --
         Task_Table.Reset;
      end loop;

   end Watchdog_Task;

Conclusion

As I said, I wrote this for the fun of it and, indeed, fun it was (Yoda-style). I hope you found this interesting.

Safer set-uid programs in Ada with the suid-helper library

22 February 2020 at 19:03

Set-uid programs: the good, the bad and the risky

In *nix systems access control is traditionally often done by checking if the user requesting a given operation has read/write/execution privilege with respect to a given file/directory.

In some cases this policy is not fine-grained enough. An example is the common passwd that allows the user to change its password. Changing the password requires modifying the file /etc/passwd that stores the passwords (to be honest, this was true on older systems, nowadays things are a bit different, but we stick to this for the sake of the example). This causes a problem with the permission accesses of /etc/passwd: the file must be clearly not writable by the ordinary user, but... how could the user change its password?

The solution is allow the user to modify /etc/passwd in a controlled way, via an executable that will change only the user's password. In other words, the user must gain temporally limited root privileges (administrator privileges for you Window people 😄). This is done with the idea of set-user-ID executable. If an executable is marked as setuid, the user running the program gets the privileges of the owner of the executable (usually root). In this way the program can carry out privileged actions on behalf of the user.

The three identities

We need a bit of more detail. In current *nix systems a running executable has 3 identities

  • The real user ID. This is the user that is running the code.
  • The effective user ID. This is the identity that is used to check the accesses. In a normal program this is equal to the real user ID, but in a setuid program it is initialized to the identity of the owner of the executable.
  • The saved user ID. This is initialized with the effective user ID; therefore, in a normal program it is equal to the real UID, while in a setuid program is equal to the ID of the owner.

How do these IDs interact? It is really simple:

The program can change its effective ID to its real or saved ID.

This means that a normal program starts with the effective ID equal to the identity of the user and it cannot be changed; while a setuid program can set it to the real identity (we will say that it drops the privileges) or to the identity of the owner (we will say that it restores the privileges). It is possible for the program to definitively drop the privileges by setting the saved ID to the real ID.

Good practices in setuid codes

Of course a setuid program is a potential security risk because of this privilege escalation provided by the setuid mechanism. Sure, if the program was correctly written, without weakness nor bugs, it would be safe to have setuid code, but... You know... We are all human beings and a defensive approach to limit the potential damage it is not a bad idea. We will consider two simple good practices

  1. Drop the privileges as soon as possible, restoring them only when necessary.
  2. Consider the environment variables as tainted since their value is under the control of the user. For example, the user could set the PATH variable to force the code to execute programs under the control of the user. It would be better not using environment variables at all, but if you really need replace the variable values with trusted values (e.g. write in PATH a list of trusted directory) or, at least, sanitize the value given by the user.

What is setuid-helper and how can it help me?

setuid-helper is a small Ada library that provides a package Setuid.Helper designed to help following the above guidelines for setuid programs.

During the package initialization (elaboration in Ada jargon) the package does three things

  1. It saves the environment variables received from the user
  2. It deletes every environment variable
  3. It drops the privileges

This means that when the first instruction of the main is executed, the program runs with the privileges dropped and the environment empty. The only way to restore the privileges is by using the procedure With_Privileges_Do. This procedure exists in different flavors that differ in how the action to be done is specified.

In the simplest case

procedure With_Privileges_Do (Callback         : access procedure;
                              Drop_Permanently : Boolean := True);

the procedure With_Privileges_Do expects a Callback parameter represented by an access (a pointer for you C people 😄) to a parameter-less procedure. The semantic is really simple: the privileges are restored, Callback called and the privileged are dropped again. Unless the second parameter is False, the privileges are permanently dropped and it will not be possible to restore them again. Note that permanent dropping is the default.

This is a typical usage example where the callback is defined locally inside a declare block

   declare
      procedure Unmount_Callback is
      begin
         Unmount (Mountpoint_Name);
      end Unmount_Callback;
   begin
      With_Privileges_Do (Callback         => Unmount_Callback'Access,
                          Drop_Permanently => True);
   end;

There are two other versions of With_Privileges_Do: one expects a handler object (that can store a status), the other expects a function and it can return a value. See the .ads file and the README file for more details.

Managing the environment

As said before, it is maybe better not using environment variables at all. This is not always possible and setuid.helper provides a way to manage it as safely as possible, keeping two different environments: one normal and one privileged.

More precisely, the package mantains three "enviroments"

  1. The tainted environment. This is a copy of the original environment. It cannot be changed, but it can be read with the functions Tainted_Variable. This allows to import in a controlled way the values provided by the user in the environment.
  2. The user environment. This is used when the privileges are dropped. It is initially empty and it can be written with Set_User_Environment.
  3. The safe environment used when privileges are on. It is initially empty and it can be written with Set_Safe_Environment.

Procedure Set_Safe_Environment expects the value to be of type Safe_Value, not String. A string can be converted to a Safe_Value by calling the function Bless.

Just to be clear: it is expected that the value given to Bless is checked or sanitized, but nothing prevents the programmer to call Bless with any unsafe value. The usage of a different type is to avoid involuntary short-circuits where a value of the original environment (to be read with Tainted_Variable, nomen omen...) is given to Set_Safe_Environment

Portability, installation, ...

Just check the beginning of the README file. I work with Linux and in Linux it works; I guess it should work with any modern *nix (if it works for you under another *nix please let me know).

What about Windows? Honestly, I do not know much Windows, but I am afraid that the idea of the setuid bit is very *nix-ish. Is maybe possible to bring the same idea to Windows? Well, let me know, never say never...

Conclusion

setuid-helper is a pretty young library (currently is 0.1.0 since I am not sure about the stability of its API). Any feedback is welcome.

Ada for Webassembly

4 February 2020 at 20:40

A very, very fast post that I guess can be of interest to many (or at least raise some curiosity): The GNAT-LLVM project is porting the Ada standard library and runtime to WebAssembly, allowing to do web development with the robustness of Ada.

Personally, I am quite excited by this piece of news, although I guess that it will not be easy to overcome the predominance of Javascript and its environment in this field.

  • 4 February 2020 at 20:40

My first experience with SPARK-Ada

25 May 2019 at 20:11

I program in Ada and I like it.

Yes, I know, it sounds like a "coming out"... 😉

In the world of Ada there is a thing that I always wanted to try: it is a reduced subset of Ada called SPARK (nothing to do with Apache) that it allows for formal checking, that is, proving some properties of your code such as: absence of buffer overflow and other runtime errors, no uninitialized variable read, and so on. I like the idea to have my code (or, at least, part of it) armored by a mathematical proof 😊.

Curious about SPARK? Check out this Introduction to SPARK

Few days ago I decided to give it a try using an old function of mine: a Ruby-like split to split strings at a separator. It is small piece of code that I wrote some time ago and used safely in many programs of mine: the right "guinea pig" for the experiment.

If you are curious, the final SPARK code is on github, be my guest! 😊

The first thing to do was to replace the use of Vectors (a kind of dynamic array, similar to its C++ counterpart) with normal arrays. You see, as you can imagine, Vectors use pointers (access types in Ada jargon) and they are prohibited in SPARK. The fact is that pointers would allow for aliasing of objects and this would make the formal check impossible.

It is not as bad as it sounds: in Ada the necessity of using pointers is much reduced with respect to C or C++. The only real need is when you want to dynamically allocate structures.

Therefore, I wrote my own pseudo-Vector type with the minimal set of needed functionality. The idea was to keep the extracted pieces in a fixed array (allocated to the stack, so no explicit pointers) large enough to keep the pieces (if you split a string with length N you get at most N+1 pieces). Maybe not very memory efficient, but fine for my needs.

The resulting code (slightly pruned with respect to the original, for "didactic" purposes...) follows. It seems long, but it is just that I added few comments to explain the more Ada-esque syntax.

First the spec part (very roughly the equivalent of a *.h file)

with Ada.Strings.Unbounded;     use Ada.Strings.Unbounded;

package Token_Lists 
   with SPARK_Mode => On -- This specs is in SPARK
is

   subtype List_Length is Integer range 1 .. Integer'Last;

   -- "Tagged types" correspond roughly to what other 
   -- languages call "class".  Token_List is the desired
   -- pseudo-vector for holding the pieces.  
   --
   -- The funny (<>) means that the type is parameterized, but the
   -- parameter is not public.  This forces you to initialize 
   -- any variable of type Token_List with some kind of constructor.
   --
   -- Finally, "private" means that the internal details are not
   -- visible and are described later, in the private part. 
   --
   type Token_List (<>) is tagged private;

   function Create (N : List_Length) return Token_List
     with
       Post =>
         Create'Result.Capacity = N and Create'Result.Length = 0;
    --
    -- Post is an "attribute" and it specifies a Boolean expression
    -- that will be true when the function returns (contract). 
    --
    -- The post-condition says that the created list will have 
    -- room for N entries and it will be empty
    --

   function Capacity (Item : Token_List) return Positive;

   function Length (Item : Token_List) return Natural
     with Post => Length'Result <= Item.Capacity;
   --
   -- Number of the pieces currently in the list.  Of course 
   -- it cannot be larger than the capacity, if it happens 
   -- there is something wrong somewhere...
   ---

   procedure Append (List : in out Token_List;
                     What : String)
     with
       Pre'Class =>
         List.Length  < List.Capacity,

       Post =>
         List.Length = List.Length'Old + 1
     and List.Capacity = List.Capacity'Old;
    -- 
    -- The precondition (attribute "Pre") says that before calling
    -- the procedure there must be some room in the list; the
    -- post condition says that after the call there is a new entry,
    -- but the capacity is unchanged (it is obvious to you, not to
    -- SPARK).  SPARK will analyze the body of Append and it 
    -- will try to prove that the contract is respected, that is, 
    -- that the postcondition follows from the precondition.
    --
    -- If you arm correctly the code that calls Append, SPARK will
    -- try to prove that the precondition is always verified. If it  
    -- succeeds you know you'll never have an overflow!
    --

private
   --
   -- Some privacy needed... :-)
   --


   type Token_Array is array (Positive range <>) of Unbounded_String;
   --
   -- The funny "(Positive range <>)" says that the index of 
   -- indexes of a variable of type Token_Array is a range of
   -- positive integers.  
   -- 
   -- Yes, in Ada the array indexes do not *necessarily* start
   -- from zero... They can start from whatever you want... ;-D
   --

   type Token_List (Length : List_Length) is tagged
      record
         Tokens     : Token_Array (1 .. Length) := (others => Null_Unbounded_String);
         First_Free : Positive := 1;
      end record
     with 
       Predicate => 
         Token_List.First_Free <= Integer (Token_List.Length) + 1
     and Token_List.Tokens'Length = Token_List.Length;
    -- 
    -- This is the full definition of Token_List anticipated 
    -- in the public part above.  In Ada it is not possible 
    -- to have some field public and some private (like in C++
    -- or Ruby).  If you put the full definition in the public
    -- part, everything is public (typically considered bad practice),
    -- otherwise everything is private.  Honestly, I never felt the
    -- need of the hybrid solution.
    -- 
    -- "Predicate" is another attribute.  It specifies a condition
    -- that a variable of type Token_List must always satisfy.
    --
    -- If you ask to the compiler to check assertions, the 
    -- compiler will produce code that checks the predicate
    -- at the end of methods ("primitive procedures" in Ada 
    -- jargon) that modify the variable and if the condition 
    -- is not satisfied, an exception is raised, pointing an
    -- accusing finger against the culprit.  
    --
    -- A powerful bug trap!
    --



   function Create (N : List_Length) return Token_List
   is (Token_List'(Length     => N,
                   Tokens     => (others => Null_Unbounded_String),
                   First_Free => 1));

   function Capacity (Item : Token_List) return Positive
   is (Item.Tokens'Last);

   function Length (Item : Token_List) return Natural
   is (Item.First_Free - 1);
end Token_Lists;

Now the body (more or less the equivalent of a *.c)

pragma Ada_2012;
package body Token_Lists 
   with SPARK_Mode => On -- This body is in SPARK
is

   ------------
   -- Append --
   ------------

   procedure Append
     (List : in out Token_List;
      What : String)
   is
   begin
      --
      -- If you look at the definition of Token_List 
      -- you'll see that List.Tokens is an array. 
      -- List.Tokens'Last is the last index of Lists.Tokens
      -- 
      if List.First_Free > List.Tokens'Last then
         raise Constraint_Error;
      end if;

      List.Tokens (List.First_Free) := To_Unbounded_String (What);
      List.First_Free := List.First_Free + 1;
   end Append;

end Token_Lists;

Even if you do not know Ada, I think you should be able to understand the code.

As you can see, Append is very simple and obviously correct, right? I mean, the if at the beginning stops every tentative of writing beyond the array List.Tokens, right?

Well, SPARK complained saying that it was not able to prove that an overflow would not happen. My first reaction was something like "Are you crazy? Don't you see the check 4 lines above? What is wrong with you?"

Anyway, next to the message error there was a "magnifying glass" icon. I clicked on it and I got, SPARK's courtesy, a counterexample

 [Counterexample] List = (Length => ?, First_Free => 2147483647) 
 and List.Tokens'First = 1 and List.Tokens'Last = 2147483647

What the... ? Ah! ... Yeah, OK, you are right...

In case you did not recognize it at once, 2147483647 is 2^31 -1. If I created a Token_List with 2 Giga entries (doubtful if it would fit on the stack... anyway...) and I filled it completely, then I would actually have an overflow.

Now I had two choices: the first one was to say that I will never use 2 Giga entries and ignore the case; the second one was to make it official that you can have at most 2^31 -2 entries. This was really easy to do: it sufficed to change in the specs file the line

subtype List_Length is Integer range 1 .. Integer'Last;

that defines the type used for the list length to

subtype List_Length is Integer range 1 .. Integer'Last-1;

Notice the -1 at the end. Since this type is used in the constructor

function Create (N : List_Length) return Token_List;

this guarantees that you'll never call Create with N larger than 2^31 -2 (if you try to do it, an exception is raised). This placated SPARK.

After catching this border-line-condition bug I moved to SPARK-ing the rest of the code, catching another couple of border-line cases.

Overall, I can say that original code was fine since the bugs were triggered by some very extreme conditions; nevertheless it is nice to know that everything is clean now...

Reasons for loving Ada. #1: strong typing

10 December 2017 at 16:41

Yes, yes, I know... I am kind of an oddball for programming in Ada. When the matter comes out in conversations I am usually asked to defend this choice of mine. For future reference (and maybe to help others knowing this nice language) I decided to write few articles about what I like in Ada.

History and other stuff

Before diving in the details, let me spend a couple of words about the language. Many never heard about Ada and many of those who did have some prejudices about it.

The language Ada is named after Ada Lovelace daughter of Lord Byron (yes, that Byron) and considered the first programmer in history.

Incidentally, did you notice that I write "Ada" and not "ADA"? This is because it is a name of a person, not an acronym. Use "ADA" on an forum/newsgroup and you will be corrected in no time...

Ada was developed in the '80s following an idea of the Department of Defence. (I am not replicating here the interesting history that is available elsewhere.) Since then it was updated every approximately ten years giving rise to several releases, informally known with the year number (Ada 83, Ada 95, Ada 2005 and Ada 2012, next release will probably be Ada 2020). Ada is a language very alive and modern, with several feature not easily found elsewhere: contracts, type invariants, formal checking, private packages, distributed programming, native multitasking and so on...

Ada is quite flexible: it is used to write million-line code in avionics, but also to control small devices like the STM32. See, for example, this segway implemented with a Lego mindstorm controlled by a multitasking code (with a 16 bit processor and 64K of RAM!) presented at FOSDEM.

Why do you love Ada?

A simple answer? Reduced debugging time: my personal experience is that an Ada code will require maybe 1/10 of debugging time if compared with a C code of similar complexity. The reason is that Ada compiler is much "stricter" than C, so that when the program compiles, it has an internal coherence that prevents the existence of many silly bugs (and most bugs are just silly!) that would have survived in a C code (functions with no return, a missing break/case in a switch, dangling pointers, buffer overflows...). The remaining bugs are easily caught by the many bug traps (I love this term) that the compiler adds to your code. Programming in Ada is like doing pair programming, but with the compiler playing the role of the observer.

Please note that I am not claiming that by using Ada your code will be automagically readable, maintainable and bug free. You can write bad code in Ada, you just need to make an effort... :-)
Seriously, it is often said that the quality of the code depends mainly from the skill of the programmer, rather than from the type of tools. Nevertheless, using the right tool will help you to achieve the same result with less effort.

OK, OK, I understand... But can you make me an example?

Sure, I am here for this. Maybe the simplest characteristic of Ada that helps you in writing robust software is the type system, especially its strong typing nature. Let me explain with an example: if in C you write

typedef int serial_number;
typedef int port;

you can assign with no problem a variable of type port to a variable of type serial_port since both are integers; but if you do in Ada

type Serial_Number is new Integer;
type Port          is new Integer;

you cannot assign a variable of type Port to a variable of type Serial_Number although both "under the hood" are implemented as integers. This is because they actually represents two different things despite being implemented in the same way at the lowest level. Well, it makes sense, doesn't it? Why should you want to use a TCP port as a serial number? Most probably, if you try to do such a thing, there is an error somewhere.

What if you actually need to use a port as a serial number? No problem, you can convert it with Serial := Serial_Number(Client_Port);. Note that this conversion has no cost since both are integers; it is just a way to tell to the compiler "Listen, this is not an error, I know what I want, please bear with me and assign this value."

Note the use of "camel case with underscores" for names that are not keywords. This style is quite common in modern Ada code. It is not mandatory, of course, but personally, I find it quite readable. Please also note that Ada is case-insensitive, so that you could write, e.g., serial_number. Oh, yes, and the use of ":=" for assignment.

Actually, the code above is not the best choice. How do you know that an Integer will be large enough to keep a serial number or a port? Well, on current Intel processor (whit 32-bit integers), that is quite reasonable, but what if you have a small micro-controller with 16 bit integers? Well, maybe the best thing is to let the compiler to decide by writing

type Serial_Number is range 0 .. 999_999;
type Port          is range 0 .. 2**16-1;

Note the use of '_' as separator in numbers. A simple thing, but really convenient...

Note how we do not say to the compiler which low-level implementation to use, but which characteristics we need and let the compiler to decide how to handle them. On a microcontroller with 16-bit ints maybe Serial_Number will be implemented as a long int, while Port will be an unsigned int. But who really cares? Let the compiler take care of this boring stuff...

What if you need Serial_Number to have a specific size, say 24 bits (because, for example, you need to write it in a packet)? Just write

type Serial_Number is range 0 .. 999_999 with Size => 24;

I do not want to dig deeper in the Ada type system, but I cannot resist telling you about two types that are not commonly found elsewhere. If you write type Volt is delta 0.125 range 0.0 .. 255.0; the variables of type Volt will hold a fixed point real ranging in the interval 0 V..255 V with step 0.125 V. Fixed point numbers are usually implemented as integers and are used, for example, in some DSP applications running on small processors without floating point maths. Another uncommon type is the decimal fixed point type defined with something like type Money is delta 0.01 digits 15;. I'll let you discover about them. See also the corresponding Reference Manual page.

An example: cleaning user input

Let us make a toy (but not so-toy) example of exploitation of the strict typing of Ada.

Someone could object that there are other ways to handle the problem considered here. Yes, I know, but I just need a simple example to show what you can do with strict typing. I am not claiming that is the only (or best) solution (although, I think it is quite good).

It is well known that in any application using user input care must be taken in using the user input since it could open security holes. The following xkcd comic is a classic

alt text

Of course, we can pay all the attention we want to not use user input before sanitizing it, but if the application is very large and everything is a String (or char*), something can slip in the cracks... Type strictness can help us here. We just need to define a new type Dirty_String and have all the user input function return a Dirty_String rather than a string (this is easier to check). The only way to transform a Dirty_String in a normal String will be via a special Sanitize function.

Let's dig into details. We will define the following package specs

package Dirty_Strings is 
  type Dirty_String(<>) is private;

  function Sanitize(X : Dirty_String) return String;

  function Taint(X : String) return Dirty_String;
private
  type Dirty_String is new String; 

  function Taint(X : String) return Dirty_String
  is (Dirty_String(X));
end Dirty_Strings;

In Ada a package is divided into two parts: its specs that specifies the "resources" exported by the package and its body with the actual implementation. The specs are further divided into a public part (visible to the rest of the code) and an optional private part.

This package define a type Dirty_String. In the public part (before the private) the type is defined as private, that is, none of your business. Moreover, the package exports two function that can be used to convert from Dirty_String to normal String and vice-versa.

However, in the private part we see that a Dirty_String is just... a String. Putting its definition in the private part prevents short-cut conversions from Dirty_String to String and forces the programmer to go through the function Sanitize that, we guess, will do stuff like quoting special characters. Instead, the conversion of a normal String to a Dirty_String is just a type conversion since there is no need to change it. This allows us to define it as a expression function (see also the RM) that, most probably, will be "inlined" by the compiler.

Run-time constraints

Let me conclude with a feature of Ada 2012 that I find quit cute (and useful). Few years ago, I wrote a small package to write Matlab files from Ada. One day I discovered that the package was writing files that could not be read from Matlab. The reason was that the names of the variables in the Matlab file were not valid Matlab names. After correcting the bug, I decided to add a new bug trap to the code. I defined the type to be used for variable names as

type Matlab_Name is new
     String
   with Dynamic_Predicate =>
     (for all I in Matlab_Name'Range =>
        Is_Alphanumeric (Matlab_Name (I)) or Matlab_Name (I) = '_')
     and
       Is_Letter (Matlab_Name (Matlab_Name'First));

If you have a bit of programming experience, you should be able to understand the code above, even if you do not know Ada. As you can see Matlab_Name is a String (but you cannot mix it with other strings, e.g., a filename!), but it also must satisfy a Dynamic_Predicate (a condition that is checked at run-time, if you ask the compiler to do so). The condition can be split in two, the first part

  (for all I in Matlab_Name'Range =>
        Is_Alphanumeric (Matlab_Name (I)) or Matlab_Name (I) = '_')

requires that every character in the name must be a letter, a digit or an underscore, while the second part

Is_Letter (Matlab_Name (Matlab_Name'First));

requires that the first character must be a letter. If checks are enabled and at some time your code generates a bad name, an exception will be raised precisely in the point were the bug happened. (This pinpointing of bugs helps a lot in debugging...)

What about efficiency? Yes, I can hear you, efficiency lovers. The idea of having checks done at run-time seems to go against the idea of efficiency. Well, it is true, run-time checks costs in term of efficiency, but in my experience you do not even notice it. Unless you are on a very strict time budget (or unless you have very complex checks) it is usually more convenient to keep them on to catch possible bugs. You should take the decision of turning checks off only after discovering that you cannot afford them and you should turn off only those that are in most computationally intensive parts (possibly after thoroughly testing).

An alternative approach is to code in SPARK (a subset of Ada, nothing to do with Apache) in order to check your code formally. I'll let you discover the joy of SPARK... :-)

Which programming language should you learn?

29 December 2016 at 12:56

This question turns out now and then on forums, Facebook, and so on. Since I got tired of writing always the same things, I decided to write my suggestions here once for all.

As usually, with this kind of questions, everything depends on your starting point (are you a beginner? an amateur?) and your target (do you want to do web development? Contribute to open source projects? Program small devices like STM32?). Usually who asks this question is a beginner and does not say anything about specific goals, therefore usually I assume that the objective is to get some “culture” in this field. The answer that follows is written with this generic objective in mind.

Step 1 : learn to program (simplicity is the key)

If you have no experience in programming at all, I will suggest starting with something simple, in order to learn how to program before actually learning a language.

Yes, because “learning to program” is not the same of “learning a programming language.” Learning to program means learning how to decompose a problem into smaller problems and specify a procedure to solve them. Knowing how to do this is a skill that is independent on the specific language and somehow on a higher level than knowing programming languages. You can write programs without even having a PC, but just pen and paper.

Ada Lovelace, daughter of Lord Byron (yes, that Byron!), is considered the first programmer in history because of her notes to the article on the Analytical Engine designed (but never built) by Charles Babbage, a kind of computer of ‘800, but with the register made with gears and powered with steam… (I would suggest reading the Ada Lovelace article, it is very interesting)

Of course, even for learning the skill of programming a language is necessary, but I would stay on something simple.

A language that was suited for beginners, in my opinion, was BASIC. It was fairly popular in the ’80 when every home computer had some kind of BASIC interpreter built-in. Although simple, it had all the necessary features and one could write quite complex stuff with it, although the resulting code was not very maintainable. An advantage of BASIC was that it was interpreted: you could write a single command and having it executed immediately, without the need of an IDE.

Since BASIC is much less popular nowadays, I think that C is another good choice as a starting point since it is fairly simple. C is also quite low-level, giving you visibility of some machine level stuff like pointers and stack and this is useful because it gives you a sensibility about what happens “under the hood” even with other languages. (I find that this kind intuition helps me a lot when I learn a new language). Be sure to learn even all the tiny, obscure details of C such as pointers, struct, unions, functions with variable argument list (such as printf).

Also with C you will learn about macro expansion, a methodology used in several places and that requires a mindset slightly different from the one used in imperative languages like C. To be honest, I do not love macro expansion since it is quite error-prone and it can make the code less readable, but there are context where it is useful.

Second step: best practices & OOP with Ada

Since object-oriented programming (OOP) is widely diffused and fashionable, as a second language many would suggest C++ or Java. However, I am going to be heretic and suggest Ada. Beyond allowing OOP, it is a modern language, very powerful with strong emphasis on correctness and maintainability. It can introduce you also to stuff that it is not common in other languages such as contracts, type invariants and formal checking. Studying it you will learn some “good practices” in programming that will be useful even with other languages.

Since Ada is not widely used, finding information in the network is not easy. Few useful links: you can get an open-source, gcc-based compiler here: http://libre.adacore.com/. For an introduction to the language see the Adacore University, for an easy reference see the Wikibook, for a mix of resources see the Ada Information Clearinghouse (where you can find the reference manual) and the site of Adacore. Finally, you can meet other Ada-ists on the usenet group comp.lang.ada and on some LinkedIn groups.

Now you have the most important basics

Once you have a good experience of C and Ada, learning other languages is just a matter of learning a new syntax and maybe one or two new concepts. I, for example, learned Ruby in few hours just using a tutorial. In the end, most of the current languages are very similar one another.

Polishing it : Ruby/Python and assembly

Finally, to round everything up, I would suggest, as the cake icing, a scripting language like Ruby or Python and maybe some assembly. It is true that assembly is not used much nowadays (although it depends on your context), but some experience of assembly will give you some good understanding of what happens “under the hood.” Moreover, it is so much fun to go so close to the processor that you can almost feel the silicon... :-)

Strange stuff: functional languages, PROLOG, ….

If you really want the widest view, you can also try some fancy stuff like functional languages (Eiffel, Haskell, … even LISP) and logic programming stuff like PROLOG. Those languages have a “model” that is different from the usual “procedural” model. I did not see them used too much and I wonder if they are suited for very large scale and very long lived software, but it is useful to know that they exist and how they work.

Conclusions and final remarks

I hope that you found this useful or that, at least, gave you some ideas about programming. A final suggestion: my experience is that the best way to learn a new language is to use it. After you got acquainted with the syntax, choose a project of intermediate difficulty (even a silly one, it does not necessarily need to be useful) and do it (and, possibly, have fun…)

❌
❌