Normal view

There are new articles available, click to refresh the page.

Before yesterdayNews from the Ada programming language world

Riccardo Bernardini
Reasons for loving Ada: Type invariants (because bugs shouldn't sleep...)
19 July 2020 at 14:30

Reasons for loving Ada: Type invariants (because bugs shouldn't sleep...)

Riccardo Bernardini

By: Riccardo Bernardini

19 July 2020 at 14:30

This tutorial talks about a powerful bug trap that was introduced in Ada 2012: the Type_invariant attribute.

Type invariant? What's that?

The idea of type invariant is pretty simple. Sometimes data structure are quite simple, just an aggregation of values, but often they are quite complex with an internal structure that needs to be preserved.

A type invariant associated to a given type is a Boolean expression that is true for every valid value of the type. Ada allows the association of a type invariant to a private type; the compiler — if instructed so — will insert code that checks that the invariant remains satisfied; if not, an exception will be raised.

An example: a table with two keys

Let's do a simple example. Suppose that

We have a set of users
Every user has a Serial_ID (a 6 digit number) and a SSN (Social Security Number). We will assume the Italian convention where the SSN is a 16 character string.
Both the Serial_ID and SSN identify uniquely the user, that is, there are no two users with the same Serial_ID nor two users with the same SSN.
We want to build a data structure User_Directory that will allow us to get (efficiently) an user both by Serial_ID and SSN

A possible solution is shown in the figure

The data structure is made of two Ordered_Maps: one map is indexed by the Serial_ID, the other one by the SSN and both maps store pointers (access values in Ada jargon) to a User_Record, a record that contains the user information. This structure clearly introduces some (quite obvious) redundancy

The SSN stored in a User_Record must be equal to the corresponding key in the SSN_Map, the same for the Serial_ID
The entries of the two maps corresponding to the same user must map to the same record
The two maps have the same number of entries.

The spec file

This is an excerpt of the public part of the spec file (roughly the *.h of C)

   subtype ID_Char is Character range '0' .. '9';
   type Serial_ID is array (1 .. 6) of ID_Char;

   type User_SSN is new String (1 .. 16);

   type User_Directory is private;

   procedure Add_User (Dir : in out User_Directory;
                       ID  : in Serial_ID;
                       SSN : in User_SSN)
     with
       Pre => not Contains (Dir, Id) and not Contains (Dir, Ssn),
     Post =>  Contains (Dir, Id) and Contains (Dir, Ssn);


   procedure Delete (Dir : in out User_Directory;
                     ID  : in Serial_ID)
     with
       Pre => Contains (Dir, Id),
     Post =>  not Contains (Dir, Id);

Here we define

The types Serial_ID and User_SSN as fixed lenght strings. Note that Serial_ID can contain only digits.
User_Directory is the data structure we are defining.
Add_User and Delete should be quite obvious. Note the use of preconditions (after Pre =>) and postconditions (after Post =>) that document the behavior of Add_User and Delete.

The actual definition (in the private part) of User_Directory is basically the translation in Ada of the scheme shown above

 type User_Record is
      record
         ID  : Serial_ID;
         SSN : User_SSN;
      end record;

   type User_Record_Pt is access User_Record;

   package ID_To_User_Maps is
     new Ada.Containers.Ordered_Maps (Key_Type     => Serial_ID,
                                      Element_Type => User_Record_Pt);

   package SSN_To_User_Maps is
     new Ada.Containers.Ordered_Maps (Key_Type     => User_SSN,
                                      Element_Type => User_Record_Pt);

   type User_Directory is
      record
         ID_To_User  : ID_To_User_Maps.Map;
         SSN_To_User : SSN_To_User_Maps.Map;
      end record;

We have

The record User_Record holding the user information and its access User_Record_Pt
A package ID_To_User_Maps (obtained by specializing the generic standard package Ordered_Maps) that implements maps that associate User_ID to access to User_Record
A similar package SSN_To_User_Maps
Finally, we have the User_Directory that simply contains the two required maps.

The implementation

Let's look at the implementation of the package. Let's start with Add_User.

 procedure Add_User
     (Dir : in out User_Directory; 
      ID  : in Serial_ID; 
      SSN : in User_SSN)
   is
      New_User : User_Record_Pt;
   begin
      if Dir.ID_To_User.Contains (ID) then
         raise Constraint_Error;
      end if;

      New_User  := new User_Record'(ID, SSN);

      Dir.ID_To_User.Include (Key      => Id,
                              New_Item => New_User);


      Dir.SSN_To_User.Include (Key      => SSN,
                               New_Item => New_User);
   end Add_User;

The implementation of Add_User is quite straightforward: with

New_User  := new User_Record'(ID  => ID, SSN => SSN);

we allocate dynamically a record with the user information and with

   Dir.ID_To_User.Include (Key      => Id,
                           New_Item => New_User);


   Dir.SSN_To_User.Include (Key      => SSN,
                            New_Item => New_User);

we update the two internal tables by associating the given ID and SSN with the address of the newly allocated record.

This is our first tentative to the implementation of Delete.

procedure Delete (Dir : in out User_Directory; ID : in Serial_ID) is
      To_Be_Removed : User_Record_Pt := Dir.ID_To_User (Id);
   begin
      Dir.ID_To_User.Delete (Id);

      Free (To_Be_Removed);
   end Delete;

This implementation has a bug 🐛, as it is clear from this figure that pictures the behavior of Delete.

The dashed line means that the memory area of the user record now has gone back to the heap. It is clear that we forgot to remove the entry from Dir.SSN_To_User and now we have a dangling pointer referring to the old user record.

This is a nasty bug.

Oh, yes, believe me. I have already seen this movie, actually I was the main character. What could go wrong? Well, for example, later you could want to update the entries of Dir.SSN_To_User in some way. However, the memory that was assigned to the user record now belongs to some other structure that will be corrupted by your update.

Depending on your code, the bug can remain dormant for long time, then, suddenly, one day, if you are lucky, you get a SEGFAULT when you try to use the corrupted structure again. If you are unlucky you'll get funny random behaviors, possibly difficult to replicate.

Even if you are lucky (so to say...) you'll get the SEGFAULT in a place totally unrelated with the real bug in Delete. Believe me, finding this bug can take days of stumbling around in the dark. Not even an army of rubber ducks can save you.

Although this is a specific example, this kind of time bomb 💣 (or Sword of Damocles) behavior is typical of bugs that cause a loss of internal coherence in some complex structure. The bug will not manifest itself when the structure is corrupted, but later as a delayed consequence of the loss of coherence, making bug hunting difficult.

Remember: there is one thing worse than a bug that causes a crash: a bug that causes the program to continue as nothing happened.

The solution: type invariant

Fear not, my friends and colleagues. Ada comes to the rescue with the Type_Invariant. It suffices to add it to the type definition as in

type User_Directory is
      record
         ID_To_User  : ID_To_User_Maps.Map;
         SSN_To_User : SSN_To_User_Maps.Map;
      end record
    with Type_Invariant => Is_Coherent (User_Directory);

where Is_Coherent is a function that checks that the whole structure is coherent, that is, that the two maps have the same number of entries and that the data in the user record match the corresponding key in the tables. The source code of the (fairly long) function is included, for reference, at the end of this post.

Now if we run the following test program

with User_Directories;   use User_Directories;

procedure Main is
   Dir : User_Directory;
begin
   Add_User (Dir => dir,
             ID  => "123456",
             SSN => "ABCDEF64X12Q456R");

   Add_User (Dir => dir,
             ID  => "654321",
             SSN => "XBCDEF64X12Q456R");

   Delete (Dir, Id => "123456");
end Main;

we get the following results

Execution of obj/main terminated by unhandled exception
raised SYSTEM.ASSERTIONS.ASSERT_FAILURE : 
  failed invariant from user_directories.ads:65
Call stack traceback locations:
⋯/s-assert.adb:46
⋯/adainclude/a-coorma.adb:1561
⋯/user_directories.adb:52
⋯/user_directories.adb:59
⋯/main.adb:18
⋯

where, for the sake of readability I replaced part of the dump text with ⋯. The line

failed invariant from user_directories.ads:65

refers to the line where type User_Directory is declared. In the stack dump the first two lines refer to some system files, we are interested in the third line

⋯/user_directories.adb:52

that refers to the first line of procedure Delete.
Summarizing,

Before calling Delete user directory Dir was fine
After calling Delete it is not fine anymore, who are you gonna blame?

Ghostbusters! No, sorry, wrong citation... 😊😛 (this is also a hint to my age... 😊)

After a brief inspection of Delete we discover the problem and we fix it quite easily

procedure Delete (Dir : in out User_Directory; ID : in Serial_ID) is
      To_Be_Removed : User_Record_Pt := Dir.ID_To_User (Id);
   begin
      Dir.SSN_To_User.Delete (To_Be_Removed.SSN); -- New line
      -- and here???
      Dir.ID_To_User.Delete (Id);

      Free (To_Be_Removed);
   end Delete;

Wait a minute...

Someone maybe could observe that at the line -- and here??? the variable Dir is in a non coherent state since we updated one table, but not the other. The same happen in the Add_User procedure. Wouldn't this raise an exception?

Well, no. The details are a bit complex, but the basic idea is that since Add_User and Delete are declared in the same package of User_Directory, they are able to operate on the internal structure of the type and it is acceptable that during this manipulation the internal structure is not coherent for a moment. The type invariant will be checked only when the procedures end;
see the Reference manual if you want every single, gory, detail...

Appendix

Implementation of `Is_Coherent`

function Is_Coherent (Dir : User_Directory) return Boolean is
      use Ada.Containers;
      use ID_To_User_Maps;
      use SSN_To_User_Maps;
   begin
      if Dir.ID_To_User.Length /= Dir.SSN_To_User.Length then
         return False;
      end if;

      for Pos in Dir.ID_To_User.Iterate loop
         declare
            Usr_Entry : constant User_Record_Pt := Element (Pos);
         begin 
            if Usr_Entry /= Dir.SSN_To_User (Usr_Entry.Ssn) then
               return False;
            end if;

            if Key (Pos) /= Usr_Entry.Id then
               return False;
            end if;
         end;         
      end loop;

      return True; 
   end Is_Coherent;

Credits

The image of Gollum is a modified version of the wikipedia image
The scolopendra image is available with license CC BY 4.0 as figure of this paper

Riccardo Bernardini
Reasons for loving Ada. #1: strong typing
10 December 2017 at 16:41

Reasons for loving Ada. #1: strong typing

Riccardo Bernardini

By: Riccardo Bernardini

10 December 2017 at 16:41

Yes, yes, I know... I am kind of an oddball for programming in Ada. When the matter comes out in conversations I am usually asked to defend this choice of mine. For future reference (and maybe to help others knowing this nice language) I decided to write few articles about what I like in Ada.

History and other stuff

Before diving in the details, let me spend a couple of words about the language. Many never heard about Ada and many of those who did have some prejudices about it.

The language Ada is named after Ada Lovelace daughter of Lord Byron (yes, that Byron) and considered the first programmer in history.

Incidentally, did you notice that I write "Ada" and not "ADA"? This is because it is a name of a person, not an acronym. Use "ADA" on an forum/newsgroup and you will be corrected in no time...

Ada was developed in the '80s following an idea of the Department of Defence. (I am not replicating here the interesting history that is available elsewhere.) Since then it was updated every approximately ten years giving rise to several releases, informally known with the year number (Ada 83, Ada 95, Ada 2005 and Ada 2012, next release will probably be Ada 2020). Ada is a language very alive and modern, with several feature not easily found elsewhere: contracts, type invariants, formal checking, private packages, distributed programming, native multitasking and so on...

Ada is quite flexible: it is used to write million-line code in avionics, but also to control small devices like the STM32. See, for example, this segway implemented with a Lego mindstorm controlled by a multitasking code (with a 16 bit processor and 64K of RAM!) presented at FOSDEM.

Why do you love Ada?

A simple answer? Reduced debugging time: my personal experience is that an Ada code will require maybe 1/10 of debugging time if compared with a C code of similar complexity. The reason is that Ada compiler is much "stricter" than C, so that when the program compiles, it has an internal coherence that prevents the existence of many silly bugs (and most bugs are just silly!) that would have survived in a C code (functions with no return, a missing break/case in a switch, dangling pointers, buffer overflows...). The remaining bugs are easily caught by the many bug traps (I love this term) that the compiler adds to your code. Programming in Ada is like doing pair programming, but with the compiler playing the role of the observer.

Please note that I am not claiming that by using Ada your code will be automagically readable, maintainable and bug free. You can write bad code in Ada, you just need to make an effort... :-)
Seriously, it is often said that the quality of the code depends mainly from the skill of the programmer, rather than from the type of tools. Nevertheless, using the right tool will help you to achieve the same result with less effort.

OK, OK, I understand... But can you make me an example?

Sure, I am here for this. Maybe the simplest characteristic of Ada that helps you in writing robust software is the type system, especially its strong typing nature. Let me explain with an example: if in C you write

typedef int serial_number;
typedef int port;

you can assign with no problem a variable of type port to a variable of type serial_port since both are integers; but if you do in Ada

type Serial_Number is new Integer;
type Port          is new Integer;

you cannot assign a variable of type Port to a variable of type Serial_Number although both "under the hood" are implemented as integers. This is because they actually represents two different things despite being implemented in the same way at the lowest level. Well, it makes sense, doesn't it? Why should you want to use a TCP port as a serial number? Most probably, if you try to do such a thing, there is an error somewhere.

What if you actually need to use a port as a serial number? No problem, you can convert it with Serial := Serial_Number(Client_Port);. Note that this conversion has no cost since both are integers; it is just a way to tell to the compiler "Listen, this is not an error, I know what I want, please bear with me and assign this value."

Note the use of "camel case with underscores" for names that are not keywords. This style is quite common in modern Ada code. It is not mandatory, of course, but personally, I find it quite readable. Please also note that Ada is case-insensitive, so that you could write, e.g., serial_number. Oh, yes, and the use of ":=" for assignment.

Actually, the code above is not the best choice. How do you know that an Integer will be large enough to keep a serial number or a port? Well, on current Intel processor (whit 32-bit integers), that is quite reasonable, but what if you have a small micro-controller with 16 bit integers? Well, maybe the best thing is to let the compiler to decide by writing

type Serial_Number is range 0 .. 999_999;
type Port          is range 0 .. 2**16-1;

Note the use of '_' as separator in numbers. A simple thing, but really convenient...

Note how we do not say to the compiler which low-level implementation to use, but which characteristics we need and let the compiler to decide how to handle them. On a microcontroller with 16-bit ints maybe Serial_Number will be implemented as a long int, while Port will be an unsigned int. But who really cares? Let the compiler take care of this boring stuff...

What if you need Serial_Number to have a specific size, say 24 bits (because, for example, you need to write it in a packet)? Just write

type Serial_Number is range 0 .. 999_999 with Size => 24;

I do not want to dig deeper in the Ada type system, but I cannot resist telling you about two types that are not commonly found elsewhere. If you write type Volt is delta 0.125 range 0.0 .. 255.0; the variables of type Volt will hold a fixed point real ranging in the interval 0 V..255 V with step 0.125 V. Fixed point numbers are usually implemented as integers and are used, for example, in some DSP applications running on small processors without floating point maths. Another uncommon type is the decimal fixed point type defined with something like type Money is delta 0.01 digits 15;. I'll let you discover about them. See also the corresponding Reference Manual page.

An example: cleaning user input

Let us make a toy (but not so-toy) example of exploitation of the strict typing of Ada.

Someone could object that there are other ways to handle the problem considered here. Yes, I know, but I just need a simple example to show what you can do with strict typing. I am not claiming that is the only (or best) solution (although, I think it is quite good).

It is well known that in any application using user input care must be taken in using the user input since it could open security holes. The following xkcd comic is a classic

Of course, we can pay all the attention we want to not use user input before sanitizing it, but if the application is very large and everything is a String (or char*), something can slip in the cracks... Type strictness can help us here. We just need to define a new type Dirty_String and have all the user input function return a Dirty_String rather than a string (this is easier to check). The only way to transform a Dirty_String in a normal String will be via a special Sanitize function.

Let's dig into details. We will define the following package specs

package Dirty_Strings is 
  type Dirty_String(<>) is private;

  function Sanitize(X : Dirty_String) return String;

  function Taint(X : String) return Dirty_String;
private
  type Dirty_String is new String; 

  function Taint(X : String) return Dirty_String
  is (Dirty_String(X));
end Dirty_Strings;

In Ada a package is divided into two parts: its specs that specifies the "resources" exported by the package and its body with the actual implementation. The specs are further divided into a public part (visible to the rest of the code) and an optional private part.

This package define a type Dirty_String. In the public part (before the private) the type is defined as private, that is, none of your business. Moreover, the package exports two function that can be used to convert from Dirty_String to normal String and vice-versa.

However, in the private part we see that a Dirty_String is just... a String. Putting its definition in the private part prevents short-cut conversions from Dirty_String to String and forces the programmer to go through the function Sanitize that, we guess, will do stuff like quoting special characters. Instead, the conversion of a normal String to a Dirty_String is just a type conversion since there is no need to change it. This allows us to define it as a expression function (see also the RM) that, most probably, will be "inlined" by the compiler.

Run-time constraints

Let me conclude with a feature of Ada 2012 that I find quit cute (and useful). Few years ago, I wrote a small package to write Matlab files from Ada. One day I discovered that the package was writing files that could not be read from Matlab. The reason was that the names of the variables in the Matlab file were not valid Matlab names. After correcting the bug, I decided to add a new bug trap to the code. I defined the type to be used for variable names as

type Matlab_Name is new
     String
   with Dynamic_Predicate =>
     (for all I in Matlab_Name'Range =>
        Is_Alphanumeric (Matlab_Name (I)) or Matlab_Name (I) = '_')
     and
       Is_Letter (Matlab_Name (Matlab_Name'First));

If you have a bit of programming experience, you should be able to understand the code above, even if you do not know Ada. As you can see Matlab_Name is a String (but you cannot mix it with other strings, e.g., a filename!), but it also must satisfy a Dynamic_Predicate (a condition that is checked at run-time, if you ask the compiler to do so). The condition can be split in two, the first part

  (for all I in Matlab_Name'Range =>
        Is_Alphanumeric (Matlab_Name (I)) or Matlab_Name (I) = '_')

requires that every character in the name must be a letter, a digit or an underscore, while the second part

Is_Letter (Matlab_Name (Matlab_Name'First));

requires that the first character must be a letter. If checks are enabled and at some time your code generates a bad name, an exception will be raised precisely in the point were the bug happened. (This pinpointing of bugs helps a lot in debugging...)

What about efficiency? Yes, I can hear you, efficiency lovers. The idea of having checks done at run-time seems to go against the idea of efficiency. Well, it is true, run-time checks costs in term of efficiency, but in my experience you do not even notice it. Unless you are on a very strict time budget (or unless you have very complex checks) it is usually more convenient to keep them on to catch possible bugs. You should take the decision of turning checks off only after discovering that you cannot afford them and you should turn off only those that are in most computationally intensive parts (possibly after thoroughly testing).

An alternative approach is to code in SPARK (a subset of Ada, nothing to do with Apache) in order to check your code formally. I'll let you discover the joy of SPARK... :-)