❌ About FreshRSS

Normal view

There are new articles available, click to refresh the page.
Before yesterdayBlogs

LEA: first steps as a "smart editor"

2 September 2023 at 19:53

Note for subscribers: if you are interested in my Ada programming articles only, you can use this RSS feed link.


Spot the "tool tip" on the following screenshot...

Click to enlarge

Translation: cross-references and source code navigation in LEA are becoming a reality since this evening!

 


Some Web links:

LEA

Web site: http://l-e-a.sf.net/
Sources, site #1: https://sf.net/p/l-e-a/code/HEAD/tree/
Sources, site #2: https://github.com/zertovitch/lea
Alire Crate: Alire - LEA


Since LEA embeds the HAC compiler, here are a few links about HAC as well:

HAC

Web site: https://hacadacompiler.sourceforge.io/
Sources, site #1: https://sf.net/p/hacadacompiler/code/HEAD/tree/
Sources, site #2: https://github.com/zertovitch/hac
Alire Crate: Alire - HAC

Ada BFD 1.3.0

20 August 2023 at 14:14
[Ada/ada-bfd-1.3.jpg](Ada/ada-bfd-1.3.jpg)
    1. Integration with Alire

For Linux users only, the Ada BFD(https://github.com/stcarrez/ada-bfd) has an associated Alire crate which allows you to use it easily. To get access to the Alire crate, you should add the AWA Alire index(https://github.com/stcarrez/awa-alire-index) in your Alire configuration as follows:

``` alr index add=https://github.com/stcarrez/awa-alire-index.git name awa ```

Then, you can get access to the crate by using

``` alr with bfdada ```

Let's see how to use this library...

    1. Declarations

The Ada BFD(https://github.com/stcarrez/ada-bfd) library provides a set of Ada bindings that give access to the BFD library. A binary file such as an object file, an executable or an archive is represented by the `Bfd.Files.File_Type` limited type. The symbol table is represented by the `Bfd.Symbols.Symbol_Table` limited type. These two types hold internal data used and managed by the BFD library.

```ada with Bfd.Files; with Bfd.Sections; with Bfd.Symbols; ...

 File    : Bfd.Files.File_Type;
 Symbols : Bfd.Symbols.Symbol_Table;

```

    1. Opening the BFD file

The first step is to use the `Open` procedure to read the object or executable file whose path is given as argument. The `File_Type` parameter will be initialized to get access to the binary file information. The `Check_Format` function must then be called to let the BFD library gather the file format information and verify that it is an object file or an executable.

```ada Bfd.Files.Open (File, Path, ""); if Bfd.Files.Check_Format (File, Bfd.Files.OBJECT) then

   ...

end if; ```

The `File_Type` uses finalization so that it will close and reclaim resources automatically.

    1. Loading the symbol table

The symbol table is loaded by using the `Read_Symbols` procedure.

```ada

  Bfd.Symbols.Read_Symbols (File, Symbols);

```

The resources used by the symbol table will be freed when the symbol table instance is finalized.

    1. Find nearest line

Once the symbol table is loaded, we can use the `Find_Nearest_Line` function to find the nearest line of a function knowing some address. This is almost a part of that function that the addr2line (1)(https://www.man7.org/linux/man-pages/man1/addr2line.1.html) command is using.

```ada File_Name, Func_Name : Ada.Strings.Unbounded.Unbounded_String; Text_Section : Bfd.Sections.Section; Line : Natural; Pc : constant Bfd.Vma_Type := ...; ...

  Text_Section := Bfd.Sections.Find_Section (File, ".text");
  Bfd.Symbols.Find_Nearest_Line (File    => File,
                                 Sec     => Text_Section,
                                 Symbols => Symbols,
                                 Addr    => Pc,
                                 Name    => File_Name,
                                 Func    => Func_Name,
                                 Line    => Line);

```

One tricky aspect of using `Find_Nearest_Line` is the fact that the address we are giving must **sometimes** be converted to an offset within the text region. With Address space layout randomization (ASLR)(https://en.wikipedia.org/wiki/Address_space_layout_randomization) a program is mapped at a random address when it executes. Before calling `Find_Nearest_Line`, we must subtract the base address of the memory region. We must now find the virtual address of the start of the text region that is mapped in memory. While the program is running, you can find the base address of the program by looking at the `/proc/self/maps` file. This special file indicates the list of memory regions used by the process with the addresses, flags and other information. Without ASLR, the program is almost always loaded at the `0x00400000` address.

``` 00400000-007f9000 r-xp 00000000 fd:01 12067645 /home/... 009f8000-009fa000 r--p 003f8000 fd:01 12067645 /home/... 009fa000-00a01000 rw-p 003fa000 fd:01 12067645 /home/... ```

But when it is mapped at a random address, we get a different address each time the program is launched:

``` 55d5983d9000-55d598592000 r--p 00000000 fc:02 1573554 /... 55d598592000-55d599376000 r-xp 001b9000 fc:02 1573554 /... 55d599376000-55d5997ed000 r--p 00f9d000 fc:02 1573554 /... 55d5997ee000-55d5998bb000 r--p 01414000 fc:02 1573554 /... 55d5998bb000-55d5998c6000 rw-p 014e1000 fc:02 1573554 /... ```

In that case, the value to use it the first address of first `r--p` region associated with the program (here `0x55d5983d9000`).

Another method to know the virtual base address is to use the dl_iterate_phdr (3)(https://man7.org/linux/man-pages/man3/dl_iterate_phdr.3.html) function and look at the shared objects which are loaded. This function must be executed by the program itself: it gets as parameter a callback function which is called for each loaded shared object and a data parameter that will be passed to the callback.

```

  1. include <dlfcn.h>

static int dl_callback (struct dl_phdr_info* info, size_t size, void* data) {

 /* VM base address is: info->dlpi_addr */
 return 0;

} ...

  dl_iterate_phdr (dl_callback, 0);

```

When the callback is called, you can get the name of the shared object by looking at `info->dlpi_name` and the virtual base address by looking at `info->dlpi_addr`.

Ada BFD(https://github.com/stcarrez/ada-bfd) is a very specific library that is not always easy to use due to the complexity of binary program representation (ELF, DWARF, ...) and program execution. It is however used in very specific contexts such as the Muen Separation Kernel(https://muen.codelabs.ch/) and the Memory Analysis Tool(https://github.com/stcarrez/mat).

HAC for native targets

20 August 2023 at 13:57
Note for subscribers: if you are interested in my Ada programming articles only, you can use this RSS feed link.

HAC (the HAC Ada Compiler) was since the first of its previous lives, from Pascal-S, in 1973, to recent commits, translating high-level language exclusively to the machine code of a fictitious machine (a Virtual Machine, abbreviated VM).

For writing a tiny compiler, a VM is very convenient:

  • You (the compiler creator) can model and adapt it to your needs.
  • You don't depend on hardware.
  • You don't need to rewrite the code generation for each new hardware.
  • You can make the VM running on many hardwares (hence the success of the JVM and .NET around year 2000).


The HAC VM and its users are enjoying all these advantages.

However, there is a flip side:

  • A VM is much slower than a real machine (but till year 2000, it didn't matter; you could say: don't worry, just wait a few months for a more powerful computer).
  • Since year 2000, the speed of microprocessors stopped doubling every six months (hence the declining interest in Virtual Machines). Despite the improvements in cache, RAM, and the multiplication of cores, the per-core performance improvement is slower and slower.
  • Supporting only a VM gives the impression that your compiler can only compile to a VM.


Well, so far, the latter blame was objectively correct regarding HAC until a few weeks ago.
Now it is changing!
We have begun an abstract framework for emitting machine code.
With that framework, HAC can, on demand, emit code for its captive VM or for any implemented machine.
So you read well, it is not a just-in-time compiler translating VM instructions to native ones.
We are implementing a direct Ada-to-native compiler - and cross-compiler, since HAC doesn't know on which machine it is running!

The framework looks like this:

with HAC_Sys.Defs;

package HAC_Sys.Targets is

  type Machine is limited interface;

  type Abstract_Machine_Reference is access Machine'Class;

  --------------------
  --  Informations  --
  --------------------

  function Name (m : Machine) return String is abstract;
  function CPU (m : Machine) return String is abstract;
  function OS (m : Machine) return String is abstract;
  function Null_Terminated_String_Literals (m : Machine) return Boolean is abstract;

  ----------------------------
  --  Machine Instructions  --
  ----------------------------

  procedure Emit_Arithmetic_Binary_Instruction
    (m         : in out Machine;
     operator  :        Defs.Arithmetic_Binary_Operator;
     base_typ  :        Defs.Numeric_Typ) is abstract;

...


The code emission in the compiler is being changed (very slowly, be patient!) for going through this new abstract machine mechanism.
So far we have two implementations:

  • The HAC VM - that way, we ensure HAC-for-HAC-VM works exactly as previously and can pass the test suite.
  • A real target: the AMD64, running under Windows; the machine code is emitted in Assembler form, for the Flat Assembler (FASM).


The development for multiple targets is embryonic so far.
However, we can already compile a "hello world"-style program:

with HAT;

procedure Native is
  use HAT;
  a : Integer;
begin
  --  a := 1;  --  Variables: TBD.
  Put_Line ("Hello ...");
  Put_Line ("... world!");
  Put_Line (12340 + 5);
  Put_Line (12350 - 5);
  Put_Line (2469 * 5);
  Put_Line (61725 / 5);
end Native;


With the command:
hac -tamd64_windows_console_fasm native.adb

HAC produces this:

;  Assembler file for the Flat Assembler - https://flatassembler.net/

format PE64 console
entry _start
include 'include\win64a.inc'

section '.code' code readable executable

_start:
         push                9
         push                1
         push                -1
         push                -1
         pop                 r14
         pop                 r13
         pop                 r12
         pop                 r11
         add                 r12, _hac_strings_pool
         ccall               [printf], r12
         ccall               [printf], _hac_end_of_line
         push                10
         push                11
         push                -1
         push                -1
         pop                 r14
         pop                 r13
         pop                 r12
         pop                 r11
         add                 r12, _hac_strings_pool
         ccall               [printf], r12
         ccall               [printf], _hac_end_of_line
         push                12340
         push                5
         pop                 r11
         pop                 rax
         add                 rax, r11
         push                rax
         push                20
         push                10
         push                -1
         pop                 r14
         pop                 r13
         pop                 r12
         pop                 r11
         ccall               [printf], _hac_decimal_format, r11
         ccall               [printf], _hac_end_of_line
         push                12350
         push                5
         pop                 r11
         pop                 rax
         sub                 rax, r11
         push                rax
         push                20
         push                10
         push                -1
         pop                 r14
         pop                 r13
         pop                 r12
         pop                 r11
         ccall               [printf], _hac_decimal_format, r11
         ccall               [printf], _hac_end_of_line
         push                2469
         push                5
         pop                 r11
         pop                 rax
         imul                rax, r11
         push                rax
         push                20
         push                10
         push                -1
         pop                 r14
         pop                 r13
         pop                 r12
         pop                 r11
         ccall               [printf], _hac_decimal_format, r11
         ccall               [printf], _hac_end_of_line
         push                61725
         push                5
         pop                 r11
         pop                 rax
         xor                 rdx, rdx
         idiv                r11
         push                rax
         push                20
         push                10
         push                -1
         pop                 r14
         pop                 r13
         pop                 r12
         pop                 r11
         ccall               [printf], _hac_decimal_format, r11
         ccall               [printf], _hac_end_of_line
         stdcall             [ExitProcess],0

section '.data' data readable writeable
_hac_end_of_line  db 10, 0
_hac_decimal_format  db "%d", 0
_hac_strings_pool db "XHello ...", \
    0, "... world!", 0

section '.idata' import data readable
library kernel,'kernel32.dll',\
        msvcrt,'msvcrt.dll'
import  kernel,\
        ExitProcess,'ExitProcess'
import  msvcrt,\
        printf,'printf'

As you can see, the assembler code needs badly some simplification, but, anyway: it works.
FASM produces from it a relatively small 2048-byte executable which writes

Hello ...
... world!
12345
12345
12345
12345


The executable is full of zeroes, due to alignments. The non-zero bytes (more or less, the actual machine code and data) take 605 bytes.

Some Web links for HAC:

Main URL: https://hacadacompiler.sourceforge.io/
Sources, site #1: HAC Ada Compiler download | SourceForge.net
Sources, site #2: GitHub - zertovitch/hac: HAC Ada Compiler - a small, quick Ada compiler fully in Ada
Alire Crate: Alire - Hac

Alternative to exceptions when attempting to pop an empty stack

 

A recent question on StackOverflow asked about whether or not a mutex is locked twice by the same thread without unlocking the mutex. See https://stackoverflow.com/questions/76890110/c-the-thread-repeatedly-locks-the-same-mutex-multiple-times?noredirect=1#comment135552146_76890110

The C++ source code presented in the question is:

1 #include <exception>
2 #include <memory>
3 #include <mutex>
4 #include <stack>
5
6 struct empty_stack : std::exception
7 {
8    const char* what() const throw();
9 };
10
11 template<typename T>
12 class threadsafe_stack
13 {
14 private:
15    std::stack<T> data;
16    mutable std::mutex m;
17 public:
18    threadsafe_stack(){}
19    threadsafe_stack(const threadsafe_stack& other) {
20       std::lock_guard<std::mutex> lock(other.m);
21       data = other.data;
22 }
23    threadsafe_stack& operator=(const threadsafe_stack& other) = delete;
24    void push(T new_value)
25    {
26       std::lock_guard<std::mutex> lk(m);
27       data.push(std::move(new_value));
28    }
29    std::shared_ptr<T> pop()
30    {
31       std::lock_guard<std::mutex> lock(m);
32       if (data.empty()) throw empty_stack();
33       std::shared_ptr<T> const res(std::make_shared<T>(data.top()));
34       data.pop();
35       return res;
36    }
37    void pop(T& value)
38    {
39       std::lock_guard<std::mutex> lock(m);
40       if (data.empty()) throw empty_stack();
41       *value = data.top();
42       data.pop();
43    }
44    bool empty() const
45    {
46       std::lock_guard<std::mutex> lock(m);
47       return data.empty();
48    }
49 };
 

One issue with this program design, which has nothing to do with the question asked, is throwing an exception when pop is called on an empty stack. An empty stack is not an exceptional situation. In fact the stack starts off as an empty stack when it is created.

Exception handling should not be used to deal with non-exceptional program state.

The following Ada stack implementation demonstrates how an empty stack should be handled in a thread-safe manner.

The Ada program defines a generic thread package with the same operations shown in the C++ example above.

The Ada program creates a generic stack package and a main procedure. Ada packages are separated into two parts, the package specification and the package body or implementation. The package specification defines the package API.

1 with Ada.Containers.Doubly_Linked_Lists;
2 generic
3    type element_type is private;
4 package generic_stack is
5    package List_Pack is new
6       Ada.Containers.Doubly_Linked_Lists (element_type);
7 use List_Pack;
8
9 protected type Stack is
10    procedure Push (Item : element_type);
11    entry Pop (Item : out element_type);
12    function Is_Empty return Boolean;
13 private
14    Buffer : List := Empty_List;
15 end Stack;
16
17 end generic_stack;
 

The package creates an Ada protected type. A protected type or protected object is a passive unit of concurrency which is implicitly protected from race conditions. There are three kinds of methods available to be defined within a protected type. Protected procedures implicitly acquire an unconditional exclusive read-write lock on the data in the protected type. Protected entries implicitly acquire a conditional exclusive read-write lock on the data in the protected type. A task suspends while the entry boundary condition evaluates to false. The suspended tasks are placed in an entry queue so that they can be activated when the boundary condition evaluates to true. The third kind of method available for a protected object is a function. Protected functions are limited to read-only access to the data in the protected type. Protected functions acquire a shared read lock on the data in the protected object.

The implementation of the package is in a separate file containing the package body.

1  package body generic_stack is
2
3     -----------
4     -- Stack --
5     -----------
6
7     protected body Stack is
8
9        ----------
10       -- Push --
11       ----------
12
13       procedure Push (Item : element_type) is
14       begin
15          Buffer.Append (Item);
16       end Push;
17
18       ---------
19       -- Pop --
20       ---------
21
22       entry Pop (Item : out element_type) when
23                  not Buffer.Is_Empty is
24       begin
25          Item := Buffer.Last_Element;
26          Buffer.Delete_Last;
27       end Pop;
28
29       --------------
30       -- Is_Empty --
31       --------------
32
33       function Is_Empty return Boolean is
34       begin
35          return Buffer.Is_Empty;
36       end Is_Empty;
37
38    end Stack;
39
40 end generic_stack;

 

The push procedure implicitly locks the Buffer when it begins executing and unlocks the Buffer as the procedure completes.

The pop entry only executes when Buffer.Is_Empty evaluates to false. When that condition is met any immediate call, or any task waiting in the entry queue is allowed to execute the entry. The default queuing policy of the entry queue is FIFO and all tasks suspended in the entry queue will be executed before any new call can be executed. The shared read-write lock is applied when execution of the entry starts and is release upon completion of the entry.

The Is_Empty function can only execute when no read-write lock is applied to the Buffer data element. Upon starting execution the function applies a shared read lock, allowing any number of tasks to read simultaneously and preventing any procedure or function to execute until the function completes and releases its shared read lock.

The main procedure, which is the program entry point for this example, follows.

1  with Ada.Text_IO; use Ada.Text_IO;
2  with generic_stack;
3
4  procedure Main is
5     package int_stack is new generic_stack (Integer);
6     use int_stack;
7
8     The_Stack : Stack;
9
10    task producer is
11       entry Start;
12    end producer;
13
14    task body producer is
15    begin
16       accept Start;
17       Put_Line ("Producer started.");
18       for I in 1 .. 20 loop
19          The_Stack.Push (I);
20          Put_Line ("Pushed" & I'Image);
21          delay 0.001;
22       end loop;
23    end producer;
24
25    task consumer is
26       entry Start;
27    end consumer;
28
29    task body consumer is
30       Num : Integer;
31    begin
32       accept Start;
33       Put_Line ("Consumer started.");
34       for I in 1 .. 20 loop
35          The_Stack.Pop (Num);
36          Put_Line ("Popped" & Num'Image);
37       end loop;
38    end consumer;
39
40 begin
41    consumer.Start;
42    delay 0.01;
43    producer.Start;
44 end Main;
 

Line 5 creates an instance of the generic_stack package passing the type Integer as the generic parameter. This results in a stack of integer values.

Line 8 declares an instance of the Stack type named The_Stack.

Line 10 declares the interface for the producer task. This task has one entry named Start.

Line 14 declares the implementation of the producer task.

Line 16 accepts the Start entry. The producer task will suspend until another task calls its Start entry. Task entries implement a Rendezvous logic allowing synchronous coordination of tasks. After accepting Start the producer executes a for loop 20 times, each time pushing the current value of the loop control variable onto the stack, displaying a message to standard output and then delaying (sleeping) for 0.001 seconds.

The consumer task declaration begins at line 25. The consumer also has a Start entry.

The consumer task accepts its start entry at line 32 and then pops 20 values off of the stack. The consumer task has no delay statement. The consumer tasks will then complete its iteration before the producer task pushes another value onto the stack. Each call to the pop entry will therefore encounter an empty stack until the producer pushes another value onto the stack.

Both the producer task and the consumer tasks begin execution immediately, but both tasks suspend until their Start entry is called.

Line 40 begins execution of the main procedure which is also the top-level task for the program. Line 41 calls consumer.Start. Line 42 causes the main procedure to delay for 0.01 seconds before line 43 calls producer.start.

This delay in starting the tasks ensures that the consumer must wait at least 0.01 seconds before the first data will be pushed onto the stack. In other words, the stack will be empty for at least 0.01 seconds.

The output of this program is:

 

Consumer started.
Producer started.
Pushed 1
Popped 1
Pushed 2
Popped 2
Pushed 3
Popped 3
Pushed 4
Popped 4
Pushed 5
Popped 5
Pushed 6
Popped 6
Pushed 7
Popped 7
Pushed 8
Popped 8
Pushed 9
Popped 9
Pushed 10
Popped 10
Pushed 11
Popped 11
Pushed 12
Popped 12
Pushed 13
Popped 13
Pushed 14
Popped 14
Pushed 15
Popped 15
Pushed 16
Popped 16
Pushed 17
Popped 17
Pushed 18
Popped 18
Pushed 19
Popped 19
Pushed 20
Popped 20
 

Clearly the program did not encounter any exceptions, yet the consumer could only pop values after they were pushed onto the stack.

  • 13 August 2023 at 05:16

Variant Records used as a return type from search functions

 When searching for a value in an array a programmer is faced with two possibilities.

1.       The value searched for is found and the index of the value in the array is returned.

2.       The value is not found and no array index is returned.

It is common practice to return a scalar value from functions using the C programming language. This tradition goes back to the earliest days of the C language when unspecified function return types were given the int type by default by the compiler.

The C language does not have a concept of exceptions or exception handlers. Instead, the value returned by a function either indicates success or failure.

It is common for positive return values to indicate success and negative or zero values to indicate failure. This pattern was established by the Unix operating system. C was invented to write the Unix operating system.

A search function is often designed to return the index of the array element containing the value corresponding to the target value for which one is searching. This programming pattern is also commonly found in the C++ language.

The following function prototype is written using C++;

int binary_search(string names[], int len, const string target);

The binary_search function returns an int. The first parameter is an array of string elements. The second parameter is the number of elements in the array of strings. The third element is the string value to search for in the array of strings.

The binary_search function returns the index value of the array element equal to the target string or it returns -1 if no array element corresponds to the target string. This behavior creates an abstraction with a strong vulnerability of confusion. The return value serves two purposes. It identifies whether or not the target value was found in the array and it identifies the index value of the array element equaling the target value.

The complete binary_search function is defined as:

int binary_search(string names[], int len, const string target)
{
  int low = 0;
  int high = len - 1;
  int mid = 0;\

  
string item;
  while (low <= high) {
    mid = (low + high) / 2;
    item = names[mid];
    if (item == target) {
      return mid;
    }
    if (strcmp(target.c_str(), item.c_str()) < 0) {
      high = mid - 1;
    } else
      low = mid + 1;
  }
  return -1;
}

 

Logically whether or not the value was found is separate from the index value of the found element. There should be no index value for a found element when no element is found. The function allows the programmer to make the mistake of trying to reference an array element before the beginning of the array. In C and C++ doing this is undefined behavior, and is always wrong.

Avoiding undefined behavior

The C and C++ languages provide no assistance in the effort to avoid undefined behaviors. The compilers are not required to check for undefined behavior because the programmer is burdened with the responsibility of avoiding undefined behavior.

The Ada programming language provides a useful data structure known as a variant record, which C and C++ programmers may think is similar to a union. C and C++ unions provide different views of a common memory area so that, for instance, a number might be viewed as an int or a float. Ada variant records are different.

Every Ada variant record has a publicly visible discriminant field, even if the type is opaque. That discriminant field controls the contents, not just the view, of the record. The following example is used as a return type for a search function.

   type Result_T (found : Boolean) is record
      case found is
         when True =>
            Index : Natural;
         when False =>
            null;
      end case;
   end record;

The discriminant field in this example is named “found”. Found is a Boolean value. Every instance of Result_T contains a “found” field. The case statement within the record definition declares a second field named Index, which is of type Natural when and only when found is True. When found is False the record type contains no additional field. It only contains the found field.

This data structure separates the responsibility of identifying whether or not the target value was found from identifying the index value of the element equaling the target value.

The complete Ada version of the binary_search function is

   function binary_search
    
(Names : in str_arr; Target : Unbounded_String) return            Result_T is
      low  : Natural := Names'First;
      High : Natural := Names'Last;
      Mid  : Natural;
      Item : Unbounded_String;
   begin
      while low <= High loop
         Mid  := (High - low) / 2 + low;
         Item := Names (Mid);
         if Item = Target then
            return (found => True, Index => Mid);
         end if;
         if Target < Item then
            High := Mid - 1;
         else
            low := Mid + 1;
         end if;
      end loop;
      return (found => False);
   end binary_search;

If the item corresponding to Target is found the return value contains both a found and an Index field.

            return (found => True, Index => Mid);

If no item corresponding to Target is found the return value only contains the found field.

      return (found => False);

Since no Index value exists when found equals False, no undefined behavior can occur. The Ada runtime raises the pre-defined exception CONSTRAINT_ERROR along with the message “discriminant check failed” if the programmer attempts to reference the Index field when found equals False.

The proper way to deal with a variant record such as this returned by a function is:

   Result : Result_T := binary_search (Names,                                                      To_Unbounded_String ("ali2"));
begin
 if Result.found then
      Put_Line ("The given index is " & Result.Index'Image);
      Put_Line ("The found name is " & To_String (Names                                                      (Result.Index)));
 else
    Put_Line ("String not found!");
 end if;

This is similar to the proper way to respond to the return value from the C or C++ function. The difference is that when the return value is not evaluated the C or C++ program will not raise an exception. Instead it will simply exhibit an undefined behavior.
  • 8 August 2023 at 02:00

Array Handling

 

Arrays in both C and Ada are a compound type with the elements arranged sequentially in memory. All the elements in an array are of a single type. For instance an array may contain integer elements or it may contain floating point elements, or it may contain strings of characters, or it may even contain other arrays.

One might therefore assume that arrays in C and Ada are fundamentally the same. This article explores that assumption and finds some clear differences between C arrays and Ada arrays.

Array Characteristic

C language

Ada Language

Array types

C only allows definition of the type of an array element. It does not allow declaration of an array type.

Every Ada array is a member of a type, even if it is an anonymous type.

Index range

Valid C array indices always start at 0. The C language provides no implicit checking for referencing invalid array indices.

Ada array indices may begin at any programmer-chosen scalar value and end at any programmer-chosen scalar value. Ada compilers validate static array index references and by default the Ada runtime checks array index values during program execution.

Array declaration

C arrays are declared by specifying the element type, the array name, and the number of elements in the array.

Ada arrays are declared by specifying the array name, the array index range and the array element type.

Array size

C array sizes can only be calculated using the sizeof operator within the scope in which the array is first declared. Passing an array to a function results in only the address of the first array element being passed to the function. The programmer must pass another parameter to the function indicating the size of the array. The compiler cannot ensure the size parameter is correct.

Ada array attributes are available wherever the array is visible. Ada array attributes include:

  •      ‘Size – The number of bits the array occupies in memory.
  •        ‘Length – The number of elements in the array.
  •         ‘First – The first valid index value for the array.
  •      ‘Last – The last valid index value for the array
  •      ‘Range – The range specified by ‘First .. ‘Last

 

Array slicing

C does not provide a facility for array slicing.

Ada provides the ability to define a slice of an array.

Array assignment

C does not allow the programmer to copy one array to another using a single assignment statement.

Ada allow arrays to be copied using a single assignment statement.

The impact of these differences is demonstrated by reviewing the Merge-Sort algorithm implemented in both languages.

The first example is a C implementation of the Merge-Sort algorithm.

1 /*     
2 * C Program to Perform Merge Sort using Recursion and Functions
3 */
4
5 #include <stdio.h>
6 #include <stdlib.h>
7
8 // merge function
9 void Merge(int arr[], int left, int mid, int right)
10 {
11    int i, j, k;
12    int size1 = mid - left + 1;
13    int size2 = right - mid;
14
15    // created temporary array
16    int Left[size1], Right[size2];
17
18    // copying the data from arr to temporary array
19    for (i = 0; i < size1; i++)
20        Left[i] = arr[left + i];
21
22    for (j = 0; j < size2; j++)
23       Right[j] = arr[mid + 1 + j];
24
25    // merging of the array
26    i = 0; // intital index of first subarray
27    j = 0; // inital index of second subarray
28    k = left; // initial index of parent array
29    while (i < size1 && j < size2)
30    {
31        if (Left[i] <= Right[j])
32        {
33            arr[k] = Left[i];
34            i++;
35        }
36        else
37        {
38            arr[k] = Right[j];
39            j++;
40        }
41        k++;
42    }
43
44    // copying the elements from Left[], if any
45    while (i < size1)
46    {
47        arr[k] = Left[i];
48        i++;
49        k++;
50    }
51
52    // copying the elements from Right[], if any
53    while (j < size2)
54    {
55        arr[k] = Right[j];
56        j++;
57        k++;
58    }
59 }
60
61 //merge sort function
62 void Merge_Sort(int arr[], int left, int right)
63 {
64    if (left < right)
65    {
66
67        int mid = left + (right - left) / 2;
68
69        // recursive calling of merge_sort
70        Merge_Sort(arr, left, mid);
71        Merge_Sort(arr, mid + 1, right);
72
73        Merge(arr, left, mid, right);
74    }
75 }
76
77 // driver code
78 int main()
79 {
80     int size;
81     printf("Enter the size: ");
82     scanf("%d", &size);
83
84     int arr[size];
85     printf("Enter the elements of array: ");
86     for (int i = 0; i < size; i++)
87     {
88         scanf("%d", &arr[i]);
89     }
90
91     Merge_Sort(arr, 0, size - 1);
92
93     printf("The sorted array is: ");
94     for (int i = 0; i < size; i++)
95     {
96         printf("%d ", arr[i]);
97     }
98     printf("\n");
99     return 0;
100 } 

Both the Merge_Sort function and the Merge function take several parameters.

 

62 void Merge_Sort(int arr[], int left, int right)
9 void Merge(int arr[], int left, int mid, int right) 

The first parameter in each function specifies an array of int elements. The remaining parameters specify various int values. These extra parameters are needed in C because all array indices must begin at 0, the size of the array parameter is not passed with the array, and C does not provide array slicing.

The C programmer must provide all this information as additional function parameters.

The C language does not provide assignment of one array to another. The programmer must explicitly create a loop and assign each element one at a time:

18    // copying the data from arr to temporary array
19    for (i = 0; i < size1; i++)
20        Left[i] = arr[left + i];
21
22    for (j = 0; j < size2; j++)
23       Right[j] = arr[mid + 1 + j];
 

Each of these examples is a demonstration of the low-level terseness of C. Each of these examples shows how much more writing must be done in C to achieve what Ada does very simply.

The Ada version of the Merge-Sort algorithm is implemented in a generic Ada package so that the sort procedure declared within the package specification can be used to sort any mutable data type.

The Ada solution is done using three filles.

1 generic
2    type element_type is private;
3    type array_type is array (Integer range <>) of element_type;
4    with function "<" (left, right : element_type) return Boolean is <>;
5 package generic_merge_sort is
6    procedure sort (Item : in out array_type);
7 end generic_merge_sort;

The generic package specification requires three parameters when an instance of the package is declared. The first parameter is the name of the element type for the array which will be sorted. The second parameter is the name of the array type to be sorted. This parameter is restricted to an unconstrained array type indexed by Integer values. Each element of the array type must be the type passed to the first parameter. The third parameter is a possible overloading of the “<” function. This is an optional parameter. If the element_type already has a “<” function defined that function will be used by default.

The procedure sort is defined to take one parameter with mode “in out”, meaning the parameter value(s) will be used and possibly modified within the procedure. All modifications will be visible to the calling scope.

The package body contains the logic used to implement the merge-sort algorithm.

1 package body generic_merge_sort is
2    procedure merge (Item : in out array_type) is
3       mid   : Integer    := Item'First + (Item'Last - Item'First) / 2;
4       Left  : array_type := Item (Item'First .. mid);
5       Right : array_type := Item (mid + 1 .. Item'Last);
6       I     : Integer    := Left'First;
7       J     : Integer    := Right'First;
8       K     : Integer    := Item'First;
9    begin
10      while I <= Left'Last and then J <= Right'Last loop
11         if Left (I) < Right (J) then
12            Item (K) := Left (I);
13            I        := I + 1;
14         else
15            Item (K) := Right (J);
16            J        := J + 1;
17         end if;
18         K := K + 1;
19      end loop;
20      -- copying unused items from Left
21      while I <= Left'Last loop
22         Item (K) := Left (I);
23         I        := I + 1;
24         K        := K + 1;
25      end loop;
26      -- copying unused items from right
27      while J <= Right'Last loop
28         Item (K) := Right (J);
29         J        := J + 1;
30         K        := K + 1;
31      end loop;
32   end merge;
33
34   ----------
35   -- sort --
36   ----------
37
38   procedure sort (Item : in out array_type) is
39      mid : Integer;
40   begin
41      if Item'Length > 1 then
42         mid := Item'First + (Item'Last - Item'First)/2;
43         sort(Item(Item'First .. Mid));
44         sort(Item(Mid + 1 .. Item'Last));
45         merge(Item);
46      end if;
47   end sort;
48
49 end generic_merge_sort;

This example make extensive use of Ada array attributes. The value of mid in both the merge procedure and the sort procedure is calculated using the ‘First and ‘Last array attributes, simplifying the procedure signature for both sort and merge. Both procedures only need an instance or slice of an array passed to them.

In the merge procedure the Left and Right instances of array_type are created and initialized with slices of the array Item passed to the procedure. Ada’s ability to assign one array or slice to another array or slice of the same type eliminates the need to explicitly calculate the sizes needed for the Left and Right arrays. It also eliminates the need to explicitly copy the values iteratively.

4       Left  : array_type := Item (Item'First .. mid);
5       Right : array_type := Item (mid + 1 .. Item'Last);

For reference, here is the corresponding C code:

12    int size1 = mid - left + 1;
13    int size2 = right - mid;
14
15    // created temporary array
16    int Left[size1], Right[size2];
17
18    // copying the data from arr to temporary array
19    for (i = 0; i < size1; i++)
20        Left[i] = arr[left + i];
21
22    for (j = 0; j < size2; j++)
23       Right[j] = arr[mid + 1 + j];

A comparison of the C merge_sort function and the Ada sort procedure provides more insight to the differences between C arrays and Ada arrays.

C merge_sort:

62 void Merge_Sort(int arr[], int left, int right)
63 {
64    if (left < right)
65    {
66
67        int mid = left + (right - left) / 2;
68
69        // recursive calling of merge_sort
70        Merge_Sort(arr, left, mid);
71        Merge_Sort(arr, mid + 1, right);
72
73        Merge(arr, left, mid, right);
74    }
75 }

Ada sort:

38   procedure sort (Item : in out array_type) is
39      mid : Integer;
40   begin
41      if Item'Length > 1 then
42         mid := Item'First + (Item'Last - Item'First)/2;
43         sort(Item(Item'First .. Mid));
44         sort(Item(Mid + 1 .. Item'Last));
45         merge(Item);
46      end if;
47   end sort;

The condition in the “if” statement, while achieving the same result, is quite different. The C function relies upon the correctness of the left and right parameters passed as the second and third arguments to the function. The Ada procedure only relies upon the ‘Length attribute of the array. The Ada syntax is much less error-prone and a more direct statement to a human reader about the intention of the conditional statement. In order to get a complete understanding of the purposes of the C left and right parameters one must read the context of the main procedure, while the meaning of the Ada conditional is completely clear in the local context. Since the sort parameter type is an unconstrained array type every slice of the array is also an instance of the unconstrained array type.

The C Merge_Sort function recursively calls itself passing the beginning of the arr parameter each time but with different values for left and right.

The Ada sort function recursively calls itself passing slices of the parameter Item to each recursive call. In this manner each recursion of Sort only sees the slice passed to it and all the other elements of the array initially passed to the sort procedure from main are not available to the recursive call.

The C Merge procedure passes the full array, but also passes three parameters containing the left, mid and right index positions in the array. The Ada merge procedure only array slice passed to it by the sort procedure.

The main function in C is:

78 int main()
79 {
80     int size;
81     printf("Enter the size: ");
82     scanf("%d", &size);
83
84     int arr[size];
85     printf("Enter the elements of array: ");
86     for (int i = 0; i < size; i++)
87     {
88         scanf("%d", &arr[i]);
89     }
90
91     Merge_Sort(arr, 0, size - 1);
92
93     printf("The sorted array is: ");
94     for (int i = 0; i < size; i++)
95     {
96         printf("%d ", arr[i]);
97     }
98     printf("\n");
99     return 0;
100 }

The main procedure in Ada is:

1 with Ada.Text_IO;         use Ada.Text_IO;
2 with Ada.Integer_Text_IO; use Ada.Integer_Text_IO;
3 with generic_merge_sort;
4
5 procedure Main is
6    type int_arr is array (Integer range <>) of Integer;
7    package int_sort is new generic_merge_sort
8           (element_type => Integer, array_type => int_arr);
9    use int_sort;
10   Num_Elements : Positive;
11 begin
12   Put ("Enter the size of the array: ");
13   Get (Num_Elements);
14   declare
15      the_array : int_arr (1 .. Num_Elements);
16   begin
17      Put_Line ("Enter the array values:");
18      for value of the_array loop
19         Get (value);
20      end loop;
21      sort (the_array);
22      Put_Line ("The sorted array is:");
23      for value of the_array loop
24         Put (value'Image & " ");
25      end loop;
26      New_Line;
27   end;
28 end Main;

Lines 6 through 9 of the Ada program are needed to create an instance of the generic_merge_sort package.

6    type int_arr is array (Integer range <>) of Integer;
7    package int_sort is new generic_merge_sort
8           (element_type => Integer, array_type => int_arr);
9    use int_sort;

An inner block is declared so that an array of the size entered by the user can be created. The rest of the program is performed in this inner block.

14   declare
15      the_array : int_arr (1 .. Num_Elements);
16   begin
17      Put_Line ("Enter the array values:");
18      for value of the_array loop
19         Get (value);
20      end loop;
21      sort (the_array);
22      Put_Line ("The sorted array is:");
23      for value of the_array loop
24         Put (value'Image & " ");
25      end loop;
26      New_Line;
27   end;

The for loop used to read all the values into the array starting at line 18 is an Ada iterator loop. The loop parameter “value” becomes an alias for each array element starting at the first element and ending at the last element. Thus, each value entered by the user is placed directly into the array without explicit index notation.

The array is sorted.

21      sort (the_array);

Finally, the sorted array is output to standard output and the program ends.

Conclusion:

The C and Ada examples implement the same merge-sort algorithm to sort an array of integer values. C arrays are more primitive abstractions than are Ada arrays. The more primitive abstractions require more lines of C source code than does the Ada implementation. The C implementation also requires a more complex parameter list for both its Merge function and its Merge_Sort function. The Ada implementation concentrates on merely sorting an array without additional parameters.

The built-in characteristics of Ada arrays really do simplify the manipulation of arrays, even when those arrays are passed to an Ada function or procedure.

I learned long ago that complexity is the enemy of correctness. In these examples we see that lower complexity also provides fewer opportunities for programmer error. We also see that a programming language with a low level syntax is not necessarily simpler than a programming language with a higher level syntax. Terse is not always simpler. Terse does not always result in less typing for the programmer.

  • 5 August 2023 at 20:36

Your Amazon Affiliate Account has been closed

29 July 2023 at 00:00

Back in 2015, I wanted to build a NAS for backups at home. I found the shopping experience for hard drives on Amazon extremely frustrating. There is no sort option for Price per $ATTRIBUTE. You can only view 10 search results at a time. The first dozen results are sponsored listings. Many products are available through third party sellers on Amazon’s marketplace at prices below what’s shown in the “Buy Box.” but you’ll never see these prices unless you click through to every product.

I decided to write a Python script to find all of the hard drives on Amazon, figure out their capacity, then sort by price per GB. I found the Amazon Product Advertising API, which allows you to query their product catalog and get results back as XML (later JSON). You need an access key to use the API which is only available through the Amazon Affiliates program. So I signed up.

The idea behind the affiliates program is that you’ll generate links to Amazon with a tracking ID and get people to click on them and buy stuff. I think Amazon’s original expectation of this is that bloggers and journalists would review products and refer people to Amazon to buy them. In an ideal world, this is beneficial for everyone; merchants get more sales, Amazon gets more revenue, and affiliates get a few percent commission off the top.

Amazon won’t give new affiliates access to the Product Advertising API until they’ve referred at least three sales. So my Python script turned into a diskprices.com. I built a list of hard drives in a spreadsheet and sorted by price per GB. A few people clicked my links. Armed with an API key, I was able to programmatically generate the listing and keep it up to date. I built spam filters and moderation tools so I could remove fake products and fix errors in the listings where I found them.

More people started visiting my site and I signed up for Amazon’s affiliate program in other countries. Amazon treats each country as a separate business, so you have to do the three initial sales for each region separately.

The site kept growing in both traffic and commission income. I’ve been running diskprices.com for nearly eight years, reviewed and approved by the affiliate program administrators several times.

The problem

Amazon has automated tools that crawl affiliate sites looking for violations of their Operating Agreement. Sometimes, people will find a good deal on diskprices.com and copy the link to a forum or chat room to share with their friends. If one of Amazon’s bots finds a link with my affiliate tag on a site other than diskprices.com, they assume I’m doing something shady and flag my account. This generates a form email.

The warning email

Created at Fri, Jun 30, 2023 at 9:55 AM
From Amazon Associates <[email protected]>
To [email protected]
Subject ACTION REQUIRED – Your Amazon Affiliate Account - diskprices0a-22

Hello diskprices0a-22, Your Affiliate account is at risk of closure. Why? We reviewed your account as part of our ongoing monitoring of the Amazon Affiliate Programme. During our review, we determined that you are not in compliance with the Operating Agreement, found here: https://affiliate-program.amazon.com.au/help/operating/agreement.

While reviewing your account we noticed traffic coming to Amazon from one or more sites that are currently not included in your Associates account. To understand how your Special Links are used by customers to get to the Amazon site, we request you to provide a detailed description of: • A list of the Sites, Social media handles, Mobile applications, link sharing sites (like linktree.com, campsite.bio) on which your Special Links or banner ads are posted, • Methods you are using to get customers to your website/social media page, • Live links to your Sites, (provide links to Amazon ads/special links displayed on your website/webpage) • Any other ways in which your Special Links are used by customers to get to the Amazon site (like advertising services), Please ensure that the list of sites associated with your account is complete, accurate, and up-to-date. Please do refer to the Operating Agreement for more details.

What’s next? Within five business days please correct the violations and notify us when you are in compliance through our customer service contact page: https://affiliate-program.amazon.com.au/home/contact. Please choose the subject ‘Warning/Information Request Response’ from the drop-down menu, and be sure to reference Issue Code 83441-AU in the comments field. If you do not respond within five business days, we may close your Affiliate account and withhold commissions. More information. For more information about what we’re looking for in your response, see our Warning help content, found here: https://affiliate-program.amazon.com.au/help/node/topic/GPPXH5WSX22AUNKR. We look forward to hearing from you soon. Amazon.com.au

The response

Created at Fri, Jun 30, 2023 at 10:18 AM
From Jeremy Grosser <[email protected]>
To Amazon.com.au Affiliate Program (replied via web form)
Subject Re: ACTION REQUIRED – Your Amazon Affiliate Account - diskprices0a-22

Issue Code 83441-AU

• A list of the Sites, Social media handles, Mobile applications, link sharing sites (like linktree.com, campsite.bio) on which your Special Links or banner ads are posted,

I only post links on diskprices.com. If my links have been posted elsewhere, it’s because people have copy/pasted them from diskprices.com. I have no control over this.

• Methods you are using to get customers to your website/social media page,

I do not promote my site, inbound traffic to diskprices.com is organic. Here’s a list of referrers reported by browsers in the last 24 hours for all of diskprices.com, not limited to the amazon.com.au section. For the social media sites (reddit, fark, etc), I did not create those posts and have no relationship with the users that posted them.

Redacted list of referrer URLs, including forums.overclockers.com.au

• Live links to your Sites, (provide links to Amazon ads/special links displayed on your website/webpage)

All links to amazon.com.au originate from https://diskprices.com/?locale=au

• Any other ways in which your Special Links are used by customers to get to the Amazon site (like advertising services),

I do not pay for advertising, search traffic, or SEO services. I just built a website that some people find useful and share with their friends.

The waiting

I’ve had similar exchanges with the affiliate programs for Amazon.com, Amazon.co.uk, Amazon.ca, Amazon.it, Amazon.de, Amazon.fr, Amazon.es, and Amazon.se. They always wait 10 days after sending the warning email to review your response. After 10 days, the flag on your account either goes away with no further communication, or your account is deleted.

I didn’t hear anything after 10 days, so I thought I was safe and had provided sufficient evidence to prove that I’m not violating the Operating Agreement.

One month later…

The bad news

Created at: Sat, Jul 29, 2023 at 5:20 AM
From: Amazon Associates <[email protected]>
To: [email protected]
Subject: Your Amazon Affiliate Account has been closed - diskprices0a-22

Hello diskprices0a-22, Effective today, Amazon is terminating your Affiliate account as well as the Operating Agreement that governs it (link below). Why? We reviewed your account as part of our ongoing monitoring of the Amazon Affiliate Programme. During our review, we determined that you are not in compliance with the Operating Agreement, found here: https://affiliate-program.amazon.com.au/help/operating/agreement. The violations include the following:

You are referring traffic from a site that is not your Site. An example can be found here: https://forums.overclockers.com.au/ While reviewing your account we noticed traffic coming to Amazon from one or more site/social media handle/mobile application that are currently not included in your Associates account. Please ensure that the list of site/social media handle/mobile application associated with your account is complete, accurate, and up-to-date. Please do refer to the Operating Agreement for more details https://affiliate-program.amazon.com.au/help/operating/policies We previously issued you a non-compliance warning regarding your Associates account. You did not respond or come into compliance within the specified time.

What’s next? You must stop using all Amazon Program Content (for example: Images, trademarks and other information in connection with the Associates Program) and promptly remove all links to the Amazon site. Because you are not in compliance with the Operating Agreement, Amazon will not pay you any outstanding commission income. Please be aware that any other related accounts may be closed without payment of any commissions. Amazon reserves all other rights and claims. In limited cases this closure may be appealable. See Appeals help for more information, found here: https://affiliate-program.amazon.com.au/help/node/topic/GACDBRFKVDTXSPTH. Amazon.com.au

The point

Amazon.com.au approved of my account for the last four years and has paid significant commissions. I don’t even have an account on the forum they claim I’m spamming with affiliate links.

diskprices.com no longer serves data from amazon.com.au

Threads of Confusion

Many programming languages support concurrent behavior through the creation of threads and the explicit manipulation of semaphores, locks or mutexes. The C language seems to have initiated this approach to handling concurrency.

C example

The following example of using a mutex with thread comes from C Program to Show Thread Interface and Memory Consistency Errors – GeeksforGeeks

1       // C program to use a mutex to avoid memory consistency

2       // errors

3       #include <pthread.h>

4       #include <stdio.h>

5       #include <stdlib.h>

6

7       // Global variable that will be shared among threads

8       int shared_counter = 0;

9

10       // Mutex to protect the shared counter

11       pthread_mutex_t shared_counter_mutex

12       = PTHREAD_MUTEX_INITIALIZER;

13

14       // Function that will be executed by each thread

15       void* thread_function(void* thread_id)

16       {

17               // Get the thread ID

18               long tid = (long)thread_id;

19

20               // Lock the mutex to protect the shared counter

21               pthread_mutex_lock(&shared_counter_mutex);

22

23               // Increment the shared counter

24               shared_counter++;

25

26               // Print the thread ID and the updated value of the

27               // shared counter

28               printf("Thread %ld: shared_counter = %d\n", tid,

29                             shared_counter);

30

31               // Unlock the mutex

32               pthread_mutex_unlock(&shared_counter_mutex);

33

34               // Return NULL to indicate successful execution of the

35               // thread

36               return NULL;

37   }

38

39       int main(int argc, char* argv[])

40       {

41               // Check if the number of arguments is correct

42               if (argc != 2) {

43                       printf("Usage: %s <number_of_threads>\n", argv[0]);

44                       exit(EXIT_FAILURE);

45       }

46

47               // Get the number of threads to create from the command

48               // line arguments

49               int num_threads = atoi(argv[1]);

50

51               // Create an array of pthread_t structures to store the

52               // thread IDs

53               pthread_t* threads = (pthread_t*)malloc(

54                       num_threads * sizeof(pthread_t));

55

56               // Create the specified number of threads

57               for (int i = 0; i < num_threads; i++) {

58                       int status = pthread_create(

59                               &threads[i], NULL, thread_function, (void*)i);

60                       if (status != 0) {

61                               printf("Error: pthread_create() returned error "

62                                             "code %d\n",

63                                             status);

64                               exit(EXIT_FAILURE);

65           }

66       }

67

68               // Wait for all threads to finish execution

69               for (int i = 0; i < num_threads; i++) {

70                       int status = pthread_join(threads[i], NULL);

71                       if (status != 0) {

72                               printf("Error: pthread_join() returned error "

73                                             "code %d\n",

74                                             status);

75                               exit(EXIT_FAILURE);

76           }

77       }

78

79               // Free the memory allocated for the thread IDs

80               free(threads);

81

82               // Print the final value of the shared counter

83               printf("Final value of shared_counter: %d\n",

84                             shared_counter);

85

86               // Return success

87               return 0;

88   }

The C threading model uses the pthread library to create threads. The behavior of each created thread is controlled by the function passed to the thread as part of the pthread_create function parameter list. First an array of pthread_t structures is created:

53               pthread_t* threads = (pthread_t*)malloc(

54                       num_threads * sizeof(pthread_t));

Next, the Id number of each thread is created as the function named thread_function is passed to each element of the thread as a parameter to the pthread_create function:

57               for (int i = 0; i < num_threads; i++) {

58                       int status = pthread_create(

59                               &threads[i], NULL, thread_function, (void*)i);

60                       if (status != 0) {

61                               printf("Error: pthread_create() returned error "

62                                             "code %d\n",

63                                             status);

64                               exit(EXIT_FAILURE);

65           }

66       }

The third parameter to the pthread_create command is the function passed to the newly created thread. The fourth argument is the parameter passed to the function passed to the thread. Not only is the thread function parameter passed as the fourth argument to the pthread_create function, it is also cast to a pointer to void. This bit of syntax can be confusing because the value being passed is not a pointer, but rather an int.

Let's now look inside the function named thread_function, but before that, let's look at the creation of the pthread_mutex used to control access to the shared counter.

10       // Mutex to protect the shared counter

11       pthread_mutex_t shared_counter_mutex

12       = PTHREAD_MUTEX_INITIALIZER;

Yes, the pthread_mutex_t instance used to control access to the shared counter is itself a shared instance of a type. You might also see that the shared_counter_mutex is only loosely connected to the shared_counter integer variable. That loose connection is a voluntary connection not enforced by any syntax in the C pthreads library.

The thread_function is defined as:

14       // Function that will be executed by each thread

15       void* thread_function(void* thread_id)

16       {

17               // Get the thread ID

18               long tid = (long)thread_id;

19

20               // Lock the mutex to protect the shared counter

21               pthread_mutex_lock(&shared_counter_mutex);

22

23               // Increment the shared counter

24               shared_counter++;

25

26               // Print the thread ID and the updated value of the

27               // shared counter

28               printf("Thread %ld: shared_counter = %d\n", tid,

29                             shared_counter);

30

31               // Unlock the mutex

32               pthread_mutex_unlock(&shared_counter_mutex);

33

34               // Return NULL to indicate successful execution of the

35               // thread

36               return NULL;

37   }

The thread function must explicitly lock the shared_counter mutex, then increment the shared_counter, then output the current value of the shared_counter and finally unlock the shared_counter_mutex.

A major source of confusion

The order of the locking modifying and unlocking the mutex is critical. The shared_counter knows nothing of these locks. The pthread_mutex_t has no syntactical connection to the shared_counter. On the other hand, failing to lock, perform operations and then unlock the mutex will result in semantic failures which prevent the mutex from properly locking the shared_counter and then unlocking the shared counter after the operations are completed. Even worse, there is no syntax in the C pthread library prohibiting a thread from simply accessing the shared_counter while completely ignoring the mutex lock and unlock behaviors.

Further program issues

The remainder of the program calls pthread_join, forcing the main thread to wait until all the pthreads have completed before continuing to execute its sequence of instructions.

68               // Wait for all threads to finish execution

69               for (int i = 0; i < num_threads; i++) {

70                       int status = pthread_join(threads[i], NULL);

71                       if (status != 0) {

72                               printf("Error: pthread_join() returned error "

73                                             "code %d\n",

74                                             status);

75                               exit(EXIT_FAILURE);

76           }

77       }

It is important to understand that the pthread_join function called for each thread can fail and return failure status.

Once the pthreads have completed the program frees all the pointers in the array of pointers to pthread_t that were created to create the array of threads. This is done because it is always correct to free dynamically allocated memory when that memory is no longer needed in the program.

While this action is not particularly confusing, it is yet another detail that should be explicitly implemented.

Ada Example

The Ada programming language has tasking, which is roughly equivalent to threads, built into the core language since the first Ada language standard in 1983. In the 1995 standard protected types were added to the core language, which allow asynchronous communication between tasks. There are no special Ada libraries to implement tasking or protected types.

A protected type or protected object is protected against inappropriate simultaneous access to a shared data structure. A protected type or object is built very much in the philosophy of Object Oriented Programming inasmuch as the protected object has programmer-defined behaviors, however the locking and unlocking of the protected object is performed implicitly by the object itself and not through explicit lock and unlock methods called by the task accessing the protected object.

There are three kinds of protected methods.

  • Protected procedures – Protected procedures control an unconditional exclusive read-write lock on the protected object. The task calling the protected procedure can access the protected object whenever the object is unlocked.

  • Protected entries – Protected entries control a conditional exclusive read-write lock on the protected object. The calling task can only execute the entry when its boundary condition evaluates to True and only then can the task implicitly acquire the exclusive read-write lock.

  • Protected functions – Protected functions control an unconditional shared read lock on the protected object. This allows multiple tasks to simultaneously read from the protected object at the same time, but prevents any task to execute any procedure or entry while tasks are calling protected functions.

The following Ada example creates a protected object implementing the shared counter. The protected object in this example implements one procedure and one function. The source code for this program is:

1       with Ada.Text_IO;         use Ada.Text_IO;

2       with Ada.Integer_Text_IO; use Ada.Integer_Text_IO;

3

4       procedure Main is

5             Num_Tasks : Positive;

6

7             -- protected object shared by all the tasks

8             protected counter is

9                   procedure update (The_Count : out Natural);

10                   function get_value return Natural;

11             private

12                   count : Natural := 0;

13             end counter;

14

15             protected body counter is

16                   procedure update (The_Count : out Natural) is

17                   begin

18                         count     := count + 1;

19                         The_Count := count;

20                   end update;

21                   function get_value return Natural is

22                   begin

23                         return count;

24                   end get_value;

25             end counter;

26

27             -- Define the task type

28             task type thread is

29                   entry set_id (Id : in Positive);

30             end thread;

31

32             task body thread is

33                   Me       : Positive;

34                   My_Count : Natural;

35             begin

36                   -- accept the set_id entry call from the main task

37                   accept set_id (Id : in Positive) do

38                         Me := Id;

39                   end set_id;

40

41                   counter.update (My_Count);

42                   Put_Line ("Task" & Me'Image & ": counter =" & My_Count'Image);

43             end thread;

44

45       begin

46             Put ("Enter the number of tasks to create: ");

47             Get (Num_Tasks);

48

49             -- declare an inner block in which all the tasks will execute

50             -- the inner block will only complete after all tasks have completed

51             declare

52                   -- create an array of thread objects

53                   pool : array (1 .. Num_Tasks) of thread;

54             begin

55                   for I in pool'Range loop

56                         -- set the id number of each thread object

57                         pool (I).set_id (I);

58                   end loop;

59             end;

60

61      -- output the total after all threads have completed

62      Put_Line

63        ("The final value of the shared counter:" &

64         Natural'Image (counter.get_value));

65

66   end Main;

Ada allows functions, procedures, protected objects and task types to be declared within any function, procedure or task type. This means all items declared in this program are completely local to this program and not “static” as are multiple functions created in the same file in C.

Ada variables are declared by stating the name of the variable, followed by a colon, followed by the type of the variable. This is opposite the order of declaring variables in C where the type is listed first followed by the variable name.

5             Num_Tasks : Positive;

The variable name is Num_Tasks. The type is Positive, which is a pre-defined subtype of the Ada Integer type. Integer is equivalent to the C int type. Positive is an Integer with a minimum possible value of 1 and a maximum possible value of Integer'Last, which is the same as MAX_INT in C. This variable is used to contain the number of tasks the user specifies during the execution of the program.

The protected object named counter is declared in two parts. Protected objects are always declared in two parts. The protected specification defines the public view of the protected methods along with a private view of the data elements in the protected object. The protected body contains the definition of the protected methods, and is not visible to any task calling the protected object.

The protected specification for the counter protected object is:

8             protected counter is

9                   procedure update (The_Count : out Natural);

10                   function get_value return Natural;

11             private

12                   count : Natural := 0;

13             end counter;

There are two methods for this protected object. The procedure named update has one parameter which passes a value of type Natural out to the calling task. Natural is a predefined subtype of Integer with a minimum value of 0. The function named get_value returns a value of the subtype Natural.

In the private section of the protected specification one data element is declared. It is a variable named count. The type of the variable is Natural and it is initialized to 0.

The protected body for the counter protected object is:

15             protected body counter is

16                   procedure update (The_Count : out Natural) is

17                   begin

18                         count     := count + 1;

19                         The_Count := count;

20                   end update;

21                   function get_value return Natural is

22                   begin

23                         return count;

24                   end get_value;

25             end counter;

The update procedure increments count and passes its current value out through the parameter named The_Count.

The function get_value simply returns the value of count. The Ada compiler will issue an error message if the programmer attempts to modify the count data member of the protected object within the execution of the get_value function. Functions are read-only methods in a protected object.

The procedure update implicitly implements a read-write lock and the function get_value implicitly implements a read lock.

The task type named thread is defined in two parts. The first part, the task specification, defines that name of the task type and the direct interfaces to the task type. In this case one task entry is defined. That entry is used to set the task ID.

28             task type thread is

29                   entry set_id (Id : in Positive);

30             end thread;

The task entry implements a direct synchronous communication channel to each instance of the thread type. The entry synchronization scheme is called a Rendezvous. The word "rendezvous" is a French word meaning "a meeting at an agreed time and place". A task calling another task's entry is will suspend until the task declaring the entry accepts the entry call. Similarly, a task accepting an entry will suspend until another task calls that entry. Once both the task calling the entry and the task accepting the entry are in this state for an overlapping time period any data specified in the entry specification is passed between the calling task and the called task. Once the entry has completed both tasks continue to execute concurrently.

The behavior of the task type is defined in the task body.

32             task body thread is

33                   Me       : Positive;

34                   My_Count : Natural;

35             begin

36                   -- accept the set_id entry call from the main task

37                   accept set_id (Id : in Positive) do

38                         Me := Id;

39                   end set_id;

40

41                   counter.update (My_Count);

42                   Put_Line ("Task" & Me'Image & ": counter =" & My_Count'Image);

43             end thread;

The task body declared two local variables. Each instance of the task type has unique instances of these two variables.

The first action taken by the thread task is to accept the set_id entry, passing the Id value from a calling task to this task. The accept statement assigns the value of Id to the task's local variable named Me. Task entries implement a Rendezvous behavior. The thread task will wait at the accept statement until its entry is called by another task. Once the value is passed to the thread task both tasks will the resume concurrent behavior.

The thread task calls the protected object's update procedure, using its local variable My_Count to receive the current value of the counter protected object. Finally, the thread task simply outputs the value it received from the counter.update procedure.

The next “begin” begins the execution of the Main task.

45       begin

46             Put ("Enter the number of tasks to create: ");

47             Get (Num_Tasks);

48

49             -- declare an inner block in which all the tasks will execute

50             -- the inner block will only complete after all tasks have completed

51             declare

52                   -- create an array of thread objects

53                   pool : array (1 .. Num_Tasks) of thread;

54             begin

55                   for I in pool'Range loop

56                         -- set the id number of each thread object

57                         pool (I).set_id (I);

58                   end loop;

59             end;

60

61      -- output the total after all threads have completed

62      Put_Line

63        ("The final value of the shared counter:" &

64         Natural'Image (counter.get_value));

65

66   end Main;

The Main task prompts the user for the number of tasks to create and reads the number entered by the user.

An inner block is created starting at the “declare” reserved word. That inner block has a declarative section in which an array of thread tasks, equal in number to the value entered by the user, is created. The tasks in the array named pool begin executing immediately. Their first responsibility is to accept the set_id entry, so each task waits until its set_id entry is called.

The “for” loop iterates through each element in the pool array assigning the element's index value as the ID for the task.

Immediately after accepting its ID number each task executes counter.update and then outputs the value of the counter passed back through the counter.update procedure.

The inner block will only complete when all the tasks created within the block complete. This produces the same effect as the “join” command in the C example.

After the inner block completes the Main task calls the counter.get_value function and displays the final value of the shared counter. Completion of the inner block automatically frees the array of task objects which were created on the stack at the start of the inner block. No explicit loop to free the task elements is needed or even possible.

Conclusion

While the two programs above achieve the same behavior dealing with allowing multiple threads or tasks to update a shared counter, the C solution contains more opportunities for programmer confusion and error. It also requires a lot more coding by the programmer than the Ada solution.

  • 22 July 2023 at 01:49

HAC version 0.26

8 July 2023 at 06:40
Note for subscribers: if you are interested in my Ada programming articles only, you can use this RSS feed link.

Main URL: https://hacadacompiler.sourceforge.io/
Sources, site #1: HAC Ada Compiler download | SourceForge.net
Sources, site #2: GitHub - zertovitch/hac: HAC Ada Compiler - a small, quick Ada compiler fully in Ada
Alire Crate: Alire - Hac

What’s new:

  • You can use a loop’s name for the exit statement: exit Loop_Name.
  • You can exit multiple, nested, loops, by using exit Loop_Name.
  • Ada semantics are better verified:
    • Array indexing (i)(j) (array of array) and (i, j) (multi-dimensional array) are no more treated as equivalent (this feature was a remnant of the Pascal syntax).
    • Separation between Type and Subtype declarations (anonymous types are allowed only in the few cases foreseen by the language).
    • Operators of the HAT package (+, -, & for strings) are visible without prefix only in the scope of a use HAT clause.

Note that correct Ada programs, in relation to the above points, were already accepted and parsed correctly by HAC before that change.

Finally, a bit of cosmetics in the build messages:

C:\Ada\hac\exm>..\hac -v2 -c pkg_demo.adb

*******[ HAC ]*******   HAC is free and open-source. Type "hac" for license.
       [ HAC ]          HAC Ada Compiler version 0.26, 08-Jul-2023
       [ HAC ]          Compiling main: pkg_demo.adb
       [ HAC ]          | Compiling x_pkg_demo_s.ads (specification)
       [ HAC ]           \ Compiling x_pkg_demo_s1.ads (specification)
       [ HAC ]            \ Compiling x_pkg_demo_s11.ads (specification)
       [ HAC ]            /           x_pkg_demo_s11.ads: done.
       [ HAC ]            \ Compiling x_pkg_demo_s12.ads (specification)
       [ HAC ]            /           x_pkg_demo_s12.ads: done.
       [ HAC ]            \ Compiling x_pkg_demo_s13.ads (specification)
       [ HAC ]            /           x_pkg_demo_s13.ads: done.
       [ HAC ]           /           x_pkg_demo_s1.ads: done.
       [ HAC ]           \ Compiling x_pkg_demo_s2.ads (specification)
       [ HAC ]            \ Compiling x_pkg_demo_s21.ads (specification)
       [ HAC ]            /           x_pkg_demo_s21.ads: done.
       [ HAC ]            \ Compiling x_pkg_demo_s22.ads (specification)
       [ HAC ]            /           x_pkg_demo_s22.ads: done.
       [ HAC ]            \ Compiling x_pkg_demo_s23.ads (specification)
       [ HAC ]            /           x_pkg_demo_s23.ads: done.
       [ HAC ]           /           x_pkg_demo_s2.ads: done.
       [ HAC ]           \ Compiling x_pkg_demo_s3.ads (specification)
       [ HAC ]            \ Compiling x_pkg_demo_s31.ads (specification)
       [ HAC ]            /           x_pkg_demo_s31.ads: done.
       [ HAC ]            \ Compiling x_pkg_demo_s32.ads (specification)
       [ HAC ]            /           x_pkg_demo_s32.ads: done.
       [ HAC ]            \ Compiling x_pkg_demo_s33.ads (specification)
       [ HAC ]            /           x_pkg_demo_s33.ads: done.
       [ HAC ]           /           x_pkg_demo_s3.ads: done.
       [ HAC ]          |           x_pkg_demo_s.ads: done.
       [ HAC ]          | Compiling x_pkg_demo_m.ads (specification)
       [ HAC ]          |           x_pkg_demo_m.ads: done.
       [ HAC ]          | Compiling x_pkg_demo_b.ads (specification)
       [ HAC ]          |           x_pkg_demo_b.ads: done.
       [ HAC ]          Compilation of pkg_demo.adb (main) completed
       [ HAC ]          ------  Compilation of eventual with'ed unit's bodies
       [ HAC ]          | Compiling x_pkg_demo_s.adb (body)
       [ HAC ]          |           x_pkg_demo_s.adb: done.
       [ HAC ]          | Compiling x_pkg_demo_s1.adb (body)



Bit Arrays

 

A bit array is an array that compactly stores bits. A bit may represent the values 0 or 1, or it can be viewed as representing the values False or True.

The C language has a somewhat loose interpretation of Boolean values, treating 0 as false and non-zero as true. The Ada programming language defines the Boolean type as the enumeration (False, True). Since that enumeration has only two values the Boolean type has only two position values. False is at position 0 and True is at position 1.

C array indexing is connected to memory addresses. The typical smallest memory addressable value is a byte, which is typically 8 bits. Indexing a bit array, where each bit within a string of bytes has its own index value requires some bit gymnastics.

The following C example is taken from https://www.sanfoundry.com/c-program-implement-bit-array/


Two functions are created.

The function toggle_bit toggles the bit at the index value specified by the function’s second parameter.

The function get_bit returns the numeric value of the bit at the index value specified by the function’s second value.

The output of the program is:

0: 1

1: 0

2: 1

3: 0

4: 1

5: 0

6: 1

7: 0

8: 1

9: 0

10: 1

11: 0

12: 1

13: 0

14: 1

15: 0

16: 1

17: 0

18: 1

19: 0

20: 1

21: 0

22: 1

23: 0

24: 1

25: 0

26: 1

27: 0

28: 1

29: 0

30: 1

31: 0

32: 1

33: 0

34: 1

35: 0

36: 1

37: 0

38: 1

39: 0

40: 1

41: 0

42: 1

43: 0

44: 1

45: 0

46: 1

47: 0

48: 1

49: 0

50: 1

51: 0

52: 1

53: 0

54: 1

55: 0

56: 0

57: 0

 

The Ada language allows the declaration of packed arrays of Boolean to represent each Boolean as a single bit.

Ada allows the programmer to use normal array index notation to access each element of a bit array.

The following Ada implementation of a bit array produces the same behavior as the C program except that the Ada program also displays the number of elements in the array and the number of bits occupied by an instance of the array.

An unconstrained array type named Bit_Array is declared. Each element of Bit_Array is a Boolean. Every instance of Bit_Array is packed, which causes the Boolean values to be represented by single bits.

Similar to the C program, the variable X is declared to be an instance of Bit_Array containing 58 elements. Whereas the C program uses a macro to create the array of bits, the Ada program uses normal Ada array declaration syntax. In C all the elements of the array are initialized to 0. In Ada all elements of the array are initialized to False.

The “while” loop in the Ada program toggles every other element in the array. The element indexed by the value 56 is then toggled again.

The “for” loop prints the Boolean position of each element of the array.

The output of the Ada program is

X contains 58 elements.

X occupies 64 bits

 

 0: 1

 1: 0

 2: 1

 3: 0

 4: 1

 5: 0

 6: 1

 7: 0

 8: 1

 9: 0

 10: 1

 11: 0

 12: 1

 13: 0

 14: 1

 15: 0

 16: 1

 17: 0

 18: 1

 19: 0

 20: 1

 21: 0

 22: 1

 23: 0

 24: 1

 25: 0

 26: 1

 27: 0

 28: 1

 29: 0

 30: 1

 31: 0

 32: 1

 33: 0

 34: 1

 35: 0

 36: 1

 37: 0

 38: 1

 39: 0

 40: 1

 41: 0

 42: 1

 43: 0

 44: 1

 45: 0

 46: 1

 47: 0

 48: 1

 49: 0

 50: 1

 51: 0

 52: 1

 53: 0

 54: 1

 55: 0

 56: 0

 57: 0

Conclusion:

Both C and Ada can implement bit arrays. The Ada implementation is simpler and easier to understand than is the C implementation.


  • 4 July 2023 at 01:33

Comparing Programming Languages Part 1 Scalar Ranges

 Overview

It is often asserted that all general purpose programming languages can solve the same set of problems. This means that no one programming language has functional advantages over any other programming language.

That assertion is only mostly true, which means it is false.

For example, weakly typed languages can perform and make use of implicit type conversions while strongly typed languages cannot. Strongly typed languages must employ explicit conversions to achieve a similar effect.

The purpose of this article is to begin discussing some of the things that are difficult in one commonly used language and relatively easy in another.

Scalar Ranges

Programming languages derived from Pascal syntax allow scalar types and subtypes to be defined by the programmer, while programming languages derived from C syntax do not allow the programmer to define scalar types or subtypes.

In C++, for example, a class must be declared encapsulating the behavior of a scalar type with a programmer specified range of values. Making a subtype of that class then requires the creation of an inherited class expressing the restrictions distinguishing the subtype.

In C++ enums are encapsulated in a class as illustrated by the following Stack Overflow issue:

How can I implicitly convert an enum to its subset and vice versa in C++?

More precisely, the feature I want is like implicitly convert an enum to its subset enum and vice versa.

The code I wish it working:

enum class Human {

    A = 1,

    B = 2,

};

 

enum class Male {    // subset of Human

    A = Human::A,

};

 

enum class Female {    // subset of Human

    B = Human::B,

};

 

 

// some functions can handle all humans

void human_func(Human h) {

    // ...

}

 

// some only take a subset of humans

void male_func(Male m) {

    // ...

}

 

void female_func(Female m) {

    // ...

}

 

 

// and user only uses values of Human as token

constexpr auto SOMEONE = Human::A;

 

int main() {

    human_func(SOMEONE);  // ok

    male_func(SOMEONE);   // also ok, Human::A implicitly converted to Male

    female_func(SOMEONE); // failed, can't convert Human::A to Female.

}

 

But enum cannot do the conversion. Now I have two options:

// 1. static_assert with template parameter

 

template <Human H>

void female_func() {

    static_assert(H == Human::B);

    // ...

}

 

// 2. manually convert it

 

#define _ENUM_TO_ENUM(e1, e2) \

    static_cast<e2>(static_cast<std::underlying_type_t<decltype(e1)>>(e1))

 

void female_func(_ENUM_TO_ENUM(SOMEONE, Female)) {

    // But this way the compiler does not check if the value is valid.

    // I can put anything in.

    // ...

}

 

 

As is shown above, the concept of a scalar range and subrange is complicated by the need in C++ to express such a type as a class.

One answer provided to this question is

enum class Gender {MALE, FEMALE};

 

struct Human

{

    Gender m_gender;

    Human(Gender g) : m_gender{g}

    {}

    virtual ~Human() = default;

};

 

struct Man : public Human

{

    Man() : Human{Gender::MALE}

    {}

};

struct Woman : public Human

{

    Woman() : Human(Gender::FEMALE)

    {}

};

 

void human_func(const Human & h)

{

    //...

}

void man_func(const Man & m)

{

    //...

}

void woman_func(const Woman & w)

{

    //...

}

 

It is clear that this approach may work for an enum with 2 values, but becomes unusable with an enum containing tens or hundreds of values.

The Ada programming language, on the other hand, uses the concept of scalar ranges and subtypes extensively.

The Character type in Ada is an enumeration type with the range of values expressed as nul .. 'ÿ'. The ASCII characters are a subset of the Character type with value in the range of nul .. del. Within the ASCII characters the upper case characters are the range ‘A’ .. ‘Z’ and the lower characters are the range ‘a’ .. ‘z’.

If the programmer wants to pass only upper case characters as a parameter to a procedure the procedure can be defined as

subtype Upper is range (‘A’ .. ‘Z’);

procedure Upper_Action(U : Upper);

This procedure will only accept characters in the range specified by the subtype Upper.

A function that counts all the upper case characters in a string can be defined as

function Count_Uppers (S : in String) return Natural is

   Count : Natural := 0;

begin

   for value of S loop

      if S in Upper then

         Count := Count + 1;

      end if;

    return Count;

end Count_Uppers;

The Ada samples above exhibit the behavior and usage requested by the person using C++ in the Stack Overflow question above.

The Ada program is not encumbered with the heavy syntax and rules associated with C++ classes.

  • 2 June 2023 at 23:23

Recursion – implementing sum

By: spqr
23 May 2023 at 03:06

There are lots of easy to understand recursive algorithms. One of the easiest is is a function sum(x) which sums up all the integers from 1 to x. Here is the function implemented in Ada.

function sum(n: integer) return integer is
begin
   if n = 1 then 
      return 1;
   else 
      return (n + sum(n-1));
   end if;
end sum;

Here the function sum(x) recurses until the value of x becomes 0. At this point the recursion is essentially done, and the calls to sum() backtrack. This is shown in the summary below for sum(5).

sum(5) = (5 + sum(4))                          recursive call sum(4)
       = (5 + (4 + sum(3)))                    recursive call sum(3)
       = (5 + (4 + (3 + sum(2))))              recursive call sum(2)
       = (5 + (4 + (3 + (2 + sum(1)))))        recursive call sum(1), return 1
       = (5 + (4 + (3 + (2 + 1))))             return (2 + 1)
       = (5 + (4 + (3 + 3)))                   return (3 + 3)
       = (5 + (4 + 6))                         return (4 + 6)
       = (5 + 10)                              return (5 + 10)
       = 15

There are four recursive calls to sum() in addition to the original call, which is not considered recursive, because the function may actually terminate, e.g. if sum(1) is invoked. So if the function were to call sum(10000), there would be 9,999 recursive calls. The problem with recursion of course is that many of these simplistic algorithms are just as easy to implement as iterative algorithms.

Here is the same algorithm represented iteratively:

function sumi(n: integer) return integer is
   s : integer;
begin
   s := 0;
   for i in 1..n loop
      s := s + i;
   end loop;
   return s;
end sumi;

Reentrant scanner and parser with Aflex and Ayacc

14 May 2023 at 19:02
[Aflex and Ayacc](Ada/Aflex-Ayacc-code.jpg)
    1. What's new in Aflex 1.6

- Support the flex options `%option output`, `%option nooutput`, `%option yywrap`, `%option noinput`,

 `%option noyywrap`, `%option unput`, `%option nounput`, `%option bufsize=NNN` to better control the
 generated `_IO` package.
- Aflex(https://github.com/Ada-France/aflex) templates provide more control for tuning the code generation and
 they are embedded with [Advanced Resource Embedder](https://gitlab.com/stcarrez/resource-embedder)
- Support to define Ada code block in the scanner that is inserted in the generated scanner - New option -P to generate a private Ada package for DFA and IO - New directive `%option reentrant` and `%yyvar` to generate a recursive scanner - New directive `%yydecl` to allow passing parameters to `YYLex`
 or change the default function name

Example of `%option` directives to tell Aflex(https://github.com/Ada-France/aflex) to avoid generating several function or procedures and customize the buffer size.

```Ada %option nounput %option noinput %option nooutput %option noyywrap %option bufsize=1024 ```

The tool supports some code block injection at various places in the generated scanner. The code block has the following syntax where `<block-name>` is the name of the code block:

```Ada %<block-name> {

 -- Put Ada code here

} ```

The `%yytype` code block can contain type declaration, function and procedure declarations. It is injected within the `YYLex` function in the declaration part. The `%yyinit` code block can contain statements that are executed at beginning of the `YYLex` function. The `%yyaction` code block can contain statements that are executed before running any action. The `%yywrap` code block can contain statements which are executed when the end of current file is reached to start parsing a next input.

    1. What's new in Ayacc 1.4

- Support the Bison `%define variable value` option to configure the parser generator - Support the Bison `%code name { ... }` directive to insert code verbatim into the output parser - Recognize some Bison variables `api.pure`, `api.private`, `parse.error`, `parse.stacksize`,

 `parse.name`, `parse.params`, `parse.yyclearin`, `parse.yyerrok`, `parse.error`
- New option `-S skeleton` to allow using an external skeleton file for the parser generator - Ayacc(https://github.com/Ada-France/ayacc) templates provide more control for tuning the code generation and
 they are embedded with [Advanced Resource Embedder](https://gitlab.com/stcarrez/resource-embedder)
- New option `-P` to generate a private Ada package for the tokens package - Improvement to allow passing parameters to `YYParse` for the grammar rules - New `%lex` directive to control the call of `YYLex` function - Fix #6: ayacc gets stuck creating an infinitely large file after encountering a comment in an action

The generator supports two code block injections, the first one `decl` is injected in the `YYParse` procedure declaration and the `init` is injected as first statements to be executed only once when the procedure is called. The syntax is borrowed from the Bison parser:

```Ada %code decl {

  -- Put Ada declarations

} %code init {

  -- Put Ada statements

} ```

Some other Bison like improvements have been introduced to control the generation of the parser code.

``` %define parse.error true %define parse.stacksize 256 %define parse.yyclearin false %define parse.yyerrok false %define parse.name MyParser ```

    1. How to use

The easiest way to use Ayacc(https://github.com/Ada-France/ayacc) and Aflex(https://github.com/Ada-France/aflex) is to use Alire(https://github.com/alire-project/alire), get the sources, build them and install them. You can do this as follows:

``` alr get aflex cd aflex_1.6.0_b3c21d99 alr build alr install alr get ayacc cd ayacc_1.4.0_c06f997f alr build alr install ```

  • UPDATE*: the `alr install` command is available only with Alire(https://github.com/alire-project/alire) 2.0.

Using these tools is done in two steps:

1. a first step to call `aflex` or `ayacc` command with the scanner file or grammar file, 2. a second step to call `gnatchop` to split the generated file in separate Ada files

For example, with a `calc_lex.l` scanner file, you would use:

``` aflex calc_lex.l gnatchop -w calc_lex.ada ```

And with a `calc.y` grammar file:

``` ayacc calc.y gnatchop -w calc.ada ```

To know more about how to write a scanner file or grammar file, have a look at Aflex 1.5 and Ayacc 1.3.0(https://blog.vacs.fr/vacs/blogs/post.html?post=2021/12/18/Aflex-1.5-and-Ayacc-1.3.0) which explains more into details some of these aspects.

    1. Highlight on reentrancy

By default Aflex(https://github.com/Ada-France/aflex) and Ayacc(https://github.com/Ada-France/ayacc) generate a scanner and a parser which use global variables declared in a generated Ada package. These global variables contain some state about the scanner such as the current file being scanned. The Ayacc(https://github.com/Ada-France/ayacc) parser generates on its side two global variables `YYLVal` and `YYVal`.

Using global variables creates some strong restrictions when using the generated scanner and parser: we can scan and parse only one file at a time. It cannot be used in a multi-thread environment unless the scan and parse is protected from concurrent access. We cannot use easily some grammars that need to recurse and parse another file such as an included file.

      1. Reentrant scanner

The reentrant scanner is created by using the `-R` option or the `%option reentrant` directive. The scanner will then need a specific declaration with a context parameter that will hold the scanner state and variables. The context parameter has its type generated in the `Lexer_IO` package. The `%yydecl` directive in the scanner file must be used to declare the `YYLex` function with its parameters. By default the name of the context variable is `Context` but you can decide to customize and change it to another name by using the `%yyvar` directive.

``` %option reentrant %yyvar Context %yydecl function YYLex (Context : in out Lexer_IO.Context_Type) return Token ```

When the `reentrant` option is activated, Aflex(https://github.com/Ada-France/aflex) will generate a first `Context_Type` limited type in the `Lexer_DFA` package and another one in the `Lexer_IO` package. The generator can probably be improved in the future to provide a single package with a single type declaration. The `Lexer_DFA` package contains the internal data structures for the scanner to maintain its state and the `Lexer_IO` package holds the input file as well as the `YYLVal` and `YYVal` values.

      1. Reentrant parser

On its side, Ayacc(https://github.com/Ada-France/ayacc) uses the `YYLVal` and `YYVal` variables. By default, it generates them in the `_tokens` package that contains the list of parser symbols. It must not generate them and it must now use the scanner `Context_Type` to hold them as well as the scanner internal state. The setup requires several steps:

1. The reentrant parser is activated by using the `%define api.pure`

  directive similar to the [bison %define](https://www.gnu.org/software/bison/manual/html_node/_0025define-Summary.html).

2. The `%lex` directive must be used to define how the `YYLex` function must be called since it now has some

  context parameter.

3. The scanner context variable must be declared somewhere, either as parameter to the `YYParse`

  procedure or as a local variable to `YYParse`.  This is done using the new `%code decl` directive
  and allows to customize the local declaration part of the `YYParse` generated procedure.

4. We must give visibility of the `YYLVal` and `YYVal` values defined in the scanner context variable.

  Again, we can do this within the `%code decl` directive.

A simple reentrant parser could be defined by using:

```Ada %define api.pure true %lex YYLex (Scanner) %code decl {

     Scanner : Lexer_IO.Context_Type;
     YYLVal  : YYSType renames Scanner.YYLVal;
     YYVal   : YYSType renames Scanner.YYVal;

} ```

However, this simple form is not really useful as you may need to open the file and setup the scanner to read from it. It is probably better to pass the scanner context as parameter to the `YYParse` procedure. For this, we can use the `%define parse.params` directive to control the procedure parameters. The reentrant parser is declared as follows:

```Ada %lex YYLex (Scanner) %define api.pure true %define parse.params "Scanner : in out Lexer_IO.Context_Type" %code decl {

     YYLVal : YYSType renames Scanner.YYLVal;
     YYVal  : YYSType renames Scanner.YYVal;

} ```

To use the reentrant parser and scanner, we only need to declare the scanner context, open the file by using the `Lexer_IO.Open_Input` procedure and call the `YYParse` procedure as follows:

```Ada

 Scanner : Lexer_IO.Context_Type;
 ...
   Lexer_IO.Open_Input (Scanner, "file-to-scan");
   YYParse (Scanner);

```

      1. Grammar examples:

To have a more complete example of a reentrant parser, you may have a look at the following files:

New colour theme with LEA: Solarized Light

7 May 2023 at 21:19
Note for subscribers: if you are interested in my Ada programming articles only, you can use this RSS feed link.

Today, a topic that is guaranteed frivolous: a new colour theme for LEA (a Lightweight Editor for Ada).

Actually, it is a serious subject if you spend a large share of your time at programming (for work, for fun, or both...). For the theoretically brighter season (spring and summer), some prefer to give up the trendy Dark Side mode (which not so long ago was considered as horribly old-fashioned):

...and use a dark-font-bright-background scheme.

However, the contrast is a bit violent. A convincing and carefully designed low-contrast theme is called Solarized Light (which can be turned, with an astute swap within the same 8-colours palette for base colours, into a dark-backgrounded Solarized Dark). The theme is described here.

The Solarized Light theme is incorporated since today in LEA.

Enjoy!


LEA, a Lightweight Editor for Ada, aims to provide an easy, script-world-like, "look & feel" for developing Ada projects of any size and level, while enabling access to full-scale development tools like GNAT.

  • Quick start and reactivity
  • Uses the Scintilla editor widget (like Notepad++)
  • Multi-document
  • Multiple undo's & redo's
  • Multi-line edit, rectangular selections
  • Color themes, easy to switch
  • Duplication of lines and selections
  • Syntax highlighting
  • Parenthesis matching
  • Bookmarks
  • Free, Open-Source
  • Programmed in Ada
  • Includes HAC, the HAC Ada Compiler
  • Includes numerous examples of Ada programs, ready to be run
  • Single executable, runs without installation




Comparison of Bit Array Implementations using C and Ada

 

I found the following programming example in Bit Array in C - Sanfoundry

#include <stdio.h>

#define SIZE (58) /* amount of bits */

#define ARRAY_SIZE(x) (x/8+(!!(x%8)))

char get_bit(char *array, int index);

void toggle_bit(char *array, int index);


void toggle_bit(char *array, int index) {

    array[index / 8] ^= 1 << (index % 8);

}

char get_bit(char *array, int index) {

    return 1 & (array[index / 8] >> (index % 8));

}

int main(void) {

    /* initialize empty array with the right size */

    char x[ARRAY_SIZE(SIZE)] = { 0 };

    int i;

    for (i = 0; i < SIZE; i += 2)

        toggle_bit(x, i);

    toggle_bit(x, 56);

    for (i = 0; i < SIZE; i++)

        printf("%d: %d\n", i, get_bit(x, i));

    return 0;

}

 

The program creates a bit array containing 58 elements. The program works as expected and manages to illustrate very arcane features of the C programming language.

I challenge anybody not familiar with C bit shifting to completely understand how the two void functions named get_bit and toggle_bit actually work.

void toggle_bit(char *array, int index) {

    array[index / 8] ^= 1 << (index % 8);

}

char get_bit(char *array, int index) {

    return 1 & (array[index / 8] >> (index % 8));

}

These functions are examples of the “simplicity” of C programming. While the syntax is compact the semantics of these functions are not. Describing what these two functions do would take several paragraphs of text.

 

As an avid Ada programmer I decided to implement a bit array in Ada and perform the same behaviors on the bit array.

The Ada program is slightly longer, sacrificing compactness for clarity.

-- Ada program to implement a bit array

with Ada.Text_IO; use Ada.Text_IO;

procedure Main is

   type bit is range 0 .. 1;

   type index is range 0 .. 57;

   type bit_array is array (index) of bit with

      Component_Size => 1;

   x   : bit_array := (others => 0);

   Idx : index     := 0;

begin

   for I in x'Range loop

      if I mod 2 = 0 then

         toggle_bit (x, I);

      end if;

   end loop;

   toggle_bit (x, 56);


   for I in x'Range loop

      Put_Line (I'Image & ":" & bit'Image (x (I)));

   end loop;

   Put_Line("Size of bit array is" & Integer'Image(X'Size) & " bits.");

   Put_Line("Length of array is" & Integer'Image(X'Length));

end Main;

 

The Ada programming language measures the size of data types in units of bits while the C program must convert bytes to bits in determining its size.

Compare the corresponding Ada and C code sections:

C:

#define SIZE (58) /* amount of bits */

#define ARRAY_SIZE(x) (x/8+(!!(x%8)))

. . .

char x[ARRAY_SIZE(SIZE)] = { 0 };

 

Ada:

   type bit is range 0 .. 1;

   type index is range 0 .. 57;

   type bit_array is array (index) of bit with

      Component_Size => 1;

. . .

x   : bit_array := (others => 0);

 

The Ada code defines an integral type named bit with 0 and 1 as the only valid values. The Ada code then defines an index range to be used in the array type. Knowing the index range prevents buffer overflows in other parts of the program.

The array type bit_array is declared. C provides no means to define array types, only array instances.

In the C example the array named x is defined as an array of type char, but we are not interested in values of type char. We are interested in the bit values within each char element. The size of the char array must be declared to be the number of char elements needed to store 58 bits. Each char element is 8 bits, therefore 8 char elements are needed to store 58 bits.

The corresponding Ada source code is somewhat more readable. A bit_array is an array of 58 elements indexed by the values 0 through 57. The size of each array component is 1, which in Ada terms is 1 bit.

The variable x in the C example is declared to be an array of 8 char data elements. All elements (and all 64 bits) are initialized to 0.

The variable x in the Ada example is declared to be an instance of bit_array, which is declared to be an array of 58 bits. Each element is initialized to 0. Only the 58 bits are initialized to 0.

The C function named toggle_bit is difficult to understand.

void toggle_bit(char *array, int index) {

    array[index / 8] ^= 1 << (index % 8);

}

 

The purpose of this function is to identify the bit specified by the index parameter and toggle the bit. If the bit is 1 then set it to zero. If the bit is 0 then set it to 1.

The Ada procedure named toggle_bit performs the same behavior with a somewhat more verbose, but also clearer syntax.

   procedure toggle_bit (arr : in out bit_array; Idx : index) is

   begin

      if arr (Idx) = 1 then

         arr (Idx) := 0;

      else

         arr (Idx) := 1;

      end if;

   end toggle_bit;

 

Note that the bits in the bit array are indexed with the same syntax used to index any other Ada array. No special syntax is needed. The compiler writes all the low level bit shifting for the programmer, eliminating the need for an explicit get_bit procedure as found in the C example. The Ada version clearly states toggle logic. If the current value of the bit indexed by Idx is 1 then assign 0 to that bit, otherwise assign 1 to that bit.

Also note the obscurity of the C syntax for passing an array to a function. The array is not actually passed to the function. The name of the array is passed as a pointer to the first element of the array, thus the parameter used to “pass” the array is a pointer to char rather than an array. A pointer to char may be a pointer to the first element of an array of char or it may be a pointer to a char which is not a member of an array of char. C requires the programmer to know whether or not the parameter points to an array. The C syntax also does not specify whether or not the actual parameter passed to this function may be modified by the function.

The Ada procedure specifies the parameter arr is an instance of bit_array. Furthermore, the passing mode “in out” specifies that the value of the array is used and modified by the procedure, specifying that the actual parameter passed to this procedure may be modified by the procedure.

Outputs:

The output of the C program is:

0: 1

1: 0

2: 1

3: 0

4: 1

5: 0

6: 1

7: 0

8: 1

9: 0

10: 1

11: 0

12: 1

13: 0

14: 1

15: 0

16: 1

17: 0

18: 1

19: 0

20: 1

21: 0

22: 1

23: 0

24: 1

25: 0

26: 1

27: 0

28: 1

29: 0

30: 1

31: 0

32: 1

33: 0

34: 1

35: 0

36: 1

37: 0

38: 1

39: 0

40: 1

41: 0

42: 1

43: 0

44: 1

45: 0

46: 1

47: 0

48: 1

49: 0

50: 1

51: 0

52: 1

53: 0

54: 1

55: 0

56: 0

57: 0

 

The output of the Ada program is:

 0: 1

 1: 0

 2: 1

 3: 0

 4: 1

 5: 0

 6: 1

 7: 0

 8: 1

 9: 0

 10: 1

 11: 0

 12: 1

 13: 0

 14: 1

 15: 0

 16: 1

 17: 0

 18: 1

 19: 0

 20: 1

 21: 0

 22: 1

 23: 0

 24: 1

 25: 0

 26: 1

 27: 0

 28: 1

 29: 0

 30: 1

 31: 0

 32: 1

 33: 0

 34: 1

 35: 0

 36: 1

 37: 0

 38: 1

 39: 0

 40: 1

 41: 0

 42: 1

 43: 0

 44: 1

 45: 0

 46: 1

 47: 0

 48: 1

 49: 0

 50: 1

 51: 0

 52: 1

 53: 0

 54: 1

 55: 0

 56: 0

 57: 0

Size of bit array is 64 bits.

Length of array is 58

 

  • 22 April 2023 at 19:27

Poor Quality C Programming Examples

 

C is hard enough without low quality programming examples

I recently read through some of the C programming examples from Sanfoundry. Some of the examples are well constructed while others are very poorly constructed.

One of the bad examples I read is found under the heading of Simple C Programs. The example purports to show how to count the number of vowels and consonants in an input sentence.

The source code for the example follows.

/*

 * C program to read a sentence and count the total number of vowels

 * and consonants in the sentence.

 */

#include <stdio.h>

 

void main()

{

    char sentence[80];

    int i, vowels = 0, consonants = 0, special = 0;

 

    printf("Enter a sentence \n");

    gets(sentence);

    for (i = 0; sentence[i] != '\0'; i++)

    {

        if ((sentence[i] == 'a' || sentence[i] == 'e' || sentence[i] ==

        'i' || sentence[i] == 'o' || sentence[i] == 'u') ||

        (sentence[i] == 'A' || sentence[i] == 'E' || sentence[i] ==

        'I' || sentence[i] == 'O' || sentence[i] == 'U'))

        {

            vowels = vowels + 1;

        }

        else

        {

            consonants = consonants + 1;

        }

        if (sentence[i] =='\t' ||sentence[i] =='\0' || sentence[i] ==' ')

        {

            special = special + 1;

        }

    }

    consonants = consonants - special;

    printf("No. of vowels in %s = %d\n", sentence, vowels);

    printf("No. of consonants in %s = %d\n", sentence, consonants);

}

 

This program does not actually identify a sentence. Instead it merely reads a single input line from stdin. The first conditional carefully identifies the values of the vowels and then assumes anything not a vowel is a consonant. A second conditional makes a weak attempt to find “special” characters that are not letters, but this conditional only looks for tabs, spaces and end of string null characters. It completely ignores punctuation and numeric digits. Sentences often contain punctuation and numeric digits, which are neither vowels nor consonants. The error shows in an inflated count of consonants when an input string contains punctuation and/or numeric digits.

The example program is poorly designed and does not fulfill the requirements expressed on the page C Program to Count the Number of Vowels and Consonants in a Sentence - Sanfoundry.

I wrote an Ada program to achieve the same stated goals. This program uses approximately the same number of lines of source code as the C program while avoiding the problem of miscounting consonants.

-- Ada program to read a line of text and count the number of vowels

-- and consonants in the text

 

with Ada.Text_IO; use Ada.Text_IO;

 

procedure Main is

   subtype Letter is Character with

        Static_Predicate => Letter in 'a' .. 'z' | 'A' .. 'Z';

   subtype Vowel is Character with

     Static_Predicate => Vowel in 'a' | 'e' | 'i' | 'o' | 'u' |

       'A' | 'E' | 'I' | 'O' | 'U';

 

   Line       : String (1 .. 80);

   vowels     : Natural := 0;

   consonants : Natural := 0;

   Length     : Natural;

begin

   Put_Line ("Enter a sentence:");

   Get_Line (Item => Line, Last => Length);

   for I in 1 .. Length loop

      if Line (I) in Letter then

         if Line (I) in Vowel then

            vowels := vowels + 1;

         else

            consonants := consonants + 1;

         end if;

      end if;

   end loop;

   Put_Line

     ("Number of vowels in " & Line (1 .. Length) & " is" & vowels'Image);

   Put_Line

     ("Number of consonants in " & Line (1 .. Length) & " is" &

      consonants'Image);

end Main;

 

The Ada program explicitly defines a subtype of Character containing only English letters. It also defines a subtype of Character defining only the vowels defined in the C program.

The first conditional in the Ada program filters out all characters that are not letters. Within that conditional block the current character is tested for being a vowel. If it is not a vowel then it can only be a consonant. Both the vowel count and the consonant count are correct because the program selected only letters and filtered out all non-letter characters.

C language philosophy

The C language expects the programmer to perform all error checking while only providing very primitive means of specifying the correct data conditions. Historically C has concentrated its design on execution speed at the expense of programmer effort.

Ada language philosophy

The Ada language provides syntax tools to define subtypes of a data type based upon either a range of values or set of values. The Ada example above defines two subtypes of the predefined type Character. One subtype defines the set of all lower case and upper case letters. The other subtype defines the set of lower case and upper case letters identified as vowels.

Each of these subtypes is expressed in a very compact manner in a single expression.

These subtypes express exactly what is a letter and what is a vowel.

Conclusion

The C program never specifies what is a letter. The result is a latent error in program logic which most beginning C students would be unlikely to identify. Eliminating this error would require either a laborious list of letters in C or a list of the numeric ranges representing lower case letters and upper case letters. The first option greatly obscures the meaning of the filtering by its complexity. The second option greatly obscures the meaning of the filtering by using the numeric representation of the letters in a compound conditional expression such as

If ((sentence[i] >= 65 && sentence[i] <= 90) || (sentence[i] >= 97 && sentence[i] <= 122))

The Ada equivalent is accomplished by defining the set of values constituting a letter

subtype Letter is Character with

        Static_Predicate => Letter in 'a' .. 'z' | 'A' .. 'Z';

 

Followed by a simple conditional

if Line (I) in Letter then

The Ada “in” operator returns True if Line (I) is a member of the set of values defined for Letter and False if Line (I) is not a member of the set of values define for Letter.

The set of letters is defined as all the characters in the range starting at ‘a’ and ending at ‘z’ and all the characters in the range starting at ‘A’ and ending at ‘Z’. While this set defines the same values as the C conditional example above, it does so in a very clear manner understandable by programmers of all levels of experience, without resorting to a table of values mapping characters to numeric representations.

The Ada program clearly specifies all the values needed to count vowels and consonants, thereby eliminating the latent defect present in the C program.

  • 5 April 2023 at 05:08

Libadalang, Alire, and macOS

Background

This exercise was prompted by the need for Scripted Testing to be supported by – as far as possible – code generation. The need is for the public or interfacing view of a supporting part (domain) of a system to be able to log calls made to it and to provide values on calls to it, using a scripting mechanism.

Read more »
❌
❌