gnatcoll-db includes dborm.py, need to understand routines in python

Newest questions tagged ada - Stack Overflow

By: fdesp87

15 June 2022 at 16:56

I was using gnatcoll-db, but the limitations made me to rewrite dborm.py in Ada. There are two routines in dborm.py (in python) that I don't understand, specifically compute_table_aliases and fields_count_array.

Any help will be welcome. Of course, the modifications can be shared between all of us.

Edited to complete information:

My project fork is here. I don't want to copy here the complete routines from dborm.py as I don't know exactly the license terms of ACS. dborm.py can be downloaded from https://github.com/AdaCore/gnatcoll-db/tree/master/gnatcoll_db2ada.

The routines that I want to translate to Ada are:

compute_table_aliases, lines 2407 to 2448. Really I don't understand the algorithm.
fields_count_array and fields_count, lines 2372 to 2397. For these routines I have a translation to Ada but when testing, the result is OK except in a few cases.

Here is my translation to Ada:

Max_Depth : constant := 3; type Counts_Array is array (0 .. Max_Depth) of Integer;

  function Fields_Count_Array (T         : Table_Description;
                               Follow_LJ : Boolean;
                               DepthMax  : Integer;
                               FKStop    : Field := No_Field)
                            return Counts_Array is
     FK_Stop : Boolean; -- to be reset before each call to fields_count_
     Depth   : Integer := 0;
     Temp    : Counts_Array;

     function Fields_Count (T         : Table_Description;
                            Depth     : Integer;
                            Follow_LJ : Boolean;
                            FKStop    : Field := No_Field) return Integer;
     function Fields_Count (T         : Table_Description;
                            Depth     : Integer;
                            Follow_LJ : Boolean;
                            FKStop    : Field := No_Field)
                         return Integer is
        Result : Integer;
        procedure Process_FK (FK : in out Field);
        procedure Process_FK (FK : in out Field) is
        begin
           if FK = FKStop then
              FK_Stop := True;
              return;
           end if;
           if FK_Stop then
              return;
           end if;
           if Follow_LJ or (not FK.Can_Be_Null) then
              Result := Result +
                Fields_Count (Pointed_Table (FK), Depth - 1, Follow_LJ);
           end if;
        end Process_FK;
     begin
        Result := Num_Fields (T);
        if Depth > 0 then
           For_Each_FK (T, Process_FK'Access);
        end if;
        return Result;
     end Fields_Count;

  begin
     while Depth <= DepthMax loop
        FK_Stop := False;
        Temp (Depth) := Fields_Count (T, Depth, Follow_LJ, FKStop);
        Depth := Depth + 1;
     end loop;
     return Temp;
  end Fields_Count_Array;

Note that all type definitions come from gnatcoll-sql.

I understand that this is difficult to follow, perphaps may be better if I send a report on the modifications and the complete new Ada package replacing dborm.py. How?

Ada, Java and Python database access

blog.vacs.fr - Tag Ada

By: Stephane Carrez

17 November 2018 at 14:02

The database also has a serious impact on such benchmark and I've measured the following three famous databases:

The purpose of the benchmark is to be able to have a simple comparison between these different databases and different programming languages. For this, a very simple database table is created with only two integer columns one of them being the primary key with auto increment. For example the SQLite table is created with the following SQL:

CREATE table test_simple (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  value INTEGER
)

The database table is filled with a simple INSERT statement which is also benchmarked. The goal is not to demonstrate and show the faster insert method, nor the faster query for a given database or language.

Benchmark

The SQL benchmarks are simple and they are implemented in the same way for each language so that we can get a rough comparison between languages for a given database. The SELECT query retrieves all the database table rows but it includes a LIMIT to restrict the number of rows returned. The query is executed with different values for the limit so that a simple graph can be drawn. For each database, the SQL query looks like:

SELECT * FROM test_simple LIMIT 10

The SQL statements are executed 10000 times for SELECT queries, 1000 times for INSERT and 100 times for DROP/CREATE statements.

Each SQL benchmark program generates an XML file that contains the results as well as resource statistics taken from the /proc/self/stat file. An Ada tool is provided to gather the results, prepare the data for plotting and produce an Excel file with the results.

Python code


def execute(self):
  self.sql = "SELECT * FROM test_simple LIMIT " + str(self.expect_count)
  repeat = self.repeat()
  db = self.connection()
  stmt = db.cursor()

  for i in range(0, repeat):
    stmt.execute(self.sql)
    row_count = 0
    for row in stmt:
      row_count = row_count + 1

    if row_count != self.expect_count:
      raise Exception('Invalid result count:' + str(row_count))

    stmt.close()

Java code

public void execute() throws SQLException {
  PreparedStatement stmt
 = mConnection.prepareStatement("SELECT * FROM test_simple LIMIT " + mExpectCount);

  for (int i = 0; i < mRepeat; i++) {
    if (stmt.execute()) {
      ResultSet rs = stmt.getResultSet();
      int count = 0;
      while (rs.next()) {
        count++;
      }
      rs.close();
      if (count != mExpectCount) {
        throw new SQLException("Invalid result count: " + count);
      }
    } else {
      throw new SQLException("No result");
    }
  }
  stmt.close();
}

Ada code

procedure Select_Table_N (Context : in out Context_Type) is
   DB    : constant ADO.Sessions.Master_Session := Context.Get_Session;
   Count : Natural;
   Stmt  : ADO.Statements.Query_Statement
        := DB.Create_Statement ("SELECT * FROM test_simple LIMIT " & Positive'Image (LIMIT));
begin
   for I in 1 .. Context.Repeat loop
      Stmt.Execute;
      Count := 0;
      while Stmt.Has_Elements loop
         Count := Count + 1;
         Stmt.Next;
      end loop;
      if Count /= LIMIT then
         raise Benchmark_Error with "Invalid result count:" & Natural'Image (Count);
      end if;
   end loop;
end Select_Table_N;

The benchmark were executed on an Intel i7-3770S CPU @3.10Ghz with 8-cores running Ubuntu 16.04 64-bits. The following database versions are used:

MariaDB 10.0.36
PostgreSQL 9.5.14

Resource usage comparison

The first point to note is the fact that both Python and Ada require only one thread to run the SQL benchmark. On its side, the Java VM and database drivers need 20 threads to run.

The second point is not surprising: Java needs 1000% more memory than Ada and Python uses 59% more memory than Ada. What is measured is the the VM RSS size which means this is really the memory that is physically mapped at a given time.

The SQLite database requires less resource than others. The result below don't take into account the resource used by the MariaDB and PostgreSQL servers. At that time, the MariaDB server was using 125Mb and the PostgreSQL server was using 31Mb.

Speed comparison

Looking at the CPU time used to run the benchmark, Ada appears as a clear winner. The Java PostgreSQL driver appears to be very slow at connecting and disconnecting to the database, and this is the main reason why it is slower than others.

It is interesting to note however that both Java and Python provide very good performance results with SQLite database when the number of rows returned by the query is less than 100. With more than 500 rows, Ada becomes faster than others.

With a PostgreSQL database, Ada is always faster even with small result sets.

Conclusion and references

SQLite as an embedded database is used on more than 1 billion of devices as it is included in all smartphones (Android, iOS). It provides very good performances for small databases.

With client-server model, MariaDB and PostgreSQL are suffering a little when compared to SQLite.

For bigger databases, Ada provides the best performance and furthermore it appears to be more predictable that other languages (ie, linear curves).

The Excel result file is available in: sql-benchmark-results.xls

Sources of the benchmarks are available in the following GitHub repository:

https://github.com/stcarrez/sql-benchmark

Normal view