Forpedo

17 September 2007

Originally published on macresearch.org, around 2007. Reproduced from the author's archive; some links may no longer resolve.

Advanced Fortran: Polymorphism and Generic Programming

As a regular user of languages like Objective-C, C++, and Python, I have at times become frustrated by the lack of features in Fortran. Although Fortran 90 brought the language into the modern age, adding user-defined types amongst other things, if you are looking for the sort of high-level features you find in just about every other programming language, the only option is to wait for wide support of the new Fortran 2003 standard, which could still be many years away. To bridge the gap, I developed a Python preprocessor called Forpedo, which adds a few advanced features to Fortran 90/95.

Forpedo supports two programming paradigms: generic programming and object-oriented programming. Generic programming is a paradigm in which a single piece of code can be used with multiple data types. For example, code for a linked list could be used to create lists of integers, reals, or even strings. The same source code is used to define each list; the compiler produces different instances of the code, substituting concrete data types (e.g., integer, real) as appropriate to generate a list capable of storing the required data. Without generic programming, a programmer would typically need to make virtually identical copies of the linked list source code for each data type used.

Object-Oriented programming is supported through polymorphic types, which are called protocols in Forpedo terminology. A protocol is very similar to a Java interface, for those familiar with that language. It defines a set of procedures that a conforming type must include. The term ‘protocol’ derives from Objective-C, and has been adopted because ‘interface’ is already used in Fortran.

Generic Programming

To give you a rough idea how it works, I will present a few simple examples. Here is some generic forpedo code:

#definetype WorldIdType Int integer 
#definetype WorldIdType Real real 
module HelloWorld<WorldIdType>

    @WorldIdType :: worldId<WorldIdType>

contains 
    subroutine setId<WorldIdType>(id) 
        @WorldIdType :: id 
        worldId<WorldIdType> = id 
    end subroutine

    subroutine print<WorldIdType>() 
        print *,'Hello World "',worldId<WorldIdType>,'"' 
    end subroutine

end module

The definetype preprocessor directive is used to define generic types, and to stipulate the concrete data types that will be substituted in the eventual Fortran code. In the code above, there is one generic type: WorldIdType. In this example, WorldIdType is a placeholder for the data type of a variable (i.e., id) that is used to store a world identifier. The definetype directive takes 3 arguments: the first is the generic type label; the second is a tag that is used by Forpedo to generate unique names; and the third is one of the Fortran types that will be substituted for the generic placeholder in the output Fortran. There is one definetype directive for each concrete Fortran type that will be substituted for any given generic type. In this case, two directives are included for WorldIdType, one that results in code for an integer world identifier, and one for a real world identifier.

Whenever the generic type is needed in the Forpedo code, the type label is given, prepended with an @ symbol. For example, to define the type of the id parameter in the first subroutine, the following appears:

@WorldIdType :: id

When this code is run though Forpedo, it will be substituted with a concrete type. For example, in the case of an integer world identifier, it will become

integer :: id

The tag supplied in the definetype directive is used to avoid naming conflicts. This process is known as name mangling, and is usually performed by the compiler. A C++ compiler, for example, would modify the name of a class template in order to generate a unique class name for any given combination of data types.

With Forpedo, the programmer is responsible for determining where naming clashes could occur, and for avoiding them by inserting a tag placeholder. The tag placeholder is the name of the generic type, enclosed in triangular brackets. You will typically need to use the tag for any named entity with global scope. A module name is a typical example:

module HelloWorld<WorldIdType>

The tag placeholder <WorldIdType> will be replaced in the generated Fortran code with the tag corresponding to a concrete data type. For example, when WorldIdType is integer, the module name will be HelloWorldInt, because the integer data type corresponds to the Int tag.

The tag placeholder <WorldIdType> has also been used to mangle the names of other globally accessible entities, such as the procedure names, and the name of the variable included in the module data section. All of these placeholders get substituted with the same tag whenever an instance of the generic code is formed.

Running Forpedo

You can download Forpedo at the forpedo web page (link no longer available). It requires Python 2.4 to use. Running it is simple enough: you pipe the forpedo code into standard input, and Fortran 90 comes out on standard output.

forpedo.py < helloworld.f90t > helloworld.f90

In the example above, helloworld.f90 should look like this

module HelloWorldInt

    integer :: worldIdInt

contains

    subroutine setIdInt(id) 
        integer :: id 
        worldIdInt = id 
    end subroutine

    subroutine printInt() 
        print *,'Hello World "',worldIdInt,'"' 
    end subroutine

end module

module HelloWorldReal

    real :: worldIdReal

contains

    subroutine setIdReal(id) 
        real :: id 
        worldIdReal = id 
    end subroutine

    subroutine printReal() 
        print *,'Hello World "',worldIdReal,'"' 
    end subroutine

end module

This code includes two instances of the generic Forpedo code. The Fortran code in each case is virtually identical, with only the generic type placeholders and tags having been replaced to produce compliant Fortran. The potential of generic programming to reduce code duplication should be fairly evident, even from this simple example. There is around half as much Forpedo code as Fortran code. Not only that, but if you form another instance of the generic code for a different concrete data type, you only need add one line to the Forpedo code to induce a 50% increase in Fortran code.

To test the code, you can compile the helloworld.f90 file with the following main program

program HelloWorld 
    use HelloWorldInt 
    use HelloWorldReal 
    call setIdInt(3) 
    call printInt() 
    call setIdReal(3.0) 
    call printReal() 
end program

Running the resulting executable should result in the following output

Hello World " 3 " 
Hello World " 3.000000 "

Run-Time Polymorphism

The protocol directive is used to define a polymorphic type with forpedo. A protocol defines the subroutines and functions that a type must implement. Here is an example of a protocol declaration:

#protocol AnimalProtocol AnimalProtocolMod

#useblock
use SomeModule
#enduseblock

#method makeSound
type(AnimalProtocol), intent(in) :: self
#endmethod

#method increaseAgeInAnimalYears increase
type(AnimalProtocol), intent(inout) :: self
integer, intent(in)                 :: increase
#endmethod

#funcmethod increaseAgeAndReturnValue increase,returnVar
type(AnimalProtocol), intent(inout) :: self
integer, intent(in)                 :: increase
integer                             :: returnVar
#endmethod

#conformingtype Dog DogMod
#conformingtype Cat CatMod

#endprotocol

This declares a protocol that will be contained in the module AnimalProtocolMod, which will be generated by Forpedo. The Fortran type corresponding to the polymorphic type will be AnimalProtocol.

The method/funcmethod/endmethod directives, which must appear in the protocol block, declare the interfaces of subroutines and functions that conforming types must implement. In this case, the conforming types must have a makeSound and increaseAgeInAnimalYears subroutine, and a increaseAgeAndReturnValue function.

The arguments list for each routine is given on the method directive line after the method name. This list should not include the first argument, which is assumed to be the instance ‘self’ (equivalent to ‘this’ in C++ and Java). Note that the declaration of ‘self’ is included in the method/funcmethod/endmethod block, so that you can assign attributes to it (eg intent(in)).

The types that conform to the protocol are given explicitly in the protocol block, using the conformingtype directive. This directive requires the Fortran user-defined type that conforms to the protocol, and the module that declares the type. Each conforming type must be declared in a separate module.

The protocol above, having been run through Forpedo, can be used like this:

program Main
  use AnimalProtocolMod
  use DogMod
  use CatMod
  type (Dog), pointer   :: d
  type (Cat), pointer   :: c
  type (AnimalProtocol) :: p

  allocate(d,c)

  ! Assign protocol to Dog
  p = d

  ! Pass pointer to a subroutine that knows nothing about the concrete type Dog
  call doStuffWithAnimal(p)

  ! Repeat for Cat. Results will be different, though subroutine call is the same.
  p = c
  call doStuffWithAnimal(p)

contains

  subroutine doStuffWithAnimal(a)
    type (AnimalProtocol) :: a
    call makeSound(a)
    call increaseAgeInAnimalYears(a, 2)
  end subroutine

end program

Note that the subroutine doStuffWithAnimal is able to call subroutines belonging to Dog and Cat without having any direct knowledge of those types. Information about the concrete type is stored by the protocol, and the correct subroutine invoked via the AnimalProtocol type.

All branching required to select the correct subroutine is encapsulated in the protocol, and generated by Forpedo, making the code easier to read and extend. Adding a new conforming type to the protocol only requires a single line to be added, and changes often do not need to be made to existing code. For example, adding a type Tiger to the program, and making it conform to the protocol, would not require any changes to doStuffWithAnimal. This is not generally true in traditional procedural programs, which require wholesale changes to the code, because the branching blocks are typically distributed throughout the code base.