umem

umem provides an abstraction for managing memory of a variety of storage devices in an unified manner.

libumem

A C library with C and C++ API.

libumem

libumem implements an abstraction for managing memory of a variety of storage devices in an unified manner. The core part of libumem is implemented in C for maximum portability but APIs are provided to other programming languages such as C++ that are often easier to use and provide better resource handling.

According to umem memory management abstraction, the data location is described by data address in the given device context. At the libumem C level, the data address is given as uintptr_t value and the device context is represented via C struct object that holds various memory managment methods such as allocation, dealloction, copying, etc. While these methods are specific to each storage device, libumem provides a uniform interface for all supported storage devices. In addition, methods are provided for keeping the memory areas of different storage devices in sync.

libumem provides also a C++ API that usage is highly recommended for its simplicity and robustness in managing memory resources.

libumem public C API

Memory location

Within libumem C API, the data location address is a uintptr_t value. In the case of host RAM, the address value is equal to data pointer value. For other storage devices, the address value may have various interpretations that depends on the storage device as well as the storage device driver library. However, the fundamental assumption of address value is that its increments give valid addresses of the whole data content stored in the device.

Examples

The following program illustrates the usage of libumem as a replacement of stdlib.h malloc/free functionality.

#include "umem.h"

int main()
{
  umemHost host;
  umemHost_ctor(&host);    // construct host RAM context

  // allocate a length 10 array of doubles
  uintptr_t adr = host.calloc(sizeof(double), 10);

  // application specific code follows, for instace, initialize the array
  // as range(10):
  double * ptr = (double*)adr;
  for(int i=0; i<10; ++i) ptr[i] = (double)i;

  // free the allocated memory area
  umem_free(&host, adr);
  umem_dtor(&host);        // destruct host RAM context
}

The following program illustrates the synchronization of data between host RAM and GPU device memory:

#include "umem.h"

int main()
{
  umemHost host;
  umemCuda cuda;
  umemHost_ctor(&host);       // construct host RAM context
  umemCuda_ctor(&cuda, 0);    // construct GPU device 0 context

  // allocate a length 10 array of doubles in GPU device aligned in
  // 128 byte boundaries
  size_t cuda_alignment = 128;
  uintptr_t cuda_adr = cuda.aligned_alloc(cuda_alignment, 10*sizeof(double));

  // establish a connection between host and GPU memories.
  // for allocated host buffer, we'll use alignment 64
  size_t host_alignment = 64;
  uintptr_t host_adr = umem_connect(&cuda, cuda_adr,
                                    10*sizeof(double),
                                    &host, host_alignment);

  // application specific code, for instace, initialize the array
  // as range(10):
  double * ptr = (double*)host_adr;
  for(int i=0; i<10; ++i) ptr[i] = (double)i;
  umem_sync_from(&cuda, cuda_adr, &host, host_adr, 10);
  // now the GPU device memory is initialized as range(10)

  // say, the GPU device changed the allocated data, so we sync the
  // data to host buffer:
  umem_sync_to(&cuda, cuda_adr, &host, host_adr, 10);

  // disconnect the host and GPU device memories, this also frees host buffer
  umem_disconnect(&cuda, cuda_adr, &host, host_adr, host_alignment);

  // free the allocated memory area in the GPU device
  umem_aligned_free(&cuda, cuda_adr);

  umem_dtor(&cuda);        // destruct GPU device context
  umem_dtor(&host);        // destruct host RAM context
}

Note that the only device specific lines in the above example are the constructor calls. The code that follows the constructor calls, are device independent and would function exactly the same when, say, swapping the host and cuda variables.

Supported storage devices

The libumem C-API provides the following device memory context objects (C struct instances):

  • umemHost - stdlib.h based interface to host RAM,
  • umemFile - stdio.h based interface to files,
  • umemCudaHost - CUDA RT based interface to page-locked host RAM.
  • umemCuda - CUDA RT based interface to GPU device memory.
  • umemCudaManaged - CUDA Unified Memory interface to GPU device memory.
  • umemRMM - `RAPIDS Memory Manager`__ based interface to GPU device memory.

Each device memory context has specific initializer (a constructor). However, all other memory management methods such as destructors and copying tools are universal among the all memory storage devices.

umemHost context

The umemHost type defines a host RAM context and it must be initialized using the constructor function umemHost_ctor:

void umemHost_ctor(umemHost * const this);

To destruct the host RAM context object, use umem_dtor destructor function. See below.

memFile context

The umemFile type defines a file context that must be initialized with the following constructor function:

void umemFile_ctor(umemFile * const ctx, const char * filename, const char * mode);

Here filename is the path name of a file that is opened using given mode. The mode string must start with one of the following strings: "r", "r+", "w", "w+", "a", "a+". The mode string may include also the character 'b' to indicate binary file content.

The destructor function umem_dtor closes the file.

memCudaHost context

The umemCudaHost type defines a CUDA RT based page-locked host memory context that must be initialized with the following constructor function:

void umemCudaHost_ctor(umemCuda * const ctx, unsigned int flags);

Here flags is a flags argument to :func:`cudaAllocHost`__ function: cudaHostAllocDefault or cudaHostAllocPortable or cudaHostAllocMapped or cudaHostAllocWriteCombined.

Use umem_dtor function to destruct umemCudaHost object.

memCuda context

The umemCuda type defines a CUDA RT based GPU device memory context that must be initialized with the following constructor function:

void umemCuda_ctor(umemCuda * const ctx, int device);

Here device is GPU device number. The constructor function will set the corresponding GPU device.

While the destructor function umem_dtor does not call any CUDA API functions, it is recommended to use it to destruct umemCuda objects after it is not needed anymore.

memCudaManaged context

The umemCudaManaged type defines a CUDA Unified Memory based GPU device memory context that must be initialized with the following constructor function:

void umemCudaManaged_ctor(umemCuda * const ctx, unsigned int flags,
                          bool async, uintptr_t stream);

Here flags is flags option used in cudaMallocManaged call, 0 value corresponds to cudaMemAttachGlobal. If async is true then asynchronous copy methods will be used with the given strream.

Use umem_dtor function to destruct umemCudaManaged object.

memRMM context

The umemRMM type defines a RAPIDS Memory Manager based GPU device memory context that must be initialized with the following constructor function:

void umemRMM_ctor(umemRMM * const ctx, int device, uintptr_t stream);

Here stream is a CUDA stream handler (0 corresponds to default stream).

Use umem_dtor function to destruct umemRMM object.

Universal API methods
Destructor
void umem_dtor(void const * ctx);

Destructs given memory context.

Memory allocation/deallocation
uintptr_t umem_alloc(void const * ctx, size_t nbytes);

Allocates nbytes of memory in the given storage device. The allocated memory is uninitialized.

uintptr_t umem_calloc(void const * ctx, size_t nmemb, size_t size);

Allocated an array of given size and member byte size nmemb. Returns the starting address of allocated memory. The allocated memory is zero-initialized.

void umem_free(void const * ctx, uintptr_t adr);

Frees the memory that was allocated with methods umem_alloc or umem_calloc.

uintptr_t umem_aligned_alloc(void const * ctx, size_t alignement, size_t size);

Allocates size bytes (plus some extra) of device memory so that the returned starting address is aligned to given alignement value.

uintptr_t umem_free_aligned(void const * ctx, uintptr_t adr);

Frees the memory that was allocated with methods umem_aligned_alloc.

Memory initialization

For initializing device memory with arbitrary data from host RAM, see below how to copy data between devices.

uintptr_t umem_set(void const * ctx, uintptr_t adr, int c, size_t nbytes);

Sets nbytes of device memory with starting address adr to byte value c (the memory area will be filled byte-wise).

Copying data between memory devices
void umem_copy_to(void * const src_ctx, uintptr_t src_adr,
                  void * const dest_ctx, uintptr_t dest_adr,
                  size_t nbytes);

Copies nbytes of source device memory starting at address src_adr to destiniation device memory starting at address dest_adr. The source and destination memory devices can be different or the same. When the source and destination devices are the same then the copying areas should not overlap, otherwise the result will be undetermined.

void umem_copy_from(void * const dest_ctx, uintptr_t dest_adr,
                    void * const src_ctx, uintptr_t src_adr,
                    size_t nbytes);

The inverse of umem_copy_to.

void umem_copy_to_safe(void * const src_ctx, uintptr_t src_adr, size_t src_size,
                       void * const dest_ctx, uintptr_t dest_adr, size_t dest_size,
                       size_t nbytes);
void umem_copy_from_safe(void * const dest_ctx, uintptr_t dest_adr, size_t dest_size,
                         void * const src_ctx, uintptr_t src_adr, size_t src_size,
                         size_t nbytes);

These methods have the same functionality as umem_copy_to and umem_copy_from but include checking the bounds of copying areas. The src_size and dest_size are the memory area widths within the copying process is expected to be carried out. Usually the widths correspond to the size of allocated areas but not necessarily, for instance, when copying subsets of the allocated area.

When the copying process would go out of bounds, e.g. when max(src_size, dest_size) < nbytes, then umemIndexError is set as the status value in the problematic device context and the functions will return without starting the copying process.

Keeping data in sync between memory devices
uintptr_t umem_connect(void * const src_ctx, uintptr_t src_adr,
                       size_t nbytes,
                       void * const dest_ctx, size_t dest_alignment);

Establishes a connection between the two memory devices and returns the paired address in the destination context.

When the memory devices are different or when the source alignement does not match with dest_alignment then nbytes of memory is allocated in destination context and the paired address will be the starting address of allocated memory. Otherwise src_adr will be returned as the paired address.

void umem_disconnect(void * const src_ctx, uintptr_t src_adr,
                     void * const dest_ctx, uintptr_t dest_adr,
                     size_t dest_alignment)

Disconnect the two devices that were connected using umem_connect function, that is, free the memory that umem_connect may have been allocated. The dest_adr must be the paired address returned previously by umem_connect and the other arguments must be the same that was used to call umem_connect.

void umem_sync_to(void * const src, uintptr_t src_adr,
                  void * const dest, uintptr_t dest_adr, size_t nbytes);
void umem_sync_from(void * const dest, uintptr_t dest_adr,
                    void * const src, uintptr_t src_adr, size_t nbytes);

Syncronize the data between the two devices. When the source and destination devices are the same and src_adr == dest_adr then umem_sync_to and umem_sync_from are NOOP.

Note that nbytes must be less or equal to nbytes value that were using in calling umem_connect function.

void umem_sync_to_safe(void * const src_ctx, uintptr_t src_adr, size_t src_size,
                       void * const dest_ctx, uintptr_t dest_adr, size_t dest_size,
                       size_t nbytes);
void umem_sync_from_safe(void * const dest_ctx, uintptr_t dest_adr, size_t dest_size,
                         void * const src_ctx, uintptr_t src_adr, size_t src_size,
                         size_t nbytes);

These functions have the same functionality as umem_sync_to and umem_sync_from but include checking the bounds of synchronizing memory areas. Same rules apply as in umem_copy_to_save and umem_copy_from_save, see above.

Status message handling

The success or failure of calling libumem C-API methods described above can be determined by checking the status of memory context objects that participated in the call.

bool umem_is_ok(void * const ctx);

Returns true if the memory context experienced no failures.

umemStatusType umem_get_status(void * const ctx);

Returns the status flag from the memory context object.

const char * umemStatusType umem_get_message(void * const ctx);

Returns the status message from the memory context object. It will be empty string "" when no message has been set (e.g. when umem_is_ok returns true).

void umem_set_status(void * const ctx,
                     umemStatusType type, const char * message);

Sets the status type and status message to given memory context object. Use this function when you want to propagate the exceptions raised by libumem C-API methods with extra messages to a caller function that will handle the exceptions.

Note that umem_set_status overwrites the previouly set status type, however, the status message will appended to the previouly set status message. The overwrite of status type will be recorded in status message as well.

void umem_clear_status(void * const ctx)

Clears memory context object status content: sets the status to “OK” and clears status messages. One should call umem_clear_status after handling any exceptions raised by the libumem C-API methods.

Utility functions

The following utility functions are used internally in libumem but might be useful for application programs as well.

const char* umem_get_status_name(umemStatusType type);

Returns status type as a string.

inline const char* umem_get_device_name(void * const ctx);

Returns the name of memory context as a string.

bool umem_is_accessible_from(void * const src_ctx, void * const dest_ctx);

Returns true when a address created within source memory context is accessible from the destination memory context. In general, the accessibility relation is non-commutative.

uintptr_t umem_aligned_origin(void const * ctx, uintptr_t adr);

Return the original memory address that was obtained when allocating device memory with umem_aligned_alloc.

libumem internal C API

This section is for developers who want to extend libumem with other memory storage devices or want to understand libumem sources.

libumem design

While libumem is implemented in C, it uses OOP design. This design choice simplifies exposing libumem to other programming languages that support OOP, such as C++, Python, etc, not to mention the advantages of using OOP to implement abstract view of variety of data storage devices in an unified way.

umemVirtual base type

A data storage device is representes as memory context type that is derived from umemVirtual type:

typedef struct {
  struct umemVtbl const *vptr;
  umemDeviceType type;
  umemStatus status;
  void* host;
} umemVirtual;

The member vprt is a pointer to virtual table of methods. This table will be filled in with device specific methods in the constructors of the correspondig derived types:

struct umemVtbl {
  void (*dtor)(umemVirtual * const ctx);
  bool (*is_accessible_from)(umemVirtual * const src_ctx, umemVirtual * const dest_ctx);
  uintptr_t (*alloc)(umemVirtual * const ctx, size_t nbytes);
  uintptr_t (*calloc)(umemVirtual * const ctx, size_t nmemb, size_t size);
  void (*free)(umemVirtual * const ctx, uintptr_t adr);
  uintptr_t (*aligned_alloc)(umemVirtual * const this, size_t alignment, size_t size);
  uintptr_t (*aligned_origin)(umemVirtual * const this, uintptr_t aligned_adr);
  void (*aligned_free)(umemVirtual * const this, uintptr_t aligned_adr);
  void (*set)(umemVirtual * const this, uintptr_t adr, int c, size_t nbytes);
  void (*copy_to)(umemVirtual * const this, uintptr_t src_adr,
                  umemVirtual * const that, uintptr_t dest_adr,
                  size_t nbytes);
  void (*copy_from)(umemVirtual * const this, uintptr_t dest_adr,
                    umemVirtual * const that, uintptr_t src_adr,
                    size_t nbytes);
};

The descriptions of members methods are as follows:

dtor
A destructor of memory context. It should clean-up any resources that are allocted in the memory constructor.
is_accessible_from
A predicate function that should return true when the a address created by src_ctx memory context is accessible from the memory context dest_ctx.
alloc, calloc, free
Device memory allocator and deallocation functions. The allocator functions must return starting address of the allocated memory area. The free function must deallocate the corresponding memory.
aligned_alloc, aligned_free, aligned_origin
Device memory allocator and deallocation functions with specified alignment. The aligned_origin will return the orignal address of allocated memory. As a rule, the address returned by aligned_alloc points to a memory area that is a subset of memory area starting at the address returned by aligned_origin.
set
A function that must initialize memory content with given byte value in c.
copy_to, copy_from
Functions for copying data from one memory context to another memory context. If the storage device driver does not support copying data to another storage device, one can use host RAM as a buffer. It is assumed that the storage device always supports copying data between the device memory and host RAM memory.

The member type specifies the memory device type defined in umemDeviceType enum.

The member status holds the status information of given memory context as umemStatus type:

typedef struct {
  umemStatusType type;
  char* message;
} umemStatus;

Finally, the member host holds a pointer to umemHost object that is used to allocate/deallocate intermediate memory buffers that the storage device specific methods might need.

Adding a new data storage device support to libumem

In the following, the required steps of addning new data storage device support are described. To be specific, let’s assume that we want to add support to a data storage device called “MyMem”.

Defining new type umemMyMem

The template for defining a new memory context type is

typedef struct {
  umemVirtual super;  /// REQUIRED
  umemHost host;      /// REQUIRED
  // Define device specific members:
  ...                 /// OPTIONAL
} umemMyMem;

The umemMyMem must be defined in umem.h.

Adding new device type to umemDeviceType

Add new item umemMyMemDevice to umemDeviceType enum definition in umem.h.

Defining constructor function umemMyMem_ctor

The constructor function of memory context must initialize the virtual table of methods and other members in umemMyMem. The template for the constructor function is

void umemMyMem_ctor(umemMyMem * const ctx,
                    /* device specific parameters: */ ... )
{
  static struct umemVtbl const vtbl = {
    &umemMyMem_dtor_,
    &umemMyMem_is_accessibl_from_,
    &umemMyMem_alloc_,
    &umemVirtual_calloc,
    &umemMyMem_free_,
    &umemVirtual_aligned_alloc,
    &umemVirtual_aligned_origin,
    &umemVirtual_aligned_free,
    &umemMyMem_set_,
    &umemMyMem_copy_to_,
    &umemMyMem_copy_from_,
  };
  umemHost_ctor(&ctx->host);                   // REQUIRED
  umemVirtual_ctor(&ctx->super, &ctx->host);   // REQUIRED
  ctx->super.vptr = &vtbl;                     // REQUIRED
  ctx->super.type = umemMyMemDevice;           // REQUIRED
  // Initialize device specific members:
  ...                                          // OPTIONAL
}

The umemMyMem_ctor function must be implemented in umem_mymem.c and exposed as extern function in umem.h:

UMEM_EXTERN void umemMyMem_ctor(umemMyMem * const ctx,
                                /* device specific parameters */ ...
                                );

In initializing the vtbl methods table, one can use the default implementations for methods like calloc, aligned_alloc, aligned_origin, free which are provided in umem.h and start with the prefix umemVirtual_. If device driver provides the corresponding methods, their usage is highly recommended.

One must provide the implementations to the following device specific methods: dtor, alloc, free, copy_to, copy_from, for instance, in umem_mymem.c file.

Including umem_mymem.c to CMake configuration

Update c/CMakeLists.txt as follows:

...
option(ENABLE_MYMEM "Enable MyMem memory context" ON)
...
if (ENABLE_MYMEM)
  add_definitions("-DHAVE_MYMEM_CONTEXT")
  set(UMEM_SOURCES ${UMEM_SOURCES} umem_mymem.c)
  set(UMEM_INCLUDE_DIRS ${UMEM_INCLUDE_DIRS} <paths to MyMem include directories>)  # OPTIONAL
  set(UMEM_LIBRARIES ${UMEM_LIBRARIES} <MyMem external libraries>)                  # OPTIONAL
endif(ENABLE_MYMEM)
...
Update doc/libumem/c-api.rst

Add :type:`umemMyMem` context section to “Supported storage devices” section above.

libumem public C++ API

Memory location

Within libumem C++ API, the data location address is a umem::Address object that captures in it also the memory context information.

The generated libumem C++ API is available here .

Examples

The following program illustrates the usage of libumem as a replacement of stdlib.h malloc/free functionality.

#include "umem.h"

int main()
{
  umem::Host host;

  {
    // allocate a length 10 array of doubles
    umem::Address adr = host.calloc(sizeof(double), 10);

    // application specific code follows, for instace, initialize the array
    // as range(10):
    double * ptr = (double*)adr;
    for(int i=0; i<10; ++i) ptr[i] = (double)i;

    // leaving the scope frees the adr memory
  }

  // leaving the scope destructs host
}

The following program illustrates the synchronization of data between host RAM and GPU device memory:

#include "umem.h"

int main()
{
  umem::Host host;            // construct host RAM context
  umem::Cuda cuda(0);         // construct GPU device 0 context

  {
    // allocate a length 10 array of doubles in GPU device aligned in
    // 128 byte boundaries
    size_t cuda_alignment = 128;
    umem::Address cuda_adr = cuda.aligned_alloc(cuda_alignment, 10*sizeof(double));

    // establish a connection between host and GPU memories.
    // for allocated host buffer, we'll use alignment 64
    size_t host_alignment = 64;
    umem::Address host_adr = cuda_adr.connect(10*sizeof(double), host_alignment);

    // application specific code, for instace, initialize the array
    // as range(10):
    double * ptr = (double*)host_adr;
    for(int i=0; i<10; ++i) ptr[i] = (double)i;
    host_adr.sync(10*sizeof(double));
    // now the GPU device memory is initialized as range(10)

    // say, the GPU device changed the allocated data, so we sync the
    // data to host buffer:
    host_adr.update(10*sizeof(double));

    // leaving the scope frees host_adr and cuda_adr
  }

  // leaving the scope destructs cuda and host
}

Note that the only device specific lines in the above example are the constructor calls. The code that follows the constructor calls, are device independent and would function exactly the same when, say, swapping the host and cuda variables.

Releases

Releases

v0.1.0 (January ?th 2019)

First release.