umem¶
umem provides an abstraction for managing memory of a variety of storage devices in an unified manner.
libumem¶
A C library with C and C++ API.
libumem¶
libumem implements an abstraction for managing memory of a variety of storage devices in an unified manner. The core part of libumem is implemented in C for maximum portability but APIs are provided to other programming languages such as C++ that are often easier to use and provide better resource handling.
According to umem memory management abstraction, the data location is described by data address in the given device context. At the libumem C level, the data address is given as uintptr_t value and the device context is represented via C struct object that holds various memory managment methods such as allocation, dealloction, copying, etc. While these methods are specific to each storage device, libumem provides a uniform interface for all supported storage devices. In addition, methods are provided for keeping the memory areas of different storage devices in sync.
libumem provides also a C++ API that usage is highly recommended for its simplicity and robustness in managing memory resources.
libumem public C API¶
Memory location¶
Within libumem C API, the data location address is a uintptr_t
value. In the case of host RAM, the address value is equal to data
pointer value. For other storage devices, the address value may have
various interpretations that depends on the storage device as well as
the storage device driver library. However, the fundamental assumption
of address value is that its increments give valid addresses of the
whole data content stored in the device.
Examples¶
The following program illustrates the usage of libumem as a
replacement of stdlib.h
malloc/free functionality.
#include "umem.h"
int main()
{
umemHost host;
umemHost_ctor(&host); // construct host RAM context
// allocate a length 10 array of doubles
uintptr_t adr = host.calloc(sizeof(double), 10);
// application specific code follows, for instace, initialize the array
// as range(10):
double * ptr = (double*)adr;
for(int i=0; i<10; ++i) ptr[i] = (double)i;
// free the allocated memory area
umem_free(&host, adr);
umem_dtor(&host); // destruct host RAM context
}
The following program illustrates the synchronization of data between host RAM and GPU device memory:
#include "umem.h"
int main()
{
umemHost host;
umemCuda cuda;
umemHost_ctor(&host); // construct host RAM context
umemCuda_ctor(&cuda, 0); // construct GPU device 0 context
// allocate a length 10 array of doubles in GPU device aligned in
// 128 byte boundaries
size_t cuda_alignment = 128;
uintptr_t cuda_adr = cuda.aligned_alloc(cuda_alignment, 10*sizeof(double));
// establish a connection between host and GPU memories.
// for allocated host buffer, we'll use alignment 64
size_t host_alignment = 64;
uintptr_t host_adr = umem_connect(&cuda, cuda_adr,
10*sizeof(double),
&host, host_alignment);
// application specific code, for instace, initialize the array
// as range(10):
double * ptr = (double*)host_adr;
for(int i=0; i<10; ++i) ptr[i] = (double)i;
umem_sync_from(&cuda, cuda_adr, &host, host_adr, 10);
// now the GPU device memory is initialized as range(10)
// say, the GPU device changed the allocated data, so we sync the
// data to host buffer:
umem_sync_to(&cuda, cuda_adr, &host, host_adr, 10);
// disconnect the host and GPU device memories, this also frees host buffer
umem_disconnect(&cuda, cuda_adr, &host, host_adr, host_alignment);
// free the allocated memory area in the GPU device
umem_aligned_free(&cuda, cuda_adr);
umem_dtor(&cuda); // destruct GPU device context
umem_dtor(&host); // destruct host RAM context
}
Note that the only device specific lines in the above example are the
constructor calls. The code that follows the constructor calls, are
device independent and would function exactly the same when, say,
swapping the host
and cuda
variables.
Supported storage devices¶
The libumem C-API provides the following device memory context
objects (C struct
instances):
umemHost
- stdlib.h based interface to host RAM,umemFile
- stdio.h based interface to files,umemCudaHost
- CUDA RT based interface to page-locked host RAM.umemCuda
- CUDA RT based interface to GPU device memory.umemCudaManaged
- CUDA Unified Memory interface to GPU device memory.umemRMM
- `RAPIDS Memory Manager`__ based interface to GPU device memory.
Each device memory context has specific initializer (a constructor). However, all other memory management methods such as destructors and copying tools are universal among the all memory storage devices.
umemHost
context¶
The umemHost
type defines a host RAM context and it must be
initialized using the constructor function umemHost_ctor
:
void umemHost_ctor(umemHost * const this);
To destruct the host RAM context object, use umem_dtor
destructor function. See below.
memFile
context¶
The umemFile type defines a file context that must be initialized with the following constructor function:
void umemFile_ctor(umemFile * const ctx, const char * filename, const char * mode);
Here filename
is the path name of a file that is opened using
given mode
. The mode
string must start with one of the
following strings: "r"
, "r+"
, "w"
, "w+"
, "a"
,
"a+"
. The mode
string may include also the character
'b'
to indicate binary file content.
The destructor function umem_dtor
closes the file.
memCudaHost
context¶
The umemCudaHost
type defines a CUDA RT based page-locked host
memory context that must be initialized with the following constructor
function:
void umemCudaHost_ctor(umemCuda * const ctx, unsigned int flags);
Here flags
is a flags argument to :func:`cudaAllocHost`__
function: cudaHostAllocDefault
or
cudaHostAllocPortable
or cudaHostAllocMapped
or
cudaHostAllocWriteCombined
.
Use umem_dtor
function to destruct umemCudaHost
object.
memCuda
context¶
The umemCuda
type defines a CUDA RT based GPU device memory
context that must be initialized with the following constructor
function:
void umemCuda_ctor(umemCuda * const ctx, int device);
Here device
is GPU device number. The constructor function will set
the corresponding GPU device.
While the destructor function umem_dtor
does not call any CUDA
API functions, it is recommended to use it to destruct umemCuda
objects after it is not needed anymore.
memCudaManaged
context¶
The umemCudaManaged
type defines a CUDA Unified Memory based
GPU device memory context that must be initialized with the following
constructor function:
void umemCudaManaged_ctor(umemCuda * const ctx, unsigned int flags,
bool async, uintptr_t stream);
Here flags
is flags option used in cudaMallocManaged
call, 0 value corresponds to cudaMemAttachGlobal
. If
async
is true then asynchronous copy methods will be used with
the given strream
.
Use umem_dtor
function to destruct umemCudaManaged
object.
memRMM
context¶
The umemRMM
type defines a RAPIDS Memory Manager based GPU
device memory context that must be initialized with the following
constructor function:
void umemRMM_ctor(umemRMM * const ctx, int device, uintptr_t stream);
Here stream
is a CUDA stream handler (0 corresponds to default stream).
Use umem_dtor
function to destruct umemRMM
object.
Universal API methods¶
Memory allocation/deallocation¶
uintptr_t umem_alloc(void const * ctx, size_t nbytes);
Allocates nbytes
of memory in the given storage device. The
allocated memory is uninitialized.
uintptr_t umem_calloc(void const * ctx, size_t nmemb, size_t size);
Allocated an array of given size
and member byte size
nmemb
. Returns the starting address of allocated memory. The
allocated memory is zero-initialized.
void umem_free(void const * ctx, uintptr_t adr);
Frees the memory that was allocated with methods
umem_alloc
or umem_calloc
.
uintptr_t umem_aligned_alloc(void const * ctx, size_t alignement, size_t size);
Allocates size
bytes (plus some extra) of device memory so
that the returned starting address is aligned to given
alignement
value.
uintptr_t umem_free_aligned(void const * ctx, uintptr_t adr);
Frees the memory that was allocated with methods
umem_aligned_alloc
.
Memory initialization¶
For initializing device memory with arbitrary data from host RAM, see below how to copy data between devices.
uintptr_t umem_set(void const * ctx, uintptr_t adr, int c, size_t nbytes);
Sets nbytes
of device memory with starting address adr
to byte value c
(the memory area will be filled byte-wise).
Copying data between memory devices¶
void umem_copy_to(void * const src_ctx, uintptr_t src_adr,
void * const dest_ctx, uintptr_t dest_adr,
size_t nbytes);
Copies nbytes
of source device memory starting at address
src_adr
to destiniation device memory starting at address
dest_adr
. The source and destination memory devices can be
different or the same. When the source and destination devices are the
same then the copying areas should not overlap, otherwise the result
will be undetermined.
void umem_copy_from(void * const dest_ctx, uintptr_t dest_adr,
void * const src_ctx, uintptr_t src_adr,
size_t nbytes);
The inverse of umem_copy_to
.
void umem_copy_to_safe(void * const src_ctx, uintptr_t src_adr, size_t src_size,
void * const dest_ctx, uintptr_t dest_adr, size_t dest_size,
size_t nbytes);
void umem_copy_from_safe(void * const dest_ctx, uintptr_t dest_adr, size_t dest_size,
void * const src_ctx, uintptr_t src_adr, size_t src_size,
size_t nbytes);
These methods have the same functionality as umem_copy_to
and
umem_copy_from
but include checking the bounds of copying
areas. The src_size
and dest_size
are the memory area
widths within the copying process is expected to be carried
out. Usually the widths correspond to the size of allocated areas but
not necessarily, for instance, when copying subsets of the allocated
area.
When the copying process would go out of bounds, e.g. when
max(src_size, dest_size) < nbytes
, then umemIndexError
is set as the status value in the problematic device context and the
functions will return without starting the copying process.
Keeping data in sync between memory devices¶
uintptr_t umem_connect(void * const src_ctx, uintptr_t src_adr,
size_t nbytes,
void * const dest_ctx, size_t dest_alignment);
Establishes a connection between the two memory devices and returns the paired address in the destination context.
When the memory devices are different or when the source alignement
does not match with dest_alignment
then nbytes
of
memory is allocated in destination context and the paired address will
be the starting address of allocated memory. Otherwise src_adr
will be returned as the paired address.
void umem_disconnect(void * const src_ctx, uintptr_t src_adr,
void * const dest_ctx, uintptr_t dest_adr,
size_t dest_alignment)
Disconnect the two devices that were connected using
umem_connect
function, that is, free the memory that
umem_connect
may have been allocated. The dest_adr
must be the paired address returned previously by umem_connect
and the other arguments must be the same that was used to call
umem_connect
.
void umem_sync_to(void * const src, uintptr_t src_adr,
void * const dest, uintptr_t dest_adr, size_t nbytes);
void umem_sync_from(void * const dest, uintptr_t dest_adr,
void * const src, uintptr_t src_adr, size_t nbytes);
Syncronize the data between the two devices. When the source and
destination devices are the same and src_adr == dest_adr
then
umem_sync_to
and umem_sync_from
are NOOP.
Note that nbytes
must be less or equal to nbytes
value
that were using in calling umem_connect
function.
void umem_sync_to_safe(void * const src_ctx, uintptr_t src_adr, size_t src_size,
void * const dest_ctx, uintptr_t dest_adr, size_t dest_size,
size_t nbytes);
void umem_sync_from_safe(void * const dest_ctx, uintptr_t dest_adr, size_t dest_size,
void * const src_ctx, uintptr_t src_adr, size_t src_size,
size_t nbytes);
These functions have the same functionality as umem_sync_to
and umem_sync_from
but include checking the bounds of
synchronizing memory areas. Same rules apply as in
umem_copy_to_save
and umem_copy_from_save
, see above.
Status message handling¶
The success or failure of calling libumem C-API methods described above can be determined by checking the status of memory context objects that participated in the call.
bool umem_is_ok(void * const ctx);
Returns true
if the memory context experienced no failures.
umemStatusType umem_get_status(void * const ctx);
Returns the status flag from the memory context object.
const char * umemStatusType umem_get_message(void * const ctx);
Returns the status message from the memory context object. It will be
empty string ""
when no message has been set (e.g. when
umem_is_ok
returns true
).
void umem_set_status(void * const ctx,
umemStatusType type, const char * message);
Sets the status type
and status message
to given
memory context object. Use this function when you want to propagate
the exceptions raised by libumem C-API methods with extra messages to
a caller function that will handle the exceptions.
Note that umem_set_status
overwrites the previouly set status
type, however, the status message will appended to the previouly set
status message. The overwrite of status type will be recorded in
status message as well.
void umem_clear_status(void * const ctx)
Clears memory context object status content: sets the status to “OK”
and clears status messages. One should call umem_clear_status
after handling any exceptions raised by the libumem C-API methods.
Utility functions¶
The following utility functions are used internally in libumem but might be useful for application programs as well.
const char* umem_get_status_name(umemStatusType type);
Returns status type
as a string.
inline const char* umem_get_device_name(void * const ctx);
Returns the name of memory context as a string.
bool umem_is_accessible_from(void * const src_ctx, void * const dest_ctx);
Returns true
when a address created within source memory
context is accessible from the destination memory context. In general,
the accessibility relation is non-commutative.
uintptr_t umem_aligned_origin(void const * ctx, uintptr_t adr);
Return the original memory address that was obtained when allocating
device memory with umem_aligned_alloc
.
libumem internal C API¶
This section is for developers who want to extend libumem with other memory storage devices or want to understand libumem sources.
libumem design¶
While libumem is implemented in C, it uses OOP design. This design choice simplifies exposing libumem to other programming languages that support OOP, such as C++, Python, etc, not to mention the advantages of using OOP to implement abstract view of variety of data storage devices in an unified way.
umemVirtual
base type¶
A data storage device is representes as memory context type that is
derived from umemVirtual
type:
typedef struct {
struct umemVtbl const *vptr;
umemDeviceType type;
umemStatus status;
void* host;
} umemVirtual;
The member vprt
is a pointer to virtual table of methods. This
table will be filled in with device specific methods in the
constructors of the correspondig derived types:
struct umemVtbl {
void (*dtor)(umemVirtual * const ctx);
bool (*is_accessible_from)(umemVirtual * const src_ctx, umemVirtual * const dest_ctx);
uintptr_t (*alloc)(umemVirtual * const ctx, size_t nbytes);
uintptr_t (*calloc)(umemVirtual * const ctx, size_t nmemb, size_t size);
void (*free)(umemVirtual * const ctx, uintptr_t adr);
uintptr_t (*aligned_alloc)(umemVirtual * const this, size_t alignment, size_t size);
uintptr_t (*aligned_origin)(umemVirtual * const this, uintptr_t aligned_adr);
void (*aligned_free)(umemVirtual * const this, uintptr_t aligned_adr);
void (*set)(umemVirtual * const this, uintptr_t adr, int c, size_t nbytes);
void (*copy_to)(umemVirtual * const this, uintptr_t src_adr,
umemVirtual * const that, uintptr_t dest_adr,
size_t nbytes);
void (*copy_from)(umemVirtual * const this, uintptr_t dest_adr,
umemVirtual * const that, uintptr_t src_adr,
size_t nbytes);
};
The descriptions of members methods are as follows:
dtor
- A destructor of memory context. It should clean-up any resources that are allocted in the memory constructor.
is_accessible_from
- A predicate function that should return
true
when the a address created bysrc_ctx
memory context is accessible from the memory contextdest_ctx
. alloc
,calloc
,free
- Device memory allocator and deallocation functions. The
allocator functions must return starting address of the
allocated memory area. The
free
function must deallocate the corresponding memory. aligned_alloc
,aligned_free
,aligned_origin
- Device memory allocator and deallocation functions with
specified alignment. The
aligned_origin
will return the orignal address of allocated memory. As a rule, the address returned byaligned_alloc
points to a memory area that is a subset of memory area starting at the address returned byaligned_origin
. set
- A function that must initialize memory content with given byte
value in
c
. copy_to
,copy_from
- Functions for copying data from one memory context to another memory context. If the storage device driver does not support copying data to another storage device, one can use host RAM as a buffer. It is assumed that the storage device always supports copying data between the device memory and host RAM memory.
The member type
specifies the memory device type defined in
umemDeviceType
enum.
The member status
holds the status information of given memory
context as umemStatus
type:
typedef struct {
umemStatusType type;
char* message;
} umemStatus;
Finally, the member host
holds a pointer to umemHost
object that is used to allocate/deallocate intermediate memory buffers
that the storage device specific methods might need.
Adding a new data storage device support to libumem¶
In the following, the required steps of addning new data storage device support are described. To be specific, let’s assume that we want to add support to a data storage device called “MyMem”.
Defining new type umemMyMem
¶
The template for defining a new memory context type is
typedef struct {
umemVirtual super; /// REQUIRED
umemHost host; /// REQUIRED
// Define device specific members:
... /// OPTIONAL
} umemMyMem;
The umemMyMem
must be defined in umem.h
.
Adding new device type to umemDeviceType
¶
Add new item umemMyMemDevice
to umemDeviceType
enum definition in umem.h
.
Defining constructor function umemMyMem_ctor
¶
The constructor function of memory context must initialize the virtual
table of methods and other members in umemMyMem
. The template
for the constructor function is
void umemMyMem_ctor(umemMyMem * const ctx,
/* device specific parameters: */ ... )
{
static struct umemVtbl const vtbl = {
&umemMyMem_dtor_,
&umemMyMem_is_accessibl_from_,
&umemMyMem_alloc_,
&umemVirtual_calloc,
&umemMyMem_free_,
&umemVirtual_aligned_alloc,
&umemVirtual_aligned_origin,
&umemVirtual_aligned_free,
&umemMyMem_set_,
&umemMyMem_copy_to_,
&umemMyMem_copy_from_,
};
umemHost_ctor(&ctx->host); // REQUIRED
umemVirtual_ctor(&ctx->super, &ctx->host); // REQUIRED
ctx->super.vptr = &vtbl; // REQUIRED
ctx->super.type = umemMyMemDevice; // REQUIRED
// Initialize device specific members:
... // OPTIONAL
}
The umemMyMem_ctor
function must be implemented in
umem_mymem.c
and exposed as extern function in umem.h
:
UMEM_EXTERN void umemMyMem_ctor(umemMyMem * const ctx,
/* device specific parameters */ ...
);
In initializing the vtbl
methods table, one can use the
default implementations for methods like calloc
,
aligned_alloc
, aligned_origin
, free
which are
provided in umem.h
and start with the prefix
umemVirtual_
. If device driver provides the corresponding
methods, their usage is highly recommended.
One must provide the implementations to the following device specific
methods: dtor
, alloc
, free
, copy_to
,
copy_from
, for instance, in umem_mymem.c
file.
Including umem_mymem.c
to CMake configuration¶
Update c/CMakeLists.txt
as follows:
...
option(ENABLE_MYMEM "Enable MyMem memory context" ON)
...
if (ENABLE_MYMEM)
add_definitions("-DHAVE_MYMEM_CONTEXT")
set(UMEM_SOURCES ${UMEM_SOURCES} umem_mymem.c)
set(UMEM_INCLUDE_DIRS ${UMEM_INCLUDE_DIRS} <paths to MyMem include directories>) # OPTIONAL
set(UMEM_LIBRARIES ${UMEM_LIBRARIES} <MyMem external libraries>) # OPTIONAL
endif(ENABLE_MYMEM)
...
Update doc/libumem/c-api.rst
¶
Add :type:`umemMyMem` context
section to “Supported storage devices” section above.
libumem public C++ API¶
Memory location¶
Within libumem C++ API, the data location address is a
umem::Address
object that captures in it also the memory
context information.
The generated libumem C++ API is available here .
Examples¶
The following program illustrates the usage of libumem as a
replacement of stdlib.h
malloc/free functionality.
#include "umem.h"
int main()
{
umem::Host host;
{
// allocate a length 10 array of doubles
umem::Address adr = host.calloc(sizeof(double), 10);
// application specific code follows, for instace, initialize the array
// as range(10):
double * ptr = (double*)adr;
for(int i=0; i<10; ++i) ptr[i] = (double)i;
// leaving the scope frees the adr memory
}
// leaving the scope destructs host
}
The following program illustrates the synchronization of data between host RAM and GPU device memory:
#include "umem.h"
int main()
{
umem::Host host; // construct host RAM context
umem::Cuda cuda(0); // construct GPU device 0 context
{
// allocate a length 10 array of doubles in GPU device aligned in
// 128 byte boundaries
size_t cuda_alignment = 128;
umem::Address cuda_adr = cuda.aligned_alloc(cuda_alignment, 10*sizeof(double));
// establish a connection between host and GPU memories.
// for allocated host buffer, we'll use alignment 64
size_t host_alignment = 64;
umem::Address host_adr = cuda_adr.connect(10*sizeof(double), host_alignment);
// application specific code, for instace, initialize the array
// as range(10):
double * ptr = (double*)host_adr;
for(int i=0; i<10; ++i) ptr[i] = (double)i;
host_adr.sync(10*sizeof(double));
// now the GPU device memory is initialized as range(10)
// say, the GPU device changed the allocated data, so we sync the
// data to host buffer:
host_adr.update(10*sizeof(double));
// leaving the scope frees host_adr and cuda_adr
}
// leaving the scope destructs cuda and host
}
Note that the only device specific lines in the above example are the
constructor calls. The code that follows the constructor calls, are
device independent and would function exactly the same when, say,
swapping the host
and cuda
variables.