malloc or memory pool without alignment

I am doing C code, and I have several (millions) of malloc's each requesting for 20-30 bytes of memory.

As result the overhead of both GNU C Malloc and Jemalloc go to 40-50%. DL Malloc works better, but still ~30% overhead.

Is there a way to do malloc without any alignment / padding? I understand this will be slower and probably on some CPU's C need to "reconstruct" data from different words, but I am ready to trade speed for memory usage.

Instead of malloc I can use memory pool too, as long as it can reuse memory after free().

Answers


malloc() et al. are required by the C standard to provide sufficiently aligned pointers for any data type, so to reduce the allocation overheads, you need to implement your own allocator (or use some existing one).

One possibility would be to have a pool chain for each possible allocation size. Within each pool, you can use a bit map to track which items are allocated and which free. The overhead is just over one bit per item, but you may end up having lots of pool chains; this tends to make free() slow, because it has to search for the correct pool.

A better possibility, I think, is to create a pool chain for small allocations. Within each pool, the small chunks form a linked list, with a single unsigned char keeping track of the length and allocation status. This yields unaligned pointers, but at a overhead of just over one char.

For example:

#include <stdlib.h>
#include <string.h>
#include <limits.h>
#include <errno.h>
#include <stdio.h>

#define SMALL_POOL_SIZE 131072
#define SMALL_LIMIT     ((UCHAR_MAX + 1U) / 2U)

struct small_pool {
    struct small_pool *next;
    unsigned int       size;
    unsigned int       used;
    unsigned char      data[];
};

static struct small_pool *list = NULL;

Within the data[] member, the first char is 1 to SMALL_LIMIT-1 for a free chunk of that size, or SMALL_LIMIT+1 or larger for an used chunk of 1 char or more. Zero and SMALL_LIMIT indicate errors. A space-aggressive allocator could be implemented for example as

void *small_alloc(const size_t size)
{
    struct small_pool *pool;

    if (size < 1 || size >= SMALL_LIMIT) {
        errno = EINVAL;
        return NULL;
    }

    pool = list;

    while (pool != NULL) {

        /* Unused space in the pool? */
        if (pool->used + size < pool->size) {
            unsigned char *const ptr = pool->data + pool->used;

            /* Grab it. */
            pool->used += size + 1U;

            /* Create new slot. */
            (*ptr) = size + SMALL_LIMIT;

            /* Done. */
            return ptr + 1U;
        }

        /* Check the free slots in the pool. */
        {
            unsigned char *const end = pool->data + pool->used;
            unsigned char       *ptr = pool->data;
            unsigned char    big_len = SMALL_LIMIT;
            unsigned char   *big_ptr = NULL;

            while (ptr < end)
                if (*ptr == 0U || *ptr == SMALL_LIMIT) {
                    /* Invalid pointer */
                    errno = EDOM;
                    return NULL;
                } else
                if (*ptr > SMALL_LIMIT) {
                    /* Used slot, skip */
                    ptr += (*ptr) - SMALL_LIMIT + 1U;
                    continue;
                } else {
                if (*ptr < size) {
                    /* Slot is too small, skip it */
                    ptr += (*ptr) + 1U;
                    continue;
                } else
                if (*ptr == size) {
                    /* Perfect slot; grab it. */
                    (*ptr) = size + SMALL_LIMIT;
                    return ptr + 1U;
                } else
                    /* Remember smallest of the large enough slots */
                    if (*ptr < big_len) {
                        big_len = *ptr;
                        big_ptr = ptr;
                    }
                    ptr += (*ptr) + 1U;
                }

            if (big_ptr != NULL) {
                (*big_ptr) = big_len + SMALL_LIMIT;
                return big_ptr + 1;
            }
        }

        /* Check the next pool. */
        pool = pool->next;
    }

    /* Need a new pool. */
    pool = malloc(SMALL_POOL_SIZE);
    if (pool == NULL) {
        errno = ENOMEM;
        return NULL;
    }

    /* Initialize pool; use the initial slot for the new allocation. */
    pool->used = size + 1;
    pool->size = SMALL_POOL_SIZE - sizeof (struct small_pool);
    pool->data[0] = size + SMALL_LIMIT;

    /* Prepend this pool to the pool chain. */
    pool->next = list;
    list = pool;

    /* Return the slot we used. */
    return pool->data + 1;
}

It has a simple strategy: If there is unused trailing space in the pool, use it. Otherwise, scan through the pool to find the unused slots. If there is a perfectly-sized slot, use it; otherwise, use the smallest sufficiently large unused slot.

There are many improvements possible. For example, you could keep full pools in a separate list, to avoid scanning through them. It might also be a good idea to move the pool where a free slot was found in, to the start of the pool list.

The deallocation is more complicated. If we have relatively few deallocations amidst allocations, and don't worry about freeing entire pools, the deallocation could be as simple as

int small_free(void *const item)
{
    if (item == NULL)
        return 0;
    else {
        struct small_pool *pool = list;

        while (pool != NULL && !((unsigned char *)item > pool->data && (unsigned char *)item < pool->data + pool->used))
            pool = pool->next;

        if (pool != NULL) {
            unsigned char *const ptr = (unsigned char *)item - 1;

            if (*ptr > SMALL_LIMIT)
                (*ptr) -= SMALL_LIMIT;

            return 0;
        }

        return ENOENT;    
    }
}

assuming you need the function to return ENOENT in case the allocation is not actually a small one. If it is important to verify the pointer to be deallocated is valid, say for debugging, then

int small_free(void *const item)
{
    if (item == NULL)
        return 0;
    else {
        struct small_pool *pool = list;

        while (pool != NULL && !((unsigned char *)item > pool->data && (unsigned char *)item < pool->data + pool->used))
            pool = pool->next;

        if (pool != NULL) {
            unsigned char *const end = pool->data + pool->used;
            unsigned char       *ptr = pool->data;

            while (ptr < end)
                if (*ptr == 0U || *ptr == SMALL_LIMIT)
                    return EDOM;
                else
                if (*ptr < SMALL_LIMIT) {
                    size_t len = (*ptr) + 1U;

                    /* Coalesce consecutive slots, if possible. */
                    while (len + ptr[len] < SMALL_LIMIT) {
                        (*ptr) = len + ptr[len];
                        len = (*ptr) + 1U;
                    }

                    ptr += len;

                } else {
                    const size_t len = (*ptr) + 1U - SMALL_LIMIT;

                    /* Within the current slot.. probably should just
                     * compare item to ptr+1 instead. */
                    if ((unsigned char *)item > ptr && (unsigned char *)item < ptr + len) {
                        *ptr = len - 1U;
                        return 0;
                    }

                    ptr += len;
                }
        }

        return ENOENT;    
    }
}

Even this latter version does not trim ->used when the last chunk(s) in a pool are freed, nor does it free completely unused pools either. In other words, the above deallocation is only a rough example.

Speed-wise, the above seem to be at least an order of magnitude slower than GLIBC malloc()/free() on my machine. Here is a simple test to check for linear allocation - semirandom deallocation pattern:

/* Make sure this is prime wrt. 727 */
#define POINTERS 1000000

int main(void)
{
    void **array;
    size_t i;

    fprintf(stderr, "Allocating an array of %lu pointers: ", (unsigned long)POINTERS);
    fflush(stderr);
    array = malloc((size_t)POINTERS * sizeof array[0]);
    if (array == NULL) {
        fprintf(stderr, "Failed.\n");
        return EXIT_FAILURE;
    }
    fprintf(stderr, "Done.\n\n");
    fprintf(stderr, "Allocating pointers in varying sizes.. ");
    fflush(stderr);
    for (i = 0; i < POINTERS; i++) {
        const size_t n = 1 + ((i * 727) % (SMALL_LIMIT - 1));
        if (!(array[i] = small_alloc(n))) {
            if (errno == EDOM)
                fprintf(stderr, "Failed at %lu; corrupted list.\n", (unsigned long)i + 1UL);
            else
                fprintf(stderr, "Failed at %lu: %s.\n", (unsigned long)i + 1UL, strerror(errno));
            return EXIT_FAILURE;
        }
    }

    fprintf(stderr, "Done.\n\n");
    fprintf(stderr, "Deallocating pointers in a mixed order.. ");
    fflush(stderr);

    for (i = 0; i < POINTERS; i++) {
        const size_t p = (i * 727) % POINTERS;
        if (small_free(array[p])) {
            if (errno == EDOM)
                fprintf(stderr, "Failed at %lu: corrupted list.\n", (unsigned long)i + 1UL);
            else
                fprintf(stderr, "Failed at %lu: %s.\n", (unsigned long)i + 1UL, strerror(errno));
            return EXIT_FAILURE;
        }
    }

    fprintf(stderr, "Done.\n\n");
    fprintf(stderr, "Deallocating the pointer array.. ");
    fflush(stderr);

    free(array);

    fprintf(stderr, "Done.\n\n");
    fflush(stderr);

    return EXIT_SUCCESS;
}

The true strength of a pool-based allocator, in my opinion, is that you can release entire pools at once. With that in mind, perhaps your workload could use a specialized allocator when building your structures, with a compactor phase (capable of adjusting pointers) run at least at the end, and possibly during the construction too if a sufficient number of deletions have been done. That approach would allow you to release the temporarily needed memory back to the OS. (Without compaction, most pools are likely to have at least one allocation left, making it impossible to release it.)

I don't think this is a good answer at all, but a better answer would require more details, specifically about the data structures stored, and the allocation/deallocation pattern that occurs in practice. In the absence of those, I hope this gives at least some ideas on how to proceed.


Need Your Help

Sending output to logfile or STDOUT

perl stdout logfile

I am trying to come up with some logic to give my script an option of where to send the output.

Adding version control to an existing project

svn version-control

I am working on a project that has grown to a decent size, and I am the only developer. We currently don't use any version control, but I definitely need to start.

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.