Why are function pointers and data pointers incompatible in C/C++?

I have read that converting a function pointer to a data pointer and vice versa works on most platforms but is not guaranteed to work. Why is this the case? Shouldn't both be simply addresses into main memory and therefore be compatible?


An architecture doesn't have to store code and data in the same memory. With a Harvard architecture, code and data are stored in completely different memory. Most architectures are Von Neumann architectures with code and data in the same memory but C doesn't limit itself to only certain types of architectures if at all possible.

Some computers have (had) separate address spaces for code and data. On such hardware it just doesn't work.

The language is designed not only for current desktop applications, but to allow it to be implemented on a large set of hardware.

It seems like the C language committee never intended void* to be a pointer to function, they just wanted a generic pointer to objects.

The C99 Rationale says: Pointers C has now been implemented on a wide range of architectures. While some of these architectures feature uniform pointers which are the size of some integer type, maximally portable code cannot assume any necessary correspondence between different pointer types and the integer types. On some implementations, pointers can even be wider than any integer type.

The use of void* (“pointer to void”) as a generic object pointer type is an invention of the C89 Committee. Adoption of this type was stimulated by the desire to specify function prototype arguments that either quietly convert arbitrary pointers (as in fread) or complain if the argument type does not exactly match (as in strcmp). Nothing is said about pointers to functions, which may be incommensurate with object pointers and/or integers.

Note Nothing is said about pointers to functions in the last paragraph. They might be different from other pointers, and the committee is aware of that.

For those who remember MS-DOS, Windows 3.1 and older the answer is quite easy. All of these used to support several different memory models, with varying combinations of characteristics for code and data pointers.

So for instance for the Compact model (small code, large data):

sizeof(void *) > sizeof(void(*)())

and conversely in the Medium model (large code, small data):

sizeof(void *) < sizeof(void(*)())

In this case you didn't have separate storage for code and date but still couldn't convert between the two pointers (short of using non-standard __near and __far modifiers).

Additionally there's no guarantee that even if the pointers are the same size, that they point to the same thing - in the DOS Small memory model, both code and data used near pointers, but they pointed to different segments. So converting a function pointer to a data pointer wouldn't give you a pointer that had any relationship to the function at all, and hence there was no use for such a conversion.

Pointers to void are supposed to be able to accommodate a pointer to any kind of data -- but not necessarily a pointer to a function. Some systems have different requirements for pointers to functions than pointers to data (e.g, there are DSPs with different addressing for data vs. code, medium model on MS-DOS used 32-bit pointers for code but only 16-bit pointers for data).

In addition to what is already said here, it is interesting to look at POSIX dlsym():

The ISO C standard does not require that pointers to functions can be cast back and forth to pointers to data. Indeed, the ISO C standard does not require that an object of type void * can hold a pointer to a function. Implementations supporting the XSI extension, however, do require that an object of type void * can hold a pointer to a function. The result of converting a pointer to a function into a pointer to another data type (except void *) is still undefined, however. Note that compilers conforming to the ISO C standard are required to generate a warning if a conversion from a void * pointer to a function pointer is attempted as in:

 fptr = (int (*)(int))dlsym(handle, "my_function");

Due to the problem noted here, a future version may either add a new function to return function pointers, or the current interface may be deprecated in favor of two new functions: one that returns data pointers and the other that returns function pointers.

Depending on the target architecture, code and data may be stored in fundamentally incompatible, physically distinct areas of memory.

C++11 has a solution to the long-standing mismatch between C/C++ and POSIX with regard to dlsym(). One can use reinterpret_cast to convert a function pointer to/from a data pointer so long as the implementation supports this feature.

From the standard, 5.2.10 para. 8, "converting a function pointer to an object pointer type or vice versa is conditionally-supported." 1.3.5 defines "conditionally-supported" as a "program construct that an implementation is not required to support".

undefined doesn't necessarily mean not allowed, it can mean that the compiler implementor has more freedom to do it how they want.

For instance it may not be possible on some architectures - undefined allows them to still have a conforming 'C' library even if you can't do this.

They can be different types with different space requirements. Assigning to one can irreversibly slice the value of the pointer so that assigning back results in something different.

I believe they can be different types because the standard doesn't want to limit possible implementations that save space when it's not needed or when the size could cause the CPU to have to do extra crap to use it, etc...

Another solution:

Assuming POSIX guarantees function and data pointers to have the same size and representation (I can't find the text for this, but the example OP cited suggests they at least intended to make this requirement), the following should work:

double (*cosine)(double);
void *tmp;
handle = dlopen("libm.so", RTLD_LAZY);
tmp = dlsym(handle, "cos");
memcpy(&cosine, &tmp, sizeof cosine);

This avoids violating the aliasing rules by going through the char [] representation, which is allowed to alias all types.

Yet another approach:

union {
    double (*fptr)(double);
    void *dptr;
} u;
u.dptr = dlsym(handle, "cos");
cosine = u.fptr;

But I would recommend the memcpy approach if you want absolutely 100% correct C.

The only truly portable solution is not to use dlsym for functions, and instead use dlsym to obtain a pointer to data that contains function pointers. For example, in your library:

struct module foo_module = {
    .create = create_func,
    .destroy = destroy_func,
    .write = write_func,
    /* ... */

and then in your application:

struct module *foo = dlsym(handle, "foo_module");
/* ... */

Incidentally, this is good design practice anyway, and makes it easy to support both dynamic loading via dlopen and static linking all modules on systems that don't support dynamic linking, or where the user/system integrator does not want to use dynamic linking.

On most architectures, pointers to all normal data types have the same representation, so casting between data pointer types is a no-op.

However, it's conceivable that function pointers might require a different representation, perhaps they're larger than other pointers. If void* could hold function pointers, this would mean that void*'s representation would have to be the larger size. And all casts of data pointers to/from void* would have to perform this extra copy.

As someone mentioned, if you need this you can achieve it using a union. But most uses of void* are just for data, so it would be onerous to increase all their memory use just in case a function pointer needs to be stored.

I know that this hasn't been commented on since 2012, but I thought it would be useful to add that I do know an architecture that has very incompatible pointers for data and functions since a call on that architecture checks privilege and carries extra information. No amount of casting will help. It's The Mill.

Need Your Help