Python Bindings With Ctypes

Python and C Bindings Cyber Image

There’s plenty of ways to write Python bindings for C/C++ code. The classic, no-frills approach is to use the ctypes module of the standard library. It provides a convenient way to load dynamic libraries in Python and call functions from them. Other common options are pybind11 and cython, which are more powerful, but also more complex. They come with a bunch of dependencies and require a build step, which is not always desirable. Usually, bindings created using the latter methods have to be targeted at specific Python versions, whereas ctypes allows for more portability. The best choice depends on the use case. If the library was written with Python bindings in mind (i.e., no unsupported, cutting-edge C++ features), then go for pybind11 or cython. Otherwise, if you don’t really have a say in the source code of the library you want to bind to, or even worse, you don’t have access to the source code at all, then ctypes is a great option. It’s easy to use and comes with no extra dependencies. For a small to medium-sized project, it’s probably all you need.

Fnv Hash

I’ve written a small library that implements the FNV-1a hash function in C++. It’s a simple, non-cryptographic hash that’s easy to implement and fast to run. I’m going to demonstrate how to bind to it using ctypes, but it’s not important to understand how the function itself works. If you’re curious, here’s the pseudocode:

1
2
3
4
5
hash = offset_basis
for each octet_of_data to be hashed
hash = hash xor octet_of_data
hash = hash * FNV_prime
return hash

Implementation

The library can be used to compute the 32-bit version of the hash. The Fnv object may be updated as many times as needed, and the hash can be retrieved using the digest method. The default update function is meant to work on an array of bytes, but it’s easy to create a templated version that works on any type of data.

10
11
12
13
14
15
16
17
18
19
20
21
22
23
class Fnv1a32 {
public:
auto update(const std::uint8_t* data, std::size_t size) -> void;

template<class T>
auto update(const T& data) -> void {
update(reinterpret_cast<const std::uint8_t*>(&data), sizeof(data));
}

auto digest() const -> std::uint32_t;

private:
std::uint32_t _hash{OFFSET_BIAS_32};
};

As for the implementation, it’s pretty simple.

4
5
6
7
8
9
10
11
12
13
auto Fnv1a32::update(const std::uint8_t* data, std::size_t size) -> void {
for (auto idx = 0u; idx < size; ++idx) {
_hash ^= data[idx];
_hash *= FNV_PRIME_32;
}
}

auto Fnv1a32::digest() const -> std::uint32_t {
return _hash;
}

Basic usage

Here’s the complete fnv.h and fnv.cpp code. A basic usage example in C++ would look something like this:

1
2
3
4
fnv::Fnv1a32 hash;
std::uint8_t x[] = "\x12\x34\x56\x78";
hash.update(x, 4);
std::cout << hash.digest();

Bindings

How ctypes works

The best thing about ctypes is that it works out of the box. There’s no need to install anything. Just import ctypes and you’re good to go.
The first step requires loading the library. This is done by passing the library path to the ctypes.CDLL function. For example, libc = ctypes.CDLL("libc.so.6") would return a handle to the C standard library on Linux. Note that the handle can be used to access only the functions exported by the library, not all the functions inside the library.
Once the library is loaded, you can access its functions much like accessing functions from a Python module. The most important aspect to consider is data marshalling. C and Python have different data types, so ctypes needs to know how to convert between them. You must explicitly specify the argument types (argtypes) and the return type (restype) of a function. For example, strchr from the C standard library has the following signature:

1
char *strchr(const char *s, int c);

It returns a pointer to the first occurrence of the character c in the string s. In order for ctypes to know how to use the function, you need to specify the argument types and the return type like this:

1
2
3
4
libc = ctypes.CDLL("libc.so.6")
strchr = libc.strchr
strchr.restype = ctypes.c_char_p
strchr.argtypes = [ctypes.c_char_p, ctypes.c_int]

After that, you may use the function like this:

1
strchr(b"hello", b"l")

A list of fundamental data types and their corresponding C and Python types can be found here.

Creating the library

C wrapper

Although it would be really cool to export directly the Fnv1a32 class to Python, it’s not possible. We can only export and bind to C-style functions. The workaround is to create a function that instantiates the class and explicitly calls its methods. We’ll instantiate the object on the heap and return a pointer to it, which is going to be used as a handle in Python. This is easy for ctypes to interpret, as a pointer is just an integer representing a memory address. Same goes for all the parameters and return types - they’re either fundamental types or pointers to a more sophisticated type.

3
4
5
auto new_fnv() -> fnv::Fnv1a32* {
return new fnv::Fnv1a32();
}

One very important aspect is memory management. Memory allocated in C must be freed in C, as Python’s garbage collector won’t manage this memory.

7
8
9
auto delete_fnv(fnv::Fnv1a32* fnv) -> void {
delete fnv;
}

Using the pointer, we can call the update and digest methods on the object it points to. That looks pretty much like trying to mimic object-oriented programming in C. Note how its usage resembles the this pointer from C++.

11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
void fnv_update_bytes(fnv::Fnv1a32* fnv, const std::uint8_t* data, std::size_t size) {
fnv->update(data, size);
}

void fnv_update_uint32(fnv::Fnv1a32* fnv, std::uint32_t data) {
fnv->update(data);
}

void fnv_update_uint64(fnv::Fnv1a32* fnv, std::uint64_t data) {
fnv->update(data);
}

void fnv_update_float(fnv::Fnv1a32* fnv, float data) {
fnv->update(data);
}

auto fnv_digest(fnv::Fnv1a32* fnv) -> std::uint32_t {
return fnv->digest();
}

Exporting functions

Since we’re using C++, exporting functions is not as straightforward as it is in C. C++ supports function overloading, meaning you can have multiple functions with the same name but different parameters. This is not possible in C, and more so, it would be very confusing for ctypes. Suppose we have two functions called fnv_update, which take different parameters. How would ctypes know which one to import from the library?
Internally, C++ uses a technique called name mangling. It basically encodes the function name with information about the number and types of its parameters. This makes it impossible to export a function with a specific name, because the name is not known until the code gets compiled.
Therefore, we have to tell the compiler to use C linkage for the functions we want to export, effectively disabling name mangling and using a consistent calling convention. This is done by wrapping the function declarations in an extern "C" block.

11
12
13
14
15
16
17
18
19
extern "C" {
EXPORT auto new_fnv() -> fnv::Fnv1a32*;
EXPORT void delete_fnv(fnv::Fnv1a32* fnv);
EXPORT void fnv_update_bytes(fnv::Fnv1a32* fnv, const std::uint8_t* data, std::size_t size);
EXPORT void fnv_update_uint32(fnv::Fnv1a32* fnv, std::uint32_t data);
EXPORT void fnv_update_uint64(fnv::Fnv1a32* fnv, std::uint64_t data);
EXPORT void fnv_update_float(fnv::Fnv1a32* fnv, float data);
EXPORT auto fnv_digest(fnv::Fnv1a32* fnv) -> std::uint32_t;
}

For example, without the extern "C" block, the fnv_digest exported function would look like this (on my Linux machine, using clang):

1
_Z10fnv_digestPN3fnv7Fnv1a32E

Wrapping the declaration in an extern "C" block causes it to be exported in a more familiar way:

1
fnv_digest

Note the EXPORT macro preceding each function declaration. I had to use it because, on Windows, exported functions must be decorated with __declspec(dllexport). On Linux, this is not necessary, so the macro expands to nothing.

5
6
7
8
9
#if defined(_WIN32) || defined(_WIN64)
#define EXPORT __declspec(dllexport)
#else
#define EXPORT
#endif

Building

There’s a total of two cpp files that need to be compiled: fnv.cpp, which contains the implementation of the Fnv1a32 class, and bindings.cpp which contains the wrapper functions. The latter depends on the former, so we need to link them together. For convenience, I have created a CMakeLists.txt file. Note the SHARED parameter passed to add_library. This tells CMake to create a shared library, kind of like the -shared flag passed to g++.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
cmake_minimum_required(VERSION 3.24)

project(fnv VERSION 1.0)

set(CMAKE_CXX_STANDARD 20)
set(CMAKE_BUILD_TYPE "Release")

if (MSVC)
add_compile_options(/Wall /W4 /Ox)
else()
add_compile_options(-Wall -Werror -O3)
endif()

add_library(fnv SHARED fnv.cpp bindings.cpp)

Building is pretty straightforward.

1
2
3
4
mkdir -p build 
cd build
cmake ..
cmake --build . --config Release

Depending on your platform, the result is going be the either a fnv.dll or a libfnv.so file. The whole code directory is available here.

Viewing exported functions

What if you don’t have access to the source code of the library you want to bind to?
There are various tools that can be used to inspect the exported functions of a shared library. On Linux, you can use the nm -D --defined-only libfnv.so command, while on Windows, the dumpbin utility that comes with Visual Studio. For example, on Windows, the output of dumpbin /EXPORTS fnv.dll looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
Microsoft (R) COFF/PE Dumper Version 14.32.31332.0
Copyright (C) Microsoft Corporation. All rights reserved.


Dump of file fnv.dll

File Type: DLL

Section contains the following exports for fnv.dll

00000000 characteristics
FFFFFFFF time date stamp
0.00 version
1 ordinal base
7 number of functions
7 number of names

ordinal hint RVA name

1 0 00001070 delete_fnv
2 1 00001120 fnv_digest
3 2 00001080 fnv_update_bytes
4 3 000010F0 fnv_update_float
5 4 00001090 fnv_update_uint32
6 5 000010C0 fnv_update_uint64
7 6 00001040 new_fnv

Summary

1000 .data
1000 .pdata
2000 .rdata
1000 .reloc
1000 .rsrc
1000 .text

As for the function parameters and return types, if you don’t have access to the source code, you may have to resort to reverse engineering. There are various tools that can help you with that, but I’m not going to get into it in this article.

Creating the Python module

After you got the shared library, creating the Python module is relatively easy. You just need to load the library using ctypes.CDLL and specify the argument types and return types of the functions you want to use, sort of like “declaring” them in Python.

5
6
7
8
9
10
class Fnv:
lib = ctypes.CDLL(os.path.abspath('fnv.dll' if os.name == 'nt' else 'libfnv.so'))

# auto new_fnv() -> fnv::Fnv1a32*
lib.new_fnv.restype = ctypes.c_void_p
lib.new_fnv.argtypes = []

The process can get repetitive, so you might as well let the GitHub Copilot do some of the work for you.

As already noted, you have to take care of garbage collection yourself.

36
37
38
39
40
def __init__(self):
self._fnv = Fnv.lib.new_fnv()

def __del__(self):
Fnv.lib.delete_fnv(self._fnv)

Also, make sure to call the correct function depending on the type of the argument.

42
43
44
45
46
47
48
49
50
51
52
53
def update(self, data: bytes | int | float) -> None:
if type(data) is bytes:
Fnv.lib.fnv_update_bytes(self._fnv, ctypes.c_char_p(data), ctypes.c_size_t(len(data)))
elif type(data) is int:
if data.bit_length() <= 32:
Fnv.lib.fnv_update_uint32(self._fnv, ctypes.c_uint32(data))
else:
Fnv.lib.fnv_update_uint64(self._fnv, ctypes.c_uint64(data))
elif type(data) is float:
Fnv.lib.fnv_update_float(self._fnv, ctypes.c_float(data))
else:
raise TypeError('data must be bytes, int or float')

Here’s the complete fnv.py code, along with a couple of small unit tests, so you can check if everything works as expected.

Usage

The newly created Python module can be imported and used like any other module. Just make sure the shared library is in the same directory as the Python module, so that ctypes can find it. In practice, you may also want to add the library to the LD_LIBRARY_PATH environment variable on Linux, or to the PATH environment variable on Windows, but that’s not necessary for this example.

1
2
3
4
5
from fnv import Fnv

fnv = Fnv()
fnv.update(b'abcd')
print(fnv.digest())

Perhaps now it’s a bit more obvious why ctypes can be a great way to create Python bindings for C/C++ code. May not fit all uses cases out there, but works for most of them. I personally believe its simplicity is also its greatest strength.