Uncreative blog title: Passing multidimensional numpy arrays to C with cffi

I've been playing with using numpy with the cffi library to allow c functions to manipulate numpy arrays directly. Doing this for one dimensional arrays is fairly straightforward, but dealing with multidimensional arrays is a bit trickier. I created a simple C file with various calling options. Given a numpy array x, the x.ctypes.data attribute gives us a pointer to the memory buffer for x. We can cast it to the appropriate type using cffi and pass it directly to the single_test function and manipulate it directly there.

def single_test(x):
    C.single_test(ffi.cast("double *", x.ctypes.data), len(x))

The easiest way to do something similar with multidimensional arrays is just to use array math as in the array_math_test example above. Then the array can be passed down as it is in single_test. If you want to work with a multidimensional array without doing array math, then you need to pass down an array of pointers to your c function. To do this you first need to allocate memory for an array of pointers using the ffi.new function, then fill each index of the resulting array with a pointer to your numpy buffer. One approach would be to break your multidimensional array into multiple numpy arrays and point to each buffer individually:

multi_array = [[1.1, 2.2, 1.3, 4.4, 5.5],
               [5.0, 3.0, 2.0, 1.0, 3],
               [6.0, 1.0, -3.2, -1, 2],
               [0.0, 1.0, 1.0, 2.0, 1]]
x = [numpy.array(v, dtype='float64') for v in multi_array]


def multi_test_a(x):
    ap = ffi.new("double* [%d]" % (len(x)))
    for i in range(len(x)):
        ap[i] = ffi.cast("double *", x[i].ctypes.data)
    C.multi_test(ap, len(x), len(x[0]))

multi_test_a(x)

Perhaps a better approach is to use a regular multidimensional numpy array and calculate the offset from the original buffer pointer for each row.

def multi_test_b(x):
    dsize = ffi.sizeof("double")
    ap = ffi.new("double* [%d]" % (x.shape[0]))
    ptr = ffi.cast("double *", x.ctypes.data)
    for i in range(x.shape[0]):
        ap[i] = ptr + i*x.shape[1]
    C.multi_test(ap, x.shape[0], x.shape[1])

Here is the full python script for this example: Output:

Before single [  1.   2.   3.   4.   5.   6.   7.   8.   9.  10.]
After single [  1.   3.   5.   7.   9.  11.  13.  15.  17.  19.]
Before multi_a [array([ 1.1,  2.2,  1.3,  4.4,  5.5]), array([ 5.,  3.,  2.,  1.,  3.]), array([ 6. ,  1. , -3.2, -1. ,  2. ]), array([ 0.,  1.,  1.,  2.,  1.])]
After multi_a [array([ 1.1,  3.2,  3.3,  7.4,  9.5]), array([ 15.,  14.,  14.,  14.,  17.]), array([ 26. ,  22. ,  18.8,  22. ,  26. ]), array([ 30.,  32.,  33.,  35.,  35.])]
Before multi_b [[ 1.1  2.2  1.3  4.4  5.5]
 [ 5.   3.   2.   1.   3. ]
 [ 6.   1.  -3.2 -1.   2. ]
 [ 0.   1.   1.   2.   1. ]] (4, 5)
After multi_b [[  1.1   3.2   3.3   7.4   9.5]
 [ 15.   14.   14.   14.   17. ]
 [ 26.   22.   18.8  22.   26. ]
 [ 30.   32.   33.   35.   35. ]]

4 comments:

UnknownMay 22, 2013 at 12:11 PM
Hi,
is it OK for you if I add your numpy examples to my repository where I am trying to figure out how to use cffi?
I will of course enclose the link to this blog post. See how it looks at short url: goo.gl/4hWba
long url:
"https://github.com/oplatek/cffi-python-playground/blob/master/test-jones-np-multi/numpy_cffi_arrays.py"

If you do not want it there let me know!
AndrewMay 22, 2013 at 1:53 PM
Go ahead! I'm glad somebody is finding it useful.
mattipNovember 25, 2015 at 2:27 PM
IMO you have made this too complicated.
There is no need for pointer-to-pointer and you need to check that x is in c-contiguous order

I would do something like (not tested)

multi_test(double *x, int n, int m)
{
int i, j;
for (i = 0; i < n; i++)
for (j = 0; j < m; j++)
x[i + m*j] += 10*i + j;
}

then you could define

def multi_test_a(x):
assert len(x.shape) == 2
assert x.flags.c_continuous
C.multi_test(ffi.cast("double *", x.ctypes.data), x.shape[0], x.shape[1])
mattipNovember 25, 2015 at 2:33 PM
Also note that while dlopen works for cases where you are given a shared object, in this case you would get better performance if you use ffi.set_source() and build a cffi module instead, especially on PyPy. See the cffi documentation for more info
http://cffi.readthedocs.org/en/latest/overview.html

Tuesday, May 7, 2013

Passing multidimensional numpy arrays to C with cffi

4 comments: