Lessons when creating a C API from Rust

I have recently created a C API for a library dealing with Boot Loader Spec files as well as the GRUB environment file. In the process I have learnt a few things that, coming from a C background, were not obvious to me at all.

Box to control unsafe ownership

Say we have this simple Rust API:

pub struct A {
  counter: u8
}

impl A {
  pub fn new(count: u8) -> A {
    A { counter: count }
  }
}

Let’s start with the new method wrapper:

#[no_mangle]
pub extern "C" fn a_new(count: u8) -> *mut A {
  let boxed_a = Box::new(A {counter: count});
  Box::into_raw(boxed_a)
}

A Box is basically a smart pointer, it allows us to control the lifetime of the data outside of the boundaries of Rust’s borrow checker. Box::into_raw returns a pointer to the allocated A instance. Let’s see how to access that data again:

#[no_mangle]
pub extern "C" fn a_get_counter(a: *mut A) -> u8 {
  let a = unsafe { Box::from_raw(a) };
  let count = a.counter;
  Box::into_raw(a);
  count
}

Box::from_raw is an unsafe method that turns a pointer into an owned Box, this allows us to access the pointer data safely from Rust. Note that Box is automatically dereferenced.

UPDATE: Sebastian Dröge has rightly pointed out that the above method is wrong, note that this is how I found most StackOverflow and other people explain how I should use the data again, but if Sebastian says it is wrong then I know it to be wrong ;-).

Turns out that casting the pointer as a reference inside an unsafe block is enough:

#[no_mangle]
pub unsafe extern "C" fn a_get_counter(a: *mut A) -> u8 {
  let a = &*a;
  a.counter
}

 

Now we need to give the C user a deallocator for instances of A, this is relatively straightforward, we wrap the object around a Box and since we don’t call into_raw again, as soon as the Box is out of scope the inner contents are dropped too:

#[no_mangle]
pub unsafe extern "C" fn a_drop(a: *mut A) {
  Box::from_raw(a);
}

Strings

In Rust there are two standard ways to interact with strings, the String type, a dynamic utf-8 string that can be modified and resized, and &str, which basically is a bare pointer to an existing String. It took me a while to realize that internally a String is not null terminated and can contain many null characters. This means that the internal represenation of String is not compatible with C strings.

To address this, Rust provides another two types and referenced counterparts:

  • OsString and &OsStr: a native string tied to the runtime platform encoding and sizing
  • CString and &CStr: a null terminated string

My main grudge with this model when creating a C API is that it creates friction with the C boundary for a couple of reasons:

  • Internal Rust APIs often expect String or &str, meaning that at the C API boundary you need to allocate a CString if you use String as the internal representation, or the other way around if you use CString as the internal representation
  • You can’t “transparently” pass ownership of a CString to C without a exposing a deallocator specific to CString, more on this on the next section.

This means that compared to a C implementation of the API your code will liekly use more allocations which might or might not be critical depending on the use case, but this is something that struck me as a drawback for Rustification.

UPDATE: I am talking solely about the C API boundary, Rust is _great_ at giving you tools to avoid extra allocations (yay slices!), you can create a parser of a large chunk of text without allocating any extra strings to chunk the source text around.

Allocator mismatch

Something else I stumbled upon was that Rust does not use malloc/free, and that mismatch has annoying side effects when you are trying to rustify an API. Say you have this C code:

char* get_name() {
  const char* STATIC_NAME = "John";
  char* name = (char*)malloc(sizeof(STATIC_NAME));
  memcpy(name, STATIC_NAME, sizeof(STATIC_NAME));
}

int main () {
  char * name = get_name();
  printf("%s\n", name);
  free(name);
  return 0;
}

Now if you want to Rustify that C function, the naive way (taking into account the String vs. CString stuff I mentioned before) would be to do this:

#[no_mangle]
pub extern "C" fn get_name() -> *mut std::os::raw::c_char {
  const STATIC_NAME: &str = "John";
  let name = std::ffi::CString::new(STATIC_NAME)
               .expect("Multiple null characters in string");
  name.into_raw()
}

But this is not exactly the same as before, note that in the C example we call free() in order to drop the memory. In this case we would have to create a new method that calls CString::from_raw() but that won’t be compatible with the original C API.

This is the best I was able to came up with:

/* You can use the libc crate as well */
extern {
  fn malloc(size: usize) -> *mut u8;
  fn memcpy(dest: *mut u8, src: *const u8, size: usize) -> *mut u8;
}

#[no_mangle]
pub extern "C" fn get_name() -> *mut u8 {
  const STATIC_NAME: &str = "John";
  let name = std::ffi::CString::new(STATIC_NAME)
               .expect("Multiple null characters in string");
  let length = name.as_bytes_with_nul().len();
  let cname = unsafe { malloc(length) };
  unsafe { memcpy(cname, name.as_bytes_with_nul().as_ptr(), length) };
  cname
}

Note that STATIC_NAME is just an example, usually the data comes from a String/&str in your Rust API. The problem here is that we allocated an extra CString to then copy its contents using malloc/memcpy and then drop it immediately.

However, later, while working on creating UEFI binaries from Rust, I learned that Rust allows you to override its own allocator and use a custom one or the native system one. This would be another way to achieve the same and save the malloc/memcpy step, but don’t trust me 100% here as I am not sure whether this is entirely safe (if you know, let me know in the comments).

UPDATE: Many people have pointed out that overriding the allocator to use the system is absolutely fine:

use std::alloc::System;

#[global_allocator]
static GLOBAL: System = System;

#[no_mangle]
pub extern "C" fn get_name() -> *mut u8 {
  const STATIC_NAME: &str = "John";
  let name = std::ffi::CString::new(STATIC_NAME).expect("Multiple null characters in string");
  name.into_raw() as *mut u8
}

Traits as fat pointers

Let’s say we have the following API with two types and a trait implemented by both:

pub struct A {}
pub struct B {}

impl A {
  pub fn new () -> A { A{} }
}

impl B {
  pub fn new () -> B { B{} }
}

pub trait T {
  fn get_name(&self) -> std::ffi::CString;
}

impl T for A {
  fn get_name(&self) -> std::ffi::CString {
    std::ffi::CString::new("I am A").expect("CString error")
  }
}

impl T for B {
  fn get_name(&self) -> std::ffi::CString {
    std::ffi::CString::new("I am B").expect("CString error")
  }
}

Now the problem is, if we want a single wrapper for T::get_name() to avoid having to wrap each trait implementation family of functions, what do we do? I banged my head on this trying to Box a reference to a trait and other things until I read about this in more detail. Basically, the internal representation of a trait is a fat pointer (or rather, a struct of two pointers, one to the data and another to the trait vtable).

So we can transmute a reference to a trait as a C struct of two pointers, the end result for type A would be like this (for B you just need another constructor and cast function):

#[repr(C)]
pub struct CTrait {
  data: *mut std::os::raw::c_void,
  vtable: *mut std::os::raw::c_void
}

#[no_mangle]
pub extern "C" fn a_new() -> *mut A {
  Box::into_raw(Box::new(A::new()))
}

#[no_mangle]
pub extern "C" fn a_drop(a: *mut A) {
  unsafe{ Box::from_raw(a) };
}

#[no_mangle]
pub extern "C" fn a_as_t (a: *mut A) -> CTrait {
  let mut boxed_a = unsafe { Box::from_raw(a) };
  let ret: CTrait = {
    let t: &mut dyn T = &mut *boxed_a;
    unsafe { std::mem::transmute::<&mut dyn T,CTrait> (t) }
  };
  Box::into_raw(boxed_a);
  ret
}

#[no_mangle]
pub extern "C" fn t_get_name(t: CTrait) -> *mut u8 {
  let t = unsafe { std::mem::transmute::<CTrait, &mut dyn T> (t) };
  t.get_name().into_raw() as *mut u8
}

The C code to consume this API would look like this:

typedef struct {} A;
typedef struct {
  void* _d;
  void* _v;
} CTrait;

A*      a_new();
void    a_drop(A* a);
CTrait  a_as_t(A* a);
char*   t_get_name(CTrait);

int main () {
  A* a = a_new();
  CTrait t = a_as_t(a);
  char* name = t_get_name(t);
  printf("%s\n", name);
  free(name);
  a_drop(a);
  return 0;
}

Error reporting

Another hurdle has been dealing with Result<> in general, however this is more of a shortcoming of C’s lack of standard error reporting mechanism. In general I tend to return NULL to C API calls that expect a pointer and let C handle it, but of course data is lost in the way as the C end has no way to know what exactly went wrong as there is no error type to query. I am tempted to mimick GLib’s error handling. I think that if I was trying to replace an existing C library with its own error reporting mapping things would become easier.

Conclusions

I am in love with Rust and its ability to impersonate C is very powerful, however it is note entirely 0 cost, for me, the mismatch between string formats is the biggest hurdle as it imposes extra allocations, something that could become really expensive when rustifying C code that passes strings back and forth from/to the API caller. The other things I mentioned were things that took me quite some time to realize and by writing it here I hope I help other people that are writing Rust code to expose it as a C API. Any feedback on my examples is welcome.

5 thoughts on “Lessons when creating a C API from Rust

  1. > This would be another way to achieve the same and save the malloc/memcpy step, but don’t trust me 100% here as I am not sure whether this is entirely safe (if you know, let me know in the comments):

    Indeed this works: by setting the global allocator to System, you can rely on it being malloc/free on Linux.

    Like

  2. Totally unrelevant to your otherwise acute post, but:

    “`
    const char* STATIC_NAME = “John”;
    char* name = (char*)malloc(sizeof(STATIC_NAME));
    “`

    That’s not how you use `sizeof()`. Either use `strlen()` here, or:

    “`
    static const char STATIC_NAME[] = “John”;
    char* name = malloc(sizeof(STATIC_NAME));
    “`

    (Also, it’s been a while since I last used pure C, but unless the standard has evolved, you don’t need an explicit cast when casting from `void *`)

    Like

Leave a comment