-TODO +This page describes the detailed semantics underlying the FFI library +and its interaction with both Lua and C code. +
++Given that the FFI library is designed to interface with C code +and that declarations can be written in plain C syntax, it +closely follows the C language semantics wherever possible. Some +concessions are needed for smoother interoperation with Lua language +semantics. But it should be straightforward to write applications +using the LuaJIT FFI for developers with a C or C++ background.
C Language Support
-TODO +The FFI library has a built-in C parser with a minimal memory +footprint. It's used by the ffi.* library +functions to declare C types or external symbols.
++It's only purpose is to parse C declarations, as found e.g. in +C header files. Although it does evaluate constant expressions, +it's not a C compiler. The body of inline +C function definitions is simply ignored. +
++Also, this is not a validating C parser. It expects and +accepts correctly formed C declarations, but it may choose to +ignore bad declarations or show rather generic error messages. If in +doubt, please check the input against your favorite C compiler. +
++The C parser complies to the C99 language standard plus +the following extensions: +
+-
+
+
- C++-style comments (//). + +
- The '\e' escape in character and string literals. + +
- The long long 64 bit integer type. + +
- The C99/C++ boolean type, declared with the keywords bool +or _Bool. + +
- Complex numbers, declared with the keywords complex or +_Complex. + +
- Two complex number types: complex (aka +complex double) and complex float. + +
- Vector types, declared with the GCC mode or +vector_size attribute. + +
- Unnamed ('transparent') struct/union fields +inside a struct/union. + +
- Incomplete enum declarations, handled like incomplete +struct declarations. + +
- Unnamed enum fields inside a +struct/union. This is similar to a scoped C++ +enum, except that declared constants are visible in the +global namespace, too. + +
- C++-style scoped static const declarations inside a +struct/union. + +
- Zero-length arrays ([0]), empty +struct/union, variable-length arrays (VLA, +[?]) and variable-length structs (VLS, with a trailing +VLA). + +
- Alternate GCC keywords with '__', e.g. +__const__. + +
- GCC __attribute__ with the following attributes: +aligned, packed, mode, +vector_size, cdecl, fastcall, +stdcall. + +
- The GCC __extension__ keyword and the GCC +__alignof__ operator. + +
- GCC __asm__("symname") symbol name redirection for +function declarations. + +
- MSVC keywords for fixed-length types: __int8, +__int16, __int32 and __int64. + +
- MSVC __cdecl, __fastcall, __stdcall, +__ptr32, __ptr64, __declspec(align(n)) +and #pragma pack. + +
- All other GCC/MSVC-specific attributes are ignored. + +
+The following C types are pre-defined by the C parser (like +a typedef, except re-declarations will be ignored): +
+-
+
+
- Vararg handling: va_list, __builtin_va_list, +__gnuc_va_list. + +
- From <stddef.h>: ptrdiff_t, +size_t, wchar_t. + +
- From <stdint.h>: int8_t, int16_t, +int32_t, int64_t, uint8_t, +uint16_t, uint32_t, uint64_t, +intptr_t, uintptr_t. + +
+You're encouraged to use these types in preference to the +compiler-specific extensions or the target-dependent standard types. +E.g. char differs in signedness and long differs in +size, depending on the target architecture and platform ABI. +
++The following C features are not supported: +
+-
+
+
- A declaration must always have a type specifier; it doesn't +default to an int type. + +
- Old-style empty function declarations (K&R) are not allowed. +All C functions must have a proper protype declaration. A +function declared without parameters (int foo();) is +treated as a function taking zero arguments, like in C++. + +
- The long double C type is parsed correctly, but +there's no support for the related conversions, accesses or arithmetic +operations. + +
- Wide character strings and character literals are not +supported. + +
- See below for features that are currently +not implemented. + +
C Type Conversion Rules
TODO
+Conversions from C types to Lua objects
+Conversions from Lua objects to C types
+Conversions between C types
Initializers
Conversions between C types
Initializers
@@ -81,8 +222,8 @@ initializers and the C types involved:
C Library Namespaces
--A C library namespace is a special kind of object which allows -access to the symbols contained in libraries. Indexing it with a -symbol name (a Lua string) automatically binds it to the library. -
--TODO -
-Operations on cdata Objects
TODO @@ -158,9 +289,9 @@ Similar rules apply for Lua strings which are implicitly converted to "const char *": the string object itself must be referenced somewhere or it'll be garbage collected eventually. The pointer will then point to stale data, which may have already beeen -overwritten. Note that string literals are automatically kept alive as -long as the function containing it (actually its prototype) is not -garbage collected. +overwritten. Note that string literals are automatically kept +alive as long as the function containing it (actually its prototype) +is not garbage collected.
Objects which are passed as an argument to an external C function @@ -181,6 +312,121 @@ indistinguishable from pointers returned by C functions (which is one of the reasons why the GC cannot follow them).
+C Library Namespaces
++A C library namespace is a special kind of object which allows +access to the symbols contained in shared libraries or the default +symbol namespace. The default +ffi.C namespace is +automatically created when the FFI library is loaded. C library +namespaces for specific shared libraries may be created with the +ffi.load() API +function. +
++Indexing a C library namespace object with a symbol name (a Lua +string) automatically binds it to the library. First the symbol type +is resolved — it must have been declared with +ffi.cdef. Then the +symbol address is resolved by searching for the symbol name in the +associated shared libraries or the default symbol namespace. Finally, +the resulting binding between the symbol name, the symbol type and its +address is cached. Missing symbol declarations or nonexistent symbol +names cause an error. +
++This is what happens on a read access for the different kinds of +symbols: +
+-
+
+
- External functions: a cdata object with the type of the function +and its address is returned. + +
- External variables: the symbol address is dereferenced and the +loaded value is converted to a Lua object +and returned. + +
- Constant values (static const or enum +constants): the constant is converted to a +Lua object and returned. + +
+This is what happens on a write access: +
+-
+
+
- External variables: the value to be written is +converted to the C type of the +variable and then stored at the symbol address. + +
- Writing to constant variables or to any other symbol type causes +an error, like any other attempted write to a constant location. + +
+C library namespaces themselves are garbage collected objects. If +the last reference to the namespace object is gone, the garbage +collector will eventually release the shared library reference and +remove all memory associated with the namespace. Since this may +trigger the removal of the shared library from the memory of the +running process, it's generally not safe to use function +cdata objects obtained from a library if the namespace object may be +unreferenced. +
++Performance notice: the JIT compiler specializes to the identity of +namespace objects and to the strings used to index it. This +effectively turns function cdata objects into constants. It's not +useful and actually counter-productive to explicitly cache these +function objects, e.g. local strlen = ffi.C.strlen. OTOH it +is useful to cache the namespace itself, e.g. local C = +ffi.C. +
+ +No Hand-holding!
++The FFI library has been designed as a low-level library. The +goal is to interface with C code and C data types with a +minimum of overhead. This means you can do anything you can do +from C: access all memory, overwrite anything in memory, call +machine code at any memory address and so on. +
++The FFI library provides no memory safety, unlike regular Lua +code. It will happily allow you to dereference a NULL +pointer, to access arrays out of bounds or to misdeclare +C functions. If you make a mistake, your application might crash, +just like equivalent C code would. +
++This behavior is inevitable, since the goal is to provide full +interoperability with C code. Adding extra safety measures, like +bounds checks, would be futile. There's no way to detect +misdeclarations of C functions, since shared libraries only +provide symbol names, but no type information. Likewise there's no way +to infer the valid range of indexes for a returned pointer. +
++Again: the FFI library is a low-level library. This implies it needs +to be used with care, but it's flexibility and performance often +outweigh this concern. If you're a C or C++ developer, it'll be easy +to apply your existing knowledge. OTOH writing code for the FFI +library is not for the faint of heart and probably shouldn't be the +first exercise for someone with little experience in Lua, C or C++. +
++As a corollary of the above, the FFI library is not safe for use by +untrusted Lua code. If you're sandboxing untrusted Lua code, you +definitely don't want to give this code access to the FFI library or +to any cdata object (except 64 bit integers or complex +numbers). Any properly engineered Lua sandbox needs to provide safety +wrappers for many of the standard Lua library functions — +similar wrappers need to be written for high-level operations on FFI +data types, too. +
+Current Status
The initial release of the FFI library has some limitations and is @@ -200,18 +446,15 @@ obscure constructs.
The JIT compiler already handles a large subset of all FFI operations. @@ -238,6 +481,7 @@ two. value.