The string buffer library allows high-performance manipulation of string-like data.
Unlike Lua strings, which are constants, string buffers are mutable sequences of 8-bit (binary-transparent) characters. Data can be stored, formatted and encoded into a string buffer and later converted, decoded or extracted.
The convenient string buffer API simplifies common string manipulation tasks, that would otherwise require creating many intermediate strings. String buffers improve performance by eliminating redundant memory copies, object creation, string interning and garbage collection overhead. In conjunction with the FFI library, they allow zero-copy operations.
Using the String Buffer Library
The string buffer library is built into LuaJIT by default, but it's not loaded by default. Add this to the start of every Lua file that needs one of its functions:
local buffer = require("string.buffer")
Work in Progress
This library is a work in progress. More functions will be added soon.
Serialization of Lua Objects
The following functions and methods allow high-speed serialization (encoding) of a Lua object into a string and decoding it back to a Lua object. This allows convenient storage and transport of structured data.
The encoded data is in an internal binary format. The data can be stored in files, binary-transparent databases or transmitted to other LuaJIT instances across threads, processes or networks.
Encoding speed can reach up to 1 Gigabyte/second on a modern desktop- or server-class system, even when serializing many small objects. Decoding speed is mostly constrained by object creation cost.
The serializer handles most Lua types, common FFI number types and nested structures. Functions, thread objects, other FFI cdata, full userdata and associated metatables cannot be serialized (yet).
The encoder serializes nested structures as trees. Multiple references to a single object will be stored separately and create distinct objects after decoding. Circular references cause an error.
str = buffer.encode(obj)
Serializes (encodes) the Lua object obj into the string str.
obj can be any of the supported Lua types — it doesn't need to be a Lua table.
This function may throw an error when attempting to serialize unsupported object types, circular references or deeply nested tables.
obj = buffer.decode(str)
De-serializes (decodes) the string str into the Lua object obj.
The returned object may be any of the supported Lua types — even nil.
This function may throw an error when fed with malformed or incomplete encoded data. The standalone function throws when there's left-over data after decoding a single top-level object.
Serialization Format Specification
This serialization format is designed for internal use by LuaJIT applications. Serialized data is upwards-compatible and portable across all supported LuaJIT platforms.
It's an 8-bit binary format and not human-readable. It uses e.g. embedded zeroes and stores embedded Lua string objects unmodified, which are 8-bit-clean, too. Encoded data can be safely concatenated for streaming and later decoded one top-level object at a time.
The encoding is reasonably compact, but tuned for maximum performance, not for minimum space usage. It compresses well with any of the common byte-oriented data compression algorithms.
Although documented here for reference, this format is explicitly not intended to be a 'public standard' for structured data interchange across computer languages (like JSON or MessagePack). Please do not use it as such.
The specification is given below as a context-free grammar with a top-level object as the starting point. Alternatives are separated by the | symbol and * indicates repeats. Grouping is implicit or indicated by {…}. Terminals are either plain hex numbers, encoded as bytes, or have a .format suffix.
object → nil | false | true | null | lightud32 | lightud64 | int | num | tab | int64 | uint64 | complex | string nil → 0x00 false → 0x01 true → 0x02 null → 0x03 // NULL lightuserdata lightud32 → 0x04 data.I // 32 bit lightuserdata lightud64 → 0x05 data.L // 64 bit lightuserdata int → 0x06 int.I // int32_t num → 0x07 double.L tab → 0x08 // Empty table | 0x09 h.U h*{object object} // Key/value hash | 0x0a a.U a*object // 0-based array | 0x0b a.U a*object h.U h*{object object} // Mixed | 0x0c a.U (a-1)*object // 1-based array | 0x0d a.U (a-1)*object h.U h*{object object} // Mixed int64 → 0x10 int.L // FFI int64_t uint64 → 0x11 uint.L // FFI uint64_t complex → 0x12 re.L im.L // FFI complex string → (0x20+len).U len*char.B .B = 8 bit .I = 32 bit little-endian .L = 64 bit little-endian .U = prefix-encoded 32 bit unsigned number n: 0x00..0xdf → n.B 0xe0..0x1fdf → (0xe0|(((n-0xe0)>>8)&0x1f)).B ((n-0xe0)&0xff).B 0x1fe0.. → 0xff n.I