+ +The string buffer library allows high-performance manipulation of +string-like data. + +
++ +Unlike Lua strings, which are constants, string buffers are +mutable sequences of 8-bit (binary-transparent) characters. Data +can be stored, formatted and encoded into a string buffer and later +converted, decoded or extracted. + +
++ +The convenient string buffer API simplifies common string manipulation +tasks, that would otherwise require creating many intermediate strings. +String buffers improve performance by eliminating redundant memory +copies, object creation, string interning and garbage collection +overhead. In conjunction with the FFI library, they allow zero-copy +operations. + +
+ +Using the String Buffer Library
++The string buffer library is built into LuaJIT by default, but it's not +loaded by default. Add this to the start of every Lua file that needs +one of its functions: +
++local buffer = require("string.buffer") ++ +
Work in Progress
+ ++ +This library is a work in progress. More +functions will be added soon. + +
+ +Serialization of Lua Objects
++ +The following functions and methods allow high-speed serialization +(encoding) of a Lua object into a string and decoding it back to a Lua +object. This allows convenient storage and transport of structured +data. + +
++ +The encoded data is in an internal binary +format. The data can be stored in files, binary-transparent +databases or transmitted to other LuaJIT instances across threads, +processes or networks. + +
++ +Encoding speed can reach up to 1 Gigabyte/second on a modern desktop- or +server-class system, even when serializing many small objects. Decoding +speed is mostly constrained by object creation cost. + +
++ +The serializer handles most Lua types, common FFI number types and +nested structures. Functions, thread objects, other FFI cdata, full +userdata and associated metatables cannot be serialized (yet). + +
++ +The encoder serializes nested structures as trees. Multiple references +to a single object will be stored separately and create distinct objects +after decoding. Circular references cause an error. + + +
+ +str = buffer.encode(obj)
++ +Serializes (encodes) the Lua object obj into the string +str. + +
++ +obj can be any of the supported Lua types — it doesn't +need to be a Lua table. + +
++ +This function may throw an error when attempting to serialize +unsupported object types, circular references or deeply nested tables. + +
+ +obj = buffer.decode(str)
++ +De-serializes (decodes) the string str into the Lua object +obj. + +
++ +The returned object may be any of the supported Lua types — +even nil. + +
++ +This function may throw an error when fed with malformed or incomplete +encoded data. The standalone function throws when there's left-over data +after decoding a single top-level object. + +
+ +Serialization Format Specification
++ +This serialization format is designed for internal use by LuaJIT +applications. Serialized data is upwards-compatible and portable across +all supported LuaJIT platforms. + +
++ +It's an 8-bit binary format and not human-readable. It uses e.g. +embedded zeroes and stores embedded Lua string objects unmodified, which +are 8-bit-clean, too. Encoded data can be safely concatenated for +streaming and later decoded one top-level object at a time. + +
++ +The encoding is reasonably compact, but tuned for maximum performance, +not for minimum space usage. It compresses well with any of the common +byte-oriented data compression algorithms. + +
++ +Although documented here for reference, this format is explicitly +not intended to be a 'public standard' for structured data +interchange across computer languages (like JSON or MessagePack). Please +do not use it as such. + +
++ +The specification is given below as a context-free grammar with a +top-level object as the starting point. Alternatives are +separated by the | symbol and * indicates repeats. +Grouping is implicit or indicated by {…}. Terminals are +either plain hex numbers, encoded as bytes, or have a .format +suffix. + +
++object → nil | false | true + | null | lightud32 | lightud64 + | int | num | tab + | int64 | uint64 | complex + | string + +nil → 0x00 +false → 0x01 +true → 0x02 + +null → 0x03 // NULL lightuserdata +lightud32 → 0x04 data.I // 32 bit lightuserdata +lightud64 → 0x05 data.L // 64 bit lightuserdata + +int → 0x06 int.I // int32_t +num → 0x07 double.L + +tab → 0x08 // Empty table + | 0x09 h.U h*{object object} // Key/value hash + | 0x0a a.U a*object // 0-based array + | 0x0b a.U a*object h.U h*{object object} // Mixed + | 0x0c a.U (a-1)*object // 1-based array + | 0x0d a.U (a-1)*object h.U h*{object object} // Mixed + +int64 → 0x10 int.L // FFI int64_t +uint64 → 0x11 uint.L // FFI uint64_t +complex → 0x12 re.L im.L // FFI complex + +string → (0x20+len).U len*char.B + +.B = 8 bit +.I = 32 bit little-endian +.L = 64 bit little-endian +.U = prefix-encoded 32 bit unsigned number n: + 0x00..0xdf → n.B + 0xe0..0x1fdf → (0xe0|(((n-0xe0)>>8)&0x1f)).B ((n-0xe0)&0xff).B + 0x1fe0.. → 0xff n.I ++
+