+LuaJIT has an integrated statistical profiler with very low overhead. It +allows sampling the currently executing stack and other parameters in +regular intervals. +
++The integrated profiler can be accessed from three levels: +
+-
+
- The bundled high-level profiler, invoked by the +-jp command line option. +
- A low-level Lua API to control the profiler. +
- A low-level C API to control the profiler. +
High-Level Profiler
++The bundled high-level profiler offers basic profiling functionality. It +generates simple textual summaries or source code annotations. It can be +accessed with the -jp command line option +or from Lua code by loading the underlying jit.p module. +
++To cut to the chase — run this to get a CPU usage profile by +function name: +
++luajit -jp myapp.lua ++
+It's not a stated goal of the bundled profiler to add every +possible option or to cater for special profiling needs. The low-level +profiler APIs are documented below. They may be used by third-party +authors to implement advanced functionality, e.g. IDE integration or +graphical profilers. +
++Note: Sampling works for both interpreted and JIT-compiled code. The +results for JIT-compiled code may sometimes be surprising. LuaJIT +heavily optimizes and inlines Lua code — there's no simple +one-to-one correspondence between source code lines and the sampled +machine code. +
+ +-jp=[options[,output]]
++The -jp command line option starts the high-level profiler. +When the application run by the command line terminates, the profiler +stops and writes the results to stdout or to the specified +output file. +
++The options argument specifies how the profiling is to be +performed: +
+-
+
- f — Stack dump: function name, otherwise module:line. +This is the default mode. +
- F — Stack dump: ditto, but dump module:name. +
- l — Stack dump: module:line. +
- <number> — stack dump depth (callee ← +caller). Default: 1. +
- -<number> — Inverse stack dump depth (caller +→ callee). +
- s — Split stack dump after first stack level. Implies +depth ≥ 2 or depth ≤ -2. +
- p — Show full path for module names. +
- v — Show VM states. +
- z — Show zones. +
- r — Show raw sample counts. Default: show percentages. +
- a — Annotate excerpts from source code files. +
- A — Annotate complete source code files. +
- G — Produce raw output suitable for graphical tools. +
- m<number> — Minimum sample percentage to be shown. +Default: 3%. +
- i<number> — Sampling interval in milliseconds.
+Default: 10ms.
+Note: The actual sampling precision is OS-dependent.
+
+The default output for -jp is a list of the most CPU consuming +spots in the application. Increasing the stack dump depth with (say) +-jp=2 may help to point out the main callers or callees of +hotspots. But sample aggregation is still flat per unique stack dump. +
++To get a two-level view (split view) of callers/callees, use +-jp=s or -jp=-s. The percentages shown for the second +level are relative to the first level. +
++To see how much time is spent in each line relative to a function, use +-jp=fl. +
++To see how much time is spent in different VM states or +zones, use -jp=v or -jp=z. +
++Combinations of v/z with f/F/l produce two-level +views, e.g. -jp=vf or -jp=fv. This shows the time +spent in a VM state or zone vs. hotspots. This can be used to answer +questions like "Which time consuming functions are only interpreted?" or +"What's the garbage collector overhead for a specific function?". +
++Multiple options can be combined — but not all combinations make +sense, see above. E.g. -jp=3si4m1 samples three stack levels +deep in 4ms intervals and shows a split view of the CPU consuming +functions and their callers with a 1% threshold. +
++Source code annotations produced by -jp=a or -jp=A are +always flat and at the line level. Obviously, the source code files need +to be readable by the profiler script. +
++The high-level profiler can also be started and stopped from Lua code with: +
++require("jit.p").start(options, output) +... +require("jit.p").stop() ++ +
jit.zone — Zones
++Zones can be used to provide information about different parts of an +application to the high-level profiler. E.g. a game could make use of an +"AI" zone, a "PHYS" zone, etc. Zones are hierarchical, +organized as a stack. +
++The jit.zone module needs to be loaded explicitly: +
++local zone = require("jit.zone") ++
-
+
- zone("name") pushes a named zone to the zone stack. +
- zone() pops the current zone from the zone stack and +returns its name. +
- zone:get() returns the current zone name or nil. +
- zone:flush() flushes the zone stack. +
+To show the time spent in each zone use -jp=z. To show the time +spent relative to hotspots use e.g. -jp=zf or -jp=fz. +
+ +Low-level Lua API
++The jit.profile module gives access to the low-level API of the +profiler from Lua code. This module needs to be loaded explicitly: +
+local profile = require("jit.profile") ++
+This module can be used to implement your own higher-level profiler. +A typical profiling run starts the profiler, captures stack dumps in +the profiler callback, adds them to a hash table to aggregate the number +of samples, stops the profiler and then analyzes all of the captured +stack dumps. Other parameters can be sampled in the profiler callback, +too. But it's important not to spend too much time in the callback, +since this may skew the statistics. +
+ +profile.start(mode, cb) +— Start profiler
++This function starts the profiler. The mode argument is a +string holding options: +
+-
+
- f — Profile with precision down to the function level. +
- l — Profile with precision down to the line level. +
- i<number> — Sampling interval in milliseconds (default +10ms). +Note: The actual sampling precision is OS-dependent. + +
+The cb argument is a callback function which is called with +three arguments: (thread, samples, vmstate). The callback is +called on a separate coroutine, the thread argument is the +state that holds the stack to sample for profiling. Note: do +not modify the stack of that state or call functions on it. +
++samples gives the number of accumulated samples since the last +callback (usually 1). +
++vmstate holds the VM state at the time the profiling timer +triggered. This may or may not correspond to the state of the VM when +the profiling callback is called. The state is either 'N' +native (compiled) code, 'I' interpreted code, 'C' +C code, 'G' the garbage collector, or 'J' the JIT +compiler. +
+ +profile.stop() +— Stop profiler
++This function stops the profiler. +
+ +dump = profile.dumpstack([thread,] fmt, depth) +— Dump stack
++This function allows taking stack dumps in an efficient manner. It +returns a string with a stack dump for the thread (coroutine), +formatted according to the fmt argument: +
+-
+
- p — Preserve the full path for module names. Otherwise +only the file name is used. +
- f — Dump the function name if it can be derived. Otherwise +use module:line. +
- F — Ditto, but dump module:name. +
- l — Dump module:line. +
- Z — Zap the following characters for the last dumped +frame. +
- All other characters are added verbatim to the output string. +
+The depth argument gives the number of frames to dump, starting +at the topmost frame of the thread. A negative number dumps the frames in +inverse order. +
++The first example prints a list of the current module names and line +numbers of up to 10 frames in separate lines. The second example prints +semicolon-separated function names for all frames (up to 100) in inverse +order: +
++print(profile.dumpstack(thread, "l\n", 10)) +print(profile.dumpstack(thread, "lZ;", -100)) ++ +
Low-level C API
++The profiler can be controlled directly from C code, e.g. for +use by IDEs. The declarations are in "luajit.h" (see +Lua/C API extensions). +
+ +luaJIT_profile_start(L, mode, cb, data) +— Start profiler
++This function starts the profiler. See +above for a description of the mode argument. +
++The cb argument is a callback function with the following +declaration: +
++typedef void (*luaJIT_profile_callback)(void *data, lua_State *L, + int samples, int vmstate); ++
+data is available for use by the callback. L is the +state that holds the stack to sample for profiling. Note: do +not modify this stack or call functions on this stack — +use a separate coroutine for this purpose. See +above for a description of samples and vmstate. +
+ +luaJIT_profile_stop(L) +— Stop profiler
++This function stops the profiler. +
+ +p = luaJIT_profile_dumpstack(L, fmt, depth, len) +— Dump stack
++This function allows taking stack dumps in an efficient manner. +See above for a description of fmt +and depth. +
++This function returns a const char * pointing to a +private string buffer of the profiler. The int *len +argument returns the length of the output string. The buffer is +overwritten on the next call and deallocated when the profiler stops. +You either need to consume the content immediately or copy it for later +use. +
++