From 31e7cd1c7672bde788f3d60fd3de2bbcd77ccd19 Mon Sep 17 00:00:00 2001 From: Mike Pall Date: Tue, 10 Sep 2013 00:06:17 +0200 Subject: [PATCH] Low-overhead profiler, part 6: documentation. --- doc/changes.html | 2 + doc/contact.html | 2 + doc/ext_c_api.html | 2 + doc/ext_ffi.html | 2 + doc/ext_ffi_api.html | 2 + doc/ext_ffi_semantics.html | 2 + doc/ext_ffi_tutorial.html | 2 + doc/ext_jit.html | 2 + doc/ext_profiler.html | 365 +++++++++++++++++++++++++++++++++++++ doc/extensions.html | 9 +- doc/faq.html | 2 + doc/install.html | 2 + doc/luajit.html | 2 + doc/running.html | 3 + doc/status.html | 2 + 15 files changed, 400 insertions(+), 1 deletion(-) create mode 100644 doc/ext_profiler.html diff --git a/doc/changes.html b/doc/changes.html index b3deeaf2..f65cac6d 100644 --- a/doc/changes.html +++ b/doc/changes.html @@ -44,6 +44,8 @@ div.major { max-width: 600px; padding: 1em; margin: 1em 0 1em 0; } jit.* Library
  • Lua/C API +
  • +Profiler
  • Status diff --git a/doc/contact.html b/doc/contact.html index 4735faf4..48c4efd6 100644 --- a/doc/contact.html +++ b/doc/contact.html @@ -41,6 +41,8 @@ jit.* Library
  • Lua/C API +
  • +Profiler
  • Status diff --git a/doc/ext_c_api.html b/doc/ext_c_api.html index c6feb8e1..e431e734 100644 --- a/doc/ext_c_api.html +++ b/doc/ext_c_api.html @@ -41,6 +41,8 @@ jit.* Library
  • Lua/C API +
  • +Profiler
  • Status diff --git a/doc/ext_ffi.html b/doc/ext_ffi.html index a146b055..c9a0e1a7 100644 --- a/doc/ext_ffi.html +++ b/doc/ext_ffi.html @@ -41,6 +41,8 @@ jit.* Library
  • Lua/C API +
  • +Profiler
  • Status diff --git a/doc/ext_ffi_api.html b/doc/ext_ffi_api.html index 8b2555b5..928589b4 100644 --- a/doc/ext_ffi_api.html +++ b/doc/ext_ffi_api.html @@ -46,6 +46,8 @@ td.abiparam { font-weight: bold; width: 6em; } jit.* Library
  • Lua/C API +
  • +Profiler
  • Status diff --git a/doc/ext_ffi_semantics.html b/doc/ext_ffi_semantics.html index 6f84533c..f7f72138 100644 --- a/doc/ext_ffi_semantics.html +++ b/doc/ext_ffi_semantics.html @@ -46,6 +46,8 @@ td.convop { font-style: italic; width: 40%; } jit.* Library
  • Lua/C API +
  • +Profiler
  • Status diff --git a/doc/ext_ffi_tutorial.html b/doc/ext_ffi_tutorial.html index 30213b31..6cb52bf2 100644 --- a/doc/ext_ffi_tutorial.html +++ b/doc/ext_ffi_tutorial.html @@ -48,6 +48,8 @@ td.idiomlua b { font-weight: normal; color: #2142bf; } jit.* Library
  • Lua/C API +
  • +Profiler
  • Status diff --git a/doc/ext_jit.html b/doc/ext_jit.html index cc00e72b..d9e2520b 100644 --- a/doc/ext_jit.html +++ b/doc/ext_jit.html @@ -41,6 +41,8 @@ jit.* Library
  • Lua/C API +
  • +Profiler
  • Status diff --git a/doc/ext_profiler.html b/doc/ext_profiler.html new file mode 100644 index 00000000..8a08f3b5 --- /dev/null +++ b/doc/ext_profiler.html @@ -0,0 +1,365 @@ + + + +Profiler + + + + + + + + +
    +Lua +
    + + +
    +

    +LuaJIT has an integrated statistical profiler with very low overhead. It +allows sampling the currently executing stack and other parameters in +regular intervals. +

    +

    +The integrated profiler can be accessed from three levels: +

    + + +

    High-Level Profiler

    +

    +The bundled high-level profiler offers basic profiling functionality. It +generates simple textual summaries or source code annotations. It can be +accessed with the -jp command line option +or from Lua code by loading the underlying jit.p module. +

    +

    +To cut to the chase — run this to get a CPU usage profile by +function name: +

    +
    +luajit -jp myapp.lua
    +
    +

    +It's not a stated goal of the bundled profiler to add every +possible option or to cater for special profiling needs. The low-level +profiler APIs are documented below. They may be used by third-party +authors to implement advanced functionality, e.g. IDE integration or +graphical profilers. +

    +

    +Note: Sampling works for both interpreted and JIT-compiled code. The +results for JIT-compiled code may sometimes be surprising. LuaJIT +heavily optimizes and inlines Lua code — there's no simple +one-to-one correspondence between source code lines and the sampled +machine code. +

    + +

    -jp=[options[,output]]

    +

    +The -jp command line option starts the high-level profiler. +When the application run by the command line terminates, the profiler +stops and writes the results to stdout or to the specified +output file. +

    +

    +The options argument specifies how the profiling is to be +performed: +

    +
      +
    • f — Stack dump: function name, otherwise module:line. +This is the default mode.
    • +
    • F — Stack dump: ditto, but dump module:name.
    • +
    • l — Stack dump: module:line.
    • +
    • <number> — stack dump depth (callee ← +caller). Default: 1.
    • +
    • -<number> — Inverse stack dump depth (caller +→ callee).
    • +
    • s — Split stack dump after first stack level. Implies +depth ≥ 2 or depth ≤ -2.
    • +
    • p — Show full path for module names.
    • +
    • v — Show VM states.
    • +
    • z — Show zones.
    • +
    • r — Show raw sample counts. Default: show percentages.
    • +
    • a — Annotate excerpts from source code files.
    • +
    • A — Annotate complete source code files.
    • +
    • G — Produce raw output suitable for graphical tools.
    • +
    • m<number> — Minimum sample percentage to be shown. +Default: 3%.
    • +
    • i<number> — Sampling interval in milliseconds. +Default: 10ms.
      +Note: The actual sampling precision is OS-dependent.
    • +
    +

    +The default output for -jp is a list of the most CPU consuming +spots in the application. Increasing the stack dump depth with (say) +-jp=2 may help to point out the main callers or callees of +hotspots. But sample aggregation is still flat per unique stack dump. +

    +

    +To get a two-level view (split view) of callers/callees, use +-jp=s or -jp=-s. The percentages shown for the second +level are relative to the first level. +

    +

    +To see how much time is spent in each line relative to a function, use +-jp=fl. +

    +

    +To see how much time is spent in different VM states or +zones, use -jp=v or -jp=z. +

    +

    +Combinations of v/z with f/F/l produce two-level +views, e.g. -jp=vf or -jp=fv. This shows the time +spent in a VM state or zone vs. hotspots. This can be used to answer +questions like "Which time consuming functions are only interpreted?" or +"What's the garbage collector overhead for a specific function?". +

    +

    +Multiple options can be combined — but not all combinations make +sense, see above. E.g. -jp=3si4m1 samples three stack levels +deep in 4ms intervals and shows a split view of the CPU consuming +functions and their callers with a 1% threshold. +

    +

    +Source code annotations produced by -jp=a or -jp=A are +always flat and at the line level. Obviously, the source code files need +to be readable by the profiler script. +

    +

    +The high-level profiler can also be started and stopped from Lua code with: +

    +
    +require("jit.p").start(options, output)
    +...
    +require("jit.p").stop()
    +
    + +

    jit.zone — Zones

    +

    +Zones can be used to provide information about different parts of an +application to the high-level profiler. E.g. a game could make use of an +"AI" zone, a "PHYS" zone, etc. Zones are hierarchical, +organized as a stack. +

    +

    +The jit.zone module needs to be loaded explicitly: +

    +
    +local zone = require("jit.zone")
    +
    +
      +
    • zone("name") pushes a named zone to the zone stack.
    • +
    • zone() pops the current zone from the zone stack and +returns its name.
    • +
    • zone:get() returns the current zone name or nil.
    • +
    • zone:flush() flushes the zone stack.
    • +
    +

    +To show the time spent in each zone use -jp=z. To show the time +spent relative to hotspots use e.g. -jp=zf or -jp=fz. +

    + +

    Low-level Lua API

    +

    +The jit.profile module gives access to the low-level API of the +profiler from Lua code. This module needs to be loaded explicitly: +

    +local profile = require("jit.profile")
    +
    +

    +This module can be used to implement your own higher-level profiler. +A typical profiling run starts the profiler, captures stack dumps in +the profiler callback, adds them to a hash table to aggregate the number +of samples, stops the profiler and then analyzes all of the captured +stack dumps. Other parameters can be sampled in the profiler callback, +too. But it's important not to spend too much time in the callback, +since this may skew the statistics. +

    + +

    profile.start(mode, cb) +— Start profiler

    +

    +This function starts the profiler. The mode argument is a +string holding options: +

    +
      +
    • f — Profile with precision down to the function level.
    • +
    • l — Profile with precision down to the line level.
    • +
    • i<number> — Sampling interval in milliseconds (default +10ms).
      +Note: The actual sampling precision is OS-dependent. +
    • +
    +

    +The cb argument is a callback function which is called with +three arguments: (thread, samples, vmstate). The callback is +called on a separate coroutine, the thread argument is the +state that holds the stack to sample for profiling. Note: do +not modify the stack of that state or call functions on it. +

    +

    +samples gives the number of accumulated samples since the last +callback (usually 1). +

    +

    +vmstate holds the VM state at the time the profiling timer +triggered. This may or may not correspond to the state of the VM when +the profiling callback is called. The state is either 'N' +native (compiled) code, 'I' interpreted code, 'C' +C code, 'G' the garbage collector, or 'J' the JIT +compiler. +

    + +

    profile.stop() +— Stop profiler

    +

    +This function stops the profiler. +

    + +

    dump = profile.dumpstack([thread,] fmt, depth) +— Dump stack

    +

    +This function allows taking stack dumps in an efficient manner. It +returns a string with a stack dump for the thread (coroutine), +formatted according to the fmt argument: +

    +
      +
    • p — Preserve the full path for module names. Otherwise +only the file name is used.
    • +
    • f — Dump the function name if it can be derived. Otherwise +use module:line.
    • +
    • F — Ditto, but dump module:name.
    • +
    • l — Dump module:line.
    • +
    • Z — Zap the following characters for the last dumped +frame.
    • +
    • All other characters are added verbatim to the output string.
    • +
    +

    +The depth argument gives the number of frames to dump, starting +at the topmost frame of the thread. A negative number dumps the frames in +inverse order. +

    +

    +The first example prints a list of the current module names and line +numbers of up to 10 frames in separate lines. The second example prints +semicolon-separated function names for all frames (up to 100) in inverse +order: +

    +
    +print(profile.dumpstack(thread, "l\n", 10))
    +print(profile.dumpstack(thread, "lZ;", -100))
    +
    + +

    Low-level C API

    +

    +The profiler can be controlled directly from C code, e.g. for +use by IDEs. The declarations are in "luajit.h" (see +Lua/C API extensions). +

    + +

    luaJIT_profile_start(L, mode, cb, data) +— Start profiler

    +

    +This function starts the profiler. See +above for a description of the mode argument. +

    +

    +The cb argument is a callback function with the following +declaration: +

    +
    +typedef void (*luaJIT_profile_callback)(void *data, lua_State *L,
    +                                        int samples, int vmstate);
    +
    +

    +data is available for use by the callback. L is the +state that holds the stack to sample for profiling. Note: do +not modify this stack or call functions on this stack — +use a separate coroutine for this purpose. See +above for a description of samples and vmstate. +

    + +

    luaJIT_profile_stop(L) +— Stop profiler

    +

    +This function stops the profiler. +

    + +

    p = luaJIT_profile_dumpstack(L, fmt, depth, len) +— Dump stack

    +

    +This function allows taking stack dumps in an efficient manner. +See above for a description of fmt +and depth. +

    +

    +This function returns a const char * pointing to a +private string buffer of the profiler. The int *len +argument returns the length of the output string. The buffer is +overwritten on the next call and deallocated when the profiler stops. +You either need to consume the content immediately or copy it for later +use. +

    +
    +
    + + + diff --git a/doc/extensions.html b/doc/extensions.html index 8300753e..33bcfb28 100644 --- a/doc/extensions.html +++ b/doc/extensions.html @@ -58,6 +58,8 @@ td.excinterop { jit.* Library
  • Lua/C API +
  • +Profiler
  • Status @@ -114,7 +116,7 @@ This module is a LuaJIT built-in — you don't need to download or install Lua BitOp. The Lua BitOp site has full documentation for all » Lua BitOp API functions. The FFI adds support for -64 bit bitwise operations, +64 bit bitwise operations, using the same API functions.

    @@ -149,6 +151,11 @@ LuaJIT adds some extra functions to the Lua/C API.

    +

    Profiler

    +

    +LuaJIT has an integrated profiler. +

    +

    Enhanced Standard Library Functions

    xpcall(f, err [,args...]) passes arguments

    diff --git a/doc/faq.html b/doc/faq.html index c61b8dcf..02a2fba6 100644 --- a/doc/faq.html +++ b/doc/faq.html @@ -44,6 +44,8 @@ dd { margin-left: 1.5em; } jit.* Library
  • Lua/C API +
  • +Profiler
  • Status diff --git a/doc/install.html b/doc/install.html index b7bf75ce..024a4057 100644 --- a/doc/install.html +++ b/doc/install.html @@ -69,6 +69,8 @@ td.compatno { jit.* Library
  • Lua/C API +
  • +Profiler
  • Status diff --git a/doc/luajit.html b/doc/luajit.html index e8581d3a..4431553c 100644 --- a/doc/luajit.html +++ b/doc/luajit.html @@ -123,6 +123,8 @@ table.feature small { jit.* Library
  • Lua/C API +
  • +Profiler
  • Status diff --git a/doc/running.html b/doc/running.html index 3149b381..ab238859 100644 --- a/doc/running.html +++ b/doc/running.html @@ -63,6 +63,8 @@ td.param_default { jit.* Library
  • Lua/C API +
  • +Profiler
  • Status @@ -178,6 +180,7 @@ Here are the available LuaJIT control commands:
  • -jflush — Flushes the whole cache of compiled code.
  • -jv — Shows verbose information about the progress of the JIT compiler.
  • -jdump — Dumps the code and structures used in various compiler stages.
  • +
  • -jp — Start the integrated profiler.
  • The -jv and -jdump commands are extension modules diff --git a/doc/status.html b/doc/status.html index 3d148b0a..2dcb3dc1 100644 --- a/doc/status.html +++ b/doc/status.html @@ -44,6 +44,8 @@ ul li { padding-bottom: 0.3em; } jit.* Library

  • Lua/C API +
  • +Profiler
  • Status