This loop is now roughly 1000x faster than the Lua interpreter:
local function f(a,b,...) end; for i=1,2e8 do f(1,2,i) end
Yet another silly microbenchmark -- I know.
Need to sync GC objects to stack only during atomic GC phase.
Need to setup a proper frame structure only for calling finalizers.
Force an exit to the interpreter and let it handle the uncommon cases.
Finally solves the "NYI: gcstep sync with frames" issue.