Commit Graph

2006 Commits

Author SHA1 Message Date
Peter Cawley
7213658b33 x64/LJ_GC64: Enable JIT compilation.
Under LJ_GC64, RID_DISPATCH is removed from the pool of available general purpose
registers, and instead retains its role as a pointer to the dispatch table
thoughout JIT code. This guarantees that members of the global_State and
the jit_State can always be encoded in modrm. If the memory allocator is
kind, it also allows for various KGC and KPTR values to be encoded as
32-bit offsets from RID_DISPATCH. Likewise, when SSE instructions want to
use a KNUM as a memory operand, it often transpires that the address of
the KNUM's 64-bit payload can be expressed as 32-bit offset from
RID_DISPATCH.

In some cases the recording logic has been tweaked to encode constants
as relative to RID_DISPATCH instead of as absolute addresses. This is done
via calls to lj_ir_ggfload.

LJ_GC64 also introduces a new pseudo-register: RID_RIP. If the memory
allocator isn't kind enough to put things within a 32-bit range of the
dispatch table, it is sometimes kind enough to instead put things within a
32-bit range of the mcode pointer. Furthermore, for constants which we
want (or need) to be loaded via memory operands, the constant's payload can be
copied to the low part of an mcode region, at which point it is guaranteed
to be representable as a RIP-relative operand. Fused loads can result in
an mrm referencing RID_RIP. In such cases, the fusing is only valid for
the next emitted instruction - though as a special case, one asm_guardcc call is
permitted between the fusing and the instruction into which the fusion
result is inserted.

TValue detagging is notable under LJ_GC64. The basic code pattern is:
    mov r64, [addr]
    ror r64, 47
    cmp r16, itype
    jnz ->exit
    shr r64, 17
If BMI2 is available, mov/ror are fused to be a single rorx. If BMI2 isn't
available, and a type test isn't required, ror47 becomes shl17 (and the
cmp/jnz are dropped). The type test is interesting as it only considers 16
bits of tag, despite the TValues in question nominally consisting of 47
bits of pointer and 17 bits of tag. The 16 considered bits are sufficient
to verify that the TValue is a NaN (11 bits), is a QNaN (1 bit), and has
the correct itype (4 bits). The one unconsidered bit is the sign bit of
the NaN. LuaJIT operates under the assumption that all NaNs in the system
are either canonical NaNs (as generated by the FPU) or are NaN-packed
TValues. In both cases, the sign bit of the NaN is set, and therefore does
not need to be verified during detagging. The cmp instruction encodes the
itype as an imm8, thus avoiding the LCP stall which using an imm16 would
result in. False LCP stalls are still an issue, and could be trivially
worked-around by sometimes inserting an extra nop instruction, but this
could break loop realignment (as the realigned code might be one byte
larger or one byte smaller, and loop realignment operates under the
assumption that a sequence of emitted instructions always occupies the
same number of bytes, regardless of where it is emitted [1]).

[1] This assumption also results in rip-relative operands being even more
slippery. A-priori, the realigned code might be able to reach things it
previously couldn't, or conversely not reach things it previously could.
To prevent this from happening, checki32/mcpofs is paired with
checki32/mctopofs: if a given address is reachable with a 32-bit
displacement from both of these points, then it'll also be reachable with
a 32-bit displacement from a realigned mcp.
2016-05-18 03:56:22 +01:00
Peter Cawley
79021951e5 LJ_FR2: Improve trace recording and snapshots.
The interesting changes here revolve around slots marked as TREF_FRAME /
TREF_CONT. Under !LJ_FR2, said slots contain two 32-bit values, and the
TRef for the slot primarily relates to the low 32 bits. In a snapshot, the
main SnapEntry relates to the low 32 bits, and the framelink from the
snapshot is used to restore the high 32 bits. Under LJ_FR2, TREF_FRAME /
TREF_CONT slots contain a single 64-bit value. The TRef relates to all 64
bits, the SnapEntry is used to restore all 64 bits, and no framelinks are
required to restore the slot. Restoration is done via IR_KNUM constants,
as the 64-bit values in question can be happily interpreted as denormal
numbers. These constants are created lazily: the slots in question get set
to just TREF_FRAME / TREF_CONT initially, and then if required for a
snapshot, the ref part of the TRef is changed from zero to the index of a
KNUM. Slot 1 is always zero, as although it is technically a frame link,
it never needs to be changed or saved or restored.

Though the framelink part of a snapshot isn't required for slot
restoration under LJ_FR2, it is still used for restoring PC. As such,
every snapshot has exactly two framelink entries, which are used to store
a 64-bit value.

Manipulations of J->maxslot are more interesting under LJ_FR2. For
example, the BC_MOV of a method call can introduce a three-slot gap under
LJ_FR2, whereas it could only introduce a one-slot gap under !LJ_FR2.
Other instructions can now introduce a one-slot gap where previously they
wouldn't ever introduce a gap.
2016-05-18 03:56:21 +01:00
Peter Cawley
a1bbfd7f18 LJ_GC64: Update IR type sizes. 2016-05-18 03:56:19 +01:00
Peter Cawley
5a6fbbe0e7 Support LJ_GC64 in the IR 2016-05-18 03:56:19 +01:00
Peter Cawley
6eaffef68d Strip out old infrastructure for 64-bit constants. 2016-05-18 03:56:16 +01:00
Peter Cawley
0d61fc1fb8 Embed 64 bit constants directly in the IR, using two slots. 2016-05-18 03:56:15 +01:00
Peter Cawley
ce3fcda9ea rec_check_ir: Walk the IR in increasing order. 2016-05-18 03:56:14 +01:00
Peter Cawley
fdf6eb4678 sink: Sweep constants in increasing order. 2016-05-18 03:56:13 +01:00
Peter Cawley
9d51806696 Have trace stitching embed the GCtrace* directly in an IR_KGC. 2016-05-18 03:56:12 +01:00
Peter Cawley
5687444c59 Allocate the final GCtrace* prior to assembly. 2016-05-18 03:56:11 +01:00
Peter Cawley
5855f6540a Introduce ra_addrename. 2016-05-18 03:43:10 +01:00
Peter Cawley
2fd5a19d64 Use lj_ir_ggfload for SIMD constants. 2016-05-18 03:43:09 +01:00
Peter Cawley
5201498a06 Introduce lj_ir_ggfload (IR_FLOAD with NIL op1 => load from GG). 2016-05-18 03:43:08 +01:00
Peter Cawley
26cd51981c Avoid generating new IR constants during assembly. 2016-05-18 03:43:07 +01:00
Peter Cawley
95545e11e2 Use IRT_PGC instead of IRT_P32 in various places. 2016-05-09 21:29:46 +01:00
Mike Pall
35b09e692e Windows/x86: Add full exception interoperability.
Contributed by Peter Cawley.
2016-05-07 12:32:15 +02:00
Mike Pall
6a9973203c Merge branch 'master' into v2.1 2016-05-06 12:09:23 +02:00
Mike Pall
f05280e415 x86/x64: Fix instruction length decoder.
Thanks to Peter Cawley.
2016-05-06 12:08:00 +02:00
Mike Pall
221268b17d Use the GDB JIT API in a thread-safe manner.
Thanks to Peter Cawley.
2016-05-03 18:31:29 +02:00
Mike Pall
ac42037db0 Constrain value range of lj_ir_kptr() to unsigned 32 bit pointers.
Thanks to Peter Cawley.
2016-04-24 17:32:12 +02:00
Mike Pall
d8ac6230ed Merge branch 'master' into v2.1 2016-04-24 17:14:35 +02:00
Mike Pall
7b26e9c998 Fix GCC 6 -Wmisleading-indentation warnings.
Thanks to Roman Tsisyk.
2016-04-24 17:13:45 +02:00
Mike Pall
344fe5f01d Merge branch 'master' into v2.1 2016-04-21 17:01:36 +02:00
Mike Pall
2f0001fad0 Fix handling of non-numeric strings in arithmetic coercions.
Thanks to Vyacheslav Egorov.
2016-04-21 17:00:58 +02:00
Mike Pall
4c6498d245 Merge branch 'master' into v2.1 2016-04-18 13:41:41 +02:00
Mike Pall
cc4f5d056a Whitespace. 2016-04-18 13:40:49 +02:00
Mike Pall
d13d420980 Merge branch 'master' into v2.1 2016-04-18 11:17:15 +02:00
Mike Pall
73680a5fc7 x86/x64: Search for exit jumps with instruction length decoder.
Contributed by Peter Cawley.
2016-04-18 11:16:13 +02:00
Mike Pall
0c6fdc1039 Rewrite memory block allocator.
Use a mix of linear probing and pseudo-random probing.
Workaround for 1GB MAP_32BIT limit on Linux/x64. Now 2GB with !LJ_GC64.
Enforce 128TB LJ_GC64 limit for > 47 bit memory layouts (ARM64).
2016-04-18 10:57:49 +02:00
Mike Pall
101115ddd8 Merge branch 'master' into v2.1 2016-04-14 00:16:17 +02:00
Mike Pall
e5b5e079c3 MIPS: Fix BC_ISNEXT fallback path.
Thanks to RT-RK.com.
2016-04-14 00:14:42 +02:00
Mike Pall
096a7cf4e4 x64/LJ_GC64: Fix BC_UCLO check for fast-path.
Thanks to Vyacheslav Egorov.
2016-04-13 16:10:03 +02:00
Mike Pall
ac9193cfeb x86: Improve disassembly of BMI2 instructions.
Thanks to Peter Cawley.
2016-04-05 15:10:14 +02:00
Mike Pall
d150fbf441 Merge branch 'master' into v2.1 2016-04-03 19:13:37 +02:00
Mike Pall
1c6fd13dbd Fix recording of select(n, ...) with off-trace varargs
Thanks to Peter Cawley.
2016-04-03 19:12:28 +02:00
Mike Pall
25b377942a Merge branch 'master' into v2.1 2016-04-03 19:08:32 +02:00
Mike Pall
4ab6367b21 Cygwin: Allow cross-builds to non-Cygwin targets. 2016-04-03 19:07:19 +02:00
Mike Pall
296f0ca8d7 Windows/x64/LJ_GC64: Fix math.frexp() and math.modf() (again).
Thanks to Peter Cawley.
2016-03-31 04:17:21 +02:00
Mike Pall
6e623b9914 Merge branch 'master' into v2.1 2016-03-30 16:30:44 +02:00
Mike Pall
62af101524 MIPS: Fix use of ffgccheck delay slots in interpreter. 2016-03-30 16:26:27 +02:00
Mike Pall
892887e584 x86: Generate BMI2 shifts and rotates, if available.
Contributed by Peter Cawley.
2016-03-28 23:05:20 +02:00
Mike Pall
6801e7165c x86: Detect BMI2 instruction support. 2016-03-28 23:04:33 +02:00
Mike Pall
c24c8e5312 x64/LJ_GC64: Fix JIT glue code in interpreter.
Thanks to Peter Cawley.
2016-03-28 22:31:18 +02:00
Mike Pall
d7145616ae Merge branch 'master' into v2.1 2016-03-28 22:24:01 +02:00
Mike Pall
9531eb235b Windows: Remove intermediate files at end of build. 2016-03-28 22:23:37 +02:00
Mike Pall
e03e5979c4 Fix compiler warnings. 2016-03-28 22:19:45 +02:00
Mike Pall
df7bb5bb72 Merge branch 'master' into v2.1 2016-03-28 22:17:41 +02:00
Mike Pall
e23fc10883 Fix display of NULL (light)userdata in -jdump.
Thanks to Peter Cawley.
2016-03-28 22:15:13 +02:00
Mike Pall
c7305408d1 Fix formatting of some small denormals at low precision.
Contributed by Peter Cawley.
2016-03-28 21:39:31 +02:00
Mike Pall
713e34054f Merge branch 'master' into v2.1 2016-03-22 22:22:51 +01:00