mikepaul-LuaJIT

mirror of https://github.com/LuaJIT/LuaJIT.git synced 2025-02-08 23:44:08 +00:00

Author	SHA1	Message	Date
Peter Cawley	7213658b33	x64/LJ_GC64: Enable JIT compilation. Under LJ_GC64, RID_DISPATCH is removed from the pool of available general purpose registers, and instead retains its role as a pointer to the dispatch table thoughout JIT code. This guarantees that members of the global_State and the jit_State can always be encoded in modrm. If the memory allocator is kind, it also allows for various KGC and KPTR values to be encoded as 32-bit offsets from RID_DISPATCH. Likewise, when SSE instructions want to use a KNUM as a memory operand, it often transpires that the address of the KNUM's 64-bit payload can be expressed as 32-bit offset from RID_DISPATCH. In some cases the recording logic has been tweaked to encode constants as relative to RID_DISPATCH instead of as absolute addresses. This is done via calls to lj_ir_ggfload. LJ_GC64 also introduces a new pseudo-register: RID_RIP. If the memory allocator isn't kind enough to put things within a 32-bit range of the dispatch table, it is sometimes kind enough to instead put things within a 32-bit range of the mcode pointer. Furthermore, for constants which we want (or need) to be loaded via memory operands, the constant's payload can be copied to the low part of an mcode region, at which point it is guaranteed to be representable as a RIP-relative operand. Fused loads can result in an mrm referencing RID_RIP. In such cases, the fusing is only valid for the next emitted instruction - though as a special case, one asm_guardcc call is permitted between the fusing and the instruction into which the fusion result is inserted. TValue detagging is notable under LJ_GC64. The basic code pattern is: mov r64, [addr] ror r64, 47 cmp r16, itype jnz ->exit shr r64, 17 If BMI2 is available, mov/ror are fused to be a single rorx. If BMI2 isn't available, and a type test isn't required, ror47 becomes shl17 (and the cmp/jnz are dropped). The type test is interesting as it only considers 16 bits of tag, despite the TValues in question nominally consisting of 47 bits of pointer and 17 bits of tag. The 16 considered bits are sufficient to verify that the TValue is a NaN (11 bits), is a QNaN (1 bit), and has the correct itype (4 bits). The one unconsidered bit is the sign bit of the NaN. LuaJIT operates under the assumption that all NaNs in the system are either canonical NaNs (as generated by the FPU) or are NaN-packed TValues. In both cases, the sign bit of the NaN is set, and therefore does not need to be verified during detagging. The cmp instruction encodes the itype as an imm8, thus avoiding the LCP stall which using an imm16 would result in. False LCP stalls are still an issue, and could be trivially worked-around by sometimes inserting an extra nop instruction, but this could break loop realignment (as the realigned code might be one byte larger or one byte smaller, and loop realignment operates under the assumption that a sequence of emitted instructions always occupies the same number of bytes, regardless of where it is emitted [1]). [1] This assumption also results in rip-relative operands being even more slippery. A-priori, the realigned code might be able to reach things it previously couldn't, or conversely not reach things it previously could. To prevent this from happening, checki32/mcpofs is paired with checki32/mctopofs: if a given address is reachable with a 32-bit displacement from both of these points, then it'll also be reachable with a 32-bit displacement from a realigned mcp.	2016-05-18 03:56:22 +01:00
Peter Cawley	79021951e5	LJ_FR2: Improve trace recording and snapshots. The interesting changes here revolve around slots marked as TREF_FRAME / TREF_CONT. Under !LJ_FR2, said slots contain two 32-bit values, and the TRef for the slot primarily relates to the low 32 bits. In a snapshot, the main SnapEntry relates to the low 32 bits, and the framelink from the snapshot is used to restore the high 32 bits. Under LJ_FR2, TREF_FRAME / TREF_CONT slots contain a single 64-bit value. The TRef relates to all 64 bits, the SnapEntry is used to restore all 64 bits, and no framelinks are required to restore the slot. Restoration is done via IR_KNUM constants, as the 64-bit values in question can be happily interpreted as denormal numbers. These constants are created lazily: the slots in question get set to just TREF_FRAME / TREF_CONT initially, and then if required for a snapshot, the ref part of the TRef is changed from zero to the index of a KNUM. Slot 1 is always zero, as although it is technically a frame link, it never needs to be changed or saved or restored. Though the framelink part of a snapshot isn't required for slot restoration under LJ_FR2, it is still used for restoring PC. As such, every snapshot has exactly two framelink entries, which are used to store a 64-bit value. Manipulations of J->maxslot are more interesting under LJ_FR2. For example, the BC_MOV of a method call can introduce a three-slot gap under LJ_FR2, whereas it could only introduce a one-slot gap under !LJ_FR2. Other instructions can now introduce a one-slot gap where previously they wouldn't ever introduce a gap.	2016-05-18 03:56:21 +01:00
Peter Cawley	a1bbfd7f18	LJ_GC64: Update IR type sizes.	2016-05-18 03:56:19 +01:00
Peter Cawley	5a6fbbe0e7	Support LJ_GC64 in the IR	2016-05-18 03:56:19 +01:00
Peter Cawley	6eaffef68d	Strip out old infrastructure for 64-bit constants.	2016-05-18 03:56:16 +01:00
Peter Cawley	0d61fc1fb8	Embed 64 bit constants directly in the IR, using two slots.	2016-05-18 03:56:15 +01:00
Peter Cawley	ce3fcda9ea	rec_check_ir: Walk the IR in increasing order.	2016-05-18 03:56:14 +01:00
Peter Cawley	fdf6eb4678	sink: Sweep constants in increasing order.	2016-05-18 03:56:13 +01:00
Peter Cawley	9d51806696	Have trace stitching embed the GCtrace* directly in an IR_KGC.	2016-05-18 03:56:12 +01:00
Peter Cawley	5687444c59	Allocate the final GCtrace* prior to assembly.	2016-05-18 03:56:11 +01:00
Peter Cawley	5855f6540a	Introduce ra_addrename.	2016-05-18 03:43:10 +01:00
Peter Cawley	2fd5a19d64	Use lj_ir_ggfload for SIMD constants.	2016-05-18 03:43:09 +01:00
Peter Cawley	5201498a06	Introduce lj_ir_ggfload (IR_FLOAD with NIL op1 => load from GG).	2016-05-18 03:43:08 +01:00
Peter Cawley	26cd51981c	Avoid generating new IR constants during assembly.	2016-05-18 03:43:07 +01:00
Peter Cawley	95545e11e2	Use IRT_PGC instead of IRT_P32 in various places.	2016-05-09 21:29:46 +01:00
Mike Pall	35b09e692e	Windows/x86: Add full exception interoperability. Contributed by Peter Cawley.	2016-05-07 12:32:15 +02:00
Mike Pall	6a9973203c	Merge branch 'master' into v2.1	2016-05-06 12:09:23 +02:00
Mike Pall	f05280e415	x86/x64: Fix instruction length decoder. Thanks to Peter Cawley.	2016-05-06 12:08:00 +02:00
Mike Pall	221268b17d	Use the GDB JIT API in a thread-safe manner. Thanks to Peter Cawley.	2016-05-03 18:31:29 +02:00
Mike Pall	ac42037db0	Constrain value range of lj_ir_kptr() to unsigned 32 bit pointers. Thanks to Peter Cawley.	2016-04-24 17:32:12 +02:00
Mike Pall	d8ac6230ed	Merge branch 'master' into v2.1	2016-04-24 17:14:35 +02:00
Mike Pall	7b26e9c998	Fix GCC 6 -Wmisleading-indentation warnings. Thanks to Roman Tsisyk.	2016-04-24 17:13:45 +02:00
Mike Pall	344fe5f01d	Merge branch 'master' into v2.1	2016-04-21 17:01:36 +02:00
Mike Pall	2f0001fad0	Fix handling of non-numeric strings in arithmetic coercions. Thanks to Vyacheslav Egorov.	2016-04-21 17:00:58 +02:00
Mike Pall	4c6498d245	Merge branch 'master' into v2.1	2016-04-18 13:41:41 +02:00
Mike Pall	cc4f5d056a	Whitespace.	2016-04-18 13:40:49 +02:00
Mike Pall	d13d420980	Merge branch 'master' into v2.1	2016-04-18 11:17:15 +02:00
Mike Pall	73680a5fc7	x86/x64: Search for exit jumps with instruction length decoder. Contributed by Peter Cawley.	2016-04-18 11:16:13 +02:00
Mike Pall	0c6fdc1039	Rewrite memory block allocator. Use a mix of linear probing and pseudo-random probing. Workaround for 1GB MAP_32BIT limit on Linux/x64. Now 2GB with !LJ_GC64. Enforce 128TB LJ_GC64 limit for > 47 bit memory layouts (ARM64).	2016-04-18 10:57:49 +02:00
Mike Pall	101115ddd8	Merge branch 'master' into v2.1	2016-04-14 00:16:17 +02:00
Mike Pall	e5b5e079c3	MIPS: Fix BC_ISNEXT fallback path. Thanks to RT-RK.com.	2016-04-14 00:14:42 +02:00
Mike Pall	096a7cf4e4	x64/LJ_GC64: Fix BC_UCLO check for fast-path. Thanks to Vyacheslav Egorov.	2016-04-13 16:10:03 +02:00
Mike Pall	ac9193cfeb	x86: Improve disassembly of BMI2 instructions. Thanks to Peter Cawley.	2016-04-05 15:10:14 +02:00
Mike Pall	d150fbf441	Merge branch 'master' into v2.1	2016-04-03 19:13:37 +02:00
Mike Pall	1c6fd13dbd	Fix recording of select(n, ...) with off-trace varargs Thanks to Peter Cawley.	2016-04-03 19:12:28 +02:00
Mike Pall	25b377942a	Merge branch 'master' into v2.1	2016-04-03 19:08:32 +02:00
Mike Pall	4ab6367b21	Cygwin: Allow cross-builds to non-Cygwin targets.	2016-04-03 19:07:19 +02:00
Mike Pall	296f0ca8d7	Windows/x64/LJ_GC64: Fix math.frexp() and math.modf() (again). Thanks to Peter Cawley.	2016-03-31 04:17:21 +02:00
Mike Pall	6e623b9914	Merge branch 'master' into v2.1	2016-03-30 16:30:44 +02:00
Mike Pall	62af101524	MIPS: Fix use of ffgccheck delay slots in interpreter.	2016-03-30 16:26:27 +02:00
Mike Pall	892887e584	x86: Generate BMI2 shifts and rotates, if available. Contributed by Peter Cawley.	2016-03-28 23:05:20 +02:00
Mike Pall	6801e7165c	x86: Detect BMI2 instruction support.	2016-03-28 23:04:33 +02:00
Mike Pall	c24c8e5312	x64/LJ_GC64: Fix JIT glue code in interpreter. Thanks to Peter Cawley.	2016-03-28 22:31:18 +02:00
Mike Pall	d7145616ae	Merge branch 'master' into v2.1	2016-03-28 22:24:01 +02:00
Mike Pall	9531eb235b	Windows: Remove intermediate files at end of build.	2016-03-28 22:23:37 +02:00
Mike Pall	e03e5979c4	Fix compiler warnings.	2016-03-28 22:19:45 +02:00
Mike Pall	df7bb5bb72	Merge branch 'master' into v2.1	2016-03-28 22:17:41 +02:00
Mike Pall	e23fc10883	Fix display of NULL (light)userdata in -jdump. Thanks to Peter Cawley.	2016-03-28 22:15:13 +02:00
Mike Pall	c7305408d1	Fix formatting of some small denormals at low precision. Contributed by Peter Cawley.	2016-03-28 21:39:31 +02:00
Mike Pall	713e34054f	Merge branch 'master' into v2.1	2016-03-22 22:22:51 +01:00

1 2 3 4 5 ...

2006 Commits