[AMDGPU] Update entry point name for PAL metadata (#123581)
Old entry-point metadata being updated. Nothing is required
to account for deprecation as nothing uses the old style
"Reland "[Wunsafe-buffer-usage] Fix false positive when const sized array is indexed by const evaluatable expressions (#119340)"" (#123713)
This reverts commit 7dd34baf5505d689161c3a8678322a394d7a2929.
Fixed the assertion violation reported by
7dd34baf5505d689161c3a8678322a394d7a2929
Co-authored-by: MalavikaSamak <malavika2 at apple.com>
[lldb][Linux] Add Control Protection Fault signal (#122917)
This will be sent by Arm's Guarded Control Stack extension when an
invalid return is executed.
The signal does have an address we could show, but it's the PC at which
the fault occured. The debugger has plenty of ways to show you that
already, so I've left it out.
```
(lldb) c
Process 460 resuming
Process 460 stopped
* thread #1, name = 'test', stop reason = signal SIGSEGV: control protection fault
frame #0: 0x0000000000400784 test`main at main.c:57:1
54 afunc();
55 printf("return from main\n");
56 return 0;
-> 57 }
[9 lines not shown]
[AArch64] Improve bcvtn2 and remove aarch64_neon_bfcvt intrinsics (#120363)
This started out as trying to combine bf16 fpround to BFCVT2
instructions, but ended up removing the aarch64.neon.nfcvt intrinsics in
favour of generating fpround instructions directly. This simplifies the
patterns and can lead to other optimizations. The BFCVT2 instruction is
adjusted to makes sure the types are valid, and a bfcvt2 is now
generated in more place. The old intrinsics are auto-upgraded to fptrunc
instructions too.
[AArch64] Eliminate Common SUBS by Reassociating Non-Constants (#123344)
Commit 1eed46960c217f9480865702f06fb730c7521e61 added logic to
reassociate a (add (add x y) -c) operand to a CSEL instruction with a
comparison involving x and c (or a similar constant) in order to obtain
a common (SUBS x c) instruction.
This commit extends this logic to non-constants. In this way, we also
reassociate a (sub (add x y) z) operand of a CSEL instruction to
(add (sub x z) y) if the CSEL compares x and z, for example.
Alive proof: https://alive2.llvm.org/ce/z/SEVpR
DAG: Avoid breaking legal vector_shuffle with multiple uses
Previously this combine would undo AMDGPU's new custom legalization of
wide vector shuffles into 2 element pieces. The comment also
states that this combine is only done before legalization,
but the case with a build_vector source was unconditional.
We probably don't want to do this if the multiple uses are full
scalarization of the vector, but this seems to work well enough.
Scalarizing extracts should have folded out pre-legalize.
AMDGPU: Custom lower 32-bit element shuffles
This is so we can try to make use of v_pk_mov_b32 when available.
Note this currently has little observable effect. The combiner
will undo the common extract of shuffle pattern. The lack
of test changes should demonstrate this change is minimally
correct.
We should probably try to make better use of wider extracts in
even aligned cases, but I'm trying to avoid some really ugly
regalloc regressions in some MFMA tests. The DAG scheduler ends
up doing a worse job if we use vector extracts, resulting
in failure to do 3 address conversion of MFMAs.
[YAML] Don't validate `Fill::Size` after error
Size is required, so we don't know if it's in
uninitialized state after the previous error.
Triggers msan on llvm/test/tools/yaml2obj/ELF/custom-fill.yaml
Pull Request: https://github.com/llvm/llvm-project/pull/123280
[MLIR][NVVM] Add TMA Bulk Copy Ops (#123186)
PR #122344 adds intrinsics for Bulk Async Copy
(non-tensor variants) using TMA. This patch
adds the corresponding NVVM Dialect Ops.
lit tests are added to verify the lowering to all
variants of the intrinsics.
Signed-off-by: Durgadoss R <durgadossr at nvidia.com>
[Mips] Handle declspec(dllimport) on mipsel-windows-* triples (#120912)
On Windows, imported symbols must be searched with '__imp_' prefix.
Support imported global variables and imported functions.