[AMDGPU] Fix some cache policy checks for GFX12+ (#116396)
Fix coding errors found by inspection and check that the swz bit still
serves to prevent merging of buffer loads/stores on GFX12+.
[X86][MC] Add R_X86_64_CODE_4_GOTTPOFF (#116633)
For
mov name at GOTTPOFF(%rip), %reg
add name at GOTTPOFF(%rip), %reg
add
`R_X86_64_CODE_4_GOTTPOFF` = 44
if the instruction starts at 4 bytes before the relocation offset. It's
similar to R_X86_64_GOTTPOFF.
Linker can treat `R_X86_64_CODE_4_GOTTPOFF` as `R_X86_64_GOTTPOFF` or
convert the instructions above to
mov $name at tpoff, %reg
add $name at tpoff, %reg
[10 lines not shown]
[llvm] Remove `br i1 undef` from some regression tests [NFC] (#117112)
This PR removes tests with `br i1 undef` under
`llvm/tests/Transforms/Loop*, Lower*`.
[InstCombine] Convert logical and/or with `icmp samesign` into bitwise ops (#116983)
See the following case:
```
define i1 @test_logical_and_icmp_samesign(i8 %x) {
%cmp1 = icmp ne i8 %x, 9
%cmp2 = icmp samesign ult i8 %x, 11
%and = select i1 %cmp1, i1 %cmp2, i1 false
ret i1 %and
}
```
Currently we cannot convert this logical and into a bitwise and due to
the `samesign` flag. But if `%cmp2` evaluates to `poison`, we can infer
that `%cmp1` is either `poison` or `true` (`samesign` violation
indicates that X is negative). Therefore, `%and` still evaluates to
`poison`.
This patch converts a logical and into a bitwise and iff TV is poison
implies that Cond is either poison or true. Likewise, we convert a
[14 lines not shown]
[ORC-RT] Test basic C++ static initialization support in the ORC runtime.
This tests that a simple C++ static initializer works as expected.
Compared to the architecture specific, assembly level regression tests for the
ORC runtime; this test is expected to catch cases where the compiler adopts
some new MachO feature that the ORC runtime does not yet support (e.g. a new
initializer section).
[clang][bytecode] Fix ToType/FromType diagnostic ordering (#116988)
We need to check the ToType first, then the FromType. Additionally,
remove qualifiers from the parent type of the field we're emitting a
note for.
[mlir] [IR] Allow zero strides in StridedLayoutAttr (#116463)
Disabling memrefs with a stride of 0 was intended to prevent internal
aliasing, but this does not address all cases : internal aliasing can
still occur when the stride is less than the shape.
On the other hand, a stride of 0 can be very useful in certain
scenarios. For example, in architectures that support multi-dimensional
DMA, we can use memref::copy with a stride of 0 to achieve a broadcast
effect.
This commit removes the restriction that strides in memrefs cannot be 0.
[ControlHeightReduction] Add assert to avoid underflow (#116339)
`NumCHRedBranches - 1` is used later, we should add an assertion to make
sure it will not underflow.
[mlir][vector] Fix 0-d vector transfer mask inference (#116526)
When inferring the mask of a transfer operation that results in a single `i1` element,
we could represent it using either `vector<i1>` or vector<1xi1>. To avoid type mismatches,
this PR updates the mask inference logic to consistently generate `vector<1xi1>` for
these cases. We can enable 0-D masks if they are needed in the future.
See: https://github.com/llvm/llvm-project/issues/116197
[AArch64][NFC] NFC for const vector as Instruction operand (#116790)
Current cost-modelling does not take into account cost of materializing
const vector. This results in some cases, as the test shows, being
vectorized but this may not always be profitable. Future patch will try
to address this issue.
[SLP] NFC. Change the comment to match the code execution. (#116022)
Make code execute like the comment will modify many tests and affect the
performance. As a result, we change the comment instead of the code.
[TargetVersion] Only enable on RISC-V and AArch64 (#115991)
Address https://github.com/llvm/llvm-project/issues/115000.
This patch constrains the target_version feature to work only on RISC-V
and AArch64 to prevent crashes in Clang.
---------
Co-authored-by: Aaron Ballman <aaron at aaronballman.com>
[JITLink][arm64] Support arm64e JIT'd code (initially enabled for MachO only).
Adds two new JITLink passes to create and populate a pointer-signing function
that can be called via an allocation-action attached to the LinkGraph:
* createEmptyPointerSigningFunction creates a pointer signing function in a
custome section, reserving sufficient space for the signing code. It should
be run as a post-prune pass (to ensure that memory is reserved prior to
allocation).
* lowerPointer64AuthEdgesToSigningFunction pass populates the signing function
by walking the graph, decoding the ptrauth info (encoded in the edge addend) and
writing an instruction sequence to sign all ptrauth fixup locations.
rdar://61956998
Add the initializes attribute inference (#117104)
reland https://github.com/llvm/llvm-project/pull/97373 after fixing
clang tests.
Confirmed with "ninja check-llvm" and "ninja check-clang"
[flang] Introduce hlfir.elemental lowerings to omp.workshare_loop_nest (#104748)
This patch adds parallelization support for the following expression in OpenMP
workshare constructs:
* Elemental procedures in array expressions
(reapplied with linking fix)
[mlir][Transforms][NFC] Dialect conversion: Remove "finalize" phase (#116934)
The dialect conversion driver has three phases:
- **Create** `IRRewrite` objects as the IR is traversed.
- **Finalize** `IRRewrite` objects. During this phase, source
materializations for mismatching value types are created. (E.g., when
`Value` is replaced with a `Value` of different type, but there is a
user of the original value that was not modified because it is already
legal.)
- **Commit** `IRRewrite` objects. During this phase, all remaining IR
modifications are materialized. In particular, SSA values are actually
being replaced during this phase.
This commit removes the "finalize" phase. This simplifies the code base
a bit and avoids one traversal over the `IRRewrite` stack. Source
materializations are now built during the "commit" phase, right before
an SSA value is being replaced.
This commit also removes the "inverse mapping" of the conversion value
[15 lines not shown]
[mlir][bufferization] Remove `finalizing-bufferize` pass (#114154)
The dialect conversion-based bufferization passes have been migrated to
One-Shot Bufferize about two years ago. To clean up the code base, this
commit removes the `finalizing-bufferize` pass, one of the few remaining
parts of the old infrastructure. Most bufferization passes have already
been removed.
Note for LLVM integration: If you depend on this pass, migrate to
One-Shot Bufferize or copy the pass to your codebase.
Depends on #114152.