[LLVM][MC][AArch64] Assembler support for Armv9.6-A memory systems extensions (#112341)
Add support for the following Armv9.6-A memory systems extensions:
FEAT_LSUI - Unprivileged Load Store
FEAT_OCCMO - Outer Cacheable Cache Maintenance Operation
FEAT_PCDPHINT - Producer-Consumer Data Placement Hints
FEAT_SRMASK - Bitwise System Register Write Masks
as documented here:
https://developer.arm.com/documentation/109697/2024_09/Feature-descriptions/The-Armv9-6-architecture-extension
Co-authored-by: Jonathan Thackray <jonathan.thackray at arm.com>
---------
Co-authored-by: Jonathan Thackray <jonathan.thackray at arm.com>
[AMDGPU] Create local KnownBits in case DenseMap gets invalidated (#111568)
KnownBits retrieved from DenseMap may invalidate if insertion requires a
(re)growth.
Fixes https://github.com/llvm/llvm-project/issues/110930
[LLVM][AArch64] Add assembly/disassembly of SVE BFSCALE instruction (#113168)
This patch add assembly/disassembly and tests for sve bfscale
instruction according to https://developer.arm.com/documentation/ddi0602
.
Reland [CFIFixup] Factor CFI remember/restore insertion into a helper (NFC) (#113328)
The previous submission looked like it triggered build failure
https://lab.llvm.org/buildbot/#/builders/17/builds/3116, but this
appears to be a spurious failure due to a flaky test.
[mlir][Vector] Support 0-d vectors natively in TransferOpReduceRank (#112907)
Since
https://github.com/llvm/llvm-project/commit/ddf2d62c7dddf1e4a9012d96819ff1ed005fbb05
, 0-d vectors are supported in VectorType. This patch removes 0-d vector
handling with scalars for the TransferOpReduceRank pattern. This pattern
specifically introduces tensor.extract_slice during vectorization,
causing vectorization to not fold transfer_read/transfer_write slices
properly. The changes in vectorization test files reflect this.
There are other places where lowering patterns are still side-stepping
from handling 0-d vectors properly, by turning them into scalars, but
this patch only focuses on the vector.transfer_x patterns.
[LowerMemIntrinsics] Use i8 GEPs in memcpy/memmove lowering (#112707)
The IR lowering of memcpy/memmove intrinsics uses a target-specific type
for its load/store operations. So far, the loaded and stored addresses
are computed with GEPs based on this type. That is wrong if the
allocation size of the type differs from its store size: The width of
the accesses is determined by the store size, while the GEP stride is
determined by the allocation size. If the allocation size is greater
than the store size, some bytes are not copied/moved.
This patch changes the GEPs to use i8 addressing, with offsets based on
the type's store size. The correctness of the lowering therefore no
longer depends on the type's allocation size.
This is in support of PR #112332, which allows adjusting the memcpy loop
lowering type through a command line argument in the AMDGPU backend.
[SystemZ] Split SystemZInstPrinter to two classes based on Asm dialect (#112975)
In preparation for future work on separating the output of the GNU/HLASM
ASM dialects, we first separate the SystemZInstPrinter classes to two
versions, one for each ASM dialect.
The common code remains in a SystemZInstPrinterCommon class instead.
---------
Co-authored-by: Tony Tao <tonytao at ca.ibm.com>
[clang][dataflow] Cache accessors for bugprone-unchecked-optional-access (#112605)
Treat calls to zero-param const methods as having stable return values
(with a cache) to address issue #58510. The cache is invalidated when
non-const methods are called. This uses the infrastructure from PR
#111006.
For now we cache methods returning:
- ref to optional
- optional by value
- booleans
We can extend that to pointers to optional in a next change.
[libc++] Rewrite the transitive header checking machinery (#110554)
Since we don't generate a full dependency graph of headers, we can
greatly simplify the script that parses the result of --trace-includes.
At the same time, we also unify the mechanism for detecting whether a
header is a public/C compat/internal/etc header with the existing
mechanism in header_information.py.
As a drive-by this fixes the headers_in_modulemap.sh.py test which had
been disabled by mistake because it used its own way of determining
the list of libc++ headers. By consistently using header_information.py
to get that information, problems like this shouldn't happen anymore.
This should also unblock #110303, which was blocked because of
a brittle implementation of the transitive includes check which broke
when the repository was cloned at a path like /path/__something/more.
[LLVM][CodeGen][AArch64] while_le(#,max_int) -> all_active (#111183)
When the second operand of an incrementing while instruction is the
maximum value, comparisons that include equality can never fail.
[AMDGPU] Don't rely on !eq comparing int with bits<5>. NFC. (#113279)
Tweak VOP2eInst_Base so that it does not rely on !eq comparing an int
value (-1) with a bits<5> value. This is to avoid a change in behaviour
when #112904 lands, which is a bug fix which has the side effect of
implicitly casting template arguments to the declared template parameter
type.
[clang][OpenCL][CodeGen][AMDGPU] Do not use `private` as the default AS for when `generic` is available (#112442)
Currently, for AMDGPU, when compiling for OpenCL, we unconditionally use
`private` as the default address space. This is wrong for cases where
the `generic` address space is available, and is corrected via this
patch. In general, this AS map abuse is a bad hack and we should re-work
it altogether, but at least after this patch we will stop being
incorrect for e.g. OpenCL 2.0.
[clang-format] Make bitwise and imply requires clause (#110942)
This patch adjusts the requires clause/expression parser to imply a
requires clause if it is preceded by a bitwise and operator `&`, and
assume it is a reference qualifier. The justification is that bitwise
operations should not be used for requires expressions.
This is a band-aid fix. The real problems lie in the lookahead heuristic
in the same method. It may be worth it to rewrite that whole heuristic
to track more state in the future, instead of just blindly marching
forward across multiple unrelated definitions, since right now, the
definition following the one with the requires clause can influence
whether the heuristic chooses clause or expression.
Fixes https://github.com/llvm/llvm-project/issues/110485