[mlir][tosa] Require operand/result tensors of at least rank 1 for some operations (#131335)
This commit updates the following operations (operands/results) to be of
at least rank 1 such that it aligns with the expectations of the
specification:
- ARGMAX (input)
- REDUCE_ALL (input/output)
- REDUCE_ANY (input/output)
- REDUCE_MAX (input/output)
- REDUCE_MIN (input/output)
- REDUCE_PRODUCT (input/output)
- REDUCE_SUM (input/output)
- CONCAT (each input in input1/output)
- PAD (input1/output)
- REVERSE (input1/output)
- SLICE (input1/output)
- TILE (input1/output)
- TRANSPOSE (input1/output)
In addition to this change, PAD has been updated to allow unranked
tensors for input1/output, inline with other operations.
science/arbor: Link statically to the internal libarborsup library
This library is not intended for distribution, but because port passes
BUILD_SHARED_LIBS to cmake this library ends up in the requires list.
Reported by: pkg-devel exp-run
AMDGPU: Move insertion into V2SCopies map
Insert the start instruction directly into the map before the uses. This
prevents improperly re-visting sgpr->vgpr phi inputs multiple times which
would trigger a use after free.
I don't particularly trust the iteration scheme here. This is also
unnecessarily revisting transitive users of a phi or reg_sequence for every
input operand, but I will address that separately.
Fixes #130646. I also believe it fixes #130119, although that test fails
less consistently for me.
[OpenMP][MLIR] Refactor code related to collecting privatizer info into a shared util
Moves code needed to collect info about delayed privatizers into a
shared util instread of repeating the same patter across all relevant
constructs.
[flang][OpenMP] Enable delayed privatization by default for `omp.distribute`
Switches delayed privatization for `omp.distribute` to be on by default:
controlled by the `-openmp-enable-delayed-privatization` instead of by
`-openmp-enable-delayed-privatization-staging`
[OpenMP][MLIR] Support LLVM translation for `distribute` with delayed privatization
Adds support for tranlating delayed privatization (`private` and
`firstprivate`) for `omp.distribute` ops.
[AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x)))
This is a bit of an akward pattern that can come up as a result
of legalization and then widening of i16 operations to i32 in RegBankSelect
on AMDGPU.
This quick combine avoids redundant patterns like
```
s_sext_i32_i8 s0, s0
s_sext_i32_i16 s0, s0
s_ashr_i32 s0, s0, s1
```
With this the second sext is removed as it's redundant.
[AMDGPU][SIFoldOperands] Fold some redundant bitmasks
Instructions like shifts only read some of the bits of the shift amount operand, between 4 and 6 bits.
If the source operand is being masked, we can just ignore the mask.
Effects are minimal right now but this will kick in more once we disable uniform i16 operation widening in CGP.
With that disabled, we get more i16 shift amounts
that are zext'd and without this we'd end up with
more `s_and_b32 s1, s1, 0xFFFF` in the output.
Ideally ISel should handle this but it's proving difficult to get the patterns right, and after a few hours of trying I just decided to go with this as it's simple enough and it "just works" for this purpose.
[AMDGPU][GlobalISel] Allow forming s16 U/SBFX pre-regbankselect
Make s16 G_U/SBFX legal and widen them in RegBankSelect.
This allows the set of BFX formation combines to work on s16 types.
[AMDGPU][Legalizer] Widen i16 G_SEXT_INREG
It's better to widen them to avoid it being lowered into a G_ASHR + G_SHL. With this change we just extend to i32 then trunc the result.
[AMDGPU][RegBankCombiner] Add cast_of_cast and constant_fold_cast combines (#131307)
We can add a bunch of exts/truncs during RBSelect, we should be able to fold
them away afterwards.
[NFC][analyzer] Framework for multipart checkers (#130985)
In the static analyzer codebase we have a traditional pattern where a
single checker class (and its singleton instance) acts as the
implementation of several (user-facing or modeling) checkers that have
shared state and logic, but have their own names and can be enabled or
disabled separately.
Currently these multipart checker classes all reimplement the same
boilerplate logic to store the enabled/disabled state, the name and the
bug types associated with the checker parts. This commit extends
`CheckerBase`, `BugType` and the checker registration process to offer
an easy-to-use alternative to that boilerplate (which includes the ugly
lazy initialization of `mutable std::unique_ptr<BugType>`s).
In this new framework the single-part checkers are internally
represented as "multipart checkers with just one part" (because this way
I don't need to reimplement the same logic twice) but this does not
require any changes in the code of simple single-part checkers.
[9 lines not shown]