[Headers][NFC] Deduplicate gpu_match_ between targets via inlining (#131141)
Declare a few functions before including the target specific headers
then define a fallback_match_{any,all} used by amdgpu and by older
nvptx.
Fixes a minor bug on pre-volta where one of the four fallback paths was
missing a sync_lane.
[hwasan] Don't check code model if there are no globals (#131152)
Currently, the code model check is always performed even if there are no
globals, because:
1) the HWASan compiler pass always leaves a note
2) the HWASan runtime always performs the check if there is a HWASan
globals note.
This unnecessarily adds a 2**32 byte size limit.
This patch elides the check if the globals note doesn't actually contain
globals, thus allowing larger libraries to be successfully instrumented
without globals.
Sent from my iPhone
[flang][OpenMP] Use OmpDirectiveSpecification in simple directives
The `OmpDirectiveSpecification` contains directive name, the list of
arguments, and the list of clauses. It was introduced to store the
directive specification in METADIRECTIVE, and could be reused everywhere
a directive representation is needed.
In the long term this would unify the handling of common directive
properties, as well as creating actual constructs from METADIRECTIVE
by linking the contained directive specification with any associated
user code.
[SeparateConstOffsetFromGEP] Preserve inbounds flag based on ValueTracking
If we know that the initial GEP was inbounds, and we change it to a
sequence of GEPs from the same base pointer where every offset is
non-negative, then the new GEPs are inbounds.
For SWDEV-516125.
AMDGPU: Move insertion into V2SCopies map
Insert the start instruction directly into the map before the uses. This
prevents improperly re-visting sgpr->vgpr phi inputs multiple times which
would trigger a use after free.
I don't particularly trust the iteration scheme here. This is also
unnecessarily revisting transitive users of a phi or reg_sequence for every
input operand, but I will address that separately.
Fixes #130646. I also believe it fixes #130119, although that test fails
less consistently for me.
Make sure that offsets are still folded...
...where they were folded previously, by manually applying the unsound
version of the transformation and, where necessary, increasing the
allocation size so that the transformed code is not statically
guaranteed to cause UB.
[flang][OpenMP] Add `OutlineableOpenMPOpInterface` to `omp.teams` (#131109)
Given the following input:
```fortran
program rep_loopbind
implicit none
integer :: i
real :: priv_val
!$omp teams private(priv_val)
!$omp distribute
do i=1,1000
end do
!$omp end teams
end program
```
the `AllocaOpConversion` pattern in `FIRToLLVMLowering` would **move**
the private allocations that belong to the `teams` directive (i.e. the
allocations needed for the private copies of `priv_val` and the loop's
[5 lines not shown]
[X86] combineConcatVectorOps - extend PSHUFD/LW/HW handling to support 512-bit types
VPSHUFD was already getting converted via the VPERMILPS AVX1 fallback
[MachineLateInstrsCleanup] Handle multiple kills for a preceding definition. (#119132)
When removing a redundant definition in order to reuse an earlier
identical one it is necessary to remove any earlier kill flag as well.
Previously, the assumption has been that any register that kills the
defined Reg is enough to handle for this purpose, but this is actually
not quite enough. A kill of a super-register does not necessarily imply
that all of its subregs (including Reg) is defined at that point: a
partial definition of a register is legal. This means Reg may have been
killed earlier and is not live at that point.
This patch changes the tracking of kill flags to allow for multiple
flags to be removed: instead of remembering just the single / latest
kill flag, a vector is now used to track and remove them all.
TinyPtrVector seems ideal for this as there are only very rarely more
than one kill flag, and it doesn't seem to give much difference in
compile time.
[11 lines not shown]
[flang][OpenMP] Fix missing missing inode issue (#130798)
When outlining an offload region, Flang creates a unique name by
querying an inode ID. However, when the name of the actual source file
does not match the logical file in a `#line` preprocessor directive,
code-gen was failing as it could not determine the inode ID. This PR
checks for this condition and if the logical file name does not exist,
the inode is replaced with a hash value created from the source code
itself.
[MLIR][OpenMP] Remove the ReductionClauseInterface, NFC (#130978)
This patch removes the `ReductionClauseInterface` and all definitions of
its associated `getAllReductionVars` method.
The method mandated by this interface is not used anywhere and the
conflicts its definition produces when multiple reduction clauses are
present in an operation result in a more convoluted operation
definition, so it seems better to remove it and only add something like
this if there's a clear advantage to it.
PeepholeOpt: Do not skip reg_sequence sources with subregs (#125667)
Contrary to the comment, this particular code is not responsible
for handling any composes that may be required, and unhandled cases
are already rejected later. Lift this restriction to permit composes
and reg_sequence subregisters later.
[MLIR][OpenMP] Minor improvements to BlockArgOpenMPOpInterface, NFC (#130789)
This patch introduces a use for the new `getBlockArgsPairs` to avoid
having to manually list each applicable clause.
Also, the `numClauseBlockArgs()` function is introduced, which
simplifies the implementation of the interface's verifier and enables
better memory handling within `getBlockArgsPairs`.
AMDGPU: Fix broken negative test from ancient times (#131106)
Before the dawn of civilization, instructions were printed in all
caps using the raw tablegen pseudo-names. This -NOT check was looking
for that, instead of the actual ISA output. Just switch to using generated
checks. Also replace a use of undef.
[clang][NFCI] Fix getGridValues for unsupported targets (#131023)
I broke this in
https://github.com/llvm/llvm-project/commit/f3cd2238383f695c719e7eab6aebec828781ec91,
I should have added this to the `SPIRV64` subclass, but I accidentally
added it to base `TargetInfo`.
Using an unsupported target should error in the driver way before this
though.
Signed-off-by: Sarnie, Nick <nick.sarnie at intel.com>
[VPlan] Restrict hoisting of broadcast operations using VPDominatorTree (#117138)
This patch restricts broadcast operations from being hoisted to the vector
preheader unless the basic block that defines the broadcasted value properly
dominates the vector preheader.
This prevents potential use-before-definition issues when the broadcasted
value is defined within the plan. VPDominatorTree is used to confirm this
restriction while still allowing safe hoisting for broadcasted values defined
outside the plan.
Issue https://github.com/llvm/llvm-project/issues/117139
[X86] combineConcatVectorOps - add outstanding TODOs for missing op concatenation cases. NFC.
Keep track of the remaining issues - many of these are inter-related making them difficult to deal with one at a time.
[mlir] Fix DistinctAttributeUniquer deleting attribute storage when crash reproduction is enabled (#128566)
Currently, `DistinctAttr` uses an allocator wrapped in a
`ThreadLocalCache` to manage attribute storage allocations. This ensures
all allocations are freed when the allocator is destroyed.
However, this setup can cause use-after-free errors when
`mlir::PassManager` runs its passes on a separate thread as a result of
crash reproduction being enabled. Distinct attribute storages are
created in the child thread's local storage and freed once the thread
joins. Attempting to access these attributes after this can result in
segmentation faults, such as during printing or alias analysis.
Example: This invocation of `mlir-opt` demonstrates the segfault issue
due to distinct attributes being created in a child thread and their
storage being freed once the thread joins:
```
mlir-opt --mlir-pass-pipeline-crash-reproducer=. --test-distinct-attrs mlir/test/IR/test-builtin-distinct-attrs.mlir
```
[12 lines not shown]