LLVM/project 96637b4clang/include/clang/AST DeclTemplate.h, clang/lib/AST DeclTemplate.cpp

[Clang] Improve `getReplacedTemplateParameterList()` const correctness (#131165)

DeltaFile
+1-1clang/include/clang/AST/DeclTemplate.h
+1-1clang/lib/AST/DeclTemplate.cpp
+2-22 files

LLVM/project 15a5b3allvm/lib/Target/AMDGPU AMDGPUInstructionSelector.cpp, llvm/test/CodeGen/AMDGPU/GlobalISel inst-select-icmp.s16.mir

[AMDGPU][True16][CodeGen] gisel true16 for ICMP (#128913)

GlobalIsel true16 selection for ICMP
DeltaFile
+280-57llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-icmp.s16.mir
+16-4llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
+296-612 files

LLVM/project 01dd3f5flang/include/flang/Parser parse-tree.h, flang/lib/Lower/OpenMP OpenMP.cpp

[flang][OpenMP] Use OmpDirectiveSpecification in standalone directives

This uses OmpDirectiveSpecification in the rest of the standalone
directives.
DeltaFile
+92-33flang/lib/Semantics/check-omp-structure.cpp
+38-36flang/lib/Parser/openmp-parsers.cpp
+16-16flang/test/Parser/OpenMP/depobj-construct.f90
+13-18flang/lib/Lower/OpenMP/OpenMP.cpp
+15-15flang/lib/Parser/unparse.cpp
+12-11flang/include/flang/Parser/parse-tree.h
+186-12910 files not shown
+257-14616 files

LLVM/project c9d7f70clang/lib/Headers gpuintrin.h nvptxintrin.h

[Headers][NFC] Deduplicate gpu_match_ between targets via inlining (#131141)

Declare a few functions before including the target specific headers
then define a fallback_match_{any,all} used by amdgpu and by older
nvptx.

Fixes a minor bug on pre-volta where one of the four fallback paths was
missing a sync_lane.
DeltaFile
+81-1clang/lib/Headers/gpuintrin.h
+8-40clang/lib/Headers/nvptxintrin.h
+4-40clang/lib/Headers/amdgpuintrin.h
+93-813 files

LLVM/project 2044dd0compiler-rt/lib/profile InstrProfilingFile.c, lld/test/MachO start-end.s

[InstrProf] Remove -forder-file-instrumentation (#130192)

DeltaFile
+0-169llvm/lib/Transforms/Instrumentation/InstrOrderFile.cpp
+1-90compiler-rt/lib/profile/InstrProfilingFile.c
+18-36lld/test/MachO/start-end.s
+0-27llvm/include/llvm/Transforms/Instrumentation/InstrOrderFile.h
+0-23llvm/test/Instrumentation/InstrOrderFile/basic.ll
+0-22llvm/include/llvm/ProfileData/InstrProfData.inc
+19-36721 files not shown
+32-50227 files

LLVM/project 0ed5f9bmlir/lib/Dialect/Affine/Utils LoopUtils.cpp

[MLIR] NFC. Fix unused warning in affine loop utils
DeltaFile
+1-1mlir/lib/Dialect/Affine/Utils/LoopUtils.cpp
+1-11 files

LLVM/project 01aca42clang/include/clang/Driver Options.td, clang/lib/Driver/ToolChains Flang.cpp

[flang] Add support for -f[no-]verbose-asm (#130788)

This flag provides extra commentary in the assembly output.
DeltaFile
+16-0flang/test/Driver/verbose-asm.f90
+3-2clang/include/clang/Driver/Options.td
+4-0flang/lib/Frontend/CompilerInvocation.cpp
+3-0clang/lib/Driver/ToolChains/Flang.cpp
+3-0flang/include/flang/Frontend/TargetOptions.h
+3-0flang/lib/Frontend/FrontendActions.cpp
+32-26 files

LLVM/project 143bf95compiler-rt/lib/hwasan hwasan_globals.cpp

[hwasan] Don't check code model if there are no globals (#131152)

Currently, the code model check is always performed even if there are no
globals, because:
1) the HWASan compiler pass always leaves a note
2) the HWASan runtime always performs the check if there is a HWASan
globals note.
This unnecessarily adds a 2**32 byte size limit.

This patch elides the check if the globals note doesn't actually contain
globals, thus allowing larger libraries to be successfully instrumented
without globals.

Sent from my iPhone
DeltaFile
+9-4compiler-rt/lib/hwasan/hwasan_globals.cpp
+9-41 files

LLVM/project 64bca90flang/lib/Lower/OpenMP OpenMP.cpp, flang/lib/Parser openmp-parsers.cpp unparse.cpp

[flang][OpenMP] Use OmpDirectiveSpecification in simple directives

The `OmpDirectiveSpecification` contains directive name, the list of
arguments, and the list of clauses. It was introduced to store the
directive specification in METADIRECTIVE, and could be reused everywhere
a directive representation is needed.
In the long term this would unify the handling of common directive
properties, as well as creating actual constructs from METADIRECTIVE
by linking the contained directive specification with any associated
user code.
DeltaFile
+47-14flang/lib/Parser/openmp-parsers.cpp
+17-21flang/lib/Semantics/check-omp-structure.cpp
+1-33flang/lib/Parser/unparse.cpp
+15-8flang/lib/Semantics/resolve-names.cpp
+7-10flang/lib/Lower/OpenMP/OpenMP.cpp
+8-8flang/test/Parser/OpenMP/scan.f90
+95-949 files not shown
+131-12915 files

LLVM/project a9bb606llvm/test/Transforms/SeparateConstOffsetFromGEP/AMDGPU preserve-inbounds.ll

Simplify test case and add more
DeltaFile
+112-14llvm/test/Transforms/SeparateConstOffsetFromGEP/AMDGPU/preserve-inbounds.ll
+112-141 files

LLVM/project c957d90llvm/lib/Transforms/Scalar SeparateConstOffsetFromGEP.cpp, llvm/test/Transforms/SeparateConstOffsetFromGEP/AMDGPU preserve-inbounds.ll

[SeparateConstOffsetFromGEP] Preserve inbounds flag based on ValueTracking

If we know that the initial GEP was inbounds, and we change it to a
sequence of GEPs from the same base pointer where every offset is
non-negative, then the new GEPs are inbounds.

For SWDEV-516125.
DeltaFile
+23-0llvm/test/Transforms/SeparateConstOffsetFromGEP/AMDGPU/preserve-inbounds.ll
+13-5llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
+8-8llvm/test/Transforms/SeparateConstOffsetFromGEP/NVPTX/split-gep-and-gvn.ll
+4-4llvm/test/Transforms/SeparateConstOffsetFromGEP/NVPTX/split-gep.ll
+48-174 files

LLVM/project 4a69fc8llvm/test/CodeGen/AMDGPU fold-gep-offset.ll

Add some addressing mode tests that show the result of dropping the inbounds flag on offset folding.
DeltaFile
+438-0llvm/test/CodeGen/AMDGPU/fold-gep-offset.ll
+438-01 files

LLVM/project 32cde78llvm/test/CodeGen/AMDGPU amdgpu-codegenprepare-idiv.ll srem.ll

AMDGPU: Move insertion into V2SCopies map

Insert the start instruction directly into the map before the uses. This
prevents improperly re-visting sgpr->vgpr phi inputs multiple times which
would trigger a use after free.

I don't particularly trust the iteration scheme here. This is also
unnecessarily revisting transitive users of a phi or reg_sequence for every
input operand, but I will address that separately.

Fixes #130646. I also believe it fixes #130119, although that test fails
less consistently for me.
DeltaFile
+1,782-2,024llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll
+1,199-1,387llvm/test/CodeGen/AMDGPU/srem.ll
+631-725llvm/test/CodeGen/AMDGPU/carryout-selection.ll
+387-485llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+309-318llvm/test/CodeGen/AMDGPU/idiv-licm.ll
+257-308llvm/test/CodeGen/AMDGPU/srem64.ll
+4,565-5,24733 files not shown
+7,414-8,11039 files

LLVM/project 0ef13b2llvm/test/CodeGen/AMDGPU flat-scratch.ll constant-address-space-32bit.ll, llvm/test/CodeGen/AMDGPU/GlobalISel flat-scratch.ll

Make sure that offsets are still folded...

...where they were folded previously, by manually applying the unsound
version of the transformation and, where necessary, increasing the
allocation size so that the transformed code is not statically
guaranteed to cause UB.
DeltaFile
+372-272llvm/test/CodeGen/AMDGPU/GlobalISel/flat-scratch.ll
+286-195llvm/test/CodeGen/AMDGPU/flat-scratch.ll
+4-10llvm/test/CodeGen/AMDGPU/constant-address-space-32bit.ll
+662-4773 files

LLVM/project b003facflang/test/Fir omp-teams.fir, flang/test/Transforms stack-arrays-hlfir.f90

[flang][OpenMP] Add `OutlineableOpenMPOpInterface` to `omp.teams` (#131109)

Given the following input:
```fortran
program rep_loopbind
  implicit none
  integer :: i
  real :: priv_val

  !$omp teams private(priv_val)
    !$omp distribute
    do i=1,1000
    end do
  !$omp end teams
end program
```
the `AllocaOpConversion` pattern in `FIRToLLVMLowering` would **move**
the private allocations that belong to the `teams` directive (i.e. the
allocations needed for the private copies of `priv_val` and the loop's

    [5 lines not shown]
DeltaFile
+38-0flang/test/Fir/omp-teams.fir
+1-1flang/test/Transforms/stack-arrays-hlfir.f90
+1-1mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
+40-23 files

LLVM/project 7661526llvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 vector-shuffle-512-v32.ll

[X86] combineConcatVectorOps - extend PSHUFD/LW/HW handling to support 512-bit types

VPSHUFD was already getting converted via the VPERMILPS AVX1 fallback
DeltaFile
+26-12llvm/test/CodeGen/X86/vector-shuffle-512-v32.ll
+5-3llvm/lib/Target/X86/X86ISelLowering.cpp
+31-152 files

LLVM/project 57e22f5llvm/test/CodeGen/X86 vector-shuffle-512-v32.ll

[X86] Add tests showing failure to concat matching VPSHUFLW/HW ymm shuffles.
DeltaFile
+22-0llvm/test/CodeGen/X86/vector-shuffle-512-v32.ll
+22-01 files

LLVM/project 85318ballvm/lib/CodeGen MachineLateInstrsCleanup.cpp, llvm/test/CodeGen/SystemZ machine-latecleanup-kills.mir

[MachineLateInstrsCleanup] Handle multiple kills for a preceding definition. (#119132)

When removing a redundant definition in order to reuse an earlier
identical one it is necessary to remove any earlier kill flag as well.

Previously, the assumption has been that any register that kills the
defined Reg is enough to handle for this purpose, but this is actually
not quite enough. A kill of a super-register does not necessarily imply
that all of its subregs (including Reg) is defined at that point: a
partial definition of a register is legal. This means Reg may have been
killed earlier and is not live at that point.

This patch changes the tracking of kill flags to allow for multiple
flags to be removed: instead of remembering just the single / latest
kill flag, a vector is now used to track and remove them all.
TinyPtrVector seems ideal for this as there are only very rarely more
than one kill flag, and it doesn't seem to give much difference in
compile time.


    [11 lines not shown]
DeltaFile
+76-0llvm/test/CodeGen/SystemZ/machine-latecleanup-kills.mir
+24-18llvm/lib/CodeGen/MachineLateInstrsCleanup.cpp
+100-182 files

LLVM/project 28ffa7fflang/test/Lower/OpenMP missing-inode.f90, mlir/lib/Target/LLVMIR/Dialect/OpenMP OpenMPToLLVMIRTranslation.cpp

[flang][OpenMP] Fix missing missing inode issue (#130798)

When outlining an offload region, Flang creates a unique name by
querying an inode ID. However, when the name of the actual source file
does not match the logical file in a `#line` preprocessor directive,
code-gen was failing as it could not determine the inode ID. This PR
checks for this condition and if the logical file name does not exist,
the inode is replaced with a hash value created from the source code
itself.
DeltaFile
+11-11mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+8-0flang/test/Lower/OpenMP/missing-inode.f90
+19-112 files

LLVM/project 237a910mlir/include/mlir/Dialect/OpenMP OpenMPClauses.td OpenMPOpsInterfaces.td, mlir/lib/Dialect/OpenMP/IR OpenMPDialect.cpp

[MLIR][OpenMP] Remove the ReductionClauseInterface, NFC (#130978)

This patch removes the `ReductionClauseInterface` and all definitions of
its associated `getAllReductionVars` method.

The method mandated by this interface is not used anywhere and the
conflicts its definition produces when multiple reduction clauses are
present in an operation result in a more convoluted operation
definition, so it seems better to remove it and only add something like
this if there's a clear advantage to it.
DeltaFile
+3-19mlir/include/mlir/Dialect/OpenMP/OpenMPClauses.td
+0-16mlir/include/mlir/Dialect/OpenMP/OpenMPOpsInterfaces.td
+3-8mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
+0-8mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
+6-514 files

LLVM/project c3c97eallvm/lib/CodeGen PeepholeOptimizer.cpp, llvm/test/CodeGen/AMDGPU peephole-opt-regseq-removal.mir

PeepholeOpt: Do not skip reg_sequence sources with subregs (#125667)

Contrary to the comment, this particular code is not responsible
for handling any composes that may be required, and unhandled cases
are already rejected later. Lift this restriction to permit composes
and reg_sequence subregisters later.
DeltaFile
+49-3llvm/test/CodeGen/AMDGPU/peephole-opt-regseq-removal.mir
+1-3llvm/lib/CodeGen/PeepholeOptimizer.cpp
+50-62 files

LLVM/project 6ff33edmlir/docs/Dialects/OpenMPDialect _index.md, mlir/include/mlir/Dialect/OpenMP OpenMPOpsInterfaces.td

[MLIR][OpenMP] Minor improvements to BlockArgOpenMPOpInterface, NFC (#130789)

This patch introduces a use for the new `getBlockArgsPairs` to avoid
having to manually list each applicable clause.

Also, the `numClauseBlockArgs()` function is introduced, which
simplifies the implementation of the interface's verifier and enables
better memory handling within `getBlockArgsPairs`.
DeltaFile
+9-5mlir/include/mlir/Dialect/OpenMP/OpenMPOpsInterfaces.td
+5-7mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+2-0mlir/docs/Dialects/OpenMPDialect/_index.md
+16-123 files

LLVM/project e3c80d4llvm/test/CodeGen/AMDGPU dead-machine-elim-after-dead-lane.ll

AMDGPU: Fix broken negative test from ancient times (#131106)

Before the dawn of civilization, instructions were printed in all
caps using the raw tablegen pseudo-names. This -NOT check was looking
for that, instead of the actual ISA output. Just switch to using generated
checks. Also replace a use of undef.
DeltaFile
+14-4llvm/test/CodeGen/AMDGPU/dead-machine-elim-after-dead-lane.ll
+14-41 files

LLVM/project 7a5e4f5clang/include/clang/Basic TargetInfo.h, clang/lib/Basic/Targets SPIR.h

[clang][NFCI] Fix getGridValues for unsupported targets (#131023)

I broke this in
https://github.com/llvm/llvm-project/commit/f3cd2238383f695c719e7eab6aebec828781ec91,
I should have added this to the `SPIRV64` subclass, but I accidentally
added it to base `TargetInfo`.

Using an unsupported target should error in the driver way before this
though.

Signed-off-by: Sarnie, Nick <nick.sarnie at intel.com>
DeltaFile
+4-0clang/lib/Basic/Targets/SPIR.h
+1-1clang/include/clang/Basic/TargetInfo.h
+5-12 files

LLVM/project ffe202cllvm/lib/Transforms/Vectorize VPlan.cpp VPlanHelpers.h, llvm/test/Transforms/LoopVectorize/AArch64 extractvalue-no-scalarization-required.ll sve-widen-extractvalue.ll

Revert "[LV] Limits the splat operations be hoisted must not be defined by a recipe. (#117138)"

This reverts commit 1ff10fa82fff83bb2f0a5c1ffde6203b52bc9619.
DeltaFile
+2-11llvm/lib/Transforms/Vectorize/VPlan.cpp
+0-4llvm/lib/Transforms/Vectorize/VPlanHelpers.h
+2-2llvm/test/Transforms/LoopVectorize/AArch64/extractvalue-no-scalarization-required.ll
+2-2llvm/test/Transforms/LoopVectorize/RISCV/truncate-to-minimal-bitwidth-cost.ll
+0-2llvm/lib/Transforms/Vectorize/VPlanDominatorTree.h
+1-1llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-extractvalue.ll
+7-226 files

LLVM/project 5d5e706llvm/lib/Transforms/Vectorize VPlan.cpp VPlanHelpers.h, llvm/test/Transforms/LoopVectorize/AArch64 extractvalue-no-scalarization-required.ll sve-widen-extractvalue.ll

[VPlan] Restrict hoisting of broadcast operations using VPDominatorTree (#117138)

This patch restricts broadcast operations from being hoisted to the vector
preheader unless the basic block that defines the broadcasted value properly
dominates the vector preheader.

This prevents potential use-before-definition issues when the broadcasted
value is defined within the plan. VPDominatorTree is used to confirm this
restriction while still allowing safe hoisting for broadcasted values defined
outside the plan.

Issue https://github.com/llvm/llvm-project/issues/117139
DeltaFile
+11-2llvm/lib/Transforms/Vectorize/VPlan.cpp
+4-0llvm/lib/Transforms/Vectorize/VPlanHelpers.h
+2-2llvm/test/Transforms/LoopVectorize/AArch64/extractvalue-no-scalarization-required.ll
+2-2llvm/test/Transforms/LoopVectorize/RISCV/truncate-to-minimal-bitwidth-cost.ll
+2-0llvm/lib/Transforms/Vectorize/VPlanDominatorTree.h
+1-1llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-extractvalue.ll
+22-76 files

LLVM/project d4baf61llvm/lib/Target/X86 X86ISelLowering.cpp

[X86] combineConcatVectorOps - add outstanding TODOs for missing op concatenation cases. NFC.

Keep track of the remaining issues - many of these are inter-related making them difficult to deal with one at a time.
DeltaFile
+13-3llvm/lib/Target/X86/X86ISelLowering.cpp
+13-31 files

LLVM/project 0aa5ba4mlir/include/mlir/IR MLIRContext.h, mlir/lib/IR AttributeDetail.h MLIRContext.cpp

[mlir] Fix DistinctAttributeUniquer deleting attribute storage when crash reproduction is enabled (#128566)

Currently, `DistinctAttr` uses an allocator wrapped in a
`ThreadLocalCache` to manage attribute storage allocations. This ensures
all allocations are freed when the allocator is destroyed.

However, this setup can cause use-after-free errors when
`mlir::PassManager` runs its passes on a separate thread as a result of
crash reproduction being enabled. Distinct attribute storages are
created in the child thread's local storage and freed once the thread
joins. Attempting to access these attributes after this can result in
segmentation faults, such as during printing or alias analysis.

Example: This invocation of `mlir-opt` demonstrates the segfault issue
due to distinct attributes being created in a child thread and their
storage being freed once the thread joins:
```
mlir-opt --mlir-pass-pipeline-crash-reproducer=. --test-distinct-attrs mlir/test/IR/test-builtin-distinct-attrs.mlir
```

    [12 lines not shown]
DeltaFile
+42-3mlir/lib/IR/AttributeDetail.h
+22-0mlir/test/Dialect/LLVMIR/add-debuginfo-func-scope-with-crash-reproduction.mlir
+18-0mlir/test/IR/test-builtin-distinct-attrs-with-crash-reproduction.mlir
+9-0mlir/lib/Pass/PassCrashRecovery.cpp
+7-1mlir/lib/IR/MLIRContext.cpp
+8-0mlir/include/mlir/IR/MLIRContext.h
+106-46 files

LLVM/project c26ec7ellvm/utils/lit/lit reports.py

[llvm][lit] fix writing results to --time-trace-output file (#130845)

This patch fixes an issue introduced with commit:
https://github.com/llvm/llvm-project/commit/8507dbaec3f644b8a0c6291f097800d82a4f4b16
DeltaFile
+1-1llvm/utils/lit/lit/reports.py
+1-11 files

LLVM/project a58b8damlir/lib/Dialect/Linalg/Transforms Vectorization.cpp

Addressing review feedbacks
DeltaFile
+8-11mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
+8-111 files