LLVM/project 5d281a4llvm/lib/Transforms/Scalar LoopInterchange.cpp, llvm/test/Transforms/LoopInterchange many-load-stores.ll

[LoopInterchange] Constrain number of  load/stores in a loop (#118973)

In the current state of the code, the transform computes entries for the
dependency matrix until `MaxMemInstrCount` which is 100. After 99th
entry, it terminates and thus overall wastes compile-time.

It would be nice if we can compute total number of entries upfront and
early exit if the number of entries > 100. However, computing the number
of entries is not always possible as it depends on two factors:
1. Number of load-store pairs in a loop.
2. Number of common loop levels for each of the pair.

This patch constrains the whole computation on the number of loads and
stores instructions in the loop.

In another approach, I experimented with computing 1 and constraining
the number of pairs, but that did not lead to any additional benefit in
terms of compile time. However, when other issues are fixed, I can
revisit this approach.
DeltaFile
+260-0llvm/test/Transforms/LoopInterchange/many-load-stores.ll
+29-13llvm/lib/Transforms/Scalar/LoopInterchange.cpp
+289-132 files

LLVM/project afced70flang/test/Lower/OpenMP/Todo allocate-clause-allocator.f90

[OpenMP][Flang] Workaround omp_lib error (#123666)

It appears that omp_lib is not correctly (or maybe not at all?) found
from the build directory. This made a few buildbots break after
[PR#121356](https://github.com/llvm/llvm-project/pull/121356) landed.
This is a workaround to unblock the buildbots.

https://lab.llvm.org/staging/#/builders/130/builds/12654
https://lab.llvm.org/buildbot/#/builders/140/builds/15102
https://lab.llvm.org/staging/#/builders/105/builds/13855
DeltaFile
+2-1flang/test/Lower/OpenMP/Todo/allocate-clause-allocator.f90
+2-11 files

LLVM/project c5caf56llvm/lib/Target/AMDGPU AMDGPUISelDAGToDAG.cpp, llvm/test/CodeGen/AMDGPU shufflevector.v4i64.v3i64.ll shufflevector.v4p0.v3p0.ll

AMDGPU: Make vector_shuffle legal for v2i32 with v_pk_mov_b32

For VALU shuffles, this saves an instruction in some case.
DeltaFile
+485-272llvm/test/CodeGen/AMDGPU/shufflevector.v4i64.v3i64.ll
+485-272llvm/test/CodeGen/AMDGPU/shufflevector.v4p0.v3p0.ll
+112-160llvm/test/CodeGen/AMDGPU/shufflevector.v2f32.v8f32.ll
+112-160llvm/test/CodeGen/AMDGPU/shufflevector.v2i32.v8i32.ll
+112-160llvm/test/CodeGen/AMDGPU/shufflevector.v2p3.v8p3.ll
+114-0llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
+1,420-1,02413 files not shown
+1,694-1,37819 files

LLVM/project 3274bf6offload/DeviceRTL/include Synchronization.h, offload/DeviceRTL/src Synchronization.cpp

[OpenMP] Make each atomic helper take an atomic scope argument (#122786)

Summary:
Right now we just default to device for each type, and mix an ad-hoc
scope with the one used by the compiler's builtins. Unify this can make
each version take the scope optionally.

For @ronlieb, this will remove the need for `add_system` in the fork as
well as the extra `cas` with system scope, just pass `system`.
DeltaFile
+61-60offload/DeviceRTL/include/Synchronization.h
+7-3offload/DeviceRTL/src/Synchronization.cpp
+68-632 files

LLVM/project 2d9f406offload/DeviceRTL/include LibC.h Debug.h, offload/DeviceRTL/src LibC.cpp State.cpp

[OpenMP] Adjust 'printf' handling in the OpenMP runtime (#123670)

Summary:
We used to avoid a lot of this stuff because we didn't properly handle
variadics in device code. That's been solved for now, so we can just
make an internal printf handler that forwards to the external `vprintf`
function. This is either provided by NVIDIA's SDK or by the GPU libc
implementation.

The main reason for doing this is because it prevents the stupid AMDGPU
printf pass from mangling our beautiful printfs!
DeltaFile
+19-26offload/DeviceRTL/src/LibC.cpp
+4-5offload/DeviceRTL/include/LibC.h
+4-4offload/DeviceRTL/src/State.cpp
+1-6offload/DeviceRTL/include/Debug.h
+2-2offload/DeviceRTL/src/Debug.cpp
+2-1offload/DeviceRTL/src/Parallelism.cpp
+32-446 files

LLVM/project 2d8035allvm/lib/CodeGen/SelectionDAG DAGCombiner.cpp, llvm/test/CodeGen/PowerPC vector-reduce-fadd.ll

DAG: Fix vector_shuffle -> splat fold defining undef lanes

For shuffle vector splats with undef lanes in the mask,
this was introducing real values. Filter out build_vector
results based on the undef elements in the mask.

This avoids AMDGPU test regressions in a future change.

test/CodeGen/X86/urem-seteq-illegal-types.ll looks worse
but I didn't investigate.
DeltaFile
+32-36llvm/test/CodeGen/X86/urem-seteq-vec-nonsplat.ll
+0-30llvm/test/CodeGen/WebAssembly/simd.ll
+9-15llvm/test/CodeGen/X86/vec_smulo.ll
+10-10llvm/test/CodeGen/PowerPC/vector-reduce-fadd.ll
+6-8llvm/test/CodeGen/X86/srem-seteq-vec-nonsplat.ll
+9-1llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+66-1003 files not shown
+76-1119 files

LLVM/project 585858allvm/test/CodeGen/AMDGPU shufflevector.v2i64.v8i64.ll shufflevector.v4i64.v4i64.ll

AMDGPU: Fix asm constrains in new shuffle tests

These passed prechecks but failed after cc5eba1737146a727a61b5dbe16d8c2ac453981e
DeltaFile
+4,313-5,067llvm/test/CodeGen/AMDGPU/shufflevector.v2i64.v8i64.ll
+3,841-5,495llvm/test/CodeGen/AMDGPU/shufflevector.v4i64.v4i64.ll
+3,841-5,495llvm/test/CodeGen/AMDGPU/shufflevector.v4p0.v4p0.ll
+4,333-4,276llvm/test/CodeGen/AMDGPU/shufflevector.v2i32.v8i32.ll
+4,333-4,276llvm/test/CodeGen/AMDGPU/shufflevector.v2f32.v8f32.ll
+4,333-4,276llvm/test/CodeGen/AMDGPU/shufflevector.v2p3.v8p3.ll
+24,994-28,88573 files not shown
+114,980-144,12179 files

LLVM/project c2aa11dflang/lib/Semantics check-omp-structure.cpp, flang/test/Lower/OpenMP task.f90

[Flang] Add LLVM lowering support for UNTIED clause in Task (#121052)

Implementation details:
The UNTIED clause is recognized by setting the flag=0 for the default
case or performing logical OR to flag if other clauses are specified,
and this flag is passed as an argument to the `__kmpc_omp_task_alloc`
runtime call.


Resubmitting the PR with fix for the failure, as it was reverted here:
927a70daf31b1610627f346b0dc140eda72144b9
and previously merged here: https://github.com/llvm/llvm-project/pull/115283
DeltaFile
+34-0flang/lib/Semantics/check-omp-structure.cpp
+28-0flang/test/Semantics/OpenMP/task-untied01.f90
+13-11mlir/test/Target/LLVMIR/openmp-todo.mlir
+17-0flang/test/Lower/OpenMP/task.f90
+0-13flang/test/Lower/OpenMP/Todo/task_untied.f90
+12-0mlir/test/Target/LLVMIR/openmp-llvm.mlir
+104-242 files not shown
+106-258 files

LLVM/project 9d9c561llvm/lib/Target/ARM ARMBaseInstrInfo.cpp ARMFrameLowering.cpp

[ARM] Use MCRegister instead of unsigned. NFC

Primarily around uses of getSubReg/getSuperReg.
DeltaFile
+29-25llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
+12-12llvm/lib/Target/ARM/ARMFrameLowering.cpp
+9-9llvm/lib/Target/ARM/ARMExpandPseudoInsts.cpp
+4-4llvm/lib/Target/ARM/ARMBaseRegisterInfo.cpp
+4-3llvm/lib/Target/ARM/A15SDOptimizer.cpp
+2-2llvm/lib/Target/ARM/ARMBaseInstrInfo.h
+60-551 files not shown
+61-567 files

LLVM/project 02bd6cbbolt/lib/Profile DataAggregator.cpp

= nullptr

Created using spr 1.3.4
DeltaFile
+1-1bolt/lib/Profile/DataAggregator.cpp
+1-11 files

LLVM/project 7786266llvm/test/CodeGen/AMDGPU shufflevector.v2i64.v8i64.ll shufflevector.v2f16.v8f16.ll

AMDGPU: Expand shuffle testing with generated tests (#123574)

Add some generated tests with every shuffle permutation
for relevant vector element types and sizes. Not sure if this
is going overboard with the number of tests. I pruned out the largest
cases (16 and 32-bit cases are impractically large), and there's
redundancy when testing the pointer cases (at least for SelectionDAG).

This uses inline assembly to produce sample values because of how the
ABI is lowered when using a function argument. Since we break all
arguments into 32-bit pieces, a shuffle never ends up forming. We
need separate handling to reconstruct shuffles in contexts involving
physical registers in ABI contexts.

I wrote a small tool to generate these, so I can easily change the
exact test body. Not sure if it's worth posting anywhere.

This is in preparation for making better use of v_pk_mov_b32,
v_mov_b64 and s_mov_b64 in shuffles.
DeltaFile
+31,395-0llvm/test/CodeGen/AMDGPU/shufflevector.v2i64.v8i64.ll
+27,671-0llvm/test/CodeGen/AMDGPU/shufflevector.v2f16.v8f16.ll
+27,671-0llvm/test/CodeGen/AMDGPU/shufflevector.v2bf16.v8bf16.ll
+27,249-0llvm/test/CodeGen/AMDGPU/shufflevector.v4p0.v4p0.ll
+27,249-0llvm/test/CodeGen/AMDGPU/shufflevector.v4i64.v4i64.ll
+27,155-0llvm/test/CodeGen/AMDGPU/shufflevector.v2i16.v8i16.ll
+168,390-073 files not shown
+900,397-079 files

LLVM/project e87b843clang/docs ClangOffloadBundler.rst ReleaseNotes.rst, clang/include/clang/Driver OffloadBundler.h

Reland [OffloadBundler] Compress bundles over 4GB (#122307)

Reland the patch after fixing the lit test.
DeltaFile
+106-32clang/lib/Driver/OffloadBundler.cpp
+39-11clang/include/clang/Driver/OffloadBundler.h
+24-0clang/test/Driver/clang-offload-bundler-zlib.c
+2-0clang/docs/ClangOffloadBundler.rst
+1-0clang/docs/ReleaseNotes.rst
+172-435 files

LLVM/project abbfed9llvm/lib/Target/X86 X86LowerAMXType.cpp, llvm/test/CodeGen/X86 amx-fp8-internal.ll

[X86][AMX] Fix handling of AMX-FP8 internal intrinsics (#123540)

This is to fix #123410.
DeltaFile
+88-0llvm/test/CodeGen/X86/amx-fp8-internal.ll
+5-1llvm/lib/Target/X86/X86LowerAMXType.cpp
+93-12 files

LLVM/project b45072dllvm/lib/Target/SPIRV SPIRVBuiltins.cpp

[SPIR-V] Fix type compatibility in memory order comparisons (#123676)

Fixed a type mismatch issue in the comparison of std::memory_order with
integers.

This fixes an issue reported by clang-debian-cpp20 buildbot for
https://github.com/llvm/llvm-project/pull/123654
DeltaFile
+3-4llvm/lib/Target/SPIRV/SPIRVBuiltins.cpp
+3-41 files

LLVM/project 72c560dclang/docs ClangOffloadBundler.rst ReleaseNotes.rst, clang/include/clang/Driver OffloadBundler.h

Revert "[OffloadBundler] Compress bundles over 4GB (#122307)"

revert due to failure in buildbot

 https://lab.llvm.org/buildbot/#/builders/144/builds/16114

This reverts commit 4e2efc3bd500836d0fa977d6e257ffee2c92e178.
DeltaFile
+32-106clang/lib/Driver/OffloadBundler.cpp
+11-39clang/include/clang/Driver/OffloadBundler.h
+0-24clang/test/Driver/clang-offload-bundler-zlib.c
+0-2clang/docs/ClangOffloadBundler.rst
+0-1clang/docs/ReleaseNotes.rst
+43-1725 files

LLVM/project 4e2efc3clang/docs ClangOffloadBundler.rst ReleaseNotes.rst, clang/include/clang/Driver OffloadBundler.h

[OffloadBundler] Compress bundles over 4GB (#122307)

Added initial support for version 3 of the compressed offload bundle
format, which uses 64-bit fields for Total File Size and Uncompressed
Binary Size. This enables support for files larger than 4GB. The support
is currently experimental and can be enabled by setting the environment
variable `COMPRESSED_BUNDLE_FORMAT_VERSION=3`.
DeltaFile
+106-32clang/lib/Driver/OffloadBundler.cpp
+39-11clang/include/clang/Driver/OffloadBundler.h
+24-0clang/test/Driver/clang-offload-bundler-zlib.c
+2-0clang/docs/ClangOffloadBundler.rst
+1-0clang/docs/ReleaseNotes.rst
+172-435 files

LLVM/project 271b338llvm/test/TableGen generic-tables-instruction.td generic-tables.td, llvm/utils/TableGen SearchableTableEmitter.cpp

[TableGen][NFC] Factor early-out range check. (#123645)

Combine the EarlyOut and IsContiguous range check.
Also avoid "comparison is always false" warnings in emitted code when
the lower-bound check is against 0.
DeltaFile
+27-31llvm/utils/TableGen/SearchableTableEmitter.cpp
+3-6llvm/test/TableGen/generic-tables-instruction.td
+1-2llvm/test/TableGen/generic-tables.td
+31-393 files

LLVM/project a0c6811.github/workflows build-ci-container.yml

[Github] Fix container push job

This patch fixes a typo impacting functionality and also adds the relevant
variables to the step outputs list so they can actually get picked up by the
push container step.
DeltaFile
+3-1.github/workflows/build-ci-container.yml
+3-11 files

LLVM/project f38af02.github/workflows build-ci-container.yml, .github/workflows/containers/github-action-ci Dockerfile

testing
DeltaFile
+0-41.github/workflows/containers/github-action-ci/Dockerfile
+4-13.github/workflows/build-ci-container.yml
+4-542 files

LLVM/project 22b1a12.github/workflows build-ci-container.yml

[Github] Fix container push job

There were a couple typos impacting the functionality. This patch fixes that.
DeltaFile
+1-1.github/workflows/build-ci-container.yml
+1-11 files

LLVM/project a3beb7d.github/workflows release-binaries.yml release-binaries-all.yml

Workflows: Drop Windows release builds and use more powerful runners for others (#117111)

We have community provided Windows builds that are better than what we
can build on GitHub. For the Linux/X86 builds and Mac/Aarch64 builds we
will use depot runners, for Mac/X86 we will use the larger GitHub
runners.
DeltaFile
+45-187.github/workflows/release-binaries.yml
+0-1.github/workflows/release-binaries-all.yml
+45-1882 files

LLVM/project b6287fdllvm/include/llvm/BinaryFormat DXContainerConstants.def, llvm/lib/Target/DirectX DXILShaderFlags.cpp

[DirectX] Set the EnableRawAndStructuredBuffers shader flag (#122667)

When raw or structured buffers are used, we need to set the DXIL flag
saying so.

Fixes #122663.
DeltaFile
+42-0llvm/test/CodeGen/DirectX/ShaderFlags/raw-and-structured-buffers.ll
+11-0llvm/lib/Target/DirectX/DXILShaderFlags.cpp
+1-1llvm/include/llvm/BinaryFormat/DXContainerConstants.def
+54-13 files

LLVM/project 06c54bclldb/docs/use formatting.rst, lldb/include/lldb/Core FormatEntity.h

[lldb] Implement ${target.file} format variable (#123431)

Implements a format variable to print the basename and full path to the
current target.
DeltaFile
+15-1lldb/source/Core/FormatEntity.cpp
+6-2lldb/docs/use/formatting.rst
+3-0lldb/unittests/Core/FormatEntityTest.cpp
+1-0lldb/include/lldb/Core/FormatEntity.h
+25-34 files

LLVM/project 3f0ac46.github/workflows spirv-tests.yml, llvm Maintainers.md CMakeLists.txt

Revert "[SPIR-V] Add SPIRV to LLVM_ALL_TARGETS (reapply)" (#123674)

Reverts llvm/llvm-project#123654 due to buildbot issue
DeltaFile
+2-5llvm/Maintainers.md
+0-6llvm/docs/ReleaseNotes.md
+1-1.github/workflows/spirv-tests.yml
+1-1llvm/CMakeLists.txt
+4-134 files

LLVM/project f427fef.github/workflows spirv-tests.yml, llvm Maintainers.md CMakeLists.txt

[SPIR-V] Add SPIRV to LLVM_ALL_TARGETS (reapply) (#123654)

This commit promotes the SPIR-V backend from experimental to official
status. As a result, SPIR-V will be built by default, simplifying
integration and increasing accessibility for downstream projects.

Discussion and RFC on Discourse:
https://discourse.llvm.org/t/rfc-promoting-spir-v-to-an-official-target/83614

The PR reapplies the original patch
https://github.com/llvm/llvm-project/pull/119653, reverted due to
buildbot failures.
DeltaFile
+5-2llvm/Maintainers.md
+6-0llvm/docs/ReleaseNotes.md
+1-1.github/workflows/spirv-tests.yml
+1-1llvm/CMakeLists.txt
+13-44 files

LLVM/project 7d01a8fllvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/SLPVectorizer/X86 multi-node-reuse-in-bv.ll

[SLP]Fix vector factor for repeated node for bv

When adding a node vector, when it is used already in the shuffle for
buildvector, need to calculate vector factor from all vector, not only
this single vector, to avoid incorrect result. Also, need to increase
stability of the reused entries detection to avoid mismatch in cost
estimation/codegen.

Fixes #123639
DeltaFile
+9-4llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+2-2llvm/test/Transforms/SLPVectorizer/X86/multi-node-reuse-in-bv.ll
+11-62 files

LLVM/project 5e4c34allvm/test/Transforms/SLPVectorizer/X86 multi-node-reuse-in-bv.ll

[SLP][NFC]Add a test with incorrect length and cost for repeated matching node
DeltaFile
+105-0llvm/test/Transforms/SLPVectorizer/X86/multi-node-reuse-in-bv.ll
+105-01 files

LLVM/project 697c188llvm/lib/Target/AMDGPU AMDGPULowerBufferFatPointers.cpp, llvm/test/CodeGen/AMDGPU buffer-fat-pointers-contents-legalization.ll lower-buffer-fat-pointers-contents-legalization.ll

Reapply "[AMDGPU] Handle natively unsupported types in addrspace(7) lowering" (#123660)

(#123657)

This reverts commit 64749fb01538fba2b56d9850497d5f3a626cabc2.

Adds a constructor to VecSlice to address the failure
DeltaFile
+3,998-0llvm/test/CodeGen/AMDGPU/buffer-fat-pointers-contents-legalization.ll
+912-386llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-contents-legalization.ll
+564-3llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp
+11-0llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.nxv2i32.fail.ll
+6-1llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-calls.ll
+6-1llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-unoptimized-debug-data.ll
+5,497-3916 files

LLVM/project bd5e12ellvm/lib/Transforms/Vectorize VPlanUtils.h

[VPlan] Don't retrieve Def unnecessarily in isUniformAfterVector (NFC).

dyn_cast for recipes take VPValues, avoid calling getDefiningRecipe
unnecessarily.
DeltaFile
+7-8llvm/lib/Transforms/Vectorize/VPlanUtils.h
+7-81 files

LLVM/project 2cfdddalld/COFF DLL.h Writer.cpp

[LLD][COFF] Simplify creation of .edata chunks (NFC) (#123651)

Since commit dadc6f2488684, only the constructor of the `EdataContents`
class is used. Replace it with a function and skip the call when using a
custom `.edata` section.
DeltaFile
+2-14lld/COFF/DLL.h
+4-3lld/COFF/Writer.cpp
+1-1lld/COFF/DLL.cpp
+7-183 files