[NVPTX] Remove `NVPTX::IMAD` opcode, and rely on intruction selection only (#121724)
I noticed that NVPTX will sometimes emit `mad.lo` to multiply by 1, e.g.
in https://gcc.godbolt.org/z/4j47Y9W4c.
This happens when DAGCombiner operates on the add before the mul, so the
imad contraction happens regardless of whether the mul could have been
simplified.
To fix this, I remove `NVPTXISD::IMAD` and only combine to mad during
selection. This allows the default DAGCombiner patterns to simplify
the graph without any NVPTX-specific intervention.
[cmake] Serialize native builds for Make generator (#121021)
The build system is fragile by allowing multiple invocation of
subprocess builds in the native folder for Make generator.
For example, during sub-invocation of the native llvm-config,
llvm-min-tblgen is also built. If there is another sub-invocation of the
native llvm-min-tblgen build running in parallel, they may overwrite
each other's build results, and may lead to errors like "Text file
busy".
This patch adds a cmake script that uses file lock to serialize all
native builds for Make generator.
[PGO][AIX] Disable multi-process continuous mode test in 32-bit
In PGO continuous mode, we mmap the profile file into shared memory, which
allows multiple processes to be updating the same memory.
The -fprofile-update=atomic option forces the counter increments to be atomic,
but the counter size is always 64-bit (in -m32 and -m64), so in 32-bit mode the
atomic operations are function calls to libatomic.a and these function calls use
locks.
The lock based libatomic.a functions are per-process, so two processes will race
on the same shared memory because each will acquire their own lock.
[Hexagon] Add default clang symlinks to CLANG_LINKS_TO_CREATE (#123011)
Since this cache value overrides the defaults, we end up with `clang`
linked to `clang-20`, and some `${triple}-clang*` links, but we're
missing `clang++`. This makes for a toolchain with inconsistent behavior
when used in someone's `$PATH`.
We'll add the default symlinks to our list so that C and C++ programs
are both built as expected when `clang` and `clang++` are invoked.
[APINotes] Avoid duplicated attributes for class template instantiations
If a C++ class template is annotated via API Notes, the instantiations
had the attributes repeated twice. This is because Clang was adding the
attribute twice while processing the same class template. This change
makes sure we don't try to add attributes from API Notes twice.
There is currently no way to annotate specific instantiations using API
Notes.
rdar://142539959
[compiler-rt] Install libc++ and libc++abi in build_symbolizer.sh
This ensures that the directory layout of the libc++/libc++abi matches exactly what we would get on a real installation.
Currently the build directory happens to match the install directory layout, but this will no longer be true in the future.
[OffloadBundler] Rework the ctor of `OffloadTargetInfo` to support generic target
The current parsing of target string assumes to be in a form of
`kind-triple-targetid:feature`, such as
`hipv4-amdgcn-amd-amdhsa-gfx1030:+xnack`. Specifically, the target id does not
contain any `-`, which is not the case for generic target. Also, a generic
target may contain one or more `-`, such as `gfx10-3-generic` and
`gfx12-generic`. As a result, we can no longer depend on `rstrip` to get things
work right. This patch reworks the logic to parse the target string to make it
more robust, as well as supporting generic target.
[clang][Serialization] Add the missing block info (#122976)
HEADER_SEARCH_ENTRY_USAGE and VFS_USAGE were missing from the block info
block. Add the missing info so `llvm-bcanalyzer` can read them
correctly.
[mlir] Make single value `ValueRange`s memory safer (#121996)
A very common mistake users (and yours truly) make when using
`ValueRange`s is assigning a temporary `Value` to it. Example:
```cpp
ValueRange values = op.getOperand();
apiThatUsesValueRange(values);
```
The issue is caused by the implicit `const Value&` constructor: As per
C++ rules a const reference can be constructed from a temporary and the
address of it taken. After the statement, the temporary goes out of
scope and `stack-use-after-free` error occurs.
This PR fixes that issue by making `ValueRange` capable of owning a
single `Value` instance for that case specifically. While technically a
departure from the other owner types that are non-owning, I'd argue that
this behavior is more intuitive for the majority of users that usually
don't need to care about the lifetime of `Value` instances.
[2 lines not shown]
[CMake] Remove some always-true HAVE_XXX_H
These are unneeded even on AIX, PURE_WINDOWS, and ZOS (per #104706)
* HAVE_ERRNO_H: introduced by 1a93330ffa2ae2aa0b49461f05e6f0d51e8443f8 (2009) but unneeded.
The guarded ABI is unconditionally used by lldb.
* HAVE_FCNTL_H
* HAVE_FENV_H
* HAVE_SYS_STAT_H
Pull Request: https://github.com/llvm/llvm-project/pull/123087
[coff] Don't try to write the obj if the assembler has errors (#123007)
The ASAN and MSAN tests have been failing after #122777 because some
fields are now set in `executePostLayoutBinding` which is skipped by the
assembler if it had errors but read in `writeObject`
Since the compilation has failed anyway, skip `writeObject` if the
assembler had errors.
[Driver] Fix a warning
This patch fixes:
clang/include/clang/Driver/Driver.h:82:3: error: definition of
implicit copy assignment operator for 'CUIDOptions' is deprecated
because it has a user-declared copy constructor
[-Werror,-Wdeprecated-copy]
[flang] Inline hlfir.matmul[_transpose]. (#122821)
Inlining `hlfir.matmul` as `hlfir.eval_in_mem` does not allow
to get rid of a temporary array in many cases, but it may still be
much better allowing to:
* Get rid of any overhead related to calling runtime MATMUL
(such as descriptors creation).
* Use CPU-specific vectorization cost model for matmul loops,
which Fortran runtime cannot currently do.
* Optimize matmul of known-size arrays by complete unrolling.
One of the drawbacks of `hlfir.eval_in_mem` inlining is that
the ops inside it with store memory effects block the current
MLIR CSE, so I decided to run this inlining late in the pipeline.
There is a source commen explaining the CSE issue in more detail.
Straightforward inlining of `hlfir.matmul` as an `hlfir.elemental`
is not good for performance, and I got performance regressions
with it comparing to Fortran runtime implementation. I put it
[6 lines not shown]