LLVM/project 7b3bbd8llvm/test/CodeGen/X86 vector-interleaved-store-i64-stride-7.ll vector-interleaved-load-i16-stride-7.ll

Revert "[CodeGen] Really renumber slot indexes before register allocation (#67038)"

This reverts commit 2501ae58e3bb9a70d279a56d7b3a0ed70a8a852c.

Reverted due to various buildbot failures.
DeltaFile
+12,000-11,992llvm/test/CodeGen/X86/vector-interleaved-store-i64-stride-7.ll
+11,074-11,060llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-7.ll
+9,482-9,377llvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-7.ll
+9,170-9,196llvm/test/CodeGen/X86/vector-interleaved-store-i64-stride-8.ll
+8,655-8,552llvm/test/CodeGen/X86/vector-interleaved-load-i8-stride-7.ll
+9,509-5,423llvm/test/CodeGen/X86/vector-interleaved-store-i64-stride-6.ll
+59,890-55,600730 files not shown
+267,439-259,303736 files

LLVM/project 2501ae5llvm/test/CodeGen/X86 vector-interleaved-store-i64-stride-7.ll vector-interleaved-load-i16-stride-7.ll

[CodeGen] Really renumber slot indexes before register allocation (#67038)

PR #66334 tried to renumber slot indexes before register allocation, but
the numbering was still affected by list entries for instructions which
had been erased. Fix this to make the register allocator's live range
length heuristics even less dependent on the history of how instructions
have been added to and removed from SlotIndexes's maps.
DeltaFile
+11,968-11,976llvm/test/CodeGen/X86/vector-interleaved-store-i64-stride-7.ll
+11,080-11,094llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-7.ll
+9,435-9,540llvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-7.ll
+9,201-9,175llvm/test/CodeGen/X86/vector-interleaved-store-i64-stride-8.ll
+8,567-8,670llvm/test/CodeGen/X86/vector-interleaved-load-i8-stride-7.ll
+5,428-9,514llvm/test/CodeGen/X86/vector-interleaved-store-i64-stride-6.ll
+55,679-59,969730 files not shown
+259,910-268,046736 files

LLVM/project df017ballvm/test/CodeGen/ARM fptosi-sat-scalar.ll fptoui-sat-scalar.ll, llvm/test/CodeGen/RISCV half-convert.ll half-round-conv-sat.ll

[TargetLowering] Don't use ISD::SELECT_CC in expandFP_TO_INT_SAT.

This function gets called for vectors and ISD::SELECT_CC was never
intended to support vectors. Some updates were made to support
it when this function started getting used for vectors.

Overall, using separate ISD::SETCC and ISD::SELECT looks like an
improvement even for scalar.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D149481
DeltaFile
+733-918llvm/test/CodeGen/Thumb2/mve-fptosi-sat-vector.ll
+584-849llvm/test/CodeGen/ARM/fptosi-sat-scalar.ll
+546-702llvm/test/CodeGen/Thumb2/mve-fptoui-sat-vector.ll
+464-651llvm/test/CodeGen/ARM/fptoui-sat-scalar.ll
+287-332llvm/test/CodeGen/RISCV/half-convert.ll
+160-160llvm/test/CodeGen/RISCV/half-round-conv-sat.ll
+2,774-3,6127 files not shown
+3,257-4,19113 files

LLVM/project f0dd12ellvm/test/CodeGen/X86 tls.ll unfold-masked-merge-vector-variablemask.ll

[x86] use zero-extending load of a byte outside of loops too (2nd try)

The first attempt missed changing test files for tools
(update_llc_test_checks.py).

Original commit message:

This implements the main suggested change from issue #56498.
Using the shorter (non-extending) instruction with only
-Oz ("minsize") rather than -Os ("optsize") is left as a
possible follow-up.

As noted in the bug report, the zero-extending load may have
shorter latency/better throughput across a wide range of x86
micro-arches, and it avoids a potential false dependency.
The cost is an extra instruction byte.

This could cause perf ups and downs from secondary effects,
but I don't think it is possible to account for those in

    [6 lines not shown]
DeltaFile
+754-305llvm/test/CodeGen/X86/tls.ll
+522-522llvm/test/CodeGen/X86/unfold-masked-merge-vector-variablemask.ll
+298-298llvm/test/CodeGen/X86/avx512vl-intrinsics-fast-isel.ll
+243-243llvm/test/CodeGen/X86/extract-bits.ll
+188-188llvm/test/CodeGen/X86/avx512-intrinsics-fast-isel.ll
+180-180llvm/test/CodeGen/X86/avx512-calling-conv.ll
+2,185-1,736205 files not shown
+3,834-3,292211 files

LLVM/project 95401b0llvm/test/CodeGen/X86 tls.ll unfold-masked-merge-vector-variablemask.ll

Revert "[x86] use zero-extending load of a byte outside of loops too"

This reverts commit 9d1ea1774c51c44ddf0b5065bf600919988d7015.
There are tests of update_llc_tests_checks.py that missed being updated.
DeltaFile
+305-754llvm/test/CodeGen/X86/tls.ll
+522-522llvm/test/CodeGen/X86/unfold-masked-merge-vector-variablemask.ll
+298-298llvm/test/CodeGen/X86/avx512vl-intrinsics-fast-isel.ll
+243-243llvm/test/CodeGen/X86/extract-bits.ll
+188-188llvm/test/CodeGen/X86/avx512-intrinsics-fast-isel.ll
+180-180llvm/test/CodeGen/X86/avx512-calling-conv.ll
+1,736-2,185203 files not shown
+3,284-3,826209 files

LLVM/project 9d1ea17llvm/test/CodeGen/X86 tls.ll unfold-masked-merge-vector-variablemask.ll

[x86] use zero-extending load of a byte outside of loops too

This implements the main suggested change from issue #56498.
Using the shorter (non-extending) instruction with only
-Oz ("minsize") rather than -Os ("optsize") is left as a
possible follow-up.

As noted in the bug report, the zero-extending load may have
shorter latency/better throughput across a wide range of x86
micro-arches, and it avoids a potential false dependency.
The cost is an extra instruction byte.

This could cause perf ups and downs from secondary effects,
but I don't think it is possible to account for those in
advance, and that will likely also depend on exact micro-arch.
This does bring LLVM x86 codegen more in line with existing
gcc codegen, so if problems are exposed they are more likely
to occur for both compilers.

Differential Revision: https://reviews.llvm.org/D129775
DeltaFile
+754-305llvm/test/CodeGen/X86/tls.ll
+522-522llvm/test/CodeGen/X86/unfold-masked-merge-vector-variablemask.ll
+298-298llvm/test/CodeGen/X86/avx512vl-intrinsics-fast-isel.ll
+243-243llvm/test/CodeGen/X86/extract-bits.ll
+188-188llvm/test/CodeGen/X86/avx512-intrinsics-fast-isel.ll
+180-180llvm/test/CodeGen/X86/avx512-calling-conv.ll
+2,185-1,736203 files not shown
+3,826-3,284209 files

LLVM/project 655ba9cllvm/test/CodeGen/X86 frem.ll fpclamptosat_vec.ll

Reland "Reland "Reland "Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI""""

This resolves problems reported in commit 1a20252978c76cf2518aa45b175a9e5d6d36c4f0.
1. Promote to float lowering for nodes XINT_TO_FP
2. Bail out f16 from shuffle combine due to vector type is not legal in the version
DeltaFile
+845-737llvm/test/CodeGen/X86/frem.ll
+489-715llvm/test/CodeGen/X86/fpclamptosat_vec.ll
+669-434llvm/test/CodeGen/X86/half.ll
+410-629llvm/test/CodeGen/X86/vector-half-conversions.ll
+475-375llvm/test/CodeGen/X86/fptosi-sat-vector-128.ll
+380-372llvm/test/CodeGen/X86/fptoui-sat-vector-128.ll
+3,268-3,26244 files not shown
+5,160-4,63350 files

LLVM/project 1a20252llvm/test/CodeGen/X86 frem.ll fpclamptosat_vec.ll

Revert "Reland "Reland "Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI""""

This reverts commit 04a3d5f3a1193fb87576425a385aa0a6115b1e7c.

I see two more issues:

- uitofp/sitofp from i32/i64 to half now generates
  __floatsihf/__floatdihf, which exists in neither compiler-rt nor
  libgcc

- This crashes when legalizing the bitcast:
```
; RUN: llc < %s -mcpu=skx
define void @main.45(ptr nocapture readnone %retval, ptr noalias nocapture readnone %run_options, ptr noalias nocapture readnone %params, ptr noalias nocapture readonly %buffer_table, ptr noalias nocapture readnone %status, ptr noalias nocapture readnone %prof_counters) local_unnamed_addr {
entry:
  %fusion = load ptr, ptr %buffer_table, align 8
  %0 = getelementptr inbounds ptr, ptr %buffer_table, i64 1
  %Arg_1.2 = load ptr, ptr %0, align 8
  %1 = getelementptr inbounds ptr, ptr %buffer_table, i64 2

    [38 lines not shown]
DeltaFile
+829-937llvm/test/CodeGen/X86/frem.ll
+661-435llvm/test/CodeGen/X86/fpclamptosat_vec.ll
+485-571llvm/test/CodeGen/X86/half.ll
+637-418llvm/test/CodeGen/X86/vector-half-conversions.ll
+373-473llvm/test/CodeGen/X86/fptosi-sat-vector-128.ll
+369-377llvm/test/CodeGen/X86/fptoui-sat-vector-128.ll
+3,354-3,21144 files not shown
+4,799-5,13650 files

LLVM/project 04a3d5fllvm/test/CodeGen/X86 frem.ll fpclamptosat_vec.ll

Reland "Reland "Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI"""

Fix the crash on lowering X86ISD::FCMP.
DeltaFile
+845-737llvm/test/CodeGen/X86/frem.ll
+489-715llvm/test/CodeGen/X86/fpclamptosat_vec.ll
+571-485llvm/test/CodeGen/X86/half.ll
+410-629llvm/test/CodeGen/X86/vector-half-conversions.ll
+475-375llvm/test/CodeGen/X86/fptosi-sat-vector-128.ll
+380-372llvm/test/CodeGen/X86/fptoui-sat-vector-128.ll
+3,170-3,31344 files not shown
+5,036-4,69950 files

LLVM/project 3cd5696llvm/test/CodeGen/X86 frem.ll fpclamptosat_vec.ll

Revert "Reland "Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI"""

This reverts commit e1c5afa47d37012499467b5061fc42e50884d129.

This introduces crashes in the JAX backend on CPU. A reproducer in LLVM is
below. Let me know if you have trouble reproducing this.

; ModuleID = '__compute_module'
source_filename = "__compute_module"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-grtev4-linux-gnu"

@0 = private unnamed_addr constant [4 x i8] c"\00\00\00?"
@1 = private unnamed_addr constant [4 x i8] c"\1C}\908"
@2 = private unnamed_addr constant [4 x i8] c"?\00\\4"
@3 = private unnamed_addr constant [4 x i8] c"%ci1"
@4 = private unnamed_addr constant [4 x i8] zeroinitializer
@5 = private unnamed_addr constant [4 x i8] c"\00\00\00\C0"
@6 = private unnamed_addr constant [4 x i8] c"\00\00\00B"

    [205 lines not shown]
DeltaFile
+829-937llvm/test/CodeGen/X86/frem.ll
+661-435llvm/test/CodeGen/X86/fpclamptosat_vec.ll
+637-418llvm/test/CodeGen/X86/vector-half-conversions.ll
+485-493llvm/test/CodeGen/X86/half.ll
+373-473llvm/test/CodeGen/X86/fptosi-sat-vector-128.ll
+369-377llvm/test/CodeGen/X86/fptoui-sat-vector-128.ll
+3,354-3,13344 files not shown
+4,798-5,05650 files

LLVM/project e1c5afallvm/test/CodeGen/X86 frem.ll fpclamptosat_vec.ll

Reland "Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI""

Fixed the missing SQRT promotion. Adding several missing operations too.
DeltaFile
+845-737llvm/test/CodeGen/X86/frem.ll
+489-715llvm/test/CodeGen/X86/fpclamptosat_vec.ll
+410-629llvm/test/CodeGen/X86/vector-half-conversions.ll
+493-485llvm/test/CodeGen/X86/half.ll
+475-375llvm/test/CodeGen/X86/fptosi-sat-vector-128.ll
+380-372llvm/test/CodeGen/X86/fptoui-sat-vector-128.ll
+3,092-3,31344 files not shown
+4,956-4,69850 files

LLVM/project 37455b1llvm/test/CodeGen/X86 frem.ll fpclamptosat_vec.ll

Revert "Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI""

This reverts commit 6e02e27536b9de25a651cfc9c2966ce471169355.

This introduces a crash in the backend. Reproducer in MLIR's LLVM
dialect follows. Let me know if you have trouble reproducing this.

module {
  llvm.func @malloc(i64) -> !llvm.ptr<i8>
  llvm.func @_mlir_ciface_tf_report_error(!llvm.ptr<i8>, i32, !llvm.ptr<i8>)
  llvm.mlir.global internal constant @error_message_2208944672953921889("failed to allocate memory at loc(\22-\22:3:8)\00")
  llvm.func @_mlir_ciface_tf_alloc(!llvm.ptr<i8>, i64, i64, i32, i32, !llvm.ptr<i32>) -> !llvm.ptr<i8>
  llvm.func @Rsqrt_CPU_DT_HALF_DT_HALF(%arg0: !llvm.ptr<i8>, %arg1: i64, %arg2: !llvm.ptr<i8>) -> !llvm.struct<(i64, ptr<i8>)> attributes {llvm.emit_c_interface, tf_entry} {
    %0 = llvm.mlir.constant(8 : i32) : i32
    %1 = llvm.mlir.constant(8 : index) : i64
    %2 = llvm.mlir.constant(2 : index) : i64
    %3 = llvm.mlir.constant(dense<0.000000e+00> : vector<4xf16>) : vector<4xf16>
    %4 = llvm.mlir.constant(dense<[0, 1, 2, 3]> : vector<4xi32>) : vector<4xi32>
    %5 = llvm.mlir.constant(dense<1.000000e+00> : vector<4xf16>) : vector<4xf16>

    [180 lines not shown]
DeltaFile
+829-937llvm/test/CodeGen/X86/frem.ll
+661-435llvm/test/CodeGen/X86/fpclamptosat_vec.ll
+637-418llvm/test/CodeGen/X86/vector-half-conversions.ll
+485-446llvm/test/CodeGen/X86/half.ll
+373-473llvm/test/CodeGen/X86/fptosi-sat-vector-128.ll
+369-377llvm/test/CodeGen/X86/fptoui-sat-vector-128.ll
+3,354-3,08644 files not shown
+4,798-4,99750 files

LLVM/project 6e02e27llvm/test/CodeGen/X86 frem.ll fpclamptosat_vec.ll

Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI"

Disabled 2 mlir tests due to the runtime doesn't support `_Float16`, see
the issue here https://github.com/llvm/llvm-project/issues/55992
DeltaFile
+845-737llvm/test/CodeGen/X86/frem.ll
+489-715llvm/test/CodeGen/X86/fpclamptosat_vec.ll
+410-629llvm/test/CodeGen/X86/vector-half-conversions.ll
+446-485llvm/test/CodeGen/X86/half.ll
+475-375llvm/test/CodeGen/X86/fptosi-sat-vector-128.ll
+380-372llvm/test/CodeGen/X86/fptoui-sat-vector-128.ll
+3,045-3,31344 files not shown
+4,897-4,69850 files

LLVM/project 5d8298allvm/test/CodeGen/X86 frem.ll fpclamptosat_vec.ll

Revert "[X86][RFC] Enable `_Float16` type support on X86 following the psABI"

This reverts commit 2d2da259c8726fd5c974c01122a9689981a12196.

This breaks MLIR integration test (JIT crashing), reverting in the
meantime.
DeltaFile
+829-937llvm/test/CodeGen/X86/frem.ll
+661-435llvm/test/CodeGen/X86/fpclamptosat_vec.ll
+637-418llvm/test/CodeGen/X86/vector-half-conversions.ll
+485-446llvm/test/CodeGen/X86/half.ll
+373-473llvm/test/CodeGen/X86/fptosi-sat-vector-128.ll
+369-377llvm/test/CodeGen/X86/fptoui-sat-vector-128.ll
+3,354-3,08642 files not shown
+4,811-4,99148 files

LLVM/project 2d2da25llvm/test/CodeGen/X86 frem.ll fpclamptosat_vec.ll

[X86][RFC] Enable `_Float16` type support on X86 following the psABI

GCC and Clang/LLVM will support `_Float16` on X86 in C/C++, following
the latest X86 psABI. (https://gitlab.com/x86-psABIs)

_Float16 arithmetic will be performed using native half-precision. If
native arithmetic instructions are not available, it will be performed
at a higher precision (currently always float) and then truncated down
to _Float16 immediately after each single arithmetic operation.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D107082
DeltaFile
+845-737llvm/test/CodeGen/X86/frem.ll
+489-715llvm/test/CodeGen/X86/fpclamptosat_vec.ll
+410-629llvm/test/CodeGen/X86/vector-half-conversions.ll
+446-485llvm/test/CodeGen/X86/half.ll
+475-375llvm/test/CodeGen/X86/fptosi-sat-vector-128.ll
+380-372llvm/test/CodeGen/X86/fptoui-sat-vector-128.ll
+3,045-3,31342 files not shown
+4,891-4,71148 files

LLVM/project 4a36e96llvm/test/CodeGen/AMDGPU load-constant-i16.ll amdgpu-codegenprepare-idiv.ll, llvm/test/CodeGen/AMDGPU/GlobalISel srem.i64.ll sdiv.i64.ll

RegAllocGreedy: Account for reserved registers in num regs heuristic

This simple heuristic uses the estimated live range length combined
with the number of registers in the class to switch which heuristic to
use. This was taking the raw number of registers in the class, even
though not all of them may be available. AMDGPU heavily relies on
dynamically reserved numbers of registers based on user attributes to
satisfy occupancy constraints, so the raw number is highly misleading.

There are still a few problems here. In the original testcase that
made me notice this, the live range size is incorrect after the
scheduler rearranges instructions, since the instructions don't have
the original InstrDist offsets. Additionally, I think it would be more
appropriate to use the number of disjointly allocatable registers in
the class. For the AMDGPU register tuples, there are a large number of
registers in each tuple class, but only a small fraction can actually
be allocated at the same time since they all overlap with each
other. It seems we do not have a query that corresponds to the number
of independently allocatable registers. Relatedly, I'm still debugging

    [10 lines not shown]
DeltaFile
+2,764-2,764llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i64.ll
+1,141-1,138llvm/test/CodeGen/AMDGPU/load-constant-i16.ll
+1,023-1,023llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll
+752-740llvm/test/CodeGen/AMDGPU/load-global-i16.ll
+644-644llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i64.ll
+341-341llvm/test/CodeGen/RISCV/rvv/fixed-vectors-cttz.ll
+6,665-6,650145 files not shown
+13,666-13,423151 files

LLVM/project 0aef747llvm/test/CodeGen/X86 vector-popcnt-128-ult-ugt.ll vector-popcnt-512-ult-ugt.ll

[NFC][X86][Codegen] Megacommit: mass-regenerate all check lines that were already autogenerated

The motivation is that the update script has at least two deviations
(`<...>@GOT`/`<...>@PLT`/ and not hiding pointer arithmetics) from
what pretty much all the checklines were generated with,
and most of the tests are still not updated, so each time one of the
non-up-to-date tests is updated to see the effect of the code change,
there is a lot of noise. Instead of having to deal with that each
time, let's just deal with everything at once.

This has been done via:
```
cd llvm-project/llvm/test/CodeGen/X86
grep -rl "; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py" | xargs -L1 <...>/llvm-project/llvm/utils/update_llc_test_checks.py --llc-binary <...>/llvm-project/build/bin/llc
```

Not all tests were regenerated, however.
DeltaFile
+2,310-2,310llvm/test/CodeGen/X86/vector-popcnt-128-ult-ugt.ll
+836-836llvm/test/CodeGen/X86/vector-popcnt-512-ult-ugt.ll
+493-493llvm/test/CodeGen/X86/vector-popcnt-256-ult-ugt.ll
+474-474llvm/test/CodeGen/X86/srem-seteq-vec-nonsplat.ll
+371-371llvm/test/CodeGen/X86/urem-seteq-vec-nonsplat.ll
+549-0llvm/test/CodeGen/X86/umul-with-overflow.ll
+5,033-4,484666 files not shown
+14,147-13,397672 files

LLVM/project 0248e24llvm/test/CodeGen/X86 vector_splat-const-shift-of-constmasked.ll avx2-intrinsics-x86.ll

[X86][update_llc_test_checks] Use a less greedy regular expression for replacing constant pool labels in tests.

While working on D97208 I noticed that these greedy regular
expressions prevent tests from failing when (%rip) appears after
a constant pool label when it didn't before.

Reviewed By: RKSimon, pengfei

Differential Revision: https://reviews.llvm.org/D99460
DeltaFile
+196-196llvm/test/CodeGen/X86/vector_splat-const-shift-of-constmasked.ll
+168-168llvm/test/CodeGen/X86/avx2-intrinsics-x86.ll
+124-124llvm/test/CodeGen/X86/fptosi-sat-scalar.ll
+94-94llvm/test/CodeGen/X86/limited-prec.ll
+73-73llvm/test/CodeGen/X86/fptoui-sat-scalar.ll
+72-72llvm/test/CodeGen/X86/cmov-fp.ll
+727-727138 files not shown
+1,680-1,680144 files

LLVM/project 07605eallvm/lib/Target/X86 X86ISelLowering.cpp X86ISelLowering.h, llvm/test/CodeGen/X86 fptosi-sat-scalar.ll fptoui-sat-scalar.ll

[X86] Improved lowering for saturating float to int.

Adapted from D54696 by @nikic.

This patch improves lowering of saturating float to
int conversions, FP_TO_[SU]INT_SAT, for X86.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D86079
DeltaFile
+153-342llvm/test/CodeGen/X86/fptosi-sat-scalar.ll
+139-302llvm/test/CodeGen/X86/fptoui-sat-scalar.ll
+164-0llvm/lib/Target/X86/X86ISelLowering.cpp
+1-0llvm/lib/Target/X86/X86ISelLowering.h
+457-6444 files

LLVM/project 75c0432llvm/test/CodeGen/X86 lit.local.cfg vector-pack-128.ll

[NFC] Disallow unused prefixes in CodeGen/X86 tests.

Also fixed remaining tests that featured unused prefixes.

Differential Revision: https://reviews.llvm.org/D94330
DeltaFile
+8-0llvm/test/CodeGen/X86/lit.local.cfg
+2-2llvm/test/CodeGen/X86/vector-pack-128.ll
+2-2llvm/test/CodeGen/X86/fptosi-sat-scalar.ll
+2-2llvm/test/CodeGen/X86/fptoui-sat-scalar.ll
+14-64 files

LLVM/project a89d751llvm/test/CodeGen/AArch64 fptosi-sat-vector.ll fptoui-sat-vector.ll, llvm/test/CodeGen/ARM fptosi-sat-scalar.ll

Add intrinsics for saturating float to int casts

This patch adds support for the fptoui.sat and fptosi.sat intrinsics,
which provide basically the same functionality as the existing fptoui
and fptosi instructions, but will saturate (or return 0 for NaN) on
values unrepresentable in the target type, instead of returning
poison. Related mailing list discussion can be found at:
https://groups.google.com/d/msg/llvm-dev/cgDFaBmCnDQ/CZAIMj4IBAAJ

The intrinsics have overloaded source and result type and support
vector operands:

    i32 @llvm.fptoui.sat.i32.f32(float %f)
    i100 @llvm.fptoui.sat.i100.f64(double %f)
    <4 x i32> @llvm.fptoui.sat.v4i32.v4f16(half %f)
    // etc

On the SelectionDAG layer two new ISD opcodes are added,
FP_TO_UINT_SAT and FP_TO_SINT_SAT. These opcodes have two operands

    [43 lines not shown]
DeltaFile
+4,711-0llvm/test/CodeGen/X86/fptosi-sat-scalar.ll
+4,300-0llvm/test/CodeGen/X86/fptoui-sat-scalar.ll
+2,812-0llvm/test/CodeGen/ARM/fptosi-sat-scalar.ll
+2,807-0llvm/test/CodeGen/AArch64/fptosi-sat-vector.ll
+2,196-0llvm/test/CodeGen/AArch64/fptoui-sat-vector.ll
+676-0llvm/test/CodeGen/AArch64/fptosi-sat-scalar.ll
+17,502-016 files not shown
+18,535-222 files