[libclc] Move fma to the CLC library (#126052)
This builtin is a little more involved than others as targets deal with
fma in various different ways.
Fundamentally, the CLC __clc_fma builtin compiles to
__builtin_elementwise_fma, which compiles to the @llvm.fma intrinsic.
However, in the case of fp32 fma some targets call the __clc_sw_fma
function, which provides a software implementation of the builtin. This
in principle is controlled by the __CLC_HAVE_HW_FMA32 macro and may be a
runtime decision, depending on how the target defines that macro.
All targets build the CLC fma functions for all types. This is to the
CLC library can have a reliable internal implementation for its own
purposes.
For AMD/NVPTX targets there are no meaningful changes to the generated
LLVM bytecode. Some blocks of code have moved around, which confounds
llvm-diff.
[34 lines not shown]
math/fma: Add fp32 software implementation
Passes CTS on carrizo (when forced to use sw fma) and turks.
Reviewer: Tom Stellard <tstellar at redhat.com>
Signed-off-by: Jan Vesely <jan.vesely at rutgers.edu>
llvm-svn: 334226