SIMD Classes

The purpose of the SIMD classes is to abstract and consolidate the use of compiler intrinsics for the manipulation of architecture-specific vector (SIMD) values.

The implementation is rather loosely based on the data-parallel vector types proposal P0214R6 for the C++ Parallelism TS 2.

Unless otherwise specified, all classes, namespaces and top-level functions described below are all within the top-level arb::simd namespace.

Example usage

The following code performs an element-wise vector product, storing only non-zero values in the resultant array.

#include <simd/simd.hpp>
using namespace arb::simd;

void product_nonzero(int n, const double* a, const double* b, double* result) {
    constexpr int N = simd_abi::native_width<double>::value;
    using simd = simd<double, N>;
    using mask = simd::simd_mask;

    int i = 0;
    for (; i+N<=n; i+=N) {
        auto vp = simd(a+i)*simd(b+i);
        where(vp!=0, vp).copy_to(result+i);
    }

    int tail = n-i;
    auto m = mask::unpack((1<<tail)-1);

    auto vp = simd(a+i, m)*simd(b+i, m);
    where(m && vp!=0, vp).copy_to(c+i);
}

Classes

Three user-facing template classes are provided:

  1. simd<V, N, I = simd_abi::default_abi>

    N-wide vector type of values of type V, using architecture-specific implementation I. The implementation parameter is itself a template, acting as a type-map, with I<V, N>::type being the concrete implementation class (see below) for N-wide vectors of type V for this architecture.

    The implementation simd_abi::generic provides a std::array-backed implementation for arbitrary V and N, while simd_abi::native maps to the native architecture implementation for V and N, if one is available for the target architecture.

    simd_abi::default_abi will use simd_abi::native if available, or else fall back to the generic implementation.

  2. simd_mask<V, N, I = simd_api::default_abi>

    The result of performing a lane-wise comparison/test operation on a simd<V, N, I> vector value. simd_mask objects support logical operations and are used as arguments to where expressions.

    simd_mask<V, N, I> is a type alias for simd<V, N, I>::simd_mask.

  3. where_expression<simd<V, N, I>>

    The result of a where expression, used for masked assignment.

There is, in addition, a templated class detail::indirect_expression that holds the result of an indirect(…) expression. These arise in gather and scatter operations, and are detailed below.

Implementation typemaps live in the simd_abi namespace, while concrete implementation classes live in detail. A particular specialization for an architecture, for example 4-wide double on AVX, then requires:

  • A concrete implementation class, e.g. detail::avx_double4.

  • A specialization of its ABI map, so that simd_abi::avx<double, 4>::type is an alias for detail::avx_double4.

  • A specialization of the native ABI map, so that simd_abi::native<double, 4>::type is an alias for simd_abi::avx<double, 4>::type.

The maximum natively supported width for a scalar type V is recorded in simd_abi::native_width<V>::value.

Indirect expressions

An expression of the form indirect(p, k) or indirect(p, k, constraint) describes a sequence of memory locations based at the pointer p with offsets given by the simd variable k. A constraint of type index_constraint can be provided, which promises certain guarantees on the index values in k:

Constraint

Guarantee

index_constraint::none

No restrictions.

index_constraint::independent

No indices are repeated, i.e. ki = kj implies i = j.

index_constraint::contiguous

Indices are sequential, i.e. ki = k0 + i.

index_constraint::constant

Indices are all equal, i.e. ki = kj for all i and j.

Class simd

The class simd<V, N, I> is an alias for detail::simd_impl<I<V, N>::type>; the class detail::simd_impl<C> provides the public interface and arithmetic operators for a concrete implementation class C.

In the following:

  • S stands for the class simd<V, N, I>.

  • s is a SIMD value of type S.

  • m is a mask value of type S::simd_mask.

  • t, u and v are const objects of type S.

  • w is a SIMD value of type simd<W, N, J>.

  • i is an index of type int.

  • j is a const object of type simd<U, N, J> where U is an integral type.

  • x is a value of type V.

  • p is a pointer to V.

  • c is a const pointer to V or a length N array of V.

Here and below, the value in lane i of a SIMD vector or mask v is denoted by vi

Type aliases and constexpr members

Name

Type

Description

S::scalar_type

V

The type of one lane of the SIMD type.

S::simd_mask

simd_mask<V, N, I>

The simd_mask specialization resulting from comparisons of S SIMD values.

S::width

unsigned

The SIMD width N.

Constructors

Expression

Description

S(x)

A SIMD value v with vi equal to x for i = 0…N-1.

S(t)

A copy of the SIMD value t.

S(c)

A SIMD value v with vi equal to c[i] for i = 0…N-1.

S(w)

A copy or value-cast of the SIMD value w of a different type but same width.

S(indirect(p, j))

A SIMD value v with vi equal to p[j[i]] for i = 0…N-1.

S(c, m)

A SIMD value v with vi equal to c[i] for i where mi is true.

Member functions

Expression

Type

Description

t.copy_to(p)

void

Set p[i] to ti for i = 0…N-1.

t.copy_to(indirect(p, j))

void

Set p[j[i]] to ti for i = 0…N-1.

s.copy_from(c)

void

Set si to c[i] for i = 0…N-1.

s.copy_from(indirect(c, j))

void

Set si to c[j[i]] for i = 0…N-1.

s.sum()

V

Sum of si for i = 0…N-1.

Expressions

Expression

Type

Description

t+u

S

Lane-wise sum.

t-u

S

Lane-wise difference.

t*u

S

Lane-wise product.

t/u

S

Lane-wise quotient.

fma(t, u, v)

S

Lane-wise FMA t * u + v.

s<t

S::simd_mask

Lane-wise less-than comparison.

s<=t

S::simd_mask

Lane-wise less-than-or-equals comparison.

s>t

S::simd_mask

Lane-wise greater-than comparison.

s>=t

S::simd_mask

Lane-wise greater-than-or-equals comparison.

s==t

S::simd_mask

Lane-wise equality test.

s!=t

S::simd_mask

Lane-wise inequality test.

s=t

S&

Lane-wise assignment.

s+=t

S&

Equivalent to s=s+t.

s-=t

S&

Equivalent to s=s-t.

s*=t

S&

Equivalent to s=s*t.

s/=t

S&

Equivalent to s=s/t.

s=x

S&

Equivalent to s=S(x).

indirect(p, j)=t

decltype(indirect(p, j))&

Equivalent to t.copy_to(indirect(p, j)).

indirect(p, j)+=t

decltype(indirect(p, j))&

Compound indirect assignment: p[j[i]]+=t[i] for i = 0…N-1.

indirect(p, j)-=t

decltype(indirect(p, j))&

Compound indirect assignment: p[j[i]]-=t[i] for i = 0…N-1.

t[i]

V

Value ti

s[i]=x

S::reference

Set value si to x.

The (non-const) index operator operator[] returns a proxy object of type S::reference, which writes the corresponding lane in the SIMD value on assignment, and has an implicit conversion to scalar_type.

Class simd_mask

simd_mask<V, N, I> is an alias for simd<V, N, I>::simd_mask, which in turn will be an alias for a class detail::simd_mask_impl<D>, where D is a concrete implementation class for the SIMD mask representation. simd_mask_impl<D> inherits from, and is implemented in terms of, detail::simd_impl<D>, but note that the concrete implementation class D may or may not be the same as the concrete implementation class I<V, N>::type used by simd<V, N, I>.

Mask values are read and written as bool values of 0 or 1, which may differ from the internal representation in each lane of the SIMD implementation.

In the following:

  • M stands for the class simd_mask<V, N, I>.

  • m and q are const objects of type simd_mask<V, N, I>.

  • u is an object of type simd_mask<V, N, I>.

  • b is a boolean value.

  • q is a pointer to bool.

  • y is a const pointer to bool or a length N array of bool.

  • i is of type int.

  • k is of type unsigned long long.

Constructors

Expression

Description

M(b)

A SIMD mask u with ui equal to b for i = 0…N-1.

M(m)

A copy of the SIMD mask m.

M(y)

A SIMD value u with ui equal to y[i] for i = 0…N-1.

Note that simd_mask does not (currently) offer a masked pointer/array constructor.

Member functions

Expression

Type

Description

m.copy_to(q)

void

Write the boolean value mi to q[i] for i = 0…N-1.

u.copy_from(y)

void

Set ui to the boolean value y[i] for i = 0…N-1.

Expressions

Expression

Type

Description

!m

M

Lane-wise negation.

m&&q

M

Lane-wise logical and.

m||q

M

Lane-wise logical or.

m==q

M

Lane-wise equality (equivalent to m!=!q).

m!=q

M

Lane-wise logical xor.

m=q

M&

Lane-wise assignment.

m[i]

bool

Boolean value mi.

m[i]=b

M::reference

Set mi to boolean value b.

Static member functions

Expression

Type

Description

M::unpack(k)

M

Mask with value mi equal to the ith bit of k.

Class where_expression

where_expression<S> represents a masked subset of the lanes of a SIMD value of type S, used for conditional assignment, masked scatter, and masked gather. It is a type alias for S::where_expression, and is the result of an expression of the form where(mask, simdvalue).

In the following:

  • W stands for the class where_expression<simd<V, N, I>>.

  • s is a reference to a SIMD value of type simd<V, N, I>&.

  • t is a SIMD value of type simd<V, N, I>.

  • m is a mask of type simd<V, N, I>::simd_mask.

  • j is a const object of type simd<U, N, J> where U is an integral type.

  • x is a scalar of type V.

  • p is a pointer to V.

  • c is a const pointer to V or a length N array of V.

Expression

Type

Description

where(m, s)

W

A proxy for masked-assignment operations.

where(m, s)=t

void

Set si to ti for i where mi is true.

where(m, s)=x

void

Set si to x for i where mi is true.

where(m, s).copy_to(p)

void

Set p[i] to si for i where mi is true.

where(m, s).copy_to(indirect(p, j))

void

Set p[j[i]] to si for i where mi is true.

where(m, s).copy_from(c)

void

Set si to c[i] for i where mi is true.

where(m, s).copy_from(indirect(c, j))

void

Set si to c[j[i]] for i where mi is true.

Top-level functions

Lane-wise mathematical operations abs(x), min(x, y) and max(x, y) are offered for all SIMD value types, while the transcendental functions are only usable for SIMD floating point types.

Vectorized implementations of some of the transcendental functions are provided: refer to the vector transcendental functions documentation for details.

In the following:

  • I and J are SIMD implementations.

  • A is a SIMD class simd<K, N, I> for some scalar type K.

  • S is a SIMD class simd<V, N, I> for a floating point type V.

  • L is a scalar type implicitly convertible from K.

  • a and b are values of type A.

  • s and t are values of type S.

  • r is a value of type std::array<K, N>.

Expression

Type

Description

abs(a)

A

Lane-wise absolute value of a.

min(a, b)

A

Lane-wise minimum of a and b.

max(a, b)

A

Lane-wise maximum of a and b.

sin(s)

S

Lane-wise sine of s.

cos(s)

S

Lane-wise cosine of s.

log(s)

S

Lane-wise natural logarithm of s.

exp(s)

S

Lane-wise exponential of s.

expm1(s)

S

Lane-wise \(x \mapsto e^x - 1\).

exprelr(s)

S

Lane-wise \(x \mapsto x / (e^x - 1)\).

pow(s, t)

S

Lane-wise raise s to the power of t.

simd_cast<std::array<L, N>>(a)

std::array<L, N>

Lane-wise cast of values in a to scalar type L in std::array<L, N>.

simd_cast<simd<L, N, J>>(a)

simd<L, N, J>

Lane-wise cast of values in a to scalar type L in simd<L, N, J>.

simd_cast<simd<L, N, J>>(r)

simd<L, N, J>

Lane-wise cast of values in the std::array<K, N> value r to scalar type L in simd<L, N, J>.

Implementation requirements

Each specific architecture is represented by a templated class I, with I<V, N>::type being the concrete implementation for an N-wide SIMD value with scalar_type V.

A concrete implementation class C inherits from detail::implbase<C>, which provides (via CRTP) generic implementations of most of the SIMD functionality. The base class implbase<C> in turn relies upon detail::simd_traits<C> to look up the SIMD width, and associated types.

All the required SIMD operations are given by static member functions of C.

Some arguments to static member functions use a tag class (detail::tag) parameterized on a concrete implementation class for dispatch purposes.

Minimal implementation

In the following, let C be the concrete implementation class for a N-wide vector of scalar_type V, with low-level representation archvec.

The specialization of detail::simd_traits<C> then exposes these types and values, and also provides the concrete implementation class M for masks associated with C:

template <>
struct simd_traits<C> {
    static constexpr unsigned width = N;
    using scalar_type = V;
    using vector_type = archvec;
    using mask_impl = M;
};

The mask implementation class M may or may not be the same as C. For example, detail::avx_double4 provides both the arithmetic operations and mask operations for an AVX 4 × double SIMD vector, while the mask implementation for detail::avx512_double8 is detail::avx512_mask8.

The concrete implementation class must provide at minimum implementations of copy_to and copy_from (see the section below for semantics):

struct C: implbase<C> {
    static void copy_to(const arch_vector&, V*);
    static arch_vector copy_from(const V*);
};

If the implementation is also acting as a mask implementation, it must also provide mask_copy_to, mask_copy_from, mask_element and mask_set_element:

struct C: implbase<C> {
    static void copy_to(const arch_vector&, V*);
    static arch_vector copy_from(const V*);

    static void mask_copy_to(const arch_vector& v, bool* w);
    static arch_vector mask_copy_from(const bool* y);
    static bool mask_element(const arch_vector& v, int i);
    static void mask_set_element(arch_vector& v, int i, bool x);
};

The simd_detial::generic<T, N> provides an example of a minimal implementation based on an arch_vector type of std::array<T, N>.

Concrete implementation API

In the following, C represents the concrete implementation class for a SIMD class of width N and value type V.

  • u, v, and w are values of type C::vector_type.

  • r is a reference of type C::vector_type&.

  • x is a value of type C::scalar_type.

  • c is a const pointer of type const C::scalar_type*.

  • p is a pointer of type C::scalar_type*.

  • j is a SIMD index representation of type J::vector_type for an integral concrete implementation class J.

  • d is a SIMD representation of type D::vector_type for a (different) concrete implementation class D.

  • b is a bool value.

  • q is a pointer to bool.

  • y is a const pointer to bool.

  • i is an unsigned (index) value.

  • k is an unsigned long long value.

  • m is a mask representation of type C::mask_type.

  • z is an index_constraint value.

Types and constants

Name

Type

Description

C::vector_type

simd_traits<C>::vector_type

Underlying SIMD representation type.

C::scalar_type

simd_traits<C>::scalar_type

Should be convertible to/from V.

C::mask_impl

simd_traits<C>::mask_impl

Concrete implementation class for mask SIMD type.

C::mask_type

C::mask_impl::vector_type

Underlying SIMD representation for masks.

C::width

unsigned

The SIMD width N.

Initialization, load, store

Expression

Type

Description

C::cast_from(tag<W>{}, d)

C::vector_type

Return a vector with values vi = di, where D::scalar_type is implicitly convertible to C::scalar_type.

C::broadcast(x)

C::vector_type

Fill representation with scalar x.

C::copy_to(v, p)

void

Store values vi to p+i. p may be unaligned.

C::copy_to_masked(v, p, m)

void

Store values vi to p+i wherever mi is true. p may be unaligned.

C::copy_from(c)

C::vector_type

Return a vector with values vi loaded from c+i. c may be unaligned.

C::copy_from_masked(c, m)

C::vector_type

Return a vector with values vi loaded from c+i wherever mi is true. c may be unaligned.

C::copy_from_masked(u, c, m)

void

Return a vector with values vi loaded from c+i wherever mi is true, or equal to ui otherwise. c may be unaligned.

Lane access

Expression

Type

Description

C::element(v, i)

C::scalar_type

Value in ith lane of v.

C::set_element(r, i, x)

void

Set value in lane i of r to x.

Gather and scatter

The offsets for gather and scatter operations are given by a vector type J::vector_type for some possibly different concrete implementation class J, and the static methods implementing gather and scatter are templated on this class.

Implementations can provide optimized versions for specific index classes J; this process would be simplified with more support for casts between SIMD types and their concrete implementations, functionality which is not yet provided.

The first argument to these functions is a dummy argument of type J, used only to disambiguate overloads.

Expression

Type

Description

C::gather(tag<J>{}, p, j)

C::vector_type

Vector v with values vi = p[j[i]].

C::gather(tag<J>{}, u, p, j, m)

C::vector_type

Vector v with values vi = mi ? p[j[i]] : ui.

C::scatter(tag<J>{}, u, p, j)

void

Write values ui to p[j[i]].

C::scatter(tag<J>{}, u, p, j, m)

void

Write values ui to p[j[i]] for lanes i where mi is true.

C::compound_indexed_add(tag<J>{}, u, p, j, z)

void

Update values p[j[i]] += u[i] for lanes i, subject to constraint z.

Casting

Implementations can provide optimized versions of lane-wise value casting from other specific implementation classes.

The first argument is a dummy argument of type J, used only to disambiguate overloads.

Expression

Type

Description

C::cast_from(tag<J>{}, d)

C::vector_type

Returns vector v with values vi = di, cast from D::scalar_type to C::scalar_type.

Arithmetic operations

Expression

Type

Description

C::negate(v)

C::vector_type

Lane-wise unary minus.

C::mul(u, v)

C::vector_type

Lane-wise multiplication.

C::add(u, v)

C::vector_type

Lane-wise addition.

C::sub(u, v)

C::vector_type

Lane-wise subtraction.

C::div(u, v)

C::vector_type

Lane-wise division.

C::fma(u, v, w)

C::vector_type

Lane-wise fused multiply-add (u*v+w).

C::reduce_add(u)

C::scalar_type

(Horizontal) sum of values ui in each lane.

Comparison and blends

Expression

Type

Description

C::cmp_eq(u, v)

C::mask_type

Lane-wise u = v.

C::cmp_neq(u, v)

C::mask_type

Lane-wise uv.

C::cmp_gt(u, v)

C::mask_type

Lane-wise u > v.

C::cmp_geq(u, v)

C::mask_type

Lane-wise uv.

C::cmp_lt(u, v)

C::mask_type

Lane-wise u < v.

C::cmp_leq(u, v)

C::mask_type

Lane-wise uv.

C::ifelse(m, u, v)

C::vector_type

Vector w with values wi = mi ? ui : vi.

Mathematical function support.

With the exception of abs, min and max, these are only required for floating point vector implementations.

Expression

Type

Description

C::abs(v)

C::vector_type

Lane-wise absolute value.

C::min(u, v)

C::vector_type

Lane-wise minimum.

C::max(u, v)

C::vector_type

Lane-wise maximum.

C::sin(v)

C::vector_type

Lane-wise sine.

C::cos(v)

C::vector_type

Lane-wise cosine.

C::log(v)

C::vector_type

Lane-wise natural logarithm.

C::exp(v)

C::vector_type

Lane-wise exponential.

C::expm1(v)

C::vector_type

Lane-wise \(x \mapsto e^x -1\).

C::exprelr(v)

C::vector_type

Lane-wise \(x \mapsto x/(e^x -1)\).

C::pow(u, v)

C::vector_type

Lane-wise u raised to the power of v.

Mask value support

Mask operations are only required if C constitutes the implementation of a SIMD mask class.

Expression

Type

Description

C::mask_broadcast(b)

C::vector_type

Fill mask representation with bool b.

C::mask_element(v, i)

bool

Mask value vi.

C::mask_set_element(u, i, b)

void

Set mask value ui to b.

C::mask_copy_to(v, q)

void

Write bool values to memory (unaligned).

C::mask_copy_from(y)

C::vector_type

Load bool values from memory (unaligned).

C::mask_unpack(k)

C::vector_type

Return vector v with boolean value vi equal to the ith bit of k.

Logical operations

Logical operations are only required if C constitutes the implementation of a SIMD mask class.

Expression

Type

Description

C::logical_not(u)

C::vector_type

Lane-wise negation.

C::logical_and(u, v)

C::vector_type

Lane-wise logical and.

C::logical_or(u, v)

C::vector_type

Lane-wise logical or.

C::select(m, v, w)

C::vector_type

Lane-wise m? v: w.

Missing functionality

There is no support yet for the following features, although some of these will need to be provided in order to improve the efficiency of SIMD versions of our generated mechanisms.

  • Contraint-based dispatch for indirect operations other than += and -=.

  • Vectorizable implementations of trigonometric functions.

Implementation of vector transcendental functions

When building with the Intel C++ compiler, transcendental functions on SIMD values in simd<double, 8, detail::avx512> wrap calls to the Intel scalar vector mathematics library (SVML).

Outside of this case, the functions exp, log, expm1 and exprelr use explicit approximations as detailed below. The algortihms follow those used in the Cephes library, with some accommodations.

Exponentials

\(\operatorname{exp}(x)\)

The exponential is computed as

\[e^x = 2^n · e^g,\]

with \(|g| ≤ 0.5\) and \(n\) an integer. The power of two is computed via direct manipulation of the exponent bits of the floating point representation, while \(e^g\) is approximated by a rational polynomial.

\(n\) and \(g\) are computed by:

\[ \begin{align}\begin{aligned}n &= \left\lfloor \frac{x}{\log 2} + 0.5 \right\rfloor\\g &= x - n·\log 2\end{aligned}\end{align} \]

where the subtraction in the calculation of \(g\) is performed in two stages, to limit cancellation error:

\[ \begin{align}\begin{aligned}g &\leftarrow \operatorname{fl}(x - n · c_1)\\g &\leftarrow \operatorname{fl}(g - n · c_2)\end{aligned}\end{align} \]

where \(c_1+c_2 = \log 2\), \(c_1\) comprising the first 32 bits of the mantissa. (In principle \(c_1\) might contain more bits of the logarithm, but this particular decomposition matches that used in the Cephes library.) This decomposition gives \(|g|\leq \frac{1}{2}\log 2\approx 0.347\).

The rational approximation for \(e^g\) is of the form

\[e^g \approx \frac{R(g)}{R(-g)}\]

where \(R(g)\) is a polynomial of order 6. The coefficients are again those used by Cephes, and probably are derived via a Remez algorithm. \(R(g)\) is decomposed into even and odd terms

\[R(g) = Q(x^2) + xP(x^2)\]

so that the ratio can be calculated by:

\[e^g \approx 1 + \frac{2gP(g^2)}{Q(g^2)-gP(g^2)}.\]

Randomized testing indicates the approximation is accurate to 2 ulp.

\(\operatorname{expm1}(x)\)

A similar decomposition of \(x = g + n·\log 2\) is performed so that \(g≤0.5\), with the exception that \(n\) is always taken to be zero for \(|x|≤0.5\), i.e.

\[\begin{split}n = \begin{cases} 0&\text{if $|x|≤0.5$,}\\ \left\lfloor \frac{x}{\log 2} + 0.5 \right\rfloor &\text{otherwise.} \end{cases}\end{split}\]

\(\operatorname{expm1}(x)\) is then computed as

\[e^x - 1 = 2^n·(e^g - 1)+(2^n-1).\]

and the same rational polynomial is used to approximate \(e^g-1\),

\[e^g - 1 \approx \frac{2gP(g^2)}{Q(g^2)-gP(g^2)}.\]

The scaling by step for \(n≠0\) is in practice calculated as

\[e^x - 1 = 2·(2^{n-1}·(e^g - 1)+(2^{n-1}-0.5)).\]

in order to avoid overflow at the upper end of the range.

The order 6 rational polynomial approximation for small \(x\) is insufficiently accurate to maintain 1 ulp accuracy; randomized testing indicates a maximum error of up to 3 ulp.

\(\operatorname{exprelr}(x)\)

The function is defined as

\[\operatorname{exprelr}(x) = x/(e^x-1),\]

and is the reciprocal of the relative exponential function,

\[\begin{split}\operatorname{exprel}(x) &= {}_1F_1(1; 2; x)\\ &= \frac{e^x-1}{x}.\end{split}\]

This is computed in terms of expm1 by:

\[\begin{split}\operatorname{exprelr}(x) := \begin{cases} 1&\text{if $\operatorname{fl}(1+x) = 1$,}\\ x/\operatorname{expm1}(x)&\text{otherwise.} \end{cases}\end{split}\]

With the approximation for \(\operatorname{expm1}\) used above, randomized testing demonstrates a maximum error on the order of 4 ulp.

Logarithms

The natural logarithm is computed as

\[\log x = \log u + n·log 2\]

where \(n\) is an integer and \(u\) is in the interval \([ \frac{1}{2}\sqrt 2, \sqrt 2]\). The logarithm of \(u\) is then approximated by the rational polynomial used in the Cephes implementation,

\[ \begin{align}\begin{aligned}\log u &\approx R(u-1)\\R(z) &= z - \frac{z^2}{2} + z^3·\frac{P(z)}{Q(z)},\end{aligned}\end{align} \]

where \(P\) and \(Q\) are polynomials of degree 5, with \(Q\) monic.

Cancellation error is minimized by computing the sum for \(\log x\) as:

\[\begin{split}s &\leftarrow \operatorname{fl}(z^3·P(z)/Q(z))\\ s &\leftarrow \operatorname{fl}(s + n·c_4)\\ s &\leftarrow \operatorname{fl}(s - 0.5·z^2)\\ s &\leftarrow \operatorname{fl}(s + z)\\ s &\leftarrow \operatorname{fl}(s + n·c_3)\end{split}\]

where \(z=u-1\) and \(c_3+c_4=\log 2\), \(c_3\) comprising the first 9 bits of the mantissa.