Implement scalar_prod #505

sebasv · 2018-10-19T15:04:25Z

Implements #504 .

jturner314

Thanks for this! Everything looks good except for a few changes to the test. (See the individual comments.)

Note for my future self: Ordinarily, I would avoid duplication of logic like in unrolled_sum and unrolled_prod, but I think it's fine in this case:

I anticipate sum and product being the only cases like this

I don't anticipate needing to ever really modify unrolled_sum/unrolled_prod, so we don't have to worry much about keeping things in sync

If we add more than two functions like this that could be combined, though, we should combine them by e.g. taking a closure as a parameter.

tests/oper.rs

sebasv · 2018-10-23T06:20:11Z

I anticipate sum and product being the only cases like this

Actually, I realised I would also greatly benefit from scalar_min and scalar_max. Shall I try to write up a macro to cover all four cases?

jturner314 · 2018-10-24T00:22:05Z

We could implement scalar_min and scalar_max for A: Ord. However, I'd just do it in terms of fold with something like this (taking advantage of the first method from PR #507):

impl<A, S, D> ArrayBase<S, D>
where
    S: Data<Elem = A>,
    D: Dimension,
{
    /// Returns the minimum element, or `None` if the array is empty.
    fn scalar_min(&self) -> Option<&A>
    where
        A: Ord,
    {
        let first = self.first()?;
        Some(self.fold(first, |acc, x| acc.min(x)))
    }
}

We don't need to manually unroll this because the compiler does a good job automatically (checked with Compiler Explorer using the -O compiler option).

The desired behavior for floating-point types depends on the use-case because of NaN. One option is

arr.fold(::std::f64::NAN, |acc, &x| acc.min(x))

which ignores NaN values. (It returns NaN only if there are no non-NaN values.) The compiler does a decent job automatically unrolling this, so we don't need to manually unroll in this case either.

jturner314 · 2018-10-24T00:40:06Z

Will you please squash the commits into one? I don't mind squashing them myself, but then GitHub won't consider the PR merged.

Edit: It looks like you might have given me permission to push to the master branch on sebasv/ndarray since you submitted a PR using that branch? If so, and you don't mind me modifying your master branch, I can squash the commits for you.

(Ordinarily, I would just use GitHub's "Squash and merge", but that option is disabled for this repo, I don't have the permissions to enable it, and I haven't heard from @bluss in a while.)

sebasv · 2018-10-24T05:29:06Z

I'll squash the commits. I also put the unrolled code in a macro, is this desired or do you want to stick with separate unrolled code for prod/sum and possible future cases? Current commit does not have the macro.

// eightfold unrolled so that floating point can be vectorized
// (even with strict floating point accuracy semantics)
macro_rules! unrolled_fold {
    ($xs:expr, $unity:expr, $operation:expr) => {{
        let mut collected = $unity();
        let (mut p0, mut p1, mut p2, mut p3,
            mut p4, mut p5, mut p6, mut p7) =
            ($unity(), $unity(), $unity(), $unity(),
            $unity(), $unity(), $unity(), $unity());
        while $xs.len() >= 8 {
            p0 = $operation(p0, $xs[0].clone());
            p1 = $operation(p1, $xs[1].clone());
            p2 = $operation(p2, $xs[2].clone());
            p3 = $operation(p3, $xs[3].clone());
            p4 = $operation(p4, $xs[4].clone());
            p5 = $operation(p5, $xs[5].clone());
            p6 = $operation(p6, $xs[6].clone());
            p7 = $operation(p7, $xs[7].clone());

            $xs = &$xs[8..];
        }
        collected = $operation(collected.clone(), $operation(p0, p4));
        collected = $operation(collected.clone(), $operation(p1, p5));
        collected = $operation(collected.clone(), $operation(p2, p6));
        collected = $operation(collected.clone(), $operation(p3, p7));

        // make it clear to the optimizer that this loop is short
        // and can not be autovectorized.
        for i in 0..$xs.len() {
            if i >= 7 { break; }
            collected = $operation(collected.clone(), $xs[i].clone());
        }
        collected
    }}
}

/// Compute the sum of the values in `xs`
pub fn unrolled_sum<A>(mut xs: &[A]) -> A
    where A: Clone + Add<Output=A> + libnum::Zero,
{
unrolled_fold!(xs, A::zero, A::add)
}

/// Compute the product of the values in `xs`
pub fn unrolled_prod<A>(mut xs: &[A]) -> A
    where A: Clone + Mul<Output=A> + libnum::One,
{
    unrolled_fold!(xs, A::one, A::mul)
}

jturner314 · 2018-10-25T23:44:37Z

Sure, a macro would be nice. By the way, I just noticed that the temporary variable in scalar_prod is named sum when it would be better named prod.

jturner314 · 2018-10-26T04:07:36Z

Fwiw, I prefer using generic functions over macros when possible. For example:

pub fn unrolled_fold<A, I, F>(mut xs: &[A], init: I, f: F) -> A
where
    A: Clone,
    I: Fn() -> A,
    F: Fn(A, A) -> A,
{
    // eightfold unrolled so that floating point can be vectorized
    // (even with strict floating point accuracy semantics)
    let mut acc = init();
    let (mut p0, mut p1, mut p2, mut p3,
         mut p4, mut p5, mut p6, mut p7) =
        (init(), init(), init(), init(),
         init(), init(), init(), init());
    while xs.len() >= 8 {
        p0 = f(p0, xs[0].clone());
        p1 = f(p1, xs[1].clone());
        p2 = f(p2, xs[2].clone());
        p3 = f(p3, xs[3].clone());
        p4 = f(p4, xs[4].clone());
        p5 = f(p5, xs[5].clone());
        p6 = f(p6, xs[6].clone());
        p7 = f(p7, xs[7].clone());

        xs = &xs[8..];
    }
    acc = f(acc.clone(), f(p0, p4));
    acc = f(acc.clone(), f(p1, p5));
    acc = f(acc.clone(), f(p2, p6));
    acc = f(acc.clone(), f(p3, p7));

    // make it clear to the optimizer that this loop is short
    // and can not be autovectorized.
    for i in 0..xs.len() {
        if i >= 7 { break; }
        acc = f(acc.clone(), xs[i].clone())
    }
    acc
}

This can be called like this for a sum:

numeric_util::unrolled_fold(slc, A::zero, A::add)

or like this for a product:

numeric_util::unrolled_fold(slc, A::one, A::mul)

This generates basically the same code as the non-generic version (tested with Compiler Explorer with -C target-cpu=native -C opt-level=3).

sebasv · 2018-10-26T07:20:09Z

Ready for review. I agree that this does not call for a macro, unless unrolled_dot is to be included as well, but I really don't expect a lot more variants to show up that need to be unrolled.

jturner314 · 2018-10-26T15:40:08Z

Thanks for contributing this!

sebasv · 2018-10-26T16:38:12Z

Thank you for the guidance! I am learning a ton more about safety and optimization.

jturner314 reviewed Oct 22, 2018

View reviewed changes

tests/oper.rs Outdated Show resolved Hide resolved

tests/oper.rs Outdated Show resolved Hide resolved

tests/oper.rs Outdated Show resolved Hide resolved

sebasv force-pushed the master branch from 790201e to 938846e Compare October 24, 2018 05:42

sebasv mentioned this pull request Oct 24, 2018

Implement scalar_min and scalar_max for A: Ord #512

Closed

jturner314 added the enhancement label Oct 26, 2018

sebasv force-pushed the master branch from 938846e to 8048a41 Compare October 26, 2018 07:03

implement scalar_prod

d5f9cb5

sebasv force-pushed the master branch from 8048a41 to d5f9cb5 Compare October 26, 2018 07:16

jturner314 merged commit f7fb81f into rust-ndarray:master Oct 26, 2018

jturner314 mentioned this pull request Oct 27, 2018

scalar_prod implementation #504

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement scalar_prod #505

Implement scalar_prod #505

Uh oh!

sebasv commented Oct 19, 2018

Uh oh!

jturner314 left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sebasv commented Oct 23, 2018

Uh oh!

jturner314 commented Oct 24, 2018

Uh oh!

jturner314 commented Oct 24, 2018 •

edited

Loading

Uh oh!

sebasv commented Oct 24, 2018 •

edited

Loading

Uh oh!

jturner314 commented Oct 25, 2018

Uh oh!

jturner314 commented Oct 26, 2018

Uh oh!

sebasv commented Oct 26, 2018

Uh oh!

jturner314 commented Oct 26, 2018

Uh oh!

sebasv commented Oct 26, 2018

Uh oh!

Uh oh!

Implement scalar_prod #505

Implement scalar_prod #505

Uh oh!

Conversation

sebasv commented Oct 19, 2018

Uh oh!

jturner314 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sebasv commented Oct 23, 2018

Uh oh!

jturner314 commented Oct 24, 2018

Uh oh!

jturner314 commented Oct 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sebasv commented Oct 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jturner314 commented Oct 25, 2018

Uh oh!

jturner314 commented Oct 26, 2018

Uh oh!

sebasv commented Oct 26, 2018

Uh oh!

jturner314 commented Oct 26, 2018

Uh oh!

sebasv commented Oct 26, 2018

Uh oh!

Uh oh!

jturner314 left a comment •

edited

Loading

jturner314 commented Oct 24, 2018 •

edited

Loading

sebasv commented Oct 24, 2018 •

edited

Loading