Skip to content

replaced map().sum() with fold #7

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

Vilhelm-Ian
Copy link

Most of the changes are cause of the formater sorry

@ChristopherBiscardi
Copy link
Owner

Most of the changes are cause of the formater sorry

no worries. I'll stick a rustfmt.toml in the root of the project so people have the right formatting stuff.

What's the motivation for moving from .map.sum to fold here?

@Vilhelm-Ian
Copy link
Author

Not sure if it's faster. Probably once compiler optimizations are made it will compile to the same assembly. I just think it's better. Since you don't have to create an array and then iterate through it to get the sum. This just does it one go

@DanikVitek
Copy link
Contributor

Since you don't have to create an array and then iterate through it to get the sum. This just does it one go

Iterators are built in a way that this would take one pass in any case. Iterators are lazy, so all filters and maps are done once per the call to .next(). Unlike, for example, in JavaScript, where every .map and .filter call creates a new array.

@Vilhelm-Ian
Copy link
Author

Vilhelm-Ian commented Dec 2, 2023

@DanikVitek if I did

(0..5).map(|x| { 
println!("hello");
x * 2
}).for_each(|x| println!("world"));

wouldn't the output be

hello
hello
helllo
hello
hello
world
world
world
world
world

EDIT: I re-read your reply not sure if I understand what you mean. Do you want to say that during the same iter call map() and sum() are called. If that is the case wouldn't the above code output be

hello
world
hello
world
hello
world

@DanikVitek yup just checked it out you are right. https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=734bda587b57f594335a7fd0b78ffac1

thanks for teaching me something new

@Vilhelm-Ian Vilhelm-Ian closed this Dec 2, 2023
@Vilhelm-Ian
Copy link
Author

Vilhelm-Ian commented Dec 8, 2023

@DanikVitek actually there is a different issue with using .map.max instead of just using fold. fold has a constant memory complexity while the map.max not. Day five I tried to do this with rayon par_iter.map().max() my entire pc froze. But when I did par_iter.fold() it barely used 2gb of memory

@Vilhelm-Ian Vilhelm-Ian reopened this Dec 8, 2023
@ChristopherBiscardi
Copy link
Owner

I updated the formatting so that the PR would have less noise

@ChristopherBiscardi
Copy link
Owner

ChristopherBiscardi commented Dec 9, 2023

DHAT profiling for the two approaches shows no difference (perhaps expectedly since we're dealing with integers and not stack allocated types of any kind).

dhat heap profiling

With Fold

part2

dhat: Total:     1,096 bytes in 3 blocks
dhat: At t-gmax: 1,096 bytes in 3 blocks
dhat: At t-end:  1,088 bytes in 2 blocks

part2_aho_corasick

dhat: Total:     57,412,096 bytes in 65,003 blocks
dhat: At t-gmax: 19,812 bytes in 28 blocks
dhat: At t-end:  1,088 bytes in 2 blocks

part2_nom

dhat: Total:     30,344 bytes in 1,411 blocks
dhat: At t-gmax: 1,096 bytes in 3 blocks
dhat: At t-end:  1,088 bytes in 2 blocks

Without Fold

part2

dhat: Total:     1,096 bytes in 3 blocks
dhat: At t-gmax: 1,096 bytes in 3 blocks
dhat: At t-end:  1,088 bytes in 2 blocks

part2_aho_corasick

dhat: Total:     57,412,096 bytes in 65,003 blocks
dhat: At t-gmax: 19,812 bytes in 28 blocks
dhat: At t-end:  1,088 bytes in 2 blocks

part2_nom

dhat: Total:     30,344 bytes in 1,411 blocks
dhat: At t-gmax: 1,096 bytes in 3 blocks
dhat: At t-end:  1,088 bytes in 2 blocks

In addition, Sum is implemented in terms of .fold (std::core::iter::traits::accum.rs)

impl Sum for $a {
    fn sum<I: Iterator<Item=Self>>(iter: I) -> Self {
        iter.fold(
            $zero,
            #[rustc_inherit_overflow_checks]
            |a, b| a + b,
        )
    }
}

Considering that iterators are lazy, it would surprise me to see increased memory usage because of map vs fold.

asm

The asm output is exactly the same.

.sum

used this code

pub fn my_sum() -> u32 {
    std::hint::black_box(0..10).into_iter().map(|n| plus2(n)).sum::<u32>()
}

fn plus2(n: u32) -> u32 {
    n + 2
}
asm for sum
# Compilation provided by Compiler Explorer at https://godbolt.org/
example::my_sum:
        movabs  rax, 42949672960
        mov     qword ptr [rsp - 8], rax
        lea     rax, [rsp - 8]
        mov     edx, dword ptr [rsp - 8]
        mov     esi, dword ptr [rsp - 4]
        xor     eax, eax
        mov     ecx, esi
        sub     ecx, edx
        jbe     .LBB0_2
        mov     eax, edx
        not     eax
        add     esi, eax
        lea     eax, [rdx + 3]
        imul    eax, esi
        add     ecx, -2
        imul    rcx, rsi
        shr     rcx
        add     eax, edx
        add     eax, ecx
        add     eax, 2
.LBB0_2:
        ret

fold

used this code

pub fn my_sum() -> u32 {
    std::hint::black_box(0..10).into_iter().fold(0, |acc, n| acc + plus2(n))
}

fn plus2(n: u32) -> u32 {
    n + 2
}
asm for fold
# Compilation provided by Compiler Explorer at https://godbolt.org/
example::my_sum:
        movabs  rax, 42949672960
        mov     qword ptr [rsp - 8], rax
        lea     rax, [rsp - 8]
        mov     edx, dword ptr [rsp - 8]
        mov     esi, dword ptr [rsp - 4]
        xor     eax, eax
        mov     ecx, esi
        sub     ecx, edx
        jbe     .LBB0_2
        mov     eax, edx
        not     eax
        add     esi, eax
        lea     eax, [rdx + 3]
        imul    eax, esi
        add     ecx, -2
        imul    rcx, rsi
        shr     rcx
        add     eax, edx
        add     eax, ecx
        add     eax, 2
.LBB0_2:
        ret

As a result I believe that if you saw changes in your application on day 5 it was due to something else and not .map().sum().

Here is the PR that made this true for Rust, if you're interested.

@Vilhelm-Ian
Copy link
Author

thanks @ChristopherBiscardi can I clean up my day 5 code ask you to take a look when you have time(since you are also busy with bevy game jame and advent of code currently)

@Vilhelm-Ian Vilhelm-Ian closed this Dec 9, 2023
@ChristopherBiscardi
Copy link
Owner

@Vilhelm-Ian sure, happy to take a look when I have time

@Vilhelm-Ian
Copy link
Author

@ChristopherBiscardi I just realized something .iter().map().sum() might not allocated more memory than .iter().fold() but it seems that's not the case for .par_iter().map().sum() and .par_iter().fold().reduce(). The first one makes my pc crash

@ChristopherBiscardi
Copy link
Owner

@Vilhelm-Ian While using rayon does mean that you will be iterating on more items in parallel, and thus could tip the limit of your PC if the individual .map calls are big there are two important notes:

First, rayon's ParallelIterator::map is it's own and is not the same as Iterator::map, as shown here: https://rust-lang-nursery.github.io/rust-cookbook/concurrency/parallel.html#map-reduce-in-parallel

Second, rayon's docs say .map.reduce and .sum are effectively the same so I would be surprised if one caused your computer to crash and the other didn't.

Basically equivalent to self.reduce(|| 0, |a, b| a + b), except that the type of 0 and the + operation may vary depending on the type of value being produced.

The rayon docs also have information relating to fold vs map/reduce which should be reviewed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants