1

Learning Rayon, I wanted to compare the performace of parallel calculation and serial calculation of Fibonacci series. Here's my code:

use rayon;
use std::time::Instant;

fn main() {
    let nth = 30;
    let now = Instant::now();
    let fib = fibonacci_serial(nth);
    println!(
        "[s] The {}th number in the fibonacci sequence is {}, elapsed: {}",
        nth,
        fib,
        now.elapsed().as_micros()
    );

    let now = Instant::now();
    let fib = fibonacci_parallel(nth);
    println!(
        "[p] The {}th number in the fibonacci sequence is {}, elapsed: {}",
        nth,
        fib,
        now.elapsed().as_micros()
    );
}

fn fibonacci_parallel(n: u64) -> u64 {
    if n <= 1 {
        return n;
    }

    let (a, b) = rayon::join(|| fibonacci_parallel(n - 2), || fibonacci_parallel(n - 1));
    a + b
}

fn fibonacci_serial(n: u64) -> u64 {
    if n <= 1 {
        return n;
    }

    fibonacci_serial(n - 2) + fibonacci_serial(n - 1)
}

Run in Rust Playground

I expected the elapsed time of parallel calculation would be smaller than the elapsed time of serial caculation, but the result was opposite:

# `s` stands for serial calculation and `p` for parallel
[s] The 30th number in the fibonacci sequence is 832040, elapsed: 12127
[p] The 30th number in the fibonacci sequence is 832040, elapsed: 990379

My implementation for serial/parallel calculation would have flaws. But if not, why am I seeing these results?

  • 3
    If you are doing speed comparision, always use --release or the button Release in the upper left corner of the playground (under Debug)! – hellow Jun 7 at 9:17
2

I think the real reason is, that you create threads which is not good. In every call of fibonacci_parallel you create another pair of threads for rayon and because you call fibonacci_parallel again in the closure you create yet another pair of threads.
This is utterly terrible for the OS/rayon.

An approach to solve this problem could be this:

fn fibonacci_parallel(n: u64) -> u64 {
    fn inner(n: u64) -> u64 {
        if n <= 1 { 
            return n;
        }   

        inner(n - 2) + inner(n - 1)
    }   

    if n <= 1 {
        return n;
    }   

    let (a, b) = rayon::join(|| inner(n - 2), || inner(n - 1));
    a + b 
}

You create two threads which both execute the inner function. With this addition I get

op@VBOX /t/t/foo> cargo run --release 40
    Finished release [optimized] target(s) in 0.03s
     Running `target/release/foo 40`
[s] The 40th number in the fibonacci sequence is 102334155, elapsed: 1373741
[p] The 40th number in the fibonacci sequence is 102334155, elapsed: 847343

But as said, for low numbers parallel execution is not worth it:

op@VBOX /t/t/foo> cargo run --release 20
    Finished release [optimized] target(s) in 0.02s
     Running `target/release/foo 20`
[s] The 10th number in the fibonacci sequence is 6765, elapsed: 82
[p] The 10th number in the fibonacci sequence is 6765, elapsed: 241
  • those are internal tasks, not real threads, but pretty much the same result here. – chpio Jun 7 at 10:48
  • The parallelization overhead is higher than the gains. – chpio Jun 7 at 10:50
  • @chpio thanks for the particularization. Please feel free to edit my answer, I will happily accept it. – hellow Jun 7 at 10:50
  • 1
    @chpio: Indeed, it's an overhead issue. Each task that is created needs to be scheduled, sent to a thread, etc... In general, parallelization is only worth it for coarse-grained parallelism. – Matthieu M. Jun 7 at 15:30

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.