2025-04-13

记录自己的瞎折腾

先给Python测速

简单计算一下fibonacci数列的值，使用递归和递归加cache的基础方法

from datetime import datetime

def fib(n):
    if n < 2:
        result = n
    else:
        result = fib(n - 1) + fib(n - 2)
    return result

## 闭包增加缓存
def cached_fib():
    cache = {1: 1, 2: 1}

    def fibonacci(n):
        if n not in cache:
            cache[n] = fibonacci(n - 1) + fibonacci(n - 2)
        return cache[n]

    return fibonacci

## 这个就不用试了(
def fib_tr(n, ret=0, tmp=1):
    if n == 0:
        return ret
    else:
        return fib_tr(n-1, ret + tmp, ret)

然后测一下用时多少

t1 = datetime.now()
result1 = fib(35)
t2 = datetime.now()
f = cached_fib()
result2 = f(35)
t3 = datetime.now()
result3 = f(36)
t4 = datetime.now()
print(f"the 35th number is {result1}, use time {(t2 - t1).total_seconds()} s")
print(f"the 35th number is {result2}, use time {(t3 - t2).total_seconds()} s")
print(f"the 36th number is {result3}, use time {(t4 - t3).total_seconds()} s")

the 35th number is 9227465, use time 0.842028 s

the 35th number is 9227465, use time 1.8e-05 s

the 36th number is 14930352, use time 1e-06 s

闭包加个缓存好像有用了，也许可以使用类、 装饰器（手写或者直接from functools import lru_cache，或者…

试用Cxx

~~虽然C++完全没学会，在可预见的未来也基本学不会，下面只使用递归的方法做，还不知道怎么加cache~~

只使用最简单的版本，不知道怎么正确地进行benchmark，但是大概要比之前的快…

#include <iostream>
#include <unordered_map>
#include <chrono>

int fib(int n) {
    if (n < 2) return n;
    return fib(n - 1) + fib(n - 2);
}

// 未使用
int fibTr(int n, int ret = 0, int tmp = 1)
{
    if (n == 0) return ret;
    else return fibTr(n - 1, ret + tmp, ret);
}

int main(int argc, char *argv[])
{
    using std::chrono::duration;
    using std::chrono::duration_cast;
    using std::chrono::high_resolution_clock;
    using std::chrono::milliseconds;
    int n{35};
    auto t1 = high_resolution_clock::now();
    auto result = fib(n);
    auto t2 = high_resolution_clock::now();
    auto time_delta = duration_cast<milliseconds>(t2 - t1);
    std::cout << "Operation took time (ms): " << time_delta.count() << std::endl;
}

Operation took time (ms): 13

Python + Cxx

我想试Python加Cxx的解决方案。 pybind11试用一下：sudo apt intall pybind11-dev

新建一个requirements.txt, 写入内容：pybind

在Python环境里(我使用的是miniconda的Base环境)安装一下pip install -r requirements.txt

创建一个fib_lib.cpp，内容如下：

#include <pybind11/pybind11.h>

namespace py = pybind11;

int fib(int n)
{
    if (n < 2)
        return n;
    return fib(n - 1) + fib(n - 2);
}

PYBIND11_MODULE(fib_lib, m){
    m.def("fib", &fib, "Calculate Fibonacci Numbers");
}

再输入以下神秘代码：（可以在vscode里搞一个Task防止整错了）

c++ -O3 -Wall -shared -std=c++20 -fPIC $(python3 -m pybind11 --includes) fib_lib.cpp -o fib_lib$(python3-config --extension-suffix)

会生成一个fib_lib.cpython-312-x86_64-linux-gnu.so的动态库文件在当前路径，以上操作都在conda的Base环境内

在路径下写其它的Python文件就可以用这个动态库了：

import datetime
import fib_lib 

if __name__ == '__main__':

    n = 35

    t1 = datetime.datetime.now()
    fib = fib_lib.fib(n)
    t2 = datetime.datetime.now()

    time_delta = (t2 - t1).total_seconds() * 1000

    print(f"{n}th fibonacci number {fib}")
    print(f"Operation took aproximately {time_delta} ms")

35th fibonacci number 9227465

Operation took aproximately 12.209 ms

为啥比上面直接整Cxx快捏 ~(￣▽￣)~*

总之调用成功了，最原始递归计算Fibonacci的用时从 0.842028 s → 12.209 ms

Julia 孔乙己来喽

上述步骤好像步骤比较繁琐，我对Julia比较熟悉，进行一个Julia的用：

## fibonacci 的 四种写法
fib1(n::Integer) = n < 3 ? 1 : fib(n-1) + fib(n-2) 
fib2 = let cache = Dict{Int, Int}()
    function (n::Int)
        if !haskey(cache, n)
          cache[n] = n < 3 ? n : fib2(n - 1) + fib2(n - 2)
        end
        return cache[n]
    end
end
fib3(n::Integer) = (BigInt.([1 1; 1 0])^n)[2]
fib4(n::Integer) = fib4(Val) 
fib4(::Val{N}) = N < 3 ? 1 : fib4(Val{N-1}) + fib4(Val{n-2}) 

### 测测速度，输出放字符串里了，直方图删掉...
using BenchmarkTools

@benchmark fib1(35)
""" 直接递归
BenchmarkTools.Trial: 244 samples with 1 evaluation per sample.
 Range (min … max):  19.783 ms …  22.563 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     20.528 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   20.550 ms ± 264.387 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  19.8 ms         Histogram: frequency by time         21.6 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.
"""

@benchmark fib2(35)
""" 带点cache
BenchmarkTools.Trial: 10000 samples with 998 evaluations per sample.
 Range (min … max):  23.447 ns …  2.179 μs  ┊ GC (min … max): 0.00% … 98.11%
 Time  (median):     25.852 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   27.210 ns ± 31.430 ns  ┊ GC (mean ± σ):  3.15% ±  3.81%

  23.4 ns         Histogram: frequency by time        32.8 ns <

 Memory estimate: 16 bytes, allocs estimate: 1.
"""

@benchmark fib3(35)
""" 不GC的时候够了
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  11.200 μs …  69.002 ms  ┊ GC (min … max):  0.00% … 49.65%
 Time  (median):     12.200 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   20.620 μs ± 689.902 μs  ┊ GC (mean ± σ):  16.61% ±  0.50%

  11.2 μs       Histogram: log(frequency) by time      34.6 μs <

 Memory estimate: 11.38 KiB, allocs estimate: 581.
"""

@benchmark fib4(35)
""" 这个靠编译就弄好了，快到测不出速度...
BenchmarkTools.Trial: 10000 samples with 1000 evaluations per sample.
 Range (min … max):  1.500 ns … 27.500 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.600 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.604 ns ±  0.339 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  1.5 ns         Histogram: frequency by time         1.7 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.
"""

Speed Up Python

记录自己的瞎折腾

先给Python测速

试用Cxx

Python + Cxx

Julia 孔乙己来喽

说些什么吧！