-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wasm-friendly Field #2638
base: develop
Are you sure you want to change the base?
Wasm-friendly Field #2638
Conversation
Hi @mitschabaude, this is great work. |
@sebastiencs thanks for testing it, yeah these low-level algorithms are finicky, it will need some debugging to get right. I have an equivalent implementation here though that works perfectly, so I'm sure there's just some small fixable mistake. I won't be able to finish this PR btw, no longer working at o1Labs -- sorry for leaving this here in an unsatisfying state :) |
@sebastiencs I'll look into this PR sometime soon, perhaps tomorrow, and will sync on what's the plan for continuing. |
d727c5f
to
a2422ff
Compare
I'm trying to see if we should proceed with it. Ran some benchmarks. Benchmark results (in native rust) are as follows:
Benchmark results (in WASM) are as follows:
@mitschabaude two questions, maybe you know?
|
Two important comments first:
|
In a concrete scenario: If we're moving to a different representation just for the MSM, then we'll do on the order of 50-100 multiplications per conversion. so at that point the conversion is negligible |
I have detailed benchmarks for Wasm Pasta MSM of my own implementation that uses the Fp9 algorithm Just to throw out a number, on my machine a 2^16 Pasta MSM using 16 threads takes about 80ms. I know of another, better optimized project that probably does it in 50-60ms. (Using yet another field representation that I prototyped as well, that comes with some additional complications) What I sadly don't have is detailed benchmarks for the Kimchi arkworks MSM. What I can offer is the table in this README, which compares single-threaded MSM running in arkworks with an older version of my code, on a 384-bit curve. In that comparison, my code showed a 7x speedup: https://github.com/mitschabaude/montgomery/blob/main/doc/zprize22.md Caveat: Some of that 7x is not due to different field arithmetic but better high-level MSM algorithms, and I don't know exactly what contributed what. |
@volhovm here's how to run my Wasm Pasta MSM on your machine:
it will run the MSM 10 times, display the timings, average, deviation and more fine-grained details of the timing for one particular run |
I also have raw multiplication benchmarks, that match up well with your numbers:
for the fp9 representation I get
which matches your 36ns very well. for the more complicated "51x5" representation I even get down to
here, a major caveat is that this is only that fast if you do two multiplications at once (because that's the only known efficient way to leverage SIMD) |
fundamentally, I think the multiplication benchmarks are sound and show the real picture. There's no extra speedup to be expected in Poseidon (it's all field multiplication). So my guess is that the extra speed-up is not real and due to something getting zeroed out bc of a bug which makes it faster |
@mitschabaude I've tested the javascript implementation ( When I call When I run the same operation in js (multiply 1 by 1), I get a different result: I am probably missing something |
@sebastiencs amazing, thanks for finding the bug!!
they use different montgomery representations, you probably have to account for that in testing. there could be a |
after making @sebastiencs's bug fix, for me Poseidon is down to a 2.6x improvement. still pretty nice! results still don't agree for full poseidon though. |
also, the Fp9 mul benchmark is now slower (52ns) for me than the JS version (37ns). however, it comes down to about the same as the JS version (34ns) when I refactor the benchmark to look like this: pub fn bench_basic_ops(c: &mut Criterion) {
let mut group = c.benchmark_group("Basic ops");
let x0: Fp = rand::random();
let x: Fp = x0;
let mut z: Fp = x0;
group.bench_function("Native multiplication in Fp (single)", |b| {
b.iter(|| {
z = z * x;
});
});
let x_fp9: Fp9 = x0.into();
let mut z_fp9: Fp9 = x0.into();
group.bench_function("Multiplication in Fp9 (single)", |b| {
b.iter(|| {
z_fp9 = z_fp9 * x_fp9;
});
});
} @volhovm AFAIK, in JS hosts there is no very accurate timing available, so I would make sure to structure benchmarks such that definitely a lot of individual operations are done within a single timing. I'm not sure if your current benchmark, which seems to create new field elements for every multiplication, enables a completely accurate measurement. |
Co-authored-by: Sebastien Chapuis <sebastiencs@users.noreply.github.com>
@mitschabaude Thanks, I've tested that implementation in our Webnode, and we get ~40% faster in poseidon hashing: openmina/openmina#939 I see that your js implementation also has a Lastly, is there other parts worth porting from your repository, that would make our WebNode faster ? Thanks again for your work ! |
@sebastiencs that's awesome!
Yes, absolutely |
@mitschabaude Indeed, I just ported it and we can hash the ledger in 14 seconds with this dedicated
hmm it's actually the test for
|
@sebastiencs Can you comment a bit on how hard would it be to port the changes you introduced in Another question or way to put a question: how big is the gap between |
Status update: I've been trying to estimate how complicated it would be to bring an alternative field and plug it into kimchi tests. With a lot of stubs I managed to do it: the most intense parts are implementing bignum for 32 bits and On the branch (name):
My hypothesis would be that we could indeed continue porting it, and we'd perhaps need about 3-5k lines of code (primarily Bigints and Field implementation) fairly isolated in Any comments? Anything I'm missing in terms of how this can be /harder/ to port than what I just described? @mitschabaude @sebastiencs |
Nice!
Nothing that I can think of, your description matches what I would've thought! |
WIP, experiments towards #2634
define a "wasm-friendly" Field and test it with Poseidon
TODO: Poseidon results don't agree yet, need unit tests
Performance looks promising though: Close to 4x improvement, to reproduce run