Monday, February 25, 2008

Residues and Threads

I'm slowly changing all my algorithms to work on either raw data or residue data. Because raw data are integer values in the range of 1 to 5, and thus enabling some performance optimizations and shortcuts, but residue data are essentially floats, all my algorithms took a performance hit.

At the same time, I'm adding multi-threading support to speed things up. The improvement is significant, but I'm already seeing diminishing returns: on a quad-core systems, matrix factorization using 4 cores runs at only 3.27x the speed of using one core.

2 comments:

Anonymous said...

Hi Newman!

A little off-topic here - I couldn't find better place to ask.

I found your comment regarding C++ optimisation technique I'm trying to understand:
"I didn't mention that replacing int32 with int16 or int8 gave me the most performance boost, no doubt because absolute majority of viewers have less than 65536 ratings, and a large portion of them (more than 40% if I remember correctly) have less than 256 ratings, so the amount of memory you access is significantly less. In my C++ implementation, I just wrote a template function and have one copy each for int32, int16, and int8."

Does it mean that you have 3 functions (with identical signature): one works with int32, the other with int16 and the third with int8? I'm not familiar with C++ templates... :-(
Could you please elaborate on it with more details? Or maybe could you point me to some web page describing that issue?

Thanks in advance.

Wojtek

by32768 said...

I wrote a template function, and have 3 "instantiations" of the template. So in source code, there's only one function that works with a type T. In compiled code, there're 3 functions, where T is unsigned int32/16/8.