Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In fact, bit shifting is always faster than anything. At the hardware level, it's just wiring; there is no logic involved in shifting the bits, so there is no propagation delay involved, unlike even the single logic gate operations like &, |, ~, etc.

It's so much faster, that many compilers, for integer multiplication, will optimize these multiplications by converting them to shifts and adds. Integer division, however, usually just involves a lookup table.



> In fact, bit shifting is always faster than anything.

This is overly simplistic. A fixed bitwise mapping is simple wiring and about as fast as anything can be, yes. Extending that, the "fastest" 32-bit shifter might be to put down 31 circuits, one for each possible shift amount. But that still requires a mux tree to select between the different circuits, which already goes beyond simple wiring.

While that might be the minimum-delay implementation, it's also very area inefficient. A more area-efficient design would be to cascade lg(32) = 5 muxed shifters, one for each bit in the shift amount. For example, x << 19 is the same as ((x << 16) << 2) << 1. If you read a VLSI design textbook, you'll see that there are all kinds of low-level tricks for designing barrel shifters, especially when it comes to layout.

Finally, the practical reality for programmers is that variable-amount shifts and rotates are relatively expensive on quite a few processors.


As well as the practical details of barrel shifter implementation design that psykotic points out, older CPUs couldn't afford the space for a barrel shifter and implemented shifts as a sequence of one-bit shifts. So for instance the 8086 took 8 clock cycles plus another 4 cycles per bit shift, and on that kind of CPU it was definitely not as fast as an addition.

Incidentally there is a standard trick for integer division by constant which allows you to convert it into a series of 3 to 5 shift and add/subtract operations; any decent C compiler will implement this. _Hacker's Delight_ has the explanation of the algorithm, I think.


Here's an article explaining the trick:

http://ridiculousfish.com/blog/posts/labor-of-division-episo...

And a library from the same guy for generating code at runtime to do fast division by constants:

http://libdivide.com/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: