Yes, its too bad cleaning up the architecture doesn't necessarily cleanup the physical design. As GPUs have been more recent entrants to the general purpose space it is clear they are trying to avoid the same mistakes. The only place you will find a a true GPU binary is buried deep in the memory of the runtime stack (for NVIDIA at least, not sure about AMD).
Right. My understanding is that NVIDIA has mucked around with their low level instructions at every iteration. I remember reading somewhere that with Kepler the hardware doesn't even have dependency interlocks -- the compiler is responsible for scheduling instructions such that they don't use results that aren't ready yet.
But at the same time the lack of a clear specification and backwards compatibility means that the software stack needs to deal with all new bugs (both hardware and software) at every iteration. That puts a IMHO pretty firm cap on the "asymptotic quality" of the stack -- you're constantly chasing bugs until the new version comes out. So you'll never see a GPU toolchain of the quality we expect from gcc (or LLVM, though that isn't quite as mature).