Hello everyone.
I am a software engineer mostly familiar with the x86 architecture. I always wondered whether SIMD instructions are a waste of transistors that would be advantageously replaced by more CPU cores. Because even though they can provide 100% performance boosts in a few scenarios, this is actually very circumstancial and almost useless for most of applications.
I imagined some reasons behind this choice:
a) The software industry didn't yet completed the required changes to easily cope with large-scale parallelism (requires new programming languages that force you to explicit memory sharing and encourage mostly verifiable models instead, requires software transactional memory for some cases, and other things).
b) The few scenarios that benefit from SIMD instructions are important enough for consumers (video games and video decoding).
c) Maybe SIMD instructions are really cheap and easy to add? How does it compare with the cost of a reasonably powerful core?
d) Amdahl's law. Ok, although problems that fit SIMD instructions are typically parallelizable problems, there can still be situations where it's not worth starting a new thread and they help with Amdahl's bottleneck.
And by the way I do have two other, somehwat related, questions :
1) Are there manufacturers working on chips supporting software transactional memory that is not limited by cache associativity and never fallback on software (unlike Intel's TSX)?
2) Are there some works around "explicit hardware parallellism" (maybe a better name exists)? For example a chip would let me declare that two function calls are to be ran concurrently on the same core (one processing data while the other awaits for data), without having to start an OS thread on another core. Both hardware threads should operate on independant data (A cannot write something that B needs) and an error interrupt would be raised should they not.
I am a software engineer mostly familiar with the x86 architecture. I always wondered whether SIMD instructions are a waste of transistors that would be advantageously replaced by more CPU cores. Because even though they can provide 100% performance boosts in a few scenarios, this is actually very circumstancial and almost useless for most of applications.
I imagined some reasons behind this choice:
a) The software industry didn't yet completed the required changes to easily cope with large-scale parallelism (requires new programming languages that force you to explicit memory sharing and encourage mostly verifiable models instead, requires software transactional memory for some cases, and other things).
b) The few scenarios that benefit from SIMD instructions are important enough for consumers (video games and video decoding).
c) Maybe SIMD instructions are really cheap and easy to add? How does it compare with the cost of a reasonably powerful core?
d) Amdahl's law. Ok, although problems that fit SIMD instructions are typically parallelizable problems, there can still be situations where it's not worth starting a new thread and they help with Amdahl's bottleneck.
And by the way I do have two other, somehwat related, questions :
1) Are there manufacturers working on chips supporting software transactional memory that is not limited by cache associativity and never fallback on software (unlike Intel's TSX)?
2) Are there some works around "explicit hardware parallellism" (maybe a better name exists)? For example a chip would let me declare that two function calls are to be ran concurrently on the same core (one processing data while the other awaits for data), without having to start an OS thread on another core. Both hardware threads should operate on independant data (A cannot write something that B needs) and an error interrupt would be raised should they not.