Links
Some background reading.
Laws of Computing
A key point not normally mentioned is that "clock reach" which is
dependent on wire resistance does not scale well. So, although the
number of transistors keeps going up, design methodology needs to
change -
i.e. designs do not scale automatically. Under 45nm design methodology
needs to handle "high sigma" Silicon where manufacturing variability is
high and hence timing is inconsistent.
Devices of a few nm can be built, but not reliably. Future nano-scale
chips will need to be built with fault-tolerance and redundancy
(currently only seen in FPGAs and memories).
All systems have a bottleneck, if you fix one bottleneck another one will appear - see The Goal.
Legacy Effects
Von Neumann syndrome - (Wikipedia)
It is useful to remember that current computing architectures have a
long history. A particular driving force is that memory chips are
manufactured by different companies and on different processes from
CPUs. The economics of shrinking Silicon and a shift to using IP based
design flows may change that.
Computer Architectures
Harvard Architecture - (Wikipedia)
Processor-In-Memory - (Wikipedia)
PIM has not been a successful in the past because of issues with
processing and yield (see VN syndrome above) and the lack of good programming
tools, ParC attempts to address the latter problem.
Reconfigurable Computing - (Wikipedia)
A premise behind believing in the future success of PIM and
Reconfigurable Computing (using FPGAs) is that computer performance is
tied to the physical distance that data is moved during processing.
This ties into Amdahl's law in that if you look at a computing
operation as fetch-calculate-propagate then the fetch/propagate times
are limited by the speed of light and can dominate over the calculate
phase, e.g. in a desktop PC there is usually at least 5cm between the
CPU and the memory, so under optimal conditions that will take at least
2ns to traverse, the CPU cycle time is under 1ns these days, therefore
the fetch/propagate times dominate and a faster processor doesn't help
much if you need more performance.
Ideally PIM/Reconfigurable computing only uses long data links where
latency is not an issue. The human brain is probably PIM-like in its
implementation, see: artificial_neural_network.
New advances in 3D-IC are likely to produce systems where processors are stacked with memory in a PIM-like manner.