Today, I decided I want to read and learn a bit more about computer memory. I came across this interesting paper (https://people.freebsd.org/~lstewart/articles/cpumemory.pdf) and wanted to learn!
CPUs today are mostly fast enough to handle our workloads, the bottleneck is mostly around memory access. Most solutions to address this bottleneck have been in hardware:
- RAM hardware design
- Memory controller design
- CPU cache
Commodity Hardware Today
Chipsets typically look something like this nowadays:
The CPUs are all connected via the FSB (front-side bus) to the Northbridge. The Northbridge has the memory controller and connects to different types of RAM. The Southbridge or I/O bridge is used to connect the CPU and RAM to other devices. The Southbridge connects to these devices via a bus.
The issue with this set up is that:
- All data communication to RAM and CPU happens over the same bus which may bottleneck data transfer rates
- All communication with RAM needs to go through the Northbridge
- RAM has only a single port (sequential reads and writes)
- All communication from CPU to device has to also go through the Northbridge
One other problem was that devices needed to talk to the CPU first to get access to the RAM. This has since been solved with DMA (direct memory access) and most devices no support this. It allows devices to get data from RAM directly without going through the CPU.
Another bottleneck is the bus on the Northbridge accessing all RAM. This has been improved in more recent RAM types but isn’t enough. The processor is mostly stalled waiting on memory access to complete. Sometimes Northbridge can connect to several external memory controllers which supports more memory and parallel access to the memory.
Another way to increase memory bandwidth is by integrating memory controllers onto the CPU which would give each CPU its own memory. The downside of this is that for one CPU to access another CPUs memory, it has to go over the interconnect which can be really slow. For example, CPU1 needs to access RAM from CPU4. The interconnects are pretty expensive so its not feasible to have one to every CPU.
Note: I skipped writing about the internal details of RAM because I am saving it for another post.