Array of processors

Thread Starter

bkelly

Joined Dec 3, 2021
5
Has anyone looked into or know of the possibility of creating an array of processors? The best example is the Raspberry PI computer 3 modules. I would like something like a carrier board capable of holding 4 to 16 of the modules with, maybe, an ethernet interface. In that manner, multiple carrier boards could work together to become what might be called a small super-computer. I am a software engineer, not hardware. I have never programmed multiple processors but do have some projects I would like to start.
 

Papabravo

Joined Feb 24, 2006
21,228
Has anyone looked into or know of the possibility of creating an array of processors? The best example is the Raspberry PI computer 3 modules. I would like something like a carrier board capable of holding 4 to 16 of the modules with, maybe, an ethernet interface. In that manner, multiple carrier boards could work together to become what might be called a small super-computer. I am a software engineer, not hardware. I have never programmed multiple processors but do have some projects I would like to start.
It is actually an old idea from the 1960's

https://en.wikipedia.org/wiki/ILLIAC_IV

Only one was ever built and by most metrics it was a failure. It did set the stage for later developments that were technically and commercially successful. One of the side projects was FORTRAN like language for programming massively parallel operations.
 
Last edited:

nsaspook

Joined Aug 27, 2009
13,315
These types of RPi clusters really only have a benefit to education stuff. What makes computers super is super-speed I/O back-planes and memory capacity for IPC in concert with CPU operations. A few standard surplus x86 servers with gpu cards would likely easily better (bang-for-the-buck) the watt per compute power cost of a comparable RPi cluster.
 
Last edited:

ApacheKid

Joined Jan 12, 2015
1,619
Has anyone looked into or know of the possibility of creating an array of processors? The best example is the Raspberry PI computer 3 modules. I would like something like a carrier board capable of holding 4 to 16 of the modules with, maybe, an ethernet interface. In that manner, multiple carrier boards could work together to become what might be called a small super-computer. I am a software engineer, not hardware. I have never programmed multiple processors but do have some projects I would like to start.
Use a videocard, these days these are exactly that.
 

ZCochran98

Joined Jul 24, 2018
304
I one read that such video processors are popular for mining crypto currencies.
That is correct. That's why gamers and researchers - people who actually want to use a video card (or GPU nowadays) for useful or at least entertaining massive parallel processing capabilities - have been unbelievably frustrated over the past year, as the crypto miners (or scalpers) have set up bots to sweep every single retailer for their GPUs. They then work them into oblivion and then resell them with degraded performance. I managed to get a new one at retail price simply because Newegg randomizes who gets to purchase one ....

I'd actually considered building the RPi cluster, but it doesn't quite have the performance capability of an actual supercomputer (due to data transfer speeds over ethernet between them and limited memory). They're really neat in concept, and even when you get one up and running.
 

Thread Starter

bkelly

Joined Dec 3, 2021
5
It is actually an old idea from the 1960's

https://en.wikipedia.org/wiki/ILLIAC_IV

Only one was ever built and by most metrics it was a failure. It did set the stage for later developments that were technically and commercially successful. One of the side projects was FORTRAN like language for programming massively parallel operations.
Yes, it can be called old, but when newer technology is not considered, the statement is quite misleading. Now the entire computer is available for $30, with 512 Meg of RAM and 4 Gig of Flash. That was not possible in the times of the ILLIAC. Further, the ISSIAC was single instruction multiple data.
 

Thread Starter

bkelly

Joined Dec 3, 2021
5
Yes, their flagship chip has over 10,000 processors!

Bob
That is a good option, for many things. I have not been able to determine how much memory is available for each of those GPUs on a video board. Can you really write general code or are the GPUs optimized for graphics? My searches have not been successful. Where might I start looking?
 

ZCochran98

Joined Jul 24, 2018
304
That is a good option, for many things. I have not been able to determine how much memory is available for each of those GPUs on a video board. Can you really write general code or are the GPUs optimized for graphics? My searches have not been successful. Where might I start looking?
You can write code for GPUs. NVIDIA's libraries for massive parallelization is called "CUDA" and works very well with C/C++. I believe AMD's Radeon GPUs use Vulkan, but I'm not sure how useful Vulkan is for general-purpose large-scale parallelization.

Memory on these cards also vary, depending on what generation of cards you get, but they typically vary between 6-20 GB (give-or-take) of built-in memory, but again: it depends on the card. You can typically find specs for them on NVIDIA's or AMD's websites (look for "VRAM").
 

Papabravo

Joined Feb 24, 2006
21,228
Yes, it can be called old, but when newer technology is not considered, the statement is quite misleading. Now the entire computer is available for $30, with 512 Meg of RAM and 4 Gig of Flash. That was not possible in the times of the ILLIAC. Further, the ISSIAC was single instruction multiple data.
Your query was ambiguous in the extreme. You said: "Has anyone looked into or know of the possibility of creating an array of processors? " 50-60 years ago qualifies as 'anyone' or 'possibility' as sure as there are little green apples. It's not my fault you did not chose a narrower time frame as you might have indicated with the word "recently".

The success was in the investigation of the proposition, not how many resources could be gathered together.
 

Thread Starter

bkelly

Joined Dec 3, 2021
5
Yes, it can be called old, but when newer technology is not considered, the statement is quite misleading. Now the entire computer is available for $30, with 512 Meg of RAM and 4 Gig of Flash. That was not possible in the times of the ILLIAC. Further, the ISSIAC was single instruction multiple data.
Further, the ISSIAC was single instruction multiple data. Should be: Further, the ILLIAC was single instruction multiple data. I did not see it soon enough to edit.
 

nsaspook

Joined Aug 27, 2009
13,315
Surplus server benchmarks for the house compute resource rack. For general processing needs it's hard to beat the price point in processing power and long term reliability of commodity computer arrays.
https://browser.geekbench.com/v5/cpu/compare/11494453?baseline=11494326 Threads disabled on one machine.
https://browser.geekbench.com/v5/cpu/compare/11494119?baseline=11494326 Threads disabled on one machine.
https://browser.geekbench.com/v5/cpu/compare/11494966?baseline=11494326 Thread enabled on both.
 
Last edited:

Thread Starter

bkelly

Joined Dec 3, 2021
5
Yes, their flagship chip has over 10,000 processors!

Bob
The GPU route looks good. But I cannot determine if it will work for general programming. What kinds of instructions does each GPU have? I am pretty sure it cannot do things like open a file. Thats a silly question, but, what can it do? How much memory does each have for code and for data?
 

ZCochran98

Joined Jul 24, 2018
304
Long post incoming.... Skip to the Tl;dr at the end if you don't want the long explanation.

.
.
.

GPUs are meant to work with the CPU - it can execute instructions and process large amounts of data that the CPU hands off to it. Like I mentioned in an earlier post, NVIDIA GPUs can be programed using CUDA, which is typically used with C/C++ as an external library (AMD GPUs use Vulkan, if I recall right). There are other languages that work with CUDA, to be sure; C/C++ is just what I learned. The amount of memory a GPU has depends on which GPU you're looking at, but its memory is almost exclusively used for data, not instructions, as the computer's main memory is used to hold the instructions, in most architectures.

The way it works is this (somewhat oversimplified):
A program is written in a standard programming language (usually C/C++) that makes use of the CUDA libraries and instruction sets. This is compiled and executed as normal. When the CPU encounters a command meant for the GPU, it passes off that instruction to the GPU. If it encounters a data transferring command, it will either request data from or send data to the GPU's memory (though many modern GPUs now can share the computer's main memory, so it has its own private memory plus the computer's memory). It can also send specific instructions to the GPU to start a process or wait for it to synchronize as well. While the GPU is running, the CPU can do other things in the background, but usually it's just waiting for data to be processed.

If you wanted to open a file to use the data within, you'd do that in the CPU part of the code and then transfer the data to the GPU and let it crunch on it. Then after the calculation, you'd have the GPU send the data back to the CPU, which then it can decide what to do with it. Executing things on a GPU is more like calling a subroutine (kind of) semi-asynchronously than actually "running" a program on it. Basically, having a GPU by itself is worthless; you need the CPU as well, which runs and controls the whole program. The GPU is just there to do "simple" tasks that can be distributed.

What I can say is this: GPUs are REALLY good for massive parallelization and number crunching, if a task CAN be split into a bunch of sub-tasks. HOWEVER, GPUs are NOT generally useful for "general purpose" programming or tasks that are extremely sequential (task C must come after task B, which must come after task A, and no earlier). So, whether you should invest in learning GPU programming depends on what kinds of problems you want to solve.

A common example of a use of a GPU is for image processing and graphics. Typically, you have a bunch of parts of an image or a graphic that all require the exact same calculation performed on it (for instance, image filtering, which applies math to each pixel of the image). You can do this in the CPU, but it'd take a long time for even a high-speed CPU to execute. Alternatively, you could send the image to the GPU and let each of its slower processors handle an individual pixel by itself, do all the calculations for that one pixel, and then return the processed data back to the CPU via the memory.

An example of something that, for many cases, is NOT a good use of a GPU, is something like solving a differential equation numerically. As each subsequent step of the calculation depends on the previous, it's VERY hard to solve a bunch of steps simultaneously. Instead, you'd use a single fast processor (the CPU) to calculate each step, which depends on each previous step. There are probably clever ways to get around this, but I'm personally not aware of them. Similarly, recursive problems are not good applications of GPUs, as only a single thread will be operating.

List-searching on unsorted lists is somewhere in-between; GPUs are good for them for large lists, but a waste for small lists (as it can be faster to just increment through the list on the CPU than to send the data to the GPU, let it prepare the threads, send the data to the threads, let them solve the problem, then have the threads send the data back to memory, and then have the memory transferred back to the CPU). List-searching is a bit "challenging" to properly implement on a GPU, but it can be done.

.
.
.

Tl;dr: look up CUDA and GPU programming. NVIDIA has a ton of resources on the subject. GPUs are good for many tasks, but not all. There are problems that are good for GPUs, there are ones not good for GPUs, and there are some that are size-dependent that are good for GPUs. They have their own memory (generally anywhere from 8-24 GB of on-board memory), but also share with the main computer memory (RAM) to make data transfer faster (it's literally called "shared memory"). GPU memory is NOT typically meant for instructions, as (as far as I know) they cannot natively just "run" a program. The CPU runs the program and hands off tasks to the GPU, and waits for a response from the GPU after it's done.

Hope this clears up the GPU question somewhat!

edit: here's a good introduction to CUDA programming, if you care.
 
Last edited:

ZCochran98

Joined Jul 24, 2018
304
I should also add: many modern CPUs also have multithreading/multiprocessing capabilities (my computer can run 64 independent threads at once, for instance [2 per processor core]), and each of the cores/threads in the CPU is typically faster than an individual processor core in a GPU. HOWEVER, the benefit of the GPU is that they're designed for massive thread counts, so while a CPU can have fast multithreading for small tasks, a GPU will beat them out by far on almost any large task. So if you don't have massive problems, take a look at multithreading on a CPU.
 

ZCochran98

Joined Jul 24, 2018
304
And it's likely that their programs are all in ROM for doing parallel processing and can't be altered by anyone using the chip.
Not primarily. NVIDIA's GPUs (the ones specified in that post) do have specific compute cores meant for specific graphics processing tasks, but a vast majority of the compute cores are "general-purpose" processors. The processors are split into groups of 32 (if I remember the number right) called a "warp" (which share a subsection of local memory for immediate access - like a cache), and all tasks are distributed amongst the warps from programs external to the GPU; no need for ROM on them except for the few specifically dedicated to graphics tasks.
 
Top