Parallel Processing

Thread Starter

Macjohn Key

Joined Jun 18, 2015
13
Please I had and idea of bying processors chips and connecting them in parallel and then to a main computer. But i don't know how to go about
 

MrAl

Joined Jun 17, 2014
11,389
Hi,

What kind of processor chips are you talking about here?
Also, how many do you need in parallel?
Also, can your applications use parallel processors, as not many applications these days can use that?
Or, do you plan to run a lot of applications at the same time?

You can buy an 8 core processor from AMD, but keep in mind it is really 4 sets of dual cores, where each dual core has just one floating point unit. In the past every core had it's own floating point unit, but AMD 8 cores only have 4 FPU's so they have to share each FPU with two integer cores. That bites if you plan to run multicore software that needs intensive floating point.
If you plan to do a lot of integer processing then it doesnt matter that much because you wont be using the floating point units.

Multicore processing is good for multitasking, when you have to run several programs at the same time, or you have applications that are designed to use more than one core at a time. Many programs today still cant do that, but there are some that can. There's no way to make a program that doesn use multiple cores use multiple cores, unless you have access to the source code and are prepared to rewrite some of it.
Not all tasks can be broken down into parallel processing either. Anything that has to be done sequentially can not be done this way. Anything that can be divided can make good use of it though, like a chess program, but that chess program has to be written for multiple cores too.

If you use integer processing mostly and you can use parallel proessing and you have an application that can be broken down into parallel pieces, then you can gain a huge speed increase. For example, if you have 4 cores and your processor is 4GHz then your effective clock is almost 16GHz, and with 8 cores it is about 24GHz, almost. So many tasks get done faster. Not all however, as there may be a bottleneck such as when accessing disk, which will slow everything down anyway.
After all is said and done however, there is usually a noticeable speed increase when using programs that can take advantage of the multiple cores.

To build your own motherboard is just nuts though. To make it worth your while you'd have to use fast processors, and they wont tie together very well because signal paths have to be very short at modern day processor speeds. You might get 4 together if you are lucky, but the trace routing would be a nightmare, which includes the proper power supply bussing and down converters for each core. Good luck with that :)

Another idea is to just use networking. That means you'll have a computer for each 8 core processor. The more computer boxes you have, the more computing power you have. The tasks have to be broken down a little more carefully here though, because each computer will have a piece of the computation to do and the host computer will have to keep everything in order. This isnt that easy, but it would be quite cool to do.
Four computers each with an 8 core 4GHz processor would run roughly at 128GHz, if the tasks can be broken down properly. With 3GHz processors (much cheaper) it will still be around 96GHz.
If you cant break the problem down into pieces however, then you are back to 3 or 4GHz, the speed of one single core even though you have 32 of them now.
 
Last edited:

Papabravo

Joined Feb 24, 2006
21,159
You would do well to study the history of parallel processors and the quest for a suitable language to express parallel algorithms. IIRC it was a FORTRAN derivative. This history goes back almost 50 years. I know, it's really hard to believe.

https://en.wikipedia.org/wiki/ILLIAC_IV
https://en.wikipedia.org/wiki/STARAN
https://en.wikipedia.org/wiki/CDC_6600
https://en.wikipedia.org/wiki/CDC_6000_series
https://en.wikipedia.org/wiki/Cray-1
https://en.wikipedia.org/?title=Gene_Amdahl
https://en.wikipedia.org/wiki/Amdahl_Corporation

PS -- This project is so far from a "slap it together in a weekend of hacking" that you really need to think long and hard about tackling this kind of thing.
 
Last edited:

nsaspook

Joined Aug 27, 2009
13,086
For me to have an idea, what could be a task ideal for parallel processing?

And I mean cotinuous with no bottlenecks.
The narrow case of things like matrix transformations for computer graphics, cryptography, physics simulations where you apply the same operation on each independent part of a large data set for SIMD/SSE instructions and a sort of the middle case for GPU cards with many cores with many more threads per core.
The general case for MIMD SMP machines like multicore x86 or the arm7 RPi2 4 core machine with isolated program tasks running in the local core cache with no external resource contention from other cores.
 

atferrari

Joined Jan 6, 2004
4,764
Cray 1 - Interesting details about how important was the relative position of the boards to each other and the ECL logic with differential output...!

BTW, in actual practice, who is more aware of the reality of the other part: those that use these computers or those that build them? Maybe these last?
 

Papabravo

Joined Feb 24, 2006
21,159
I think both perspectives are relevant. There seems to be a stable positive feedback loop in which requirements are defined and implemented. I remember studying the Illiac IV in my computer architecture class in 1967. I also remember writing assembly language code for the CDC 6600. We have traveled an incredible distance since then and we are still standing on the shoulders of giants.
 

nsaspook

Joined Aug 27, 2009
13,086
BTW, in actual practice, who is more aware of the reality of the other part: those that use these computers or those that build them? Maybe these last?
It's usually the 'man' in the middle the 'architect' between pure hardware design and the software abstraction of the application programmer.

"Computer architecture, like other architecture, is the art of determining the needs of the user of a structure and then designing to meet those needs as effectively as possible within economic and technological constraints."
https://en.wikipedia.org/wiki/Fred_Brooks
 

ranch vermin

Joined May 20, 2015
85
test your efficiency after the code is done, 2 cores should be twice as fast or something went wrong. then if some guy says something about 2 cores sharing an fpu, and you used floating point, dont worry about what he said, but it could also be the reason, but i imagine you wouldnt get speed stuffups, it would garble your bits? :) i spose it depends on how they make it.
 

Thread Starter

Macjohn Key

Joined Jun 18, 2015
13
This is when ca get the single processors and connect them in parallel in networks. actually the software need lots of FDU. Software such as blender, cinema 4d and all these software needs. Thank you guys
 

MrAl

Joined Jun 17, 2014
11,389
Hello again,

An example of a good application area for parallel processing is a chess program, or other games that involve making moves like that (Checkers, etc.).

For example, for the opening move for white we might choose one of 12 different moves. But say we stick to our previously studied opening line of play which only includes three moves:
c4, d4, or e4.

The computer would have to evaluate c4 and then any of 12 different replies from black, and then any next move by white, to whatever level it does this first round to. Next, it would have to do the same evaluation except this time with the first move of d4. After that, e4. So for each first move it has to evaluate a long list of lines of possible play.
They all have only one thing in common, and that is that they are separate from each other (in the general case that is). The computer is free to evaluate either one of these first, second, and third. The only problem is, with a single core it must do these in sequence, but since they all can be evaluated individually, a four core machine can divide up the task by assigning c4 to one core, d4 to the next core, and e4 for the next core. This means it will be able to evaluate the entire set to a deeper level than would be possible with a single core in the same time span. If you only have one minute to make a move, you'll want as many cores as you can get. At the end you determine which move gave the best outcome numerically.

Numerical evaluation of ODE's can be broken up too, one core for each equation for example.

Some problems can not be broken up, such as the familiar:
E=I*R

but i suppose you can break up:
E=I*R+K*G

assigning I*R to one core and K*G to the other core. Likewise, we can break up:
Y=A*B*C*D

by doing A*B and C*D separately and then multiplying the result to get Y.

The time when this doesnt work is when the overhead of figuring out how to break it up exceeds the time that would be saved by breaking it up. For these simple examples it may not be worth it, but for more complicated equations it is definitely worth it.

I often do special picture processing on lots of files. Since we can do files in sets of 4 for example, each core of a 4 core machine can do one file, and the result is they get done almost 4 times faster than with only one core.
I have read that there is a slight loss in time, but it cant be too much from what i have seen. Of course it also depends on how the program accesses memory and especially disk, because they cant all read the same medium at the same time. It still speeds up though when the calculations are intense.
 
Last edited:

Thread Starter

Macjohn Key

Joined Jun 18, 2015
13
Hello again,

An example of a good application area for parallel processing is a chess program, or other games that involve making moves like that (Checkers, etc.).

For example, for the opening move for white we might choose one of 12 different moves. But say we stick to our previously studied opening line of play which only includes three moves:
c4, d4, or e4.

The computer would have to evaluate c4 and then any of 12 different replies from black, and then any next move by white, to whatever level it does this first round to. Next, it would have to do the same evaluation except this time with the first move of d4. After that, e4. So for each first move it has to evaluate a long list of lines of possible play.
They all have only one thing in common, and that is that they are separate from each other (in the general case that is). The computer is free to evaluate either one of these first, second, and third. The only problem is, with a single core it must do these in sequence, but since they all can be evaluated individually, a four core machine can divide up the task by assigning c4 to one core, d4 to the next core, and e4 for the next core. This means it will be able to evaluate the entire set to a deeper level than would be possible with a single core in the same time span. If you only have one minute to make a move, you'll want as many cores as you can get. At the end you determine which move gave the best outcome numerically.

Numerical evaluation of ODE's can be broken up too, one core for each equation for example.

Some problems can not be broken up, such as the familiar:
E=I*R

but i suppose you can break up:
E=I*R+K*G

assigning I*R to one core and K*G to the other core. Likewise, we can break up:
Y=A*B*C*D

by doing A*B and C*D separately and then multiplying the result to get Y.

The time when this doesnt work is when the overhead of figuring out how to break it up exceeds the time that would be saved by breaking it up. For these simple examples it may not be worth it, but for more complicated equations it is definitely worth it.

I often do special picture processing on lots of files. Since we can do files in sets of 4 for example, each core of a 4 core machine can do one file, and the result is they get done almost 4 times faster than with only one core.
I have read that there is a slight loss in time, but it cant be too much from what i have seen. Of course it also depends on how the program accesses memory and especially disk, because they cant all read the same medium at the same time. It still speeds up though when the calculations are intense.
Actually those software are vfx softwares so imagine the number of px image to be rendered. But then in a network will the program choose the core to use?
 

MrAl

Joined Jun 17, 2014
11,389
Hello there,

I am not exactly sure what you are talking about, but i write and run my own software for pic processing, so i know exactly what it is going to be doing, with any core.
In a network it also will depend on the software, and how much control it gives to the host. If i wrote the programs, i would make sure the host could tell the rest how to split up the cores.
I find that you dont have to specify which core it is using, just how to split them up between parts. The exception to that now with the AMD cores is if the program used the floating point unit a lot then it might be better to just use every other core, like 1,3,4,7 on an 8 core machine. Havent had to do that yet though because when i found out about the AMD cores i rewrote the programs to use integers only.
But also worth mentioning is there is a way to specify which core is used with a given program under Windows. So you could choose core 1,2,3,..8 for a given program that way, but i'd have to look it up again on how to do that. You can modify the shortcut for example, and that tells Windows which core to use for that program, using that shortcut. So you could specify core 3 for a program, or cores 3, 5, and 7 for a program if that program can use multiple cores. The code word is in 32 bit binary, so 0x0000000A would be for using cores 2 and 4 (hex A=binary 1010, which has bits 1 and 3 set which have numerical value 2 and 4 respectively).
 
Top