In Google File System, computer hot spot files sequentially.?

Thread Starter

terabaaphoonmein

Joined Jul 19, 2020
111
hotspot-: region of computer program where a high proportion of executed instructions occur

Lazy space allocation-:https://stackoverflow.com/questions/18109582/what-is-lazy-space-allocation-in-google-file-system

With lazy space allocation, the physical allocation of space is delayed as long as possible, until data at the size of the chunk size (in GFS's case, 64 MB according the 2003 paper) is accumulated.
Large chunk size in GFS-:
=>A large chunk size, even with lazy space allocation has its disadvantages.
=> A small file consists of a small number of chunks, perhaps just one.
=> The chunkservers storing those chunks may become hot spots if many clients are accessing the same file.
=> In practice hotspots haven't been a major issue because our applications mostly read large multi chunk files sequentially.
I don't understand how hotspots are no issue when we read large multi chunk files sequentially. They say hotspots are issue if clients are accessing same small file(file of just 1 chunk).

I will represent scenario where small file=small no. of chunks is being accesed by multiple clients.


it makes sense why chunkservers will be hotspot in this case as they will be active if they are being accessed by multiple clients.
but it absolutely doesn't make sense when the research paper say " In practice hotspots haven't been a major issue because our applications mostly read large multi chunk files sequentially." What's the difference. If I imagine a scenario like above, here file is made up of multiple chunks and rest is same, what difference is made here?
 
Last edited by a moderator:

djsfantasi

Joined Apr 11, 2010
9,163
As far as understanding, a paragraph usually contains one aspect of the material under consideration.

In you example, first you have to identify the aspect (subject) being described.

Then, each sentence provides detail. And it expects the reader to mentally expand on the detail.

What are the properties of a large chunked file? For one, there are many chunks. And what about many readers happens? Since there are a larger number of chunks, the probability of many readers hitting the same chunk decreases. Assume 10 simultaneous reads. A small file of one chunk will be simultaneously hit by 10 people. But a large file of 1,000 chunks will have each chunk simultaneously hit by 0.1 people. A difference of 100%!

The point here is that in understanding a paragraph involves a mental extension.
 

Thread Starter

terabaaphoonmein

Joined Jul 19, 2020
111
As far as understanding, a paragraph usually contains one aspect of the material under consideration.

In you example, first you have to identify the aspect (subject) being described.

Then, each sentence provides detail. And it expects the reader to mentally expand on the detail.

What are the properties of a large chunked file? For one, there are many chunks. And what about many readers happens? Since there are a larger number of chunks, the probability of many readers hitting the same chunk decreases. Assume 10 simultaneous reads. A small file of one chunk will be simultaneously hit by 10 people. But a large file of 1,000 chunks will have each chunk simultaneously hit by 0.1 people. A difference of 100%!

The point here is that in understanding a paragraph involves a mental extension.
Here is me explaining what you told here via figure.
I still don't understand why and how this occurs-:
In practice hotspots haven't been a major issue because our applications mostly read large multi chunk files sequentially. Because here still same chunk can be accessed by multiple applications.
Oh I see. But here there is less chance of being big hotspot. There is chance of hotspot but small hotspots only compared to when there are less chunks of files.
Also do you assume that those 1 chunk of file is replicated or not?
 

Thread Starter

terabaaphoonmein

Joined Jul 19, 2020
111
Since there are a larger number of chunks, the probability of many readers hitting the same chunk decreases. Assume 10 simultaneous reads. A small file of one chunk will be simultaneously hit by 10 people. But a large file of 1,000 chunks will have each chunk simultaneously hit by 0.1 people. A difference of 100%!
Can you explain this?
 

djsfantasi

Joined Apr 11, 2010
9,163
Can you explain this?
Try yourself to think of a way to explain this. Maybe an analogy. Try before reading further.
.
.
.
.
.
.
.
.
.
.
Imagine you have a large barrel (file). In it, there is one tennis ball (chunk). Then, reach in blindfolded and grab the tennis ball (read file), Ok. Now put the ball back and get nine friends to join you. Then, have everyone grab the ball. There WILL be contention (hotspot). Now put 100 tennis balls into the barrel and you and your friends try to grab a ball. Most of the time, everyone will get a ball. Occasionally, there will be contention (hotspot) but it will be far less frequent.
 

Thread Starter

terabaaphoonmein

Joined Jul 19, 2020
111
Try yourself to think of a way to explain this. Maybe an analogy. Try before reading further.
.
.
.
.
.
.
.
.
.
.
Imagine you have a large barrel (file). In it, there is one tennis ball (chunk). Then, reach in blindfolded and grab the tennis ball (read file), Ok. Now put the ball back and get nine friends to join you. Then, have everyone grab the ball. There WILL be contention (hotspot). Now put 100 tennis balls into the barrel and you and your friends try to grab a ball. Most of the time, everyone will get a ball. Occasionally, there will be contention (hotspot) but it will be far less frequent.
makes sense so we aren't considering files are replicated i see.
this qn has been asked for 10 marks what should i write? the answer is simple
demerits are-:
internal fragmentation and hotspot formation. even in research paper it is written shortly not longly.
1641739422567.png
 
Top