world wide web statistics ?

Discussion in 'Computing and Networks' started by Mathematics!, Feb 18, 2012.

  1. Mathematics!

    Thread Starter Senior Member

    Jul 21, 2008
    1,022
    4
    I am curious would I be correct in saying
    surface web + deep/invisible web = whole world wide web.

    Or are their other section of the world wide web that don't fit into either category.

    My understanding is surface web is all the websites/webpages that a search engine like google can crawl /index.
    And invisible/deep web is all the websites that a search engine cann't reach.

    I am assuming that any website/webpage fits into one or the other.
    Is this reasoning correct or is their some stuff that doesn't fit into either of the categories for the world wide web? ( not talking about internet just the world wide web)

    Obviously US host the most webpages by this

    http://www.greatstatistics.com/hosting-countries-statistics.html

    I can find statistics on an approximation to the current amount of websites/webpages in the surface web world. Also a rough estimate on the invisible web so I am curious if those 2 figures I have where accurate would all I have to do is just add the 2 to get the approximate websites/webpages in all the world wide web.

    Or is their another category that websites/webpages can fall into that I am not aware of?
    Logically I think their is no excluding the middle and I have got ever possibility either showing up in one or the other but not both and at least one of them.
     
    Last edited: Feb 18, 2012
  2. thatoneguy

    AAC Fanatic!

    Feb 19, 2009
    6,357
    718
  3. Mathematics!

    Thread Starter Senior Member

    Jul 21, 2008
    1,022
    4
    That doesn't answer anything ?
     
  4. emc-2

    New Member

    Jan 29, 2010
    11
    1
    netcraft.com is a good site for finding stats about internet.
     
  5. bertus

    Administrator

    Apr 5, 2008
    15,647
    2,346
  6. Mathematics!

    Thread Starter Senior Member

    Jul 21, 2008
    1,022
    4
    alright but is my reasoning correct
    surface + invisible = whole world wide web

    Also we host over 30% of the whole world wide web. http://www.greatstatistics.com/hosting-countries-statistics.html
    The only significant others are United Kingdom and Germany all others countries are around 2 to 0.5 % ish
    So interms of web hosting we kill all the other countries in that. Of course that doesn't imply who may own the website.
    Most countries probably host the website in america but they own it...that is another story.
    But the point is we have the bulk of the world wide web/internet in america. So in theory if the internet ever had problems if we could save the america internet/web we would probably not lose to much info.

    Maybe I am wrong with that last statement... But I am assuming if it is over in another country chances are we have a mirror image of it.

    I also looked at the rate of increasing webpages / internet seems to me it is just getting exponentially bigger.

    Question
    Curious though I know we host the bulk of the web but how many webservers do we use approximately to host all the 30% or so website/webpages on. I know we host 1000's of websites on some clustered computers servers. Some times we only host one website on a webserver. But I am kind of looking for a figure on how many we use, what the average amount of websites hosted on a server is, and where the major bulk of are webservers are in america

    I know for serving up the pages the web server software is mostly apache , followed by IIS , ....on down
    Which from me being a web programmer for a little while make since ( not to much of a superize their )
     
    Last edited: Feb 19, 2012
  7. evilclem

    Member

    Dec 20, 2011
    118
    16
    When you say world wide web, are you referring to web servers, or websites that are linked to by other websites, or any service that is accessible through the internet?
     
  8. thatoneguy

    AAC Fanatic!

    Feb 19, 2009
    6,357
    718
    I run around 2 dozen servers responding to about 50 domain names for 4 companies. That is a rough idea of traffic. Most companies will have at least 3 or 4 domain names that will point to their home page, usually misspelled competitor domain names.
     
  9. Mathematics!

    Thread Starter Senior Member

    Jul 21, 2008
    1,022
    4
    I am referring to the whole thing both.
    But favoring knowing how many approx. webservers/servers their are to the approximate websites they host for the whole entire world.

    For example if the internet was only 1000 different sites.
    We could host all of the 1000 on one webserver , or 10 webservers with 100 ,...etc
    I know they probably very in different parts of the internet.
    But I am looking for an average of how many websites host to on one server to how many servers in the world.

    And if possible an average number of webpages per website.
     
  10. thatoneguy

    AAC Fanatic!

    Feb 19, 2009
    6,357
    718
    Several companies have switched away from the "machine slice" method of webservers to "cloud based". Taking the concept of an individual server at an individual location entirely out of the loop by virtualizing it. New companies to hosting are setting up as a cloud since it simply makes more sense now that the protocols are essentially carved in stone.

    Hosts now set up a large cluster of computers that behaves as one "cloud", so the load is distributed automatically, even geographically. When a client wants to what is today a "standard hosting package", which is webserver, database, and email, they simply pay for the exact hosting fees, and parts the entire cloud respond to requests related to the client's domain name(s).

    This is the evolution of computer clusters that were "sliced" or portioned out to various clients, based on what the clients expected for maximum storage and traffic. With Clouds, the actual machine is virtual, so limits and over / underestimating storage and traffic doesn't happen, the site still works, but the user gets billed for exactly what was used, compared to being billed for a "worse case" server capability every month.

    To the end Internet user, the result/look/behavior is identical. For the developer, it is a bit different, as direct calling of localhost database can't be done, instead, php needs to connect to the cloud's database service to run queries for data. That extra layer of abstraction is what allows realistic near100% uptime now, compared to redundant / mirrored servers with round robin DNS or similar load balancing of a couple years ago.

    There will still be hosts that slice out users with virtual machines or SELinux, and there will surely be dedicated servers for a good while into the future.

    Short answer, there really isn't a "standard number" of clients per machine, if you broke it down to small business clients per machine and large business clients for machines, as well as small bloggers/forums, and large blogs/forums, then estimates can be made. That's the problem with thinking about the hardware rather than the service provided by the hardware.
     
  11. Mathematics!

    Thread Starter Senior Member

    Jul 21, 2008
    1,022
    4
    Ok,
    So let me rephrase that.

    A static ip address is fix to a certain webserver say. (their is a one to one correspondence between a static ip and that server )
    Dns gives me the associated name of that server which most likely isn't one to one for major webservers)
    The webserver may have a whole cluster of other servers behind it with all the websites distributed on them which the server at the static ip address is able to serve up or delegate to the helper servers.

    Is their no way to get a list of all the websites a given ip address can possible server up or delegate to the helper servers to server up?

    I would think this would just be looking up all the virtual host set on that ip address's main webserver and traversing to those host/ip address and looking up their virtual host list ...so on and so forth as well as at each server traversed list the www subdirectories.

    For instance I have run a apache webserver on a static ip address before.
    And configured some virtual hosts on the apache webserver application to serve up websites from another internal server.

    But if you had a list of all the virtual hosts and all the www subdirectories for each of these servers that would be away to find out all possible websites at that time an ip hosted.

    SO I GUESS WHAT I AM SAYING IS THEIR AWAY TO LIST ALL THE POSSIBLE
    URLS/WEBSITES THAT A CLUSTER OF COMPUTERS CAN SERVE UP?

    A cluster of computers can be uniquely identified by the ip address of the main servers and the top level virtual hosts list!
     
  12. thatoneguy

    AAC Fanatic!

    Feb 19, 2009
    6,357
    718
    No, not without admin privs.

    The host command will give you the DNS entry for that IP, which appears in the X-Path of e-mail messages, but there is no command or list to find out how many domains a server or cluster is actually serving, unless you do a lot of research, like google.

    Essentially, look at every web site in existence, and save the IP address in addition to the domain name.

    It'd be cool to be able to do a search like search ip:xxx.xxx.xxx.xxx on google to search all the sites hosted by that IP.

    You can do the opposite, e.g. capacitor site:allaboutcircuits.com

    Remove that paragraph. I forgot about two cool tools. dig, and then I think:

    THIS IS WHAT YOU ARE LOOKING FOR
     
    Last edited: Feb 24, 2012
  13. Mathematics!

    Thread Starter Senior Member

    Jul 21, 2008
    1,022
    4
    Yes sort of but this does not totally work correct I believe.
    When I try typing 74.125.226.136 which is one of google.com site

    I don't get anything listed for websites ?

    Tried all of these below with nslookup or dig....
    And dig , axfr ,...and zone transfers are cool but normally they cann't be done (and some people would say they are a security issue to know the inside of network with this)

    But it is not very important their is other ways to secure things anyway.
    But I am not talking about that you know what I am talk about just the urls to ip ...

    Code ( (Unknown Language)):
    1. Server:     75.75.75.75
    2. Address:    75.75.75.75#53
    3.  
    4. Non-authoritative answer:
    5. www.google.com  canonical name = www.l.google.com.
    6. Name:   www.l.google.com
    7. Address: 74.125.115.105
    8. Name:   www.l.google.com
    9. Address: 74.125.115.147
    10. Name:   www.l.google.com
    11. Address: 74.125.115.103
    12. Name:   www.l.google.com
    13. Address: 74.125.115.106
    14. Name:   www.l.google.com
    15. Address: 74.125.115.99
    16. Name:   www.l.google.com
    17. Address: 74.125.115.104
    18.  
    19. Server:     75.75.75.75
    20. Address:    75.75.75.75#53
    21.  
    22. Non-authoritative answer:
    23. Name:   google.com
    24. Address: 173.194.43.14
    25. Name:   google.com
    26. Address: 173.194.43.6
    27. Name:   google.com
    28. Address: 173.194.43.1
    29. Name:   google.com
    30. Address: 173.194.43.2
    31. Name:   google.com
    32. Address: 173.194.43.3
    33. Name:   google.com
    34. Address: 173.194.43.4
    35. Name:   google.com
    36. Address: 173.194.43.0
    37. Name:   google.com
    38. Address: 173.194.43.5
    39. Name:   google.com
    40. Address: 173.194.43.9
    41. Name:   google.com
    42. Address: 173.194.43.7
    43. Name:   google.com
    44. Address: 173.194.43.8
    45.  
    46.  
    So is this some sort of trial lookup that currently doesn't work full for all ip's.

    In theory it wouldn't be to hard to accomplish if you where google.
    Since they already have all the links and most of the usable web spidered.
    They would just have to put the urls and associated ip address they belong to in a database. And to have it current all they would have to do is have a program update / delete records that don't exist any more. They could also take snapshots of the database so they can analyse how the links are changing from a usability / evolutionary stand point of the internet.

    Just saying would be nice to know whats where and this could be away of providing internet research or just researchers that use the internet to stay at the top of their fields. To get the current top level stuff going on in their research area...or what is currently known from different view points and what urls / sites hold what percentage of data and at what level.


    and 74.125.113.103 has only michellehopee.com when I do it I would think google.com should be listed some where ? Don't get where I am going wrong
     
    Last edited: Feb 25, 2012
  14. bertus

    Administrator

    Apr 5, 2008
    15,647
    2,346
  15. thatoneguy

    AAC Fanatic!

    Feb 19, 2009
    6,357
    718
    That would create major security issues, such as giving script kiddies a map with a big "X" on it for what to bog down if they want to get attention.

    I certainly wouldn't want that info in the wild, so to speak, which is probably why google, yahooo, bing, etc don't offer it, even if you are a paying subscriber for adwords or their other corporate services. You get reports of how you are doing relative to others your size, and they make suggestions based on other sites with your demographics to increase traffic, and even show you charts to see how you stack up in different areas, but they never show who you are stacking up against.
     
  16. Mathematics!

    Thread Starter Senior Member

    Jul 21, 2008
    1,022
    4
    Well , I think the benefit out weighs it.

    Since in these days we can with in seconds locate what address is causing the problem and deny packets from them...etc
    So I don't see to much of any issue... we could even have their ISP throttle the bandwidth if it kept occurring with out a relevant reason.

    Those days of DOS or DDOS are over/mostly over... from the tools we have to stop them today.

    The worst ones would be SYN Flood or Arp other then those we have a reasonable way to handle almost all other known denial of services based attacks. (as for stealing info that is more of a problem on the inside then the outside and if you don't want the outside to get at it don't put it on the internet that simple )

    And btw arp would only work on the network segment they where on or could remote into.
    So really only SYN Dos or SYN DDOS with a botnet could happen.

    Either way in that rare case they could just locate him and have a little talk with him. :)
    I am sure they could work something out.

    As for new creative unknown methods of attack those come very seldom for the general PC user more likely would be a company with elite specialisits and million $ equipment. Most home users won't beable to cause enough damage by todays standards... and how software/computers are evolving to handle outside attacks.

    In most places it is totally over secure to the point it is inconvinent for people to even use.

    Just my 2 cents
     
    Last edited: Feb 25, 2012
  17. praneshx

    New Member

    Jan 16, 2013
    1
    0
    @Mathematics: Ya that's right after i was gone through that site; i have understood the concept..
     
  18. tinamishra

    New Member

    Dec 1, 2012
    39
    1
    The term world wide web is often used on the internet, Internet is a global system which connect computers of various places, countries in a network and for that various mediums are used. While web is one of the service that is used and run on the internet having collection of text documents and resources, having hyperlinks on it. WWW is just collection of multiple websites, links, phones, networks and such interconnected things.
     
Loading...