The Laws of the Web: Patterns in the Ecology of Information
Author: Bernardo A. Huberman
Publisher: Cambridge, MA: MIT Press, 2001
Review Published: November 2003
The everyday person who visualizes the World Wide Web might see it as a morass of tangled, ever changing lines of information. These lines of information intersect and sometimes end at one virtual place or website which satisfies the searcher. Most research in this area has been performed through survey or user observation. However the best efforts only give a snapshot of the process while still lacking in the areas of prediction and analysis. The patterns of use on the Web can be viewed from the lenses of economics and physics.
Bernardo Huberman, Hewlett Packard Fellow at the HP Laboratories in Palo Alto, California, has seen patterns in the virtual world of the Web. Instead of tracking these patterns through surveys and questionnaires, he uses the Mother of Science, Mathematics. Specifically, Huberman uses theoretical models based on statistical mechanics and nonlinear dynamics. His book, however, is not aimed at the physicist or mathematician. Hubermanís intended audience is the social scientist. In his writing he also apologizes to those of us, who "wince at apparently long-winded explanations that could be compressed into a couple of equations" (viii). Be forewarned however that although this is an excellent primer on the dynamics of the Internet, this small 100-page book is as informationally dense as the mass of a collapsed star.
Huberman coins a new term, the "E-cology" (electronic ecology) of the Internet. This E-cology is the virtual substance in which the different components of the Internet interact. The E-cology contains for now four "micro-E-cologies" which include: 1) Growth, structure and linkage; 2) Sense of scale and methodology to study the social dynamics of the Web; 3) Regularities of surfing patterns, natural information foraging patterns; and 4) User interactions, Internet storms and congestion. Though the interactions of these four components are not covered in any depth in the book, there is an implied sense that the four "micro-E-cologies" do combine and produce systems with non-linear dynamics.
Growth, Structure and Linkage
Huberman uses a methodology to analyze the evolution and structure of the Web that looks at the relationships between the local properties of large distributed systems and their macro behaviors. Analysis of these behaviors cannot be accomplished by summing the partial system actions because the behavioral properties are non-linear. For example, to start this type of system, one would observe periods of equilibrium, or repetition of growth in cycles, then erratic behavior would emerge, and at some point perhaps equilibrium would be found again, but not the exact same equilibrium. Moreover, if you would start the system with the same conditions again, the results would not be replicated. The components that make up the Web, links, sites and pages display complex nonlinear dynamics.
To study these systems, Huberman uses the example of a stockbroker. An investor may look at the trading behavior of a stock broker to ascertain the brokerís methodology. However, this analysis will not return a method by which the investor could choose successful stocks, but only an understanding of how the stock broker makes their trading decisions. By the same token, studying the surfing behavior of an individual would not yield a workable theory about surfing in general.
Another piece of information adds to the mystery: That while there are a few web sites that have millions of pages, there are millions of sites that contain only a few pages. This diversity can be explained using a mathematical entity called the power law where the number of pages n is proportional to 1/nß, and ß is a number greater than or equal to 1. If ß is allowed to be negative or 0, we would find the pages of the Internet collapsing, or going toward 0 pages. The beta ß exponent also expresses the dynamics of web sites and their pages. Unfortunately, this element is glossed over in the book and needs further explanation if it is to be introduced to a more general audience.
For example, let us look at a web site with 500 pages. The formula elaborates the concept, as 1/5001 is the same as 500-1, or 0.002, now if we had a web site that had 10,000 pages that would be 1/100001 or 10000-1 or 0.0001. Those numbers tell us not how many sites had pages greater than x, but the number of sites whose pages are exactly x. This means that if one studies a system with a potential web site of 500 pages, there are 2 chances out of 1,000 (0.002) that a web site of a size of 500 pages will occur in that system. This would also follow for the site of 10,000 pages but the power of finding it is smaller, 0.0001. This shows that if one could predict the distribution of pages per site for a range of pages, then a prediction could be made for another range of pages. This concept also helps theorize the idea of how web sites change. For example, in a site of a million pages, it is not unusual to find that perhaps the site has increased or decreased by 100 pages on the next day. A site of just a few pages would not have changed at all in the same or slightly longer time period. We would have the number of pages, n, on a site on a given day which would be equal to the number on that site on the previous day plus or minus a random fraction of n. The information would assist in establishing such things as rules for search engines and predictions regarding how the Internet is growing.
Surfing the Web
According to Huberman, the surfing relationships found on the web resemble those living in a Small Worlds environment. The idea of Small Worlds is best exemplified by the "six degrees of separation" concept. Huberman explains this in the game "Six Degrees of Association," where a friend is asked to deliver a note to another friend but not directly. The note is circulated through six friends before reaching the targeted person. In the case of web surfing, links placed on a web site are affiliated with a personís specific interests. If you are looking for a specific item, you will go to web pages that have your informational interest, and similar information interests will carry links that will guide the web surfer to their information goal. If you have certain tastes you will join and be with people who have similar tastes and interests as you, at some point some of the people you know will know each other. A search engine that is focused on the Small Worlds concept will find one web page that is most popular for a certain topic and perform a cluster search from links on that one page.
How an individual surfs, according to Huberman, is a predictable process. It is predictable because it is considered an economic process. The price a surfer pays when looking at a portal site is time. This economic process uses the concept called temporal discrimination. Pages are linked to one another in multiple and complex ways within a site. For each path, the user spends an associated period of time at the site, which is dependent on the number of links and the amount of time that the user spends at each page. Time is the surrogate for the price the user will pay to access information.
While we are surfing the web for information and making decisions regarding how much time we wish to spend at each site or page, there is the issue of Internet congestion, which is another impact on the economically modeled process of Internet surfing. Huberman uses the analogy of the "Tragedy of the Commons" articulated by social scientist, Garret Hardin. The "Tragedy of the Commons" is illustrated by the following story. A group of farmers jointly own pasture which all of their sheep will graze. It is in the best interest for the group in total to manage their herds so that everyone can graze their herds equally. However, each member also realizes that it is in their best interest economically to maximize their herd on the commons and have a larger herd of sheep for profit. To add to this, each member also knows that it is possible for themselves and other members to take advantage of the fact that some members will maintain their herds at a smaller size and will increase the size of their herds to maximize profit. In other words, some members may attempt to increase their herd size though it will produce a bad outcome for the group as a whole.
This can be applied to Internet bandwidth and congestion. Think of the times you have tried to visit a web site only to have it take what seems forever to download. Sometimes you simply have to quit the process or click on the "stop" button on the browser and "refreshed" the page until the page you wanted finally loaded. Like the "commons" problem, we have people "storming" (increasing their herds) the Internet at certain times while there is a limited amount of bandwidth available. Some people will stop surfing and choose to surf again at a less crowded time. This allows others to take advantage of the released bandwidth and surf to their desired sites.
Huberman found that these "storms" do not occur regularly and that the spikes in congestion are distributed according to the bell-shaped normal distribution. This means that for the most part the congestion is low on the Internet and every once in a while a large peak in congestion takes place. Also this means that during the spikes, huge numbers of users must decide to access the Internet at the same time and when the spike drops off, another significant amount of users choose to stop surfing the Internet. This is mapping the behavior of surfing to game theory. In game theory, what users are emulating here is an unspoken "policy" of collective group strategy of conditional cooperation. This means that users maximize the utility of surfing for the group. For example, the surfer will cooperate and abandon their search if the utility of doing so is greater than the time it would take to get to the site and defect if is less. Users expect that while the Internet is congested, other users would act as they themselves would. If the Internet is congested now, then of course it will be faster at a later time period. Abandoning the search has more utility to this user, allowing other users to continue their searches.
Sharing information on networks has a game theory-related problem know as "free riders." For example, web sites like Gnutella, Napster, and Winmx offer users the ability to upload and download music. Aside from the copyright issues, there is another concern that in a distributed network where users are basically anonymous and when the user community becomes large, more persons will be consuming rather than producing. In other words, the upload of music files from different users will be effectively reduced while more users will simply download/consume the free music files offered. Rampant free riding may eventually leave the system useless. Huberman suggests that a solution to this is to create a reward system such as managed by SETI. SETI, the Search for Extra Terrestrial Intelligence, uses prestige or status to drive the participation in their program. Imagine being the person with the computer that finds signs of life in a signal from outer space; this would constitute great utility to the owner. It could be prestigious to be identified as the person serving the most music, provided the real name is hidden.
Related to both of the previous discussions in this section is the downloading of information. As you have probably noticed when you surf the Internet, there is a variability in information download times. What this involves is the trading of electronic packets of information between computers. Let us say that you are trading stocks one day. You place an order to either sell or purchase shares of stock. You tell your online broker that you want the order placed at the current market rate. You expect the online broker to sell or buy the stock at the rate you had on your screen. However, by the time that the packets with the order make it to the brokerís machine, the price has changed, possibly not to your advantage. This is due to information congestions on the Internet. What is needed, according to Huberman, are mechanisms that reduce both the typical travel time between any two computers and the time variability. One idea is to use an automated reload system mechanism. The mechanism has the possibility to compute the probability of the average time it takes to download a page or send a packet of information. Once we have the average time, then we can compute the variance around the average time. Most people click on the restart or refresh button to speed up (hopefully) the download process for the web page. We can compute the average time and variance for that average time. A "portfolio" of restart times is created and from each portfolio a waiting time is correlated. A software agent can be created which automates the refresh process. The software agent will determine when to resend the data over the network when the download times hit a certain level which is specified by the portfolio of restart times and their variances. This portfolio information can be dynamically recreated at a set time interval and the software agent has up to date information each time it is evoked.
At the end of the book, Huberman talks about the "frictionless economy" and the "winners-take-all" nature of the Internet. Many envision the Internet as a free market economy and a place of pure democracy. A frictionless economy is one with strong price competition, ease of search for best values, and low margins for the producers. Huberman argues that the digital economy is indeed frictionless and that this will continue if governments keep their regulations to a minimum. As for the "winners-take-all" aspect of the Internet, many believe that the Internet has democratic behavior. It does not take a large investment and infrastructure to create a new commercial web site. However, this should not imply that the equality in opportunity would yield equality in income. The web is very large and this condition makes it difficult for single individuals to find the best sites. What occurs is similar to real life. Individuals rely on social searches; otherwise, they might find themselves spending exorbitant amounts of time searching. The social search is the "word of mouth" approach. Simply put, a site was recommended by a friend or acquaintance.
Another issue is the vast arena of similar offers of the same product. How is the buyer to ascertain that s/he can obtain a good price and good quality? Buyers do this through the recognition of branding. Branding is a short encoding of product and company attributes; Amazon.com is a fine example. Amazon was the first on the Internet to offer bookselling and searching services. Their concentration on customer service and attention to getting their "name" out locked their "brand" into the minds of consumers. When one thinks of purchasing books online, usually the name Amazon comes in first, and then possibly Barnes and Noble or Buy.com. This branding translates into tangible assets as consumers equate brand-name recognition with good-will accounting, which means users will receive a quality product. In terms of branding, the first business to make a recognizable brand will receive the market share as has occurred with prominent web sites.
In sum, Bernardo A. Huberman's The Laws of the Web: Patterns in the Ecology of Information is a good briefing on the patterns of human and web behavior in and around the Internet. More care could have been taken with clustering like topics together and transitions showing how each topic is embedded in the previous and subsequent topic. Moreover, additional detail is needed in explaining formulas of prediction. Social scientists also work with statistical phenomena and if formulas are presented, they need to be fully and coherently explained rather than glossed over. Otherwise the reader is left wondering how this formula correctly models the proposed behavior.
Jeanette Burkett is a graduate student at the University of Washington, whose foci is Internet growth, information technology, and technology and culture. She is also a research assistant in the Center for Social Science Computation and Research. <email@example.com>
|HOME INTRO REVIEWS COURSES EVENTS LINKS ABOUT|
|©1996-2007 RCCS ONLINE SINCE: 1996 SITE LAST UPDATED: 12.10.2009|