Content analysis is a research method that uses a set of categorization procedures making valid and replicable inferences from data (textual as well as graphics) to their context. In other words, content analysis is a research
tools used to determine the presence of words or concepts in the text. World Wide Web has become a major information resource for everybody. With a wide range of websites and webpages, it becomes increasingly important to have effective techniques to describe their contents and help the users to search their required information. The content analysis in web environment involves the categorization, splitting the content into meaningful units. Content Analysis in WebPages and Websites
Content analysis in Web tries to study the nature of WebPages, websites and the permanence and consistency of WebPages and websites. Web documents are transitory. But, how transitory are they? How often do they change, what changes, and does it matter? How permanent are web page, web sites, and server level domains? What is the death rate of each? How often do they move? Do different types of Web pages, Web sites and domain behave differently? It is a fact that quite often the content of Web pages change. It is true that the classification and categorization are the two important elements of content analysis. Therefore, we should see how these two elements can be applied in Web documents. Web entities can be classified using a variety of
markers.
First set of markers consists of the elements or fragments incorporated into the URLs. These markers include the domain names, directory structure of the site, semiotic tag patterns, unusual ports, the use of tidles, to attach documents to Web sites. Second set of markers can be derived from analysis of Web and page size, object counts. This set also includes the hypertext link pattern and the relationship of any given pages to other members of the site set. These markers can be used to build Web page and site classification schemes. These
can also be utilized to help understand and identify Web sites and page consistency and permanence behavior.
Web sites are inherently more permanent than specific Web Pages, since Web Sites consist of web pages. Different web pages types will manifests different permanence and consistency characteristics. Because of the changing nature of what they point to, navigational pages change on aggregate more often than do “content”. However, content Pages are less permanent than navigation because navigation pages must be point to new content, while content pages need not point to navigation.
For content analysis in Web Sites and Web Pages, one should look into a number of aspects of pages including the source and target page's description, link anchors and their context and the location and availability of target pages. For both the source and target pages, it is essential to collect the URLs, the header and titles actually display on the page (if any) and a description of the pages. Description includes the page types, and the percentage of occurrence of page types and page components.
Below picture provides the brief description of them.