Dear Mr. Nilesh.
Please go through the following lines....
Thesauri for information retrieval are usually constructed by information experts, and have their own unique vocabulary defining different kinds of terms and relationships:
Terms are the basic semantic units for conveying concepts. They are usually single-word nouns, since nouns are the most concrete part of speech. Verbs can be converted to nouns – "cleans" to "cleaning", "reads" to "reading", and so on. Adjectives and adverbs, however, seldom convey any meaning useful for indexing. When a term is confusing, a “scope note” can be added to ensure consistency, and give direction on how to interpret the term. Not every term needs a scope note, but their presence is of considerable help in using a thesaurus correctly and reaching a correct understanding of the given field of knowledge.
"Term relationships" are links between terms. These relationships can be divided into three types: hierarchical, equivalency or associative.
• Hierarchical: relationships are used to indicate terms which are narrower and broader in scope. A "Broader Term" (BT) or hyperonym is a more general term, e.g. “Apparatus” is a generalization of “Computers”. Reciprocally, a Narrower Term (NT) or hyponym is a more specific term, e.g. “Digital Computer” is a specialization of “Computer”. BT and NT are reciprocals; a broader term necessarily implies at least one other term which is narrower. BT and NT are used to indicate class relationships, as well as part-whole relationships.
• The equivalency: relationship is used primarily to connect synonyms and near-synonyms. Use (USE) and Used For (UF) indicators are used when an authorized term is to be used for another, unauthorized, term; for example, the entry for the authorized term "Frequency" could have the indicator "UF Pitch". Reciprocally, the entry for the unauthorized term "Pitch" would have the indicator "USE Frequency". Used For (UF) terms are often called "entry points", "lead-in terms", or "non-preferred terms", pointing to the authorized term (also referred to as the Preferred Term or Descriptor) that has been chosen to stand for the concept. As such, their presence in text can be use by automated indexing software to suggest the Preferred Term being used as an Indexing Term.
• Associative: relationships are used to connect two related terms whose relationship is neither hierarchical nor equivalent. This relationship is described by the indicator "Related Term" (RT). Associative relationships should be applied with caution, since excessive use of RTs will reduce specificity in searches. Consider the following: if the typical user is searching with term "A", would they also want resources tagged with term "B"? If the answer is no, then an associative relationship should not be established.
Thesauri have been widely used in many applications such as information retrieval, natural language processing (NLP), and interactive agents. However, several problems, such as morphological analysis, treatment of synonymous and multi-sense words, still remain and corrupt accuracy on thesaurus construction methods. In addition, adding latest/miner words is also a difficult issue.
Standards and Manuals for Thesaurus construction:
The ANSI/NISO Z39.19 Standard of 2005 defines guidelines and conventions for the format, construction, testing, maintenance, and management of monolingual controlled vocabularies including lists, synonym rings, taxonomies, and thesauri.
Thesaurus Construction and Use: a practical manual. Jean Aitchison, Allan Gilchrist and David Bawden. London and New York: Europa Publications (2000).