Information retrieval: From the traditional way to the Web

10 avril 2005

10 avril 2005, 20:00

Commentaire(s)

20:00

Commentaire(s)

Par

Partager cet article

Information retrieval: From the traditional way to the Web

lexpress.mu | Toute l'actualité de l'île Maurice en temps réel.

(Part II)

As opposed to other forms of retrieval, the Web makes use of sophisticated information retrieval tools known as search engines. A search engine is simply ?a web site used to easily locate Internet resources? (IT Portal, 2000). Search engines have facilitated the information retrieval process by adopting techniques such as Artificial Intelligence, Bayesian Statistics and probability theory, weighting and also, query by example (see Definitions). We can add that without search engines, information retrieval would be impossible. The Web relies upon search engines just like libraries rely upon catalogues.

A search engine would consist of three parts, namely, an interface, an index and the Web crawler or spider. The interface is the Web page where one would normally formulate his/her queries whereas the index would be the database operating behind the Web page. The crawlers or spiders are programs that would crawl throughout the Web, visiting each site and gathering information. The role of the search engine is to provide more control for the user in performing a search.

Those search engines make use of the index to fetch terms of the query. The higher the data in the index, the higher would be the number of hits. However, the size of the index would vary from search engine to search engine but the bigger the index the better and the more often it is updated the better. Search engines are different in nature to electronic databases or library catalogues.

Search engines would include a number of free Web pages from around the world since no search engine would include every Web page whereas electronic databases would include citations to some of the articles published in a particular subject or journal. The latter may be a fee-based service. Library catalogues of whatever format would record the items of libraries. The advantage with electronic databases and library catalogues is that they are regularly updated whereas search engines do not have a definite timeframe as to when they are updated and contain links that no longer exist.

Deeper search

As said by Hersh (1998), ?unlike MEDLINE and CD-ROM textbooks, the Web is not a single database?. Therefore, given the size of the Web, no single search system will be able to search the entire Web. As far as the searching process is concerned, search engines would make use of keywords or subject headings that they have themselves established whereas library catalogues would make use of standard subject headings such as the Library of Congress Subject Headings.

Carrying out a search using a search engine would be different from that of a catalogue in the sense that we would hardly find any fields for titles, names and subjects in a search engine. We can do a title search using a search engine but it will operate on the title as is the title and not all Web sources have proper titles. Whilst searching in a library catalogue, knowing a title is enough and we can get to the source directly.

But nevertheless, with a search engine, the depth of searching is deeper than with a library catalogue. For instance, search engines would search parts of a book if the book is available on the Web but a library catalogue would search the book as a whole and it is up to the user to find the relevant chapters.

Search engines also present the hits in order of relevance but it is to be noted that only the person doing the search can judge the relevance of a document.

As far as OPACs are concerned, most would present result sets sorted by either author, title or in chronological order. As opposed to other forms of information retrieval tools, search engines provide us with more up to date information at the click of a mouse despite the fact that not all the information are useful ones. Once someone writes a piece of a material and puts it online, anyone in the world will be able to reach it.

Searching a catalogue v/s searching the Web

The principles of searching a library catalogue, is also very different from carrying out a search on the Web. We are familiar with the library catalogue which would consist of a call number, a title or author or subject entry and all these in a standardised format. If the user knows one of this information, then the user will be able to retrieve the exact information. However, searching the Web would depend upon keywords and the Boolean operators and, ?or? and ?not? in order to either broaden or narrow a search.

The Boolean method is a fast and fairly easy method of retrieval used in search engines provided the user knows how to do the search. However, the problem is that the user should have some knowledge of the search topic in order for the search to be efficient and effective. If the user enters the wrong term in the search, then the relevant documents might not be retrieved.

As opposed to a search in a library catalogue, with the Web, the information is only at a click of the mouse. With a library catalogue, one must be physically present in the library, carry out the search, memorise the call number and go to the shelves to retrieve the book or document. The next problem is that the document might not be on the shelves. With the Web, one can have access to every other document on the globe provided the document is online. The main problems are the quantity and quality of information.

That is to say, a lot of ?rubbish? is being published on the Web whereas in a library, we would not get that sort of problem. Given the amount of information on the Web, searching for the appropriate and relevant document is no easy task. With a single search on the Web, one might get millions of hits and opening every page would be time-consuming.

Library catalogues, somehow, whether online or manual, provide the user with more bibliographic information, thus facilitating information retrieval. In a catalogue for example, one would get some very brief bibliographic information for the material searched. On the Web, one would have recognised the fact that the hits give us only one line of information that is at times even unintelligible until you click on the hit in order to see what it is.

Tara Héléna LAM

DEFINITIONS

Understanding key words

■ 1. Artificial Intelligence is a mathematical program designed to assist in decision-making and information retrieval. It aims at thinking and reasoning like the human brain. 2. Bayesian statistics was created by Rev. Thomas Bayes, an 18th century mathematician. His theorem is based upon the use of mathematical theories to explain possibilities. 3. Probability theory as its name implies is based upon chance. 4. Weighting is the ability of the system to place documents containing the query terms higher up on the list of hits. 5. Query by example is the ability to search through the use of examples.