Preamble - Introduction (continued)

There are several ways to deal with the problems that have just been described. Most of the current solutions are of a strong ad hoc nature. By means of programs that roam the Internet (with flashy names like spider, worm or searchbot) meta-information [1] is being gathered about everything that is available on it. The gathered information, characterised by a number of keywords (references) and perhaps some supplementary information, is then put into a large database. Anyone who is searching for some kind of information on the Internet can then try to localise relevant information by giving one or more query terms (keywords) to such a search engine. [2]

Although search engines are a valuable service at this moment, they also have several disadvantages (which will become even more apparent in the future). A totally different solution for the problem as described in section 1.2.1, is the use of so-called Intelligent Software Agents. An agent is (usually) a software program that supports a user with the accomplishment of some task or activity. [3]

"In the future, it [agents] is going to be the only way to search the Internet, because no matter how much better the Internet may be organised, it can't keep pace with the growth in information..."

Bob Johnson, analyst at Dataquest Inc.

Using agents when looking for information has certain advantages compared to current methods, such as using a search engine:

Search Engine feature: Improvement(s) Intelligent Software Agents can offer:

1. An information search is done, based on one or more keywords given by a user. This presupposes that the user is capable of formulating the right set of keywords to retrieve the wanted information. Querying with the wrong, too many, or too little keywords will cause many irrelevant information ('noise') to be retrieved or will not retrieve (very) relevant information as it does not contain these exact keywords; Agents are capable of searching information more intelligently, for instance because tools (such as a thesaurus) enable them to search on related terms as well, or even on concepts.
Agents will also use these tools to fine-tune, or even correct user queries (on the basis of a user model, or other user information);

2. Information mapping is done by gathering (meta-)information about information and documents that are available on the Internet. This is a very time-consuming method that causes a lot of data traffic, it lacks efficiency (there are a lot of parties that use this method of gathering information, but they usually do not co-operate with others which means that they are reinventing the wheel many times), and it does not account very well for the dynamic nature of the Internet and the information that can be found on it; Individual user agents can create their own knowledge base about available information sources on the Internet, which is updated and expanded after every search. When information (i.e. documents) have moved to another location, agents will be able to find them, and update their knowledge base accordingly.
Furthermore, in the future agents will be able to communicate and co-operate with other agents (such as middle layer agents). This will enable them to perform tasks, such as information searches, quicker and more efficient, reducing network traffic. They will also be able to perform tasks (e.g. searches) directly at the source/service, leading to a further decrease of network traffic;

3. The search for information is often limited to a few Internet services, such as the WWW. Finding information that is offered through other services (e.g. a 'Telnet-able' database), often means the user is left to his or her own devices; Agents can relief their human user of the need to worry about "clerical details", such as the way the various Internet service have to operated. Instead, he or she will only have to worry about the question what exactly is being sought (instead of worrying about where certain information may be found or how it should be obtained). The user's agent will worry about the rest;

4. Search engines cannot always be reached: the server that a service resides on may be 'down', or it may be too busy on the Internet to get a connection. Regular users of the service will then have to switch to some other search engine, which probably requires a different way to be operated and may offer different services; As a user agent resides on a user's computer, it is always available to the user.
An agent can perform one or more tasks day and night, sometimes even in parallel. As looking for information on the Internet is such a time-consuming activity, having an agent do this job has many advantages, one of them being that an agent does not mind doing it continuously. A further advantage of agents is that they can detect and avoid peak-hours on the Internet;

5. Search engines are domain-independent in the way they treat gathered information and in the way they enable users to search in it. [4] Terms in gathered documents are lifted out of their context, and are stored as a mere list of individual keywords. A term like "information broker" is most likely stored as the two separate terms "information" and "broker" in the meta-information of the document that contains them. Someone searching for documents about an "information broker" will therefore also get documents where the words "information" and "broker" are used, but only as separate terms (e.g. as in "an introductory information text about stock brokers"); Software agents will be able to search information based on contexts. They will deduce this context from user information (i.e. a built-up user model) or by using other services, such as a thesaurus service. See chapter four and six for more detailed information about this;

6. The information on Internet is very dynamic: quite often search engines refer to information that has moved to another, unknown location, or has disappeared. Search engines do not learn from these searches [5], and they do not adjust themselves to their users.
Moreover, a user cannot receive information updates upon one or more topics, i.e. perform certain searches automatically at regular intervals.
Searching information this way, becomes a very time-consuming activity. User agents can adjust themselves to the preferences and wishes of individual users. Ideally this will lead to agents that will more and more adjust themselves to what a user wants and wishes, and what he or she is (usually) looking for, by learning from performed tasks (i.e. searches) and the way users react to the results of them.
Furthermore, agents are able to continuously scan the Internet for (newly available) information about topics a user is interested in.

The precise characteristics of agents are treated in more detail in chapter two. Chapter three will focus on the practical possibilities of agents.

The Internet keeps on growing, and judging by reports in the media the Internet will keep on growing. The big threat this poses is that the Internet will get too big and too diverse for humans to comprehend, let alone to be able to work on it properly. And very soon even (conventional) software programs will not be able to get a good grip on it.
More and more scientists, but also members of the business community, are saying that a new structure should be drawn up for the Internet which will make it more easily and conveniently to use, and which will make it possible to abstract from the various techniques that are hidden under its surface. A kind of abstraction comparable to the way in which higher programming languages relieve programmers of the need to deal with the low-level hardware of a computer (such as registers and devices).

Because the thinking process with regard to these developments has started only recently, there is no clear sight yet on a generally accepted standard. However, an idea is emerging that looks very promising: a three layer structure. [6] There are quite a number of parties which, although sometimes implicitly, are studying and working on this concept. The main idea of this three layer model is to divide the structure of the Internet into three layers [7] or concepts:

1. Users;

2. Suppliers; and

3. Intermediaries.

The function and added-value of the added middle layer, and the role(s) agents play in this matter, are explained in chapter four.

There are agents in many shapes and sizes. As can be concluded from the preceding text, this thesis will deal mainly with one special type of intelligent software agents, namely those that are used in the process of information supply and demand. When, in the forthcoming sections of this thesis, the term "agent" is used, usually these "information agents" are meant. However, many things that are said, apply to the other types of agents as well.

[1] For example, the gathering programs that collect information for the Lycos search engine, create document abstracts which consist of the document's title, headings and subheadings, the 100 most weighty words, the first 20 lines, its size in bytes and the number of words.
[2] There are many Search Engines on-line on the Internet. These search engines allow a user to search for information in many different ways, and are highly recommended web search tools for the time being. To get an idea of the kind of Search Engines that are available at this moment, check this comprehensive list by C. Steele.
[3] There are many different kinds of software agents, ranging from Interface agents to Retrieval agents. This thesis will be mainly about agents that are used for information tasks (such as offering, finding or editing all kinds of information). Many things that are said about agents in this thesis do, however, also apply to the other kinds of agents. However (for briefness' sake), we will only concern ourselves with information agents in this thesis.
[4] Users do not directly search the information on the Internet itself, but the meta-information that has been gathered about it. The result of such a search, is not the meta-information itself, but pointers to the document(s) it belongs to.
[5] If a document is retrieved which turns out to be no longer available, the search engine does not learn anything of this happening: it will still be retrieved in future sessions. A search engine also does not store query results, so the same query will be repeated over and over again, starting from scratch.
[6] As opposed to the more or less two layer structure of the current Internet (one layer with users and another layer with suppliers).
[7] The term "layers" is perhaps a bit misleading as it suggests a hierarchy that is not there: all three layers are of equal importance. Thinking of the layers in terms of concepts or entities may make things more clearer.

previous page next page to the chapter's TOC to the main TOC

"Intelligent Software Agents on the Internet" - by Björn Hermans