Dienst James R. Davis, Xerox INTERNET-DRAFT Carl Lagoze, Cornell July 1994 Expires December 1994 Dienst, A Protocol for a Distributed Digital Document Library
This document is an Internet Draft. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts.Last Revised: 3:00 PM 8 August 1994Internet Drafts are working documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as a "working draft" or "work in progress".
To learn the curent status of any Internet-Draft, please check the
1id-abstracts.txt
listing contained in the Internet-Drafts Shadow Directories on ds.internic.net (US East Coast), nic.nordu.net (EUROPE), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific Rim).This document is a DRAFT specification of a protocol in use on the internet. Distribution of this memo is unlimited.
This document is also available in ASCII .
This document describes Dienst, a protocol for communication with distributed digital library servers. This protocol provides an object-oriented interface to a document model, which allows a user to access complete documents or named sub-parts. It also supports multiple formats for documents. Dienst protocol messages are embedded within HTTP, the protocol used over the World Wide Web. Thus, anyone using a Web browser (e.g. Mosaic, Cello) has access to the services provided by Dienst.
The document model implicit in Dienst is that each document can be in many formats (e.g., TIFF, GIF, Postscript) and consists of a set of named parts. There are two orthogonal parts domains - 1) a physical domain where the parts are numbered pages and 2) a logical domain where parts are objects like chapters, tables, list of reference, and so on. The document model is extensible, in that we may define additional logical parts in the future.
Dienst supports an object-oriented interface to this library and document model - clients encode messages within Dienst requests that address the entire document collection, a particular document in the collection, or a particular part of a document in the collection. Each Dienst protocol request contains the name of the message, the particular document (DocID) to which it applies (if any), and the arguments (e.g. page number, format) for the method (if any).
Dienst messages address four types of digital library services, not all of which are necessarily supported by a particular digital library server.
The Dienst protocol is built on the framework of the Hypertext Transfer Protocol (HTTP) [HTTP] used on the World-Wide Web [WWW] . The advantages of piggybacking Dienst on HTTP are two-fold. First, Web browsers, such as Mosaic, are reasonably ubiquitous and, at this point, free, making digital library services available to a broad constituency. Second, there is substantial momentum for further development of the functionality of the Web and its constituent technologies. Many components of this developing technology are of direct interest to the digital library community, especially those dealing with authorization and authentication. Advancements in these areas will directly benefit Dienst clienst and servers.
A Dienst request is encoded within a Uniform Resource Locator (URL) [URL] . Specifically, the Dienst message is placed in the "path" portion of the URL, which is opaque to the HTTP client and, as defined in the URL specification, "may define details of how the client should communicate with the server, including information to be passed transparently to the server without any processing by the client."
Each Dienst request that addresses a specific document includes a unique identifier, or DocID, for the document. This document identifier, as specified in the Dienst BNF grammar that follows, consists of the following components:
ISBN_NA:0-395-32943-4 CORNELLCS:TR94-1418 IANA_NA:foobar:94-5-2
The authors recognize that the DocID, as defined here, fits the requirements for Uniform Resource Names as defined in [URN] . When the syntax of a URN is standardized, it will be incorporated into the Dienst protocol either as a replacement or supplement to the existing DocID's.
Responses to Dienst requests are formatted as HTTP responses.
Thus, the standard components of an HTTP response such as status-code
,
content-type
, and so on are returned. The actual Dienst
response is encoded in the response data
of the HTTP response.
Refer to the section that describes server responses to Dienst methods
and the one that describes error responses for complete information on
status-codes
, content-types
and response data
for each request.
An initial version of Dienst and a prototype implementation were developed as part of the Computer Science Technical Report (CSTR) project, an ARPA-sponsored, CNRI-directed effort to create an online digital library of technical reports from the nation's top computer science universities. A description of this initial version is in [DIENST] .
path
portion of a
URL. An informal BNF syntax of the protocol is as follows (see the
Methods section for a description of each method in the protocol).
The protocol is case-sensitive - this is consistent with the rest of
the path
portion of the URL.
Terminals in the BNF grammar are distinguished by names that
are all lower case (e.g., index
), non-terminals are mixed-case (e.g., Request
). The "special" characters ";" (semicolon), "/" (forward
slash), "&" (ampersand), "=" (equals), and "?" (question mark) are literals in
protocol requests. Finally, any non-terminals that are optional are
enclosed within brackets (e.g., [PageNoArgument]
). When an optional item is not
included in an protocol request, the "special" character that preceeds
the optional item (if any) is omitted.
Request = ProtocolVersion/RequestClass RequestClass = MiscRequest | IndexRequest | RepositoryRequest | UserInterfaceRequest MiscRequest = misc/MiscReqMethod MiscReqMethod = MISCServicesMethod | MISCTimeMethod | MISCVersionMethod MISCServicesMethod = services MISCTimeMethod = time MISCVersionMethod = version IndexRequest = index/IndexReqMethod IndexReqMethod = INDContentsMethod | INDSearchMethod INDContentsMethod = contents INDSearchMethod = search/SearchType?SearchCriteria SearchCriteria = <see description of INDSearchMethod> RepositoryRequest = rep/RepReqMethod RepReqMethod = REPDocFormatsMethod | REPDocPageMethod | REPDocPartMethod | REPDocPrintMethod REPDocFormatsMethod = DocID/formats REPDocPageMethod = DocID/page?PageArguments REPDocPartMethod = DocID/DocumentPart REPDocPrintMethod = DocID/print?PrintArguments PrintArguments = PrintPagesArguments&PrintDestArguments PrintPagesArguments = pages=all | PagesSomeArguments PagesSomeArguments = pages=some&PageRangeArguments PageRangeArguments = from=PageNumber&to=PageNumber PrintDestArguments = destination=download | destination=printer&printer=PrinterName PrinterName = <see description of REPDocPrintMethod> UserInterfaceRequest = ui/UIReqMethod UIReqMethod = UIDocOverviewMethod | UIDocPageMethod | UIDocPrintMethod | UISearchIntMethod | UIDocSummaryMethod UIDocOverviewMethod = DocID/overview?[PageNoArgument] PageNoArgument = PageNumberArgument UIDocPageMethod = DocID/page?PageArguments UIDocPrintMethod = DocID/print UISearchIntMethod = search UIDocSummaryMethod = DocID/summary ProtocolVersion = dienst/1.0 DocID = Naming_Authority:Publisher_ID:ID | Naming_Authority:ID | RFC_1357_Publisher:ID Naming_Authority = <refer to DocID description above> Publisher = <refer to DocID description above> RFC_1357_Publisher = <refer to DocID description above> ID = <refer to DocID description above> PageArguments = PageNumberArgument&[FormatArgument] PageNumberArgument = page=PageNumber PageNumber = <page number as a positive integer> FormatArgument = type=MimeTypeValue MimeTypeValue = MIMEType;[MIMEParameters] MIMEType = <see [MIME]> MIMEParameters = <see description of REPDocPageMethod> DocumentPart = body <see REPDocPartMethod below> SearchType = rfc-1357
Status Code
- All responses described in this section
have a status code of 200 ("OK"). Refer to the next section for error
responses with a non-200 code.
Content Type
- The MIME-type of of the Response Data is
described for each Dienst method.
Response Data
- The response to each method has a
unique data content as described below.
The description of each method is followed by an example of a REQUEST and RESPONSE. Long lines in the examples (greater than 72 characters) are broken up with the continuation lines distinguished by having a leading space.
text/plain
document that
contains the services that it provides, one per line. The possible
services are those listed in the BNF grammar (misc, index, rep, and
ui).
REQUEST:
dienst/1.0/misc/servicesRESPONSE:
misc ui
text/plain
document that
contains a single line which is the local time as defined in RFC 822
[CROCKER] , Section 5.1 and
modified in RFC 1123 [BRADEN] , Section 5.2.14.
REQUEST:
dienst/1.0/misc/timeRESPONSE:
20 June 94 12:36:47 -0500
text/plain
document that
contains a single line which is the Dienst protocol version
that it supports (e.g. 1.0).
REQUEST:
dienst/1.0/misc/versionRESPONSE:
1.0
text/x-dienst-response
document
consisting of records containing meta-information on all the
documents that it indexes. The format of this meta-information
follows the encoding proposed for Uniform Resource Characteristics (URC)
[URC] . Each record will consist of a set of pairs in the format
[attribute_name]:[value]The attribute_names may be returned are listed below.
REQUEST:
dienst/1.0/ind/contentsRESPONSE:
X-publisher:CORNELLCS X-DocID:93-1334 title:Approaches to Passage Retrieval in Full Text Information Systems author:Salton, Gerard author:Allan, J. author:Buckley, C. X-date: March 1993 URL:https://cs-tr.cs.cornell.edu.edu/dienst/1.0/rep/CORNELLCS:TR93- 1334/body?type=application/postscript x-pages:25 X-publisher:CORNELLCS X-DocID:94-1420 title:Lower Bounds for Dynamic Connectivity Problems in Graphs author:Fredman, Michael L author:Rauch, Monika H. X-date:April 94 URL:https://cs-tr.cs.cornell.edu/dienst/1.0/rep/CORNELLCS:TR94-1420/ body?text/plain
text/x-dienst-response
document
formatted in the same fashion as for the INDContentsMethod. The records
returned are documents that meet the SearchCriteria for the SearchType
included in the request. The only currently supported SearchType is
rfc-1357
. For this SearchType, SearchCriteria has the form
term[&terms], where term has the form name=value. "name" is an
rfc-1357 field tag (e.g. TITLE, AUTHOR, ABSTRACT) and value is "text" to
search for in the respective field in the rfc-1357 bibliographic entry
of a document. For a document to meet the rfc-1357 SearchCriteria, all
terms must be true, in other words terms are connected by "and". Note
that the HTTP protocol requires that any special characters in "values"
(e.g. space, question mark, etc) must be represented by escape
sequences.
REQUEST:
dienst/1.0/ind/search/rfc-1357?author=rus&abstract=mobile+robotRESPONSE:
X-DocID:91-1254 title:Task-Level Planning and Task-Directed Sensing for Robots in Uncertain Environments author:Donald, Bruce Randall author:Jennings, James author:Brown, Russel X-date: 12 Jun 91 URL:https://cs-tr.cs.cornell.edu/dienst/1.0/rep/CORNELLCS:TR1-1254/ body?type=application/postscript X-publisher:CORNELLCS X-DocID:94-1429 title:Analyzing Teams of Cooperating Mobile Robots author:Donald, Bruce Randall author:Jennings, James author:Rus, Daniela X-date:10 Apr 94 URL:https://cs-tr.cs.cornell.edu/dienst/1.0/rep/CORNELLCS:TR94-1420/ body?type=application/postscript
text/x-dienst-response
document that
consists of a list of tuples: a URL, a Content-Type,
an optional Content-Length, and a number of pages (which may be
required as specified below). The list indicates the
formats in which the server is
prepared to deliver the specified document. This list is encoded in
the manner proposed for URC's. Note the Content-Length is optional
and can only be determined if the data for the format is stored in a
single file. The number of pages is required if the URL specifies a
format that is available in discreet pages.
REQUEST:
dienst/1.0/rep/CORNELLCS:TR91-1254/formatsRESPONSE:
Content-Type:text/plain Content-Length:181249 URL:https://foo.edu/dienst/1.0/rep/CORNELLCS:TR93-1334/body? type=image/gif Content-Type:image/gif X-pages:15
image/gif;dpi=72
indicates a 72 dpi
gif image of the page. NOTE: The standard means of specifying the
desired MIME type of a response in an HTTP request is to use the
ACCEPT
field in the request header. However,
REPDocPageMethod is intended as the method used by a user interface
server to compose HTML documents that consist of a embedded page
images and links to previous and next pages of a document (see
UIDocPageMethod). HTML has no means of explicitely setting the
ACCEPT
field in the IMG
tag and, thus, the
MIME format must be included in the protocol request.
REQUEST:
dienst/1.0/rep/CORNELLCS:TR91-1254/page?page=1&type=image/tiffRESPONSE:
<byte stream for tiff representation of page 1>
REQUEST:
dienst/1.0/rep/CORNELLCS:TR91-1254/bodyRESPONSE:
<byte stream of representation of document body in format specified by ACCEPT>
destination
argument that is included in the request.
destination
is download
the
server returns an application/postscript representation (if available)
of the pages of the document specified in the PrintPagesArguments
included in the request.
destination
is printer
the
server submits a print job to the PrinterName
specified
in the request. The print job will cause the pages of the document
specified in the PrintPagesArguments
to be printed at the printer
specified in PrinterName
. The server
should extract the domain origin of the request from the HTTP header
to determine if the requester is authorized to print to the specified
printer. The server will return to the client a text/plain
document that verifies the print request.
REQUEST:
DEINST/1.0/rep/CORNELLCS:TR91-1254/print? destination=download&pages=some&from=5&to=8RESPONSE:
<application/postscript representation of pages 5-8 of the document>
REQUEST:
dienst/1.0/rep/CORNELLCS:TR91-1254/print? destination=printer&printer=pr1&pages=allRESPONSE:
All pages of CORNELLCS:TR91-1254 have been submitted to printer pr1.
text/html
document that contains in-line,
reduced-size, page images (when available) of the specified document.
The purpose of this is to facilitate browsing of large documents.
The html document should be composed so that a user can select one of
the reduced size images (e.g., using the ISMAP
facility) and view it in a larger, readable format.
Long documents may be divided into several of these "overview" documents.
In this case the request should include a "page" argument and
the text/html
document returned by the server
will include hypertext links to get to the next or previous "overview"
page.
REQUEST:
dienst/1.0/ui/CORNELLCS:TR91-1254/overview?page=1RESPONSE:
For a sample response for this method see the prototype implementation.
text/html
document that
contains an inline image (when available) of the page of the document
that is specified in the request. The text/html
document will include
hypertext links to get to the next or previous page of the document.
REQUEST:
dienst/1.0/ui/CORNELLCS:TR91-1254/page?page=1RESPONSE:
For a sample response for this method see the prototype implementation.
text/html
document that contains a
forms-based interface which allows printing or downloading the
document. This should only be permitted if the document is available
in a format that is suitable for printing - usually this means it is
available in postscript or in a form convertable to postscript. The
server should use information in the HTTP request header to determine
the domain origin of the client request, to determine if printing is
possible or downloading is the only option for this client.
The
HTML form may permit the user to select specific pages of the document to
print or download, if this is possible (i.e., the postscript
representation follows postscript document structuring conventions,
which make it possible to find code for specific pages).
REQUEST:
dienst/1.0/ui/CORNELLCS:TR91-1254/printRESPONSE:
For a sample response for this method see the prototype implementation.
text/html
document that
contains a forms-based interface for submitting a document search.
When the search is submitted, the server will handle actual querying
of one or more index servers and return a text/html
document that
contains links to documents which are the result of the search. At
this time the only current supported SearchType is rfc-1357
. For this
type of search, the suggested search form consists of a set of text
fields that are labeled as rfc-1357 field types (e.g., Title, Author,
Abstract). The user can then enter data into these fields. This data
is can then be
used by the user interface server to submit an rfc-1357 search request
to an index server using the INDSearchMethod. As more SearchType's
are incorporated into the protocol, for example full text or complex
boolean queries, other user interfaces for searching can be developed.
REQUEST:
dienst/1.0/ui/SearchRESPONSE:
For a sample response for this method see the prototype implementation.
text/html
document that
contains information about the specified document. This information
should include the title, author, date, and abstract and links to, or
information about, various formats in which the TR is available.
REQUEST:
dienst/1.0/ui/CORNELLCS:TR91-1254/summaryRESPONSE:
For a sample response for this method see the prototype implementation.
[BRADEN] R. Braden. Requirements for Internet Hosts -- Application and Support. RFC-1123.
[CROCKER] David H. Crocker. Standard for the format of ARPA Internet Messages. RFC-822.
[DIENST] James R. Davis, Carl Lagoze. A protocol and server for a distributed digital technical report library. Cornell University Computer Science Department Technical Report 94-1418, June 1994.
[GLOSS] Luis Gravano, Hector Garcia-Molina, Anthony Tomasic. The Efficiency of GLOSS for the Text Database Discovery Problem. Stanford University Technical Report CS-TN-93-2.
[HTTP] Tim Berners-Lee. Hypertext Transfer Protocol(HTTP). Internet Draft.
[MIME] Nathaniel S. Borenstein, Ned Freed. MIME (Multipurpose Internet Mail Extensions) . RFC-1341.
[RUS] Daniela Rus, Devika Subramanian. Information Retrieval, Information Structure, and Information Agents. Submitted to ACM Transactions on Information Systems.
[S-HTTP] Eric Rescorla, Allan M. Schiffman. The Secure HyperText Transfer Protocol. To appear as an RFC.
[URC] Michael Mealling. Encoding and Use of Uniform Resource Characteristics. Internet Draft.
[URL] Tim Berners-Lee, Uniform Resource Locators (URL). Internet Draft.
[URN] K. Sollins, L. Masinter. Requirements of Uniform Resource Names, March 26, 1994. Internet Draft.
[WWW] Tim Berners-Lee, Robert Cailliau, Jean-Francis Groff, and Berd Pollerman. World-wide web: The information universe. Electronic Networking: Research, Applications and Policy 2(1):52-58, 1992.
James R. Davis Xerox Corporation Design Research Institute Cornell University Ithaca, NY 14853 davis@dri.cornell.edu Carl Lagoze Computer Science Department Cornell University Ithaca, NY 14853 cjl2@cornell.edu