\documentstyle[11pt]{article} \setlength{\topmargin}{-.5in} \addtolength{\textheight}{1.5in} \addtolength{\textwidth}{\evensidemargin} \addtolength{\textwidth}{\oddsidemargin} \setlength{\oddsidemargin}{.25in} \setlength{\evensidemargin}{.25in} \addtolength{\textwidth}{-1.0\oddsidemargin} \addtolength{\textwidth}{-1.0\evensidemargin} \input MANUAL.mac \begin{document} \setlength{\baselineskip}{16pt} \sloppy \title{ The Fifth DIMACS \\ Implementation Challenge: \\ Data Type Definitions and Specifications } \author{C.C.McGeoch} \maketitle \section{Last Modified 4 June 1996} The following lists changes to previous versions of this document: \begin{itemize} \item 4/6. Notice to dictionary implementors/users: Section 4 contains a new discussion of assumptions about the valid lifetime of an item reference. Read the paragraph about {\bf Persistence of item references.} \item 4/6. The description of the {\em lkp} line for the dictionary driver is modified. \item 3/6. Fixed typos. In a few places $pr\_type$ was supposed to be $in\_type$, etc. The priority type for DIMACS tests is {\em double}, not {\em real}. \item 3/6. Section 5 new, correct, information on obtaining files. \item 3/6. Functions {\em infoval}, {\em prioval}, and {\em keyval} each have a new parameter pointing to the data structure. For example, {\bf ky\_type keyval ( it\_type item) } has been changed to {\bf ky\_type keyval ( dict\_ptr D, it\_type item)}. \item 3/6. The function {\em infoval} now has a declaration like {\bf in\_type *infoval (dict\_ptr D, it\_type item)} (with the star added), since {\em in\_type} is assumed to be a string for the tests. \item 3/6. The sample priority queue driver provided by DIMACs has an extra field in the {\em pqh} line. The specs haven't changed --- this is the kind of minor variation from specs that we believe will be inevitable. Implementors are responsible for documenting such changes and making suitable modifications to the test set. \item 3/6. The implementations of priority queue and dictionary test drivers provided by DIMACS cover more operations than those previously listed. \item 3/6. The test drivers provided by DIMACS can handle an empty third field in {\em ins} lines, rather than an explicit ``0'' value, when the item name is not being used. Either way is OK. \end{itemize} \section{Introduction} The Center for Discrete Mathematics and Theoretical Computer Science (DIMACS) invites participation in an Implementation Challenge to compare implementations of three basic data structures: {\em Priority Queues}, {\em Dictionaries}, and {\em Multi-Dimensional Point Sets}. Two kinds of point sets, static and dynamic, are described. This document contains specifications of each abstract data type, as well as format specifications for a set of testbed problems for each data structure to be announced later. Implementors of data structures are expected to develop independent research projects which are presumably more extensive than the testbed problems. \paragraph{Who Can Participate.} We expect that most participants will be professional computer scientists and graduate students with expertise in algorithms, data structures, and implementation. However the Challenge is open to any interested group or individual. We especially invite participation from groups involved in application projects that use one or more of the data structures. \paragraph{The Timeline.} Individuals or groups who wish to participate in the Challenge should send an email message to {\tt challenge5@dimacs.rutgers.edu} to register and get on the mailing list. We request that each participant submit a brief (one-paragraph) Project Proposal sometime before {\bf January 31, 1996} (although proposals will be accepted any time). The steering committee will review each proposal, make suggestions, and notify participants when redundant projects are planned. In {\bf April 1996} we will request progress reports from participants to help in planning the testbed specifications. By {\bf May 1996} the testbed suite will be announced, and implementors will be expected to carry out the specified trials. Ten-page abstracts describing the research results will be submitted for review sometime around {\bf August 1996}; accepted papers will be presented at a DIMACS workshop held in {\bf October 1996} and full papers will subsequently be published by the American Mathematical Society. The next three sections define the abstract data types and describe standard interfaces for comparing and testing implementations of each data structure. Note that although a C language syntax is used here for convenience, the Implementation Challenge does not specify a particular programming language. Participants who implement in C should obey the syntax conventions here. Participants are encouraged to implement in C++ and to follow LEDA specifications which are reproduced in the Appendix. Implementors using other languages should submit specifications to DIMACS early so that they can be made available to others. Particular application studies may require extentions or semantic variations on the basic operation sets given here; please notify the Challenge committee of any such changes so that they may be passed on to other participants. In these C specifications, some common names are used in different data structures. If an application program requires different instantiations of these data structure schemas, a prefix should be added to all required variables, types, and functions to distinguish one instantiation from another. The prefixes should be of the form {\em pq.}, {\em dc.}, {\em ps.}, or {\em ss.} (for static point sets) with additional one-digit identifiers when necessary. For example, an application program that uses a priority queue, a dictionary with integer keys, and a dictionary with character keys would contain variables {\em pq.item}, {\em dc1.item}, and {\em dc2.item}, type declarations for {\em pq.it\_type}, {\em dc1.it\_type}, and {\em dc2.it\_type}, and call to functions {\em pq.insert}, {\em dc1.insert}, and {\em dc2.insert}. \section{Priority Queues} A C language implementation of a Priority Queue requires the following variables and types. \begin{itemize} \item An instance of a priority queue is called {\em Q} and has type {\em pq\_ptr}, which is a C pointer type. The variable {\em Q} contains the address of the priority queue. A priority queue contains items. \item The type of an item in the priority queue is {\em it\_type}. The {\em item} variable is assumed to behave like a pointer or index, and to have some pre-defined {\em NULL\_ITEM} value. However, the underlying type can be defined by the implementor. \item Each item being referenced by {\em item} has two parts. The first part is a priority {\em prio} which is a linearly-ordered type called {\em pr\_type}. The second part is an information element {\em info}, which is of type {\em in\_type}. \end{itemize}. {\bf General assumptions and preconditions.} The operations supported by the Priority Queue are listed below. It is assumed that one call to {\em construct\_pq } occurs before before all other calls, and that the pointer {\em Q} subsequently contains the address of the priority queue. It is assumed that a call to {\em destruct\_pq} occurs after all references to {\em Q} in the program. Also, in all cases where {\em item} is passed as a parameter, it is assumed that {\em item} is defined and refers to an item in $Q$. Note that the C specifications require that the pointer $Q$ be passed by value, not by reference. If a function implementation changes the address of the priority queue data structure, a second level of indirection will be required. That is, $Q$ should point to a pointer containing a modifiable address of the priority queue. \begin{description} \item{\bf pq\_ptr construct\_pq ()}. Create an instance of the priority queue and initialize it to the empty priority queue. Return a pointer to the priority queue. \item{\bf it\_type insert ( pq\_ptr Q, pr\_type prio, in\_type info )}. Insert a new item having the specified priority and information, and return the item reference. \item{\bf it\_type find\_min ( pq\_ptr Q )}. Return an item reference for an item having minimal priority in {\em Q}, or return {\em NULL\_ITEM} if {\em Q} is empty. The item is not removed from {\em Q}. \item{\bf pr\_type del\_min ( pq\_ptr Q )}. Remove an item having minimal priority in {\em Q}, and return the priority of the item. Although ties may be broken arbitrarily, the item that this function chooses to delete must be the same one {\em find\_min} would return. {\em Precondition}: {\em Q} is not the empty queue. \item{\bf void decrease\_p ( pq\_ptr Q, it\_type item, pr\_type prio)}. Make $prio$ the new priority of {\em item}. {\bf Precondition}: Priority $prio$ is not larger than the current priority of {\em item}. \item{\bf pr\_type prioval ( pq\_ptr Q, it\_type item )}. Return the priority of {\em item}. \item{\bf in\_type infoval ( pq\_ptr Q, it\_type item )}. Return the information part of {\em item}. (When {\em in\_type} is a string, use *infoval.) \item{\bf int size ( pq\_ptr Q )}. Return the number of items in {\em Q}. \item{\bf void clear ( pq\_ptr Q )}. Make {\em Q} the empty priority queue by removing all items. \item{\bf pq\_ptr destruct\_pq ( pq\_ptr Q )}. De-allocate all memory associated with {\em Q}, and return the pointer value {\em null}. This function contains an invocation of {\em clear}, which removes all items from the priority queue. If the implementation has dynamic memory elements that remain after the call to {\em clear}, this function de-allocates them; otherwise nothing further happens. \end{description} \subsection{Comparing Priority Queue Implementations} Participants are encouraged to develop application programs that require priority queues, and to evaluate different implementation strategies on these applications. In order to provide a standard interface for exchanging and comparing implementations across different languages, we describe a ``Priority Queue Test Driver'' and an input file format for the driver. Applications programs should be able to generate procedure call traces in the file format. Data structure implementations are called by drivers that read the files and perform the specified invocations. A C language driver will be available at DIMACS; participants using other languages are invited to submit drivers for redistribution. \paragraph{Data types.} For the purposes of comparative testing, item priorities will be non-negative double-precision reals: that is, {\em pr\_type = double}. The info part of an item is assumed to be a string containing (at most) $k$ characters, with particular string lengths to be named later. For example, tests may be requested with {\em in\_type = str128} and with {\em in\_type=str4}. \paragraph{Unique priorities.} The Priority Queue specification allows priorities to be non--unique. However in these comparative tests we wish to avoid situations where a procedure-call trace produced by an application program is incorrect for some implementations because of the way ties are broken. (For example a {\em decrease\_p} may specify an item that has been removed in one implementation but not in another.) Therefore for the tests we require that all items in the priority queue have distinct priorities. In application programs that generate function-call traces, integer priorities can be made distinct by letting the decimal part of a real priority correspond to a time-stamp recording the insertion time of the item. For example, two items both having integer priority $3$ in the real application might be recorded as having priorities $3.114$ and $3.005$ in the trace. \paragraph{Item names.} A call to the {\em decrease\_p} operation requires an item reference as a parameter. For the comparative tests, items may be given {\em index} or {\em name} values which are integers in the range $1 \ldots $ {\em MAXNAMES}. An item name is associated with an item at insert time, and the association remains valid throughout the lifetime of the item. The driver program uses an array to convert index values to item references. Applications programs should generate integer ``names'' for items where necessary. \paragraph{Input File Specifications.} An input files for a Priority Queue test contains a sequence of lines each terminated by an {\em eoln} character. Each line begins with a three-character line code, specifying either a file header, a comment, or a type of procedure call. Other values may appear on the line, as specified below. Example test files will be available online from DIMACS; see Section \ref{access} for more information. \begin{description} \item {\bf pqh} The priority queue header line is the first line in the file. It begins with the three-character code and contains an integer {\em N} describing the maximum index value used in the file. You can set this to 0 if the test doesn't ever use the {\bf dcr} command. (Note: the driver implementation provided by DIMACS also reads an integer {\em M} giving the max number of items in the priority queue. This is not part of the specifications, but is typical of the small changes that might be required by specific implementations. Implementors are asked to clearly document such changes from the specifications. ) \item {\bf com} Comment lines begin with the three-character string {\em com} and then contain at most 100 characters of text. These lines are inserted for human readibility and are ignored by the driver. \item {\bf ins} An insert line causes an invocation of the {\em insert} function. Besides the three-character code this line contains a real priority {\em pri} and an optional field for integer index {\em inx}. A missing index value, or one set to 0, is ignored by the driver, and can be used for an item that never appears in a subsequent {\em decrease\_p} operation. \item {\bf fmn} A find-min line contains only the code {\em fmn}. It causes an invocation of the {\em find\_min} function. \item {\bf dmn} A delete-min line causes an invocation of the {\em del\_min} function. \item {\bf dcr} The decrease-priority line contains the {\em dcr} code, a real priority {\em pri}, and an index {\em inx}. This line causes an access of the index array to get an {\em item} value, followed by a call to {\em decrease\_p} using {\em pri} and {\em item}. \item {\bf prv} The prioval line contains the three character code and an integer item name. The {\em prioval } function is invoked. \item {\bf inv} The infoval line contains the three character code and an integer item name. The {\em infoval } function is invoked. \item {\bf siz} This line causes a call to {\em size}. \item {\bf clr} This line causes a call to {\em clear}. \end{description} \paragraph{The Driver.} A pseudo-code description of the driver program appears below. A C version of this program is available from DIMACS: see Section \ref{access} for information on obtaining the file. Participants using other languages are invited to develop similar drivers and submit them for public distribution. \begin{verbatim} /*---Constants and Variables ---*/ MAXNAMES: constant giving maximum index value in array N: integer giving max index used in input itemindex: array [1..MAXNAMES] of type it_type inx: integer index to itemindex array Q: pointer to the priority queue, of type pq_ptr cmd: three-character command code pri: priority of type real item: item of type it_type dummy: a dummy character string of type strk /*---Code---*/ Q = construct_pq( ); while (not end of file) { read (cmd); case cmd of { "pqh" : { read(N); if (N > MAXNAMES) report array bound error; exit; } "com" : { do nothing; } "ins" : { read(pri, inx); (append inx to end of dummy); item = insert (Q, pri, dummy); if (inx != 0) itemindex[inx] = item; } "fmn" : { item = find_min (Q); } "dmn" : { pri = delete_min (Q);} "dcr" : { read(pri, inx); item = itemindex[inx]; decrease_p (Q, pri, item); } "prv" : { read(inx); prioval(Q, itemindex[inx]);} "inv" : { read (inx); infoval (Q, itemindex[inx]); } "siz" : { size(Q); } "clr" : { clear(Q); } } /* end case; */ read input to next eoln; }/*end while*/ Q = destruct_pq (Q); \end{verbatim} The current DIMACS program prints a trace of procedure calls and their results. A later version of the driver will record counts and runtime statistics for key operations; participants will be asked to insert appropriate counting instructions in their data structure implementations. \section{Dictionaries} C languages implementations of Dictionaries require the following variables and types. \begin{itemize} \item The variable {\em D} contains the address of a dictionary. The type of {\em D} is {\em dict\_ptr}, which is a C pointer type. A dictionary contains items. \item The variable {\em item} is of type {\em it\_type}. The {\em item} variable behaves like a pointer or index, and is assumed to have a pre-defined {\em NULL\_ITEM} value. However the choice of base type is left to the implementor. \item Each item contains two parts. The first is a {\em key} of some linearly-ordered type {\em ky\_type}. The second is an information part {\em info} of type {\em in\_type}. \end{itemize} \paragraph{General assumptions and preconditions.} It is assumed that one call to {\em construct\_dict} occurs before all other calls, and that {\em D} subsequently contains the address of the dictionary. It is assumed that one call to {\em destruct\_dict} occurs after all references to the dictionary in a program. It is assumed that for each distinct {\em key} value there is at most one item in the dictionary. Also, in all cases where {\em item} is passed as a parameter, it is assumed that {\em item} is defined and refers to an item in the dictionary. Note that in the C specifications given here, the variable {\em D} is passed by value, not by reference. If a function is implemented that changes the address of the dictionary, then a second level of indirection is required: that is, {\em D} should point to a modifiable pointer containing the address of the dictionary. \paragraph{Persistence of item references.} Both the {\em insert} and {\em lookup} operations return item references; the {\em delete\_item} function deletes a dictionary element according to its item reference value. How long is an item reference assumed to be valid? We recognize two reasonable strategies for implementors and users of dictionaries. \begin{enumerate} \item A dictionary implementation may guarantee that item references are {\em persistent}: that is, the value returned by the {\em insert} function at insert-time remains correct throughout the lifetime of the item. A dictionary implementor who wishes to guarantee persistence must avoid copying any item from one ``location'' to another during insertion, deletion, and lookup operations. For example, a function {\em swap (itema, itemb)} could not use a standard three-way-copy because previously-assigned references to {\em itema} and {\em itemb} would become invalid. \item A dictionary implementation may not support persistent item references; that is, the implementation is assumed to be correct only if every {\em delete\_item} is immediately preceeded by a {\em lookup} of that item. \end{enumerate} We request that dictionary implementors and test-data produces {\em clearly state} the assumptions their projects require. Dictionary implementors should mention, in comments, whether item references are guaranteed persistent; and test-data producers should mention whether {\em delete\_items} are always preceeded by {\em lookups}. \begin{description} \item {\bf dict\_ptr construct\_dict ( )}. Create a dictionary and initialize it to the empty dictionary. Return the address of the dictionary. \item {\bf it\_type insert ( dict\_ptr D, ky\_type key, in\_type info )}. Insert the item with {\em key} and {\em info} values in the dictionary. If there is already an item having identical key value in the dictionary, then it is overwritten with this new item. \item {\bf it\_type lookup ( dict\_ptr D, ky\_type key )}. Return the item having the given key value, or return {\em NULL\_ITEM} if the item is not in the dictionary. \item {\bf void delete ( dict\_ptr D, ky\_type key )}. Delete the item having the given key value. If the item is not in the dictionary, then nothing happens. \item {\bf void del\_item ( dict\_ptr D, it\_type item )}. Delete the specified item from the dictionary. This function can be used right after {\em lookup} to avoid a redundant search. {\em Precondition:} The item must be in the dictionary. \item {\bf ky\_type keyval (dict\_ptr D, it\_type item )}. Return the key value of the item. (When {\em ky\_type} is a string, use *keyval). \item {\bf in\_type infoval (dict\_ptr D, it\_type item )}. Return the info value of the item. (When {\em in\_type} is a string, use *infoval.) \item {\bf int size ( dict\_ptr D )}. Return the number of items in the dictionary. \item {\bf void clear ( dict\_ptr D )}. Make {\em D} the empty dictionary by removing all items. \item{\bf dict\_ptr destruct\_dict ( dict\_ptr D ) }. De-allocate all memory associated with {\em D}, and return the pointer value {\em null}. This function contains an invocation of {\em clear}, which removes all items from the dictionary. If the implementation has dynamic memory elements that remain after the call to {\em clear}, this function de-allocates them; otherwise nothing further happens. \end{description} \subsection{Comparing Dictionary Implementations} Participants are encouraged to develop application programs that use dictionaries, and to evaluate different implementation strategies for the applications. In order to provide a standard interface for comparing implementations in different languages, we describe a ``Dictionary Test Driver'' and an input file format for the driver. Applications programs should be able to generate function call traces in the format given here. Data structure implementations can be called by drivers that read the files and perform the specified sequence of procedure calls. \paragraph{Data types.} For the purposes of comparative testing, two kinds of key types will be used. In some tests, item keys are assumed to be non-negative integers; that is, {\em ky\_type = int}. In other tests, an item key is assumed to be a string containing $k$ characters: for example {\em ky\_type = str128}. Specific values of $k$ will be named later so that participants can study the effect of key size on performance (for example participants may be asked to run separate tests with $str4$ and $str128$.) In all tests, the info type {\em in\_type} is assumed to be a character string of length $k$, with specific values of $k$ to be named later. \paragraph{Item names.} The {\em del\_item} operation requires an item reference as a parameter. For the comparative tests, items are given {\em index} values which are integers in the range $1 \ldots$ {\em MAXNAMES}. The driver program uses an array to convert the index values to particular item references. \paragraph{Input File Specifications} Input files for Dictionary tests contain a sequence of lines, each terminated with an {\em eoln} characer. Each line begins with a three-character line code, and may contain other values depending on the code. The line codes have the following meanings. \begin{description} \item {\bf dch} The dictionary header line is the first line in the file. Besides the three-character code, the line contains a integer {\em N} giving the maximum index value used in the file. \item {\bf com} Comment lines begin with the three-character code and then contain text. These lines are inserted for human readability and are ignored by the driver. \item {\bf ins} An insert line causes an invocation of the {\em insert} function. Besides the three-character code, this line contains a non-negative integer key {\em key}, and an optional index value {\em inx}. An missing index value, or one set to 0, is ignored by the driver and can be used for items that never appear in subsequent {\em del\_item} operations. \item {\bf lkp} A lookup line causes an invocation of the {\em lookup} function. This line contains an integer {\em key} and and optional index integer {\em inx}. If the index is missing or set to 0, then the item reference returned by lookup is ignored by the driver (fine for persistent dictionaries). If a nonzero index is present, the returned item reference is stored by the driver (suitable for non-persistent dictionaries). \item {\bf dlk} A delete line causes an invocation of the {\em delete} function. This line contains an integer {\em key} value. \item {\bf dli} A delete-item line causes an invocation of the {\em del\_item} function, which takes an item reference rather than a key as a parameter. This line contains an index value {\bf inx} that names an item already inserted in the dictionary. \item {\bf kyv} This line contains an index value and causes an invocation of {\em keyval}. \item {\bf inv} This line contains an index value and causes an invocation of {\em infoval}. \item {\bf siz} This causes a call to the {\em size } function. \item {\bf clr} This causes a call to {\em clear}. \end{description} \paragraph{The Driver.} A pseudo-code description of the Dictionary Driver program appears below. A C version of this program is available DIMACS: see Section \ref{access} for information on obtaining the file. Participants using other languages are invited to develop drivers that can read input files and generate procedure calls as specified here. DIMACS will make such drivers available as they arrive. \begin{verbatim} /*---Constants and Variables---*/ MAXNAMES: constant giving maximum index in array N: integer giving maximum index in input itemindex: array [1..MAXNAMES] of type it_type inx: integer index to itemindex array D: pointer to the dictionary, of type dict_ptr cmd: three-character command code key: integer key item: item of type it_type dummy: a dummy character string of type strk /*---Code---*/ Q = construct_dict ( ); while (not end of file) { read (cmd); case cmd of { "dch" : { read(N); if (N > MAXNAMES) report array bound error; exit; } "com" : { do nothing;} "ins" : { read(key, inx); item = insert (D, key, dummy); if (inx != 0) itemindex[inx] = item; } "lkp" : { read ( key , inx); item = lookup (D, key); if (inx != 0) itemindex[inx] = item; } "dlk" : { read (key); delete (D, key);} "dli" : { read (inx); item = itemindex[inx]; del_item (D, item); } "kyv" : { read (inx); keyval(D, inx); } "inv" : { read (inx); infoval(D, ins); } "siz" : { size (D); } "clr" : { clear(D); } } /* end case; */ read input to next eoln; }/*while*/ Q = destruct_dict (Q); \end{verbatim} A later version of the driver will will record counts and runtime statistics for key operations. Participants will be asked to insert appropriate counting instructions in their data structure implementations. \section{Multi-Dimensional Point Sets} This section contains a description of the abstract data type Three-Dimensional Point Set, followed by a Static Point Set variation, which does not support dynamic insertion of points. Participants are invited to develop implementations for either or both variations. Participants who wish to implement higher-dimensional point sets may do so by modifying the definition of {\em pt\_type}. Specifications of other variables and functions need not be modified for higher dimensions, although of course the code will change. Participants may develop implementations either for fixed dimensions or for dimensionalities specified by a compile-time parameter {\em DIM}. In applications where different point sets of different dimensionalities are used, prefixes of the form {\em ps2.}, {\em ps4.} (the digit denoting the dimensionality) should be attached to all variables, types, and functions associated with the different data structures. Similarly, the prefix {\em ss3.} denotes a static point set of dimension three. \subsection{Dynamic Point Sets} The following variables and types are used in Dynamic Three-Dimensional Point Sets implemented in C. \begin{itemize} \item The variable {\em S} contains the address of the point set. The type of {\em S} is {\em ps\_ptr}, which is a C pointer type. A point set contains items. \item The variable {\em item} is of type {\em it\_type}. The variable {\em item} behaves like a pointer or index, and has some predefined {\em NULL\_ITEM} value. However, the choice of base type is left to the implementor. \item Each item contains two parts. The first part is a three-dimensional point {\em point} of type {\em pt\_type}, which is an array of 3 integers indexed $0,1,2$. The second part is an information unit {\em info} which is of type {\em in\_type}. Points are unique: that is, for each distinct point value there is at most one item in the set $S$. \end{itemize} \paragraph{General assumptions and preconditions.} It is assumed that a call to {\em construct\_ps} occurs before all other calls, and that {\em S} subsequently contains the address of the point set. It is assumed that a call to {\em destruct\_ps} occurs after all operations using the point set. In cases where {\em item} is passed as a parameter, it is assumed that {\em item} is defined and refers to an item in {\em S}. Note that in the C specifications below, the pointer {\em S} is passed by value, not by reference. In implementations where a function changes the address of the point set, a second level of indirection is required: that is, {\em S} should point to a pointer containing a modifiable address of the point set. Point sets support the following operations. \begin{description} \item {\bf ps\_ptr construct\_ps ( )}. Create an empty point set. Return the address of the point set. \item {\bf it\_type insert ( ps\_ptr S, pt\_type point, in\_type info )}. Insert the point {\em point} and its associated {\em info} in the point set. If there is already an item having this value of {\em point}, the new information over-writes the old. \item {\bf ps\_item lookup ( ps\_ptr S, pt\_type point )}. Return the item in the set having the given point value, or return {\em NULL\_ITEM} if no such item exists. \item {\bf void delete ( ps\_ptr S, pt\_type point )}. Delete the item having the given point value. If the item is not in the set, nothing happens. \item {\bf void del\_item ( ps\_ptr S, it\_type item )}. Delete the specified item from the set. This function can be used after a {\em lookup} to avoid extra searches. {\bf Precondition}: The item must exist in the set. \item {\bf it\_type nearest ( ps\_ptr S, pt\_type point )}. Return the item that is nearest to the specified point, breaking ties arbitrarily. A double-precision Euclidean distance computation should be used. If the point set is empty, return {\em NULL\_ITEM}. Note that the specified point need not be an element of the set. \item {\bf ps\_ptr range\_search ( ps\_ptr S, pt\_type lo, pt\_type hi )}. The point {\em lo} specifies a lower bound in each dimension. The point {\em hi} specifies an upper bound value in each dimension. This function calls {\em construct\_ps} to create a new point set {\em R}, and then inserts into {\em R} all items in {\em S} having point values within the given bounds (including the bound values) in all dimensions. If no such points exist then {\em R} remains the empty set. The function returns the address of {\em R}. \item {\bf pt\_type pointval ( it\_type item ).} Return the point value of the item. \item {\bf in\_type infoval ( it\_type item ).} Return the info value of the item. \item {\bf int size ( ps\_ptr S )}. Return the number of points in the point set. \item {\bf void clear ( ps\_ptr S )}. Make {\em S} the empty set by removing all items. \item {\bf ps\_ptr destruct\_ps ( ps\_ptr S )}. De-allocate all memory associated with {\em S}, and return the pointer value {\em null}. This procedure contains an invocation of {\em clear}, which removes all items from the point set. If the implementation has dynamic memory elements that remain after the call to {\em clear}, this function de-allocates them; otherwise nothing further happens. \end{description} \subsection{Static Point Sets} The specification of the {\em Static Three-Dimensional Point Set} data type is identical to that above, except for the following changes. \begin{itemize} \item A {\em pointlist} is an array indexed from 0 to L-1, for some maximum list size $L$. The array type is {\em pl\_type}. Each element of the array is a record containing a a {\em point} of type {\em pt\_type} and an {\em info} of type {\em in\_type}. \item The {\bf ps\_ptr construct\_ss ( pl\_type pointlist, int n )} operation creates a point set containing pointlist elements indexed by 0 through $n-1$. The operation returns the address of the point set. This operation replaces the {\em construct\_ps} operation defined earlier. \item The {\bf insert} operation defined in the previous section is not supported by static point sets. \item The {\bf delete} and {\bf del\_item} operations are optional for static point sets. Participants may choose whether to support these operations. \item Otherwise, the static point set data type should support all operations given in the previous section. \end{itemize} \subsection{Comparing Point Set Implementations} Participants are encouraged to develop application programs that use multi-dimensional point sets and to evaluate different implementation strategies for the applications. In order to provide a standard interface for comparing implementations in different languages, we describe a ``Point Set Test Driver'' and an input file format for the driver. Applications programs should be able to generate procedure call traces in the format given here. Data structure implementations are called by drivers that read the files and perform the specified sequence of procedure calls. For the purposes of comparative testing, the info part of an item is assumed to be a string containing $k$ characters: that is {\em in\_type = strk}. Specific values of $k$ will be named later so that participants can study the effect of item record size on performance (for example participants may be asked to run separate tests with $str8$ and $str128$.) The {\em del\_item} operation requires an item reference as a parameter. For the comparative tests, items may be given {\em index} values which are integers in the range $1 \ldots$ {\em MAXNAMES}. This corresponds to the {\em name} of the point, not to its location in D-space. The driver program uses an array to convert the index values to particular item references. \paragraph{Input File Specifications} Input files for Point Set tests contain a sequence of lines, each terminated with an {\em eoln} character. Each line begins with a three-character line code, and may contain other values depending on the code. The line codes have the following meanings. \begin{description} \item {\bf psh} The point set header line is the first line in the file. Besides the three-character code, the line contains a value {\em N} giving the maximum index used in the file, and a value {\em D} giving the dimensionality of the point set. The driver is intended to be used for arbitrary dimension, specified by a compile-time parameter {\em DIM}. The input value {\em D} is compared to {\em DIM}. When $D < DIM$, the driver will pad with coordinate values of zero to produce higher-dimensional points. When $D > DIM$ only the first {\em DIM} coordinate values in the input will be used. \item {\bf com} Comment lines begin with the three-character code {\bf com} and then contain text. These lines are inserted for human readibility and are ignored by the driver. \item {\bf ins} An insert line causes an invocation of the {\em insert} function (an alternative meaning is described below for static point sets). Besides the three-character code, this line contains {\em D} integers corresponding to coordinates of the point, followed by an integer index {\em inx}. An index value of 0 is ignored by the driver and can be used for items that never appear in a subsequent {\em del\_item} operation. \item {\bf lkp} A lookup line causes an invocation of the {\em lookup} function. Besides the three-character code, this line contains {\em D} integer coordinate values. \item {\bf dlp} A delete-point line causes an invocation of the {\em delete} function. This line contains {\em D} integer coordinate values. \item {\bf dli} A delete-item line causes an invocation of the {\em del\_item} function, which takes an item reference rather than a point as a parameter. This line contains an integer index {\em inx} that names an item already inserted in the point set. \item {\bf nbr} A nearest-neighbor line causes an invocation of the {\em nearest} function. This line contains {\em D} integers specifying a point. \item {\bf rng} A range-search line causes an invocation of the {\em range\_search} function. This line contains {\em D} integers corresponding to the {\em lo} point, followed by {\em D} integers corresponding to the {\em hi} point. \end{description} \paragraph{The Driver.} A pseudo-code description of the Point Set driver program appears below. A C version of this program will soon be available from DIMACS: see Section \ref{access} for information on obtaining the file. Participants using other languages are invited to develop drivers that can read input files and generate procedure calls as specified here. DIMACS will make such drivers available as they arrive. \begin{verbatim} /*---Constants and Variables---*/ MAXNAMES: constant giving range of index values DIM: constant giving the dimensionality of the functions. N: input integer giving the max index value in the input D: input integer giving dimensionality of the input S: pointer to the point set, of type ps_ptr itemindex: array [0..MAXNAMES] of type it_type inx: integer index to itemindex array cmd: three-character command string point: a point of type pt_type, which is an array (0..DIM-1) of integers. item: an item of type it_type dummy: a dummy character string of type strk, containing k characters /*---Code---*/ pt_type function getpoint (int D, int DIM ) { Reads all D points, and pads or truncates to produce DIM points. } /*end getpoint*/ Q = construct_ps ( ); while ( not end of file ) { read ( cmd ); case cmd of { "psh" : { read( N, D ); if (N > MAXNAMES) report array bound error; exit; } "com" : { do nothing;} "ins" : { point = getpoint(D, DIM); item = insert (S, point, dummy); read (inx); if (inx != 0) itemindex[inx] = item; } "lkp" : { point = getpoint(D, DIM); item = lookup (S, point);} "dlk" : { point = getpoint(D, DIM); delete (S, point);} "dli" : { read (inx); item = itemindex[inx]; del_item (S, item); } "nbr" : { point = getpoint(D, DIM); item = nearest (S, point); } "rng" : { lo = getpoint(D, DIM); hi = getpoint(D, DIM); R = range_search (S, lo, hi); R = destruct_ps (R); } } /* end case; */ read input to next eoln; }/*while*/ S = destruct_ps ( S ); \end{verbatim} A later version of the driver will contain a data structure to record counts of key operations performed during function invocations. The driver will contain code to initialize, accumulate, and report these counts; participants will be asked to insert appropriate counting instructions in their data structure implementations. \paragraph{Test of Static Point Set Implementations.} The input file specifications for Static Point Sets are identical to those given above. The driver program is modified as follows \begin{itemize} \item The driver assumes that all {\bf ins} lines in the input file come immediately after the {\bf psh} line and before any other lines (ignoring comments). The driver first reads the {\bf ins} lines and constructs a {\em pointlist} and an {\em itemindex}. When all the {\em ins} lines are read, the driver invokes {\em construct\_ss} to initialize the point set. \item A compile-time switch determines whether the driver ignores the {\bf dlk} and {\bf dli} delete lines. \end{itemize} \section{Obtaining Files from DIMACS} \label{access} The web page {\bf http://cs.amherst.edu/\~ccm/challenge5/index.html} contains pointers to documents and software relating to the Fifth DIMACS Challenge. Files are available through the web page or by anonymous ftp from {\bf cs.amherst.edu}, in directory {\bf pub/dimacs}. For questions, register to participate, or to get on the mailing list, send email to {\bf challenge5@dimacs.rutgers.edu}. \newpage{} \section*{Appendix: LEDA Specifications} \label{leda} This Appendix reproduces excerpts from the {\em The LEDA User Manual, Variation R 3.3. beta}, by Stefan N\"aher and Christian Uhrig. The sections here give specifications for Priority Queues, Dictionaries, and Two-Dimensional Point Sets. (Because the excerpts do not modify the source text, section numbers do not match the original, and some references in the original are unresolved here.) Full LEDA documentation is available by anonymous ftp from {\bf ftp.mpi-sb.mpg.de}, in directory {\bf pub/LEDA/articles}. For more information check the WWW page {\bf http://www.mpi-sb.mpg.de/guide/staff/uhrig/leda.html} or the following reference: K. Mehlhorn and S. N\"ather. ``LEDA: A platform for combinatorial and geometric computing.'' {\em Communications of the ACM}, (38)1, January 1995. %------include file extract/p_queue.tex \manpage {p\_queue} P,I {Priority Queues} \definition An instance $Q$ of the parameterized data type $p\_queue\
$ is a collection of items (type $pq\_item$). Every item contains a priority from a linearly ordered type $P$ and an information from an arbitrary type $I$. $P$ is called the priority type of $Q$ and $I$ is called the information type of $Q$. The number of items in $Q$ is called the size of $Q$. If $Q$ has size zero it is called the empty priority queue. We use $\
$ to denote a $pq\_item$\ with priority $p$ and information $i$. \creation Q \create $p\_queue\
$ {$ $} { creates an instance $Q$ of type $p\_queue\
$ and initializes it with the
empty priority queue. }
\operations 2 5
\op $P$ {prio$(pq\_item\ it)$}
{ returns the priority of item $it$.\\
\precond $it$ is an item in $Q$. }
\op $I$ {inf$(pq\_item\ it)$}
{ returns the information of item $it$.\\
\precond $it$ is an item in $Q$. }
\op $pq\_item$ {insert$( P\ x, \ I\ i)$}
{ adds a new item $\ $ is a priority queue implemented by data type $impl$.
$impl$ must be one of the priority queue implementations listed in
section \ref{Implementations Priority Queues} or a user defined data structure
fulfilling the specification given in section \ref{User Implementations
Priority Queues}. Note that the priority type $P$ must linearly ordered.
%-------include file extract/dictionary.tex
\manpage {dictionary} K,I {Dictionaries}
\definition
An instance $D$ of the parameterized data type $dictionary\ $ to denote the
item with point $p$, and information $i$. For each point $p$ there is at most
one item $\ \in S$. Beside the normal dictionary operations, the data
type $point\_set$ provides operations for rectangular range queries and
nearest neighbor queries.
\creation S
\create $point\_set\$ {$ $}
{ creates an instance $S$ of type $point\_set\$ and initializes $S$ to
the empty set. }
\operations 2.5 5.2
\op $point$ {key$(ps\_item\ it)$}
{ returns the point of item $it$.\\
\precond $it$ is an item in $S$. }
\op $I$ {inf$(ps\_item\ it)$}
{ returns the information of item $it$.\\
\precond $it$ is an item in $S$. }
\op $ps\_item$ {insert$(point\ p, \ I\ i)$}
{ associates the information $i$ with point $p$.
If there is an item $\ $ in $S$ then $j$
is replaced by $i$, else a new item $\ $
is added to $S$. In both cases the item is
returned. }
\op $ps\_item$ {lookup$(point\ p)$}
{ returns the item with point $p$ (nil if no
such item exists in S). }
\op $ps\_item$ {nearest\_neighbor$(point\ q)$}
{ returns the item $\ \ \in\ S$ such that
the distance between $p$ and $q$ is minimal. }
\opl $list\ \ \in\ S$ with\\
$x_0 \le p$.xcoord() $\le x_1$ and\\
$y_0 \le p$.ycoord() $\le y_1$. }
\op $list\