Alo is an abbreviation for "Austrian literature online". Alo describes a group of Austiran University Organizations dedicated to digital digitization and digital preservation. A goal of Alo is to develop a generic solution to store any kind of literature or other cultural media digitally and present it online. The focus is set on digitized books and periodicals, but also postcards and library record cards will be integrated. Because of the generic architecture, other kinds of documents can easily be added. The tool developed to achieve this is called ALOx.
· De-centralized metadata and document upload: digital documents of any kind can be uploaded from any computer connected to the internet on a centralized server. ALOx follows the philosophy "central storage - distributed generation of content".
· Applied standards: The digital documents are stored following METS (“Metadata Encoding and Transmission Standard”, defined by “The Library of Congress”. For further information, see: http://www.loc.gov/standards/mets). METS is an XML standard. Future releases will use Sun Microsystem’s Enterprise Java Beans will be used to take another step towards standardization (EJB’s represent a standard component-model for business applications).
· Searching, browsing, hierarchical structure: documents in an ALOx system are logically structured with collections. This offers the user the possibility of browsing through the documents very easily. Search functionality is also implemented.
· Advantages compared to "usual" libraries: The availability of literature is not limited to any locations or hours of business. Additionally, optical character recognition (OCR) enables blind people to read these documents. Also, a full text search on documents can offer more comfort to the library users. Even the online-availability of literature on handhelds could be implemented with less effort.
To take a look at a working ALOx system, just follow the link www.literature.at. The core classes behind this site are included in this release.
ALOx is split up into 5 main components:
Additionally, several converters are available, like Tiff2Gif, Tiff2Jpeg and Pdf2Html. These are automatically activated when adding the parameter “convert” with the type to convert to, to the filestore request, like e.g.
http://alo.uibk.ac.at:8180/filestore/servlet/GetFile?id=XX&convert=image/gif.
The corresponding converter is called as specified in the web.xml-file of the filestore-package. Other converters can easily be added: Just implement the interface org.alo.filestore.converter.Converter and add a context-param in web.xml like this:
<context-param>
<param-name>image/tiff;image/png</param-name>
<param-value>Tiff2PngConverter</param-value>
</context-param>
(The param-name is "source mimetype;destination mimetype", and the param-value the name of the converter).
To define the location of the filestore, change the context-parameter in web.xml to the desired URL, e.g.:
<context-param>
<param-name>get_url</param-name>
<param-value>http://localhost:8080/filestore/servlet/GetFile?id=</param-value>
</context-param>
The path of the files will be written to the database (in table flocat) when they are uploaded, and read when a file with the specified URL is requested.
Most converters are written in Java using sun’s JAI (Java Advanced Imaging) class-library. When developing all these transformations, we kept an eye on coding a solution, where no X-Server has to be started on Linux-machines when running the filestore.
It should be mentioned, that it is not necessary to run the filestore on the same server than the rest of the system (which means the mets-server and the webinterface).
Additional features
Also a package specific for newspapers is included in the filestore-package, but it’s still under development. It has two main functionalities:
· Colour parts of a picture (e.g. a headline of an article on a newspaper page). The parts to be coloured are specified by polygons.
· Cut out one part of a picture (e.g. cut out an article of a newspaper page). This is implemented via clipping of each text block, picture, headline, subtitle,… of an article. A picture, which only contains the desired part, is returned.
Also a backup of the mets-file, which was written to database is copied to the filestore to avoid loss of data (e.g. when database content gets lost, the tables can be created new out of these files).
The mets-object-package implements writing of mets-objects to the database (which means mapping the hierarchical structure of the xml file to a relational one), and reading from database to a mets-object vice versa. The actual version used is mets 1.1 (see http://www.loc.gov/standards/mets for the actual mets-schema, documentation and some example documents). These classes are used by the mets-server- and the webinterface-package.
Each element of the mets-structure is implemented by one class. Each class contains reading from and writing to the database (which happens of course hierarchically, which means that writing is delegated to the sub-elements). The mets-id is stored for every mets-element to ease deleting an object from database (which means to delete all parts of one element in almost one step). The tables are connected via ids to map the hierarchical structure of an XML-document to database.
Most of the Elements of the mets-files are stored in a relational structure. There is a possibility to store mets-elements as a blob, which means pure XML (as text in this case, implemented by class MdDefault) and as name/value touples (e.g. for Dublin Core entries (for further information see: http://dublincore.org)).
The toNode()- method returns a mets-DOM-Element. Such an Object can be generated with a database connection and an unique object id (where the information is then retrieved from the database) as parameters, or via copy-constructor. The mets-versions 1.0zeta and 1.1 are available in this distribution.
JUnit was used to test if a mets-document contains the same content before writing to and after reading from the database. That’s why each class contains an equals-method, which just compares the attributes (which can also be mets-elements) of the class.
The mets-server-package is the core-element of the ALOx-system. In the metsserver/bin-directory, there are batch-files for both, UNIX and Windows-machines, to start the mets-server. Just start the mets-server by typing metsserver start, and stop by typing metsserver stop (Make sure your JAVA_HOME-environment variable is set properly). This application server has to be started to upload content and to view the content in the web-browser.
Another batch-file, which offers more functionality, is console. The following tasks can be performed with this program:
· import object
· delete object
· rebuild search info
· rebuild previews
· check / rebuild previews
The following types of classes are integrated in the mets-server-package:
· Collections (base class: MetsCollection): here, methods are provided, which return a List of objects and collections which are in a certain collection, or add and remove objects or collections, which are specified by their mets-id (and by the database, where the items are stored). Each implemented collection must have a collection class, which inherits class MetsCollection)
· Database (interface: DataBase): here the access methods of the database, like get or delete a mets-object (determined by the unique mets-id), are defined and implemented. For each database used, an implementation of this interface has to be generated. In this distribution, only one implementation is integrated, but it is also possible to use different databases for different types of mets-objects in one system.
A class IdDatabase is used to generate an unique id for each mets-document. The IdDatabase-class makes sure that all ids for mets-objects are unique. For this, a database "id" with table idtable is used, which stores the highest id given for a mets-object in the system.
· Preview: (base class: AbstractPreview): for every type of document which is added, an entry in the preview-database is made, where information, which is necessary to display a preview of a document (such as path to the thumbnail, an image which stands for the type of document, name, creator,…), is stored. The preview of an object is also uniquely identified by the mets-id. Each preview-class has to be registered in the configuration-file conf/MetsServer.properties and each class has to be combined with the document-type (e.g. ALO-BOOK_V01). Attention! It is not possible to combine more object types with one and the same preview.
· Search: these classes are necessary to enable searching (simple search and advanced search) in the documents. The searchinfo-classes write metadata information to the search-database when a new document is added. Note that for each database which is used an implementation of interface SearchInfo must be created. Each searchinfo-class has to be registered in the configuration-file conf/MetsServer.properties and each class has to be combined with the document-type (e.g. METAe_Monograph).
The following types are available in this distribution:
· Collections
· Objects
Additional features
A caching functionality is implemented to enhance the performance of the ALOx System. It just generates an XML-file with content generated by the Servlets and stores it in a specified directory. When same data (regarding object-id, language, order, style,…) is requested, the content of this cached file will be taken and not be generated new. But if the requested information has changed since the generation of the cached file (e.g. when new documents are uploaded into the requested collection), data will be generated new, and collecting data from the database becomes necessary. Caching can be turned on and off by setting the following context parameter in the webinterface’s web.xml to true or false:
<context-param>
<param-name>do_cache</param-name>
<param-value>false</param-value>
</context-param>
Caching has to be integrated into the filters of the collections (caching is only implemented for collection-types, but this can easily be changed for e.g. frequently requested documents or object types). For further implementation details, take a look at the following file: org.alo.webinterface.filters.CollectionV01Filter.java
Webinterface
The webinterface is the front-end of the whole ALOx system, which is based on the following technologies:
· Servlet Chains: for each mets-object-type, a Filter is generated, which is called, when an appropriate mets-object is requested for displaying. Then, information about the object to be displayed is created dynamically as XML-structure. Also additional information, like number of entries in a collection,…, is added. It is also specified here, which XML and which XSL file should be used for generating the HTML-reply. Additionally, a few parameters can be sent to the Servlet to specify the result (e.g. size, page number, article number, view type,…)
· XSL/XML: to generate the desired result, stylesheet transformation is used with the dynamically created xml-content. For every possible view (which mostly depends on the type of the requested mets-object), an XSL-file exists (which can be found in directory web/xsl/alo). Also XML-files exist for both English and German (which can be found in the directories web/xml/alo/de and web/xml/alo/en). Other languages can easily be added. For general display settings, a cascading stylesheet is written (web/css/alo/alo-devel.css).All images used can be found in directory web/images.
Rpcrouter
The rpcRouter-package is used to perform remote procedure calls which are delegated to the mets-server. With this technology, one can call the mets-server methods (like adding or deleting of objects) indirectly over the web. How a request has to look like is stored in the Schema /schemas/Request.xsd. The Schema of the reply is stored in the file /schemas/Reply.xsd. Even information about the mets-objects stored in the database of the mets-server can be retrieved with this technology, which is something like a web service.
Don’t forget to define the localization of the mets-server-service in web.xml of the rpcRouter-package:
<context-param>
<param-name>service_name</param-name>
<param-value>//138.47.230.212/MetsServer</param-value>
</context-param>
The other classes implemented in the rpcRouter-package (those who extend the class Value) are just wrappers, which store some kind of information (like e.g. a Boolean value).
Utils
One Additional package (org.alo.utils.*) contains some additional functionality concerning for example XML-processing or some helper functions for database-access.
Also a client is included in this distribution, which enables the upload of data and documents on the server from any (windows) computer connected to the internet. Because all documents, which are uploaded, will be published for free, no data security issues need to be taken into concern. This client is written in Delphi, so the usage is limited to Windows-Systems. The source code of the client is also accessible via CVS. This application is self-explaining, so further description is obsolete. When trying to connect to the database and to the application server, two urls have to be entered:
· Server URL (e.g. http://www.literature.at:8080/rpcrouter/servlet/RpcRouter)
· Filestore URL (http://www.literature.at:8080/filestore/servlet/PutFile)
It is recommended to not change the start collection settings.
To install the ALOx server and all the components which are necessary to run the ALOx System as expected, the following steps are necessary:
· Install Tomcat. ALOx has been tested with Tomcat Version 4.0.6 and 4.1.
· Install mySQL. Oracle support is in progress.
· Copy the following webarchives (.war-files) to the webapps-directory of Tomcat:
o webinterface.war
o rpcRouter.war
o filestore.war
· Configure the mets-server (all configuration files can be found in the mets-server package in directory /conf
o set jdbc-driver, dburl, user, password in the databasename.properties (here: alo1.properties and alo2.properties) file. Same for IdDatabase, PreviewDatabase, SearchDatabase.
o Configure databases, previews, searchinfos and collection types in file /conf/metsserver.properties (it is recommended to leave the settings as they are for the first time).
· Create all necessary databases and tables: create all necessary tables and databases with the SQL-create statements, which can be found in the directory metsserver/src/database. the version given in the filenames is the version of the mets-structure (for more information, see: www.loc.gov/standards/METS). The installation of the databases here is just an example. The names and mets-version, which should be used, can be defined in the file metsserver/conf/metsserver.conf. Here links to other configuration files scan be made, where, as already mentioned, more detailed information about the used databases can be defined (e.g. database passwords,...) When changing a configuration file, the mets-server has to be restarted, so that the changes can take affect.
The ALOx System expects a root collection entry in the database which stores the collections:
INSERT INTO dc VALUES (1,'title',1,'Root Collection',1);
INSERT INTO dc VALUES (2,'description',1,'This is the root collection of the alo
system.',1);
INSERT INTO dc VALUES (3,'publisher',1,'alo partners',1);
INSERT INTO dc VALUES (4,'date',1,'2002-01-17 01:22:46',1);
INSERT INTO dc VALUES (5,'language',1,'GE',1);
INSERT INTO div VALUES (1,0,1,NULL,1,1,'','','divRoot','COLLECTION_V01');
INSERT INTO div_dmd VALUES (1,1,1,1);
INSERT INTO dmd VALUES (1,'rootDMD',1,'dmdSec',1);
INSERT INTO filegrp VALUES (1,1,'',1,NULL,'0000-00-00 00:00:00');
INSERT INTO mdwrap VALUES (1,'','','DC','','',1);
INSERT INTO mets VALUES (1,'ROOTCOLLECTION','','root
collection','COLLECTION_V01','2002-01-17 00:00:00','2002-01-17 00:00:00','');
INSERT INTO structmap VALUES(1,1,1,NULL,NULL,NULL);
· Set the context parameter get_url in the filestores web.xml as described above
· Set the context parameter service_name in the rpcrouters web.xml as described above
To ease this process, two installation-files are situated in the mets-server root directory (named "install-db.sh" for Linux- and "install-db.bat" for Windows-machines. When running these batches, also an example-collection and three example books are uploaded in the system.
To run the Alo-System, follow, these steps:
· Start the mets-server: to do this, go to the METS_HOME/bin directory, and run mets-server with the parameter start: (make sure that Java Development Kit is installed and JAVA_HOME points at your JDK installation. If the server could start successfully, the message MetsServerImpl up and working appears in the console window.
· Start Tomcat
· To make sure, that the ALOx system is running properly, visit http://localhost:8080/webinterface in your web-browser. The following text should come up: ENTER here: ALO Library. The root collection is situated at: http://localhost:8080/webinterface/library.
· Use the Client to fill the database with content
· Check if all steps have been performed properly (if no errors occurred). For further information, take a look at the log-files (either the files in directory Tomcat %CATALINA_HOME%\4.0\logs or the files defined by log4j).
· Make sure the %JAVA_HOME%- system variable and CLASSPATH is set properly.
· When modifying a web.xml-file, Tomcat has to be restarted.
· Make sure, that all used jars have the same version. If not an exception occurs.
To add another object-type, perform the following steps:
· Implement a preview in the mets-server-package. Register the type for preview in the file metsserver/conf/metsserver.conf and assign a Preview-class to this type. Collection types have also to be registered as collection types.
· Implement a Filter in the webinterface-package. Register this Servlet in the web.xml of the webinterface.
To implement another mets-schema (e.g. another version of the mets-schema) the following steps are necessary:
· Implement the Database Interface
· Classes for reading and writing the new Schema to database has to be implemented in the mets-object-package.
· Register the database in the metsserver/conf/metsserver.conf-file.
Of course it would be possible to add another Schema than a mets-schema (e.g. self-defined) but, this is not recommended.
All features of this distribution work on Linux machines. On windows- machines, it has the restriction that the filestore does not support tif2gif- and pdf2html-conversion, because external batch programs are called, which only work on Unix- machines.
As database, mySQL 3.2 or higher is used. The possibility for using Oracle for storing the mets-objects will be implemented soon. Only a few modifications are necessary to also offer an Oracle-solution.
JDK 1.3 or higher should be used. The source code of this distribution was only tested on JDK 1.4.
Most external used packages are from the Apache Jakarta Project, like Tomcat, Xerces, Xalan, Ant and log4j. To get additional information about these packages, see: http://jakarta.apache.org
As testing framework, JUnit (www.junit.org) is used.
Feel free to participate at the ALOx project:
An anonymous login for the CVS-System is set up: to log in, enter …. .
The standard coding conventions for the Java Programming language are used (with little modifications such as the position of the braces), which can be found under http://java.sun.com/docs/codeconv/html/CodeConvTOC.doc.html.
<context-param>
<param-name>log_configurefile_name</param-name>
<param-value>conf/filestore.log4j</param-value>
Please report any comments or observed unexpected behaviour to albert.greinoecker@uibk.ac.at or a.egger@uni-graz.at