RUSSIAN JOURNAL OF EARTH SCIENCES, VOL. 14, ES1004, doi:10.2205/2014ES000538, 2014
V. V. Naumova, A. V. Belousov
Far East Geological Institute FEBRAS, Vladivostok, Russia.
The digital repository "Geology of the Russian Far East" is developed in Laboratory of Information Technologies of the Far East Geological Institute of the Far East Branch of Russian Academy of Sciences, within the framework of the development of the infrastructure of the spatially distributed different-type scientific data on geology of the Russian Far East. DSpace is the basic program environment of the digital repository. The System is adapted for geology by introduction the thematic block of geological and geographical thesauruses in it.
Open access is a free access of users to the online scientific publications with the right to read, to charge, to copy, to distribute, to publish, to search, to refer to the full-text articles, to index, and so on, i.e. to use them with any legal purpose without financial, juridical, or technical obstacles.
According to the Berlin Declaration of 2003 (Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities, http://oa.mpg.de/lang/en-uk/berlin-prozess/berliner-erklarung/), publication of full texts of editions to the access through the Internet must answer two conditions:
Two main technological directions are distinguished: journals of open access and repositories of open access. Both directions are the ways of scientific intercourse. Journals of open access publish the articles been reviewed, and repositories collect the documents not necessarily been reviewed and not necessarily articles. Journals of open access and repositories are not incompatible – they supplement each other.
Primarily, the standards of bibliographic descriptions were developed for the traditional publications in public libraries: GOST 7.1 – 2003, ISO 690, ISBD, and others. To operate with this information in the Internet the following standards were developed: ISO 2709, ISO 15836-2009, NISO Standard Z39.85, etc., and on the basis of them the keeping formats of bibliographic descriptions were developed: MARC (MARC 21, UNIMARC), Dublin core (SDC, QDC), MODS, and others. The process of exploitation of the emerged library catalogues arose a question of their combined use. For these purposes the Library of the USA Congress elaborated the portal of the distributed search Z39.50 that has been developing since 1970s, and new versions were put out in 1988, 1992, 1995, and 2003. It made it possible to dispose the distributed search, which didn't depend on the final systems, data base type, keeping formats, etc. In succeeding years on its basis there were developed the SRW portals that used modern technologies SOAP, HTML, XML, and SRU – an alternative of SRW based on the URL, which have a lower entry threshold than Z39.50 [Zhizhimov and Mazov, 2004].
Along with the advancement of the initiative of the Open Access they began to develop the digital repositories for keeping and distribution of the digital material of any type. To provide their interoperability the portal of the OAI-PMH metadata collection was elaborated.
Many technological decisions are available for the integration of the library scientific data based on these portals:
Among such systems available today in the world the following ones may be pointed out: Common to the points of access to the heterogeneous resources of the USGS (U.S. Geological Survey, http://www.usgs.gov/pubprod/): to maps, publications, satellite photos, aero photographs, and to the accompanying data.
NBII Metadata Clearing House (National Biological Information Infrastructure) is the initiative of the USGS (U.S. Geological Survey) on the creation of the distributed information system that contains the metadata describing biological data and information products and is based on the submultitude of the CSDGM-NBII Biological Profile.
In Europe, the European Library Project (http://www.theeuropeanlibrary.org/tel4) is being realized that must integrate all national libraries of Europe and leading European research libraries. Within the framework of the Project there has been created the united portal of metadata, which harvests the metadata according to the OAI-PMH Protocol. The creators elaborated the algorithm that with some expenditures and under certain conditions makes it possible to harvest the metadata according to the Z39.50 Protocol. This library provides a quick and easy access to the collections of 48 National Libraries of Europe and leading European Scientific Libraries. Users of the Library can find and use more than 18,644,265 digital sources and 119,246,208 bibliographic references. To facilitate a further search the references to other web-sites of the European group have been constructed.
In Russia a number of the OAI-compatible scientific repositories is rather modest. These are either an addition of the OAI-module to the own system, as it is, for example, in the context of the Scientific-Educational Social Net "Socionet" (http://www.socionet.ru) [Parinov et al., 2003], or the use of the specialized program media. An example of this direction is the Electron Library of the Siberian Branch, RAS (SBRAS) (http://db3.nsc.ru:8080/jspui/) [Zhizhimov and Mazov, 2004; Zhizhimov et al., 2011]. The library is developed in DSpace. The main divisions of the Library are reports on scientific research work, virtual and real museums, dissertations, integration projects of the SBRAS, materials on the Program "Telecommunication and Multimedia Resources of the SBRAS", Library of the Siberian Branch of RAS, personalities, subject collections, technical and normative documentation, and workers' proceedings. The program environment is modernized by Z39.50 server addition. The constructed System gives the chance to obtain data from repositories and from catalogs of scientific libraries of the Siberian Branch of RAS also.
The digital repository "Geology of the Russian Far East" is developed in laboratory of information technologies of the Far East Geological Institute of the Far East Branch of Russian Academy of Sciences (FEBRAS) within the framework of the development of the infrastructure of the spatially distributed heterogeneous scientific data on geology of the Russian Far East [Belousov, 2013; Naumova et al., 2011] and this System is a separate block of the common infrastructure (http://www.fareastgeology.ru).
The chosen decision is based on the metadata portal. A metadata portal is a system providing a simple and intelligible access to the distributed information resources [Schindler and Diepenbroek, 2008].
We chose the DSpace as a basic program, which possesses the functionality sufficient for our purposes: a handy catalogue system, the availability of the server for collection of the OAI-PMH metadata, full-text search based on the search instrument Apache Lucene or Apache Solr, the differentiation of rights and maintenance of the LDAP access protocol, and the possibility to manage and keep the digital material of any type. The open code and a great community of users and elaborators all over the world should be also noted.
The functional scheme of the Portal being developed is given in Figure 1.
The Portal integrates the scientific publications that refer only to the study of geology of the Russian Far East. We define the Far East as the territory including Amurskaya Oblast', the Jewish Autonomous Region, Kamchatsky Krai, Magadanskaya Oblast', Primorsky Krai, Sakha Republic (Yakutia), Sakhalinskaya Oblast', Khabarovsky Krai, and Chukotsky Autonomous Region. The area of the Far East of Russia is 6,169,329 km$^2$ that makes 36.08% of the country total area.
The publications we need are in the digital repositories of scientific institutes and universities; in electron libraries, including the Scientific Electron Library (http://elibrary.ru); in the full-text scientific data-bases; in the catalogues of scientific libraries, including the catalogue of the Central Scientific Library of the Far East Branch of RAS, and on other resources.
Publications are harvested with the use of two modes: manual recording of data and automatic integration of data with the use of the collection program modules on different protocols and with the use of the subject filter.
Scientific publications are classified into following divisions:
Collections from archives and library catalogues may be accessible on the Portal trough three communication protocols: OAI-PMH (the Open Archives Initiative Protocol for Metadata Harvesting), Z39.50 or SRU (Search/Retrieve via URL).
Choice of a communication protocol influences greatly the functionality that can be allowed by the portal to a final user. Although all three protocols allow the standard for the communication between a portal and library systems, the communication paradigm, providing the foundation, is significantly different. Whereas the OAI-PMH allows a portal to harvest all metadata records from libraries into the Central Archives, the Z39.50 and SRU were developed for the remote access and assignment, so the metadata records remain at the data provider.
Harvesting of metadata on the Z39.50/SRU protocols involves significant difficulty, because initially they were not designed for the metadata harvest, so some functionality necessary to provide the efficiency and reliability of the harvesting process was not include in the protocol project.
The works of the authors of the Library of the University of Illinois and the European Library (Guidelines for preparing a Z39.50/SRU target to enable metadata harvesting/TELplus. The European Library: http://cyberdoc.univ-lemans.fr/PUB/ CfU/Journee\_UNIMARC\_Lyon/TELplus-D2.3\_v1.0.pdf) on the possibility of the metadata harvest on the Z39.50 Protocol showed that the metadata harvest is allowed if the Z39.50 server satisfies some conditions.
In contrast to the OAI-PMH the Z39.50 servers are accessible for a significant number of the systems of the library management, and they are used widely. Many libraries of Russia use Irbis as a system of the library management. Web-catalogue of the Web Irbis system maintains the Z39.50 server that allows us to harvest the bibliographic metadata from the library catalogues. If the Z39.50 server is not adjusted, we use the function of the Web Irbis export.
Thus, the Portal harvests the bibliographic descriptions from other repositories (on the OAI-PMH Protocol) and from library catalogues (on the Z39.50 Protocol) [Kaczmarek and Naun, 2005] or uses the function of export of the library management system. The Portal realizes the technological possibility of the metadata harvesting from the full-text scientific databases, such as the Scientific Electron Library and Science Direct.
Relevancy of the harvested data is provided by filtration on the basis of the morphological search (stemming) in the metadata records of the terms of the chosen thesaurus. The metadata received are added to the database in the Dublin Core format.
The system is adapted to the geology of the Russian Far East by the entry of some subject thesauruses into it.
Thesauruses perform the following functions [Kubik, 2011]:
We have constructed the following thesauruses: "Geographic Unit of the Russian Far East", "Geologic Unit of the Russian Far East" [Dal'nauka, 2006a, 2006b] and "Geologic Time Scale" (Geologic Time Scale. http://geology.com/time.htm). The thesauruses represent the fixed lists of meanings of parameters. Below, some fragments of the Portal thesauruses are given in this paper.
A fragment of thesaurus "Geologic Unit of the Russian Far East":
. . .
862. Yaurinskaya suite.
A fragment of thesaurus "Geographic Unit of the Russian Far East":
. . .
296. Sea of Japan.
A fragment of thesaurus "Geologic Time Scale":
. . .
At present, the Repository contains the following collections and communities:
The System keeps the following objects: Metadata (Table 1), Collections, Communities, Files, Sources, Users, and official information (Figure 2). The PostgreSQL relation database is used for keeping. Metadata are kept in format Qualified Dublin Core.
A sample of metadata records is given in Figure 3.
The Repository uses the Jakarta Lucene search mechanism maintaining the following functions: full-text search, stop-words, word truncation, morphological search, search by phrases, and others. The widened search makes it possible to indicate what document fields participate in organization of search, which can be matched by logic operators: "AND", "OR", and "NOT" (Figure 4). The search field can be restricted by a community or a collection.
The user registered on the Portal gets a chance:
Repository "Geology of the Russian Far East" (http://repository.fareastgeology.ru) is the united point of the open access to the spatially distributed scientific publications on geology of the Russian Far East.
Belousov, A. V. (2013), Digital repository of the Far East Geological Institute, FEB of RAS: open access to scientific data on geology of the Far East of Russia, Internet and Modern Society: Collection of abstracts. Proceedings of the XVI All-Russia United Conference "Internet and Modern Society" (IMS-2013), Sankt-Petersburg, October 9–11, 2013, NIU ITMO, Sankt-Petersburg.
Dal'nauka (2006a), Geodynamics, Magmatism, and Metallogeny of East Russia. Book 1, 1–572, Vladivostok.
Dal'nauka (2006b), Geodynamics, magmatism, and metallogeny of East Russia. Book 2, 573–981, Vladivostok.
Kaczmarek, J., and C. C. Naun (2005), A statewide service using OAI, Library Hi Tech, 23, 576–586, doi:10.1108/07378830510636355.
Kubik, T. (2011), Role of thesauri in the information management in the web-based services and systems, Lecture Notes in Computer Science, 6560, 25–49, doi:10.1007/978-3-642-19968-4_2.
Naumova, V. V., I. N. Goryachev, K. A. Platonov (2011), Web-integration of heterogeneous scientific data and services on geology of the Far East of Russia on the basis of portal decision, Geoinformatics, 1, 56–62.
Parinov, S. I., V. M. Lyapunov, R. L. Pusyrev (2003), System Socionet as a platform for development of scientific information resources and online services, Electron Libraries, 5, 1.
Schindler, U., and M. Diepenbroek (2008), Generic XML-based framework for metadata portals, Computers & Geosciences, 34, 12, 1947–1955, doi:10.1016/j.cageo.2008.02.023.
Zhizhimov, O. L., and N. A. Mazov (2004), Principle of construction of the distributed information systems on the basis of Protocol Z39.50, 361 OIGGM, SB of RAS, Novosibirsk.
Zhizhimov, O. L., Yu. I. Molorodov, I. A. Pestunov, V. V. Smirnov, A. M. Fedotov (2011), Integration of heterogeneous data in solving the tasks of investigation of natural ecosystems, Bulletin of Novosibirsk State University, 9, 3, 67–74.
Received 18 June 2014; accepted 19 June 2014; published 28 June 2014.
Citation: Naumova V. V., A. V. Belousov (2014), Digital repository ``Geology of the Russian Far East'' – an open access to the spatially distributed online scientific publications, Russ. J. Earth Sci., 14, ES1004, doi:10.2205/2014ES000538.
Copyright 2014 by the Geophysical Center RAS.