Integration of Orthoptera collection data within a
Virtual Museum:
the German Orthoptera Collections Database
K. Riede1, S. Ingrisch1 and C. Dietrich2
1 Zoologisches Forschungsinstitut and Museum Alexander Koenig (ZFMK), Adenauerallee 150-164,
   D-53113 Bonn, Germany
2 Department of Neural Information Processing, University Ulm, D-89069 Ulm, Germany
The wealth of information contained within museum collections can only be tapped by digitising collection data, and making them available on-line. The Global Biodiversity Information Facility (GBIF: has been established to provide an interoperable network of biodiversity databases and the necessary information technology tools. The German Ministry of Science and Education is funding the EDIS-project (Entomological Data and Information System: to digitise and harmonise the rich, but scattered entomological collections housed at various German institutions. The core of the Orthoptera subproject is a specimen-based database of important Orthoptera collections in Germany, accessible by an internet-based user interface (Virtual Museum; German Orthoptera collections database: DORSA, see Poster 2).
Key words:
biodiversity databases, geographical information systems, bio-acoustics, automatized song classification.
Sample data sets from the respective tables are shown in Fig. 2
Note that the tables can be connected by the IDs, to one large table containing all the information. The relational data model saves storage space and time by distributing repetitive information on distinct tables.
Orthopterists are in the privileged situation that they have the Orthoptera Species File (OSF: Otte and Naskrecki 1997) as a taxonomic backbone, which is among the few global species register already avail- able on the world-wide web (
Fig. 2: Sample tables (simplified) for individuals of Galidacris sp.
Specimen table:
Determination table:
Storing Geo-Information
For all specimens with reliable locality information, collection sites will be geo-referenced by latitude/longitude co-ordinates, which can be mapped by any geographical information system (GIS) and intersected with environmental data. A first prototype for a Java-based graphical user can be found at This interface allows geographic queries, retrieval and mapping of species data.
Georeferencing - the geographic bottleneck
Computer-aided visualisation of locality data needs co-ordinates (latitude and longitude). Providing locali- ties as typed on specimen labels with co-ordinates is called geo-referencing.
Today, geo-referencing is done already by collectors, reliably with help of their GPS. But if we want to tap the rich geographic information stored on specimen labels, we have to look up co-ordinates manually, using atlases or gazetteers (online: Alexandria Project).
Given the huge number of specimens in museums (an estimated 5000 type specimens in Berlin alone), this sounds like an impossible task! However, the task becomes feasible if we think in terms of collectors and develop a data model to geo-reference collection trips (itineraries).
The itinerary model - a solution?
Note that one collector usually collects thousands of specimens. Historically, collections included distinct groups of organisms, from insects to plants. The material was distributed on different institutions. This means that today taxonomists at different institutions might be busy to geo-reference the localities of the famous Sarawak-expedition by Mjoeberg: inter alia, a frog and cricket type specimen were collected there (Leptobrachella mjoebergi, Itara mjobergi CHOPARD, 1930).
It is therefore much more efficient to geo-reference Mjoebergs itinerary, and make these data available in digital format.
This approach is also useful today: there are certain research stations and localities, where huge numbers of Orthoptera have been collected and been distributed to various institutions (Fig. 3).
Fig. 3
This picture shows a realistic scenario for biological collec- tions:
3800 specimens have been collected by one collector. They are distributed on 8 institutions, and eventually on different sec- tions (depending on the diversi- ty of the sample). If localities are georeferenced in each case, they must be georeferenced 3 x 8 = 24 times (probably more often, if more sections are involved). With a central register of itineraries, localities were georeferenced 3 times. This will be the only pos- sibility to geo-reference muse- um collections within a realistic time frame.
Bioacoustics and Neuroinformatics
DORSA is a network project, connecting expertise in data-basing, collection management, systematics, geo- graphical information systems, bio-acoustics and neuroinformatics. The species-specific songs are used as a knowledge base for song recognition algorithms based on neural networks. First results indicate that reliable automatized classification is possible for songs of Grylloidea from South East Asia and Amazonia.
Fig. 4:
Acoustic analysis of cricket songs for fea- ture extraction. The analysis tools were programmed with MatLab (by C. Dietrich).
Fig. 5:
Neural networks are used for cricket song classification. Many songs of sev- eral individuals from one species are necessary to allow reliable feature extraction.The songs are adminis- tered by the DORSA database. The neural net- works are trained with subset of all songs (at present, 215 songs from 137 individuals and 30 species.