The Mammal Networked Information System
Introduction (Barbara Stein)
The purpose of the meeting was to review MaNIS goals and objectives, forge a sense of community among the project participants and provide an introduction to the georeferencing methodology and tools to be used in the project.
Up to this point, the Mammal-Z-Net has served primarily as a mechanism to facilitate and coordinate grant preparation and submission. From now on we would like the list to be primarily a vehicle for discussion. See the bottom of this document for instructions on getting information about the Mammal-Z-Net list. Problems will undoubtedly arise during the course of the project and those problems will not be unique to any one institution. Addressing and answering all problems on the list will maximize the number of individuals who benefit from a given answer, stimulate discussion of common problems and solutions and create an archive of information for others who wish to implement such a network or join MaNIS at a later date.
Some background on John Wieczorek, lead programmer for the project:
John has worked in the MVZ since 1996 and created the Museum’s new database management system. I feel confident in saying that there is probably no aspect of mammal curation with which John is not familiar. Through creation of our system, he has also become knowledgeable about the intricacies and peculiarities of taxonomic, geographic and specimen data issues. For ca. 2 ½ months each year he conducts field work in Argentina and thus understands how data are collected, how field notes are taken and how specimens are processed. In short, it is unlikely that you will have any questions concerning organization of your collections or databases that he will not understand. The best way to describe my confidence in John's ability to make this project a success is to say that I would not have undertaken this project if he had not agreed to be its programmer.
Presentation (John Wieczorek)
MaNIS is an outgrowth of the symposium on Emerging Database Technologies that was held at the 1999 ASM annual meeting in Seattle, WA. In that symposium, Town Peterson, Curator of Birds at The University of Kansas, presented a network prototype for avian collections in North America. Excitement about the potential value of such a network led to proposal preparation and submission. More than just a prototype, MaNIS will be a large-scale fully-functional distributed network of mammal specimen data that will disseminate information via the WWW.
There is tremendous interest in MaNIS nationally and internationally and it is important to be aware of that as we develop the network. The product we create will be as useful to the world at large as it is will be to us. John has been invited to advise the North American Paleontology Convention on development of a network of paleontological research databases, and he just returned from a meeting of the Taxonomic Database Working Group (TDWG), which has strong interest in creating a global network composed of networks like MaNIS. In addition, the herpetological community is preparing to submit an NSF proposal similar to ours in July, 2001.
The creation of MaNIS will produce both curatorial tools and research tools. It will also increase visibility and use of collections. The creation of the MVZ’s new database management system and its public query interface have had a number of profound impacts on daily business. Specifically, they have:
The following graph shows that the number of requests for loans also increased following the debut of the public database.
We expect that creation of MaNIS will have similar impacts on the collections of participating institutions and that increased communication within the community will lead to common solutions to many data-related problems.
What are the tasks required to make MaNIS functional?
Project coordination and order of events
We do not want to "herd cats." We hope that you will be comfortable in collaborating with each other, particularly with respect to georeferencing.
While John begins work on developing the network, participants will begin georeferencing. This is why John asked for your data. From those data he will create a combined snapshot of unique localities, which will be used for georeferencing.
Once the network software is ready, he will immediately connect the two institutions for which he has complete data access (MVZ, UAM) and he will test the network with those two alone before connecting any additional institutions. Connecting an institution involves creating the unique migration scripts that will migrate data from the institution’s master database to its MaNIS server and making the network aware that a new collection has been added. Travel money has been allocated so that John can spend one week at each institution to facilitate this important task.
In thinking about writing migration scripts, John will be looking at the structure of the databases you sent him. Therefore, he will need to know as soon as possible if either your operating system or your database management system is going to change in any way between now and when he begins working on data migration for your institution.
In turn, it is understood that the collections will need as much lead time as possible before being connected to the network so that curators and staff can be available and plan their schedules accordingly.
At its simplest, georeferencing is the determination of latitude and longitude or equivalent coordinates. But it is much more than that. John demonstrated a GIS tool that has been developed by the Berkeley Digital Library Project (DLP). It is an example of the power of visualization of specimen locality data in a spatial framework.
Following is a screen image of the results of a query on the MVZ public database for Neotoma fuscipes. These results were then passed into the GIS viewer by clicking on the link to view this query result.
Following is a screen image of the results plotted on the GIS viewer with the "Relief" layer and the "Specimens" layer turned on.
Following is a screen image of the results plotted on the GIS viewer with the "California Counties" layer and the "Specimens" layer turned on.
The GIS software is available to us at no cost and will be modified by John for use in this project. In addition to using this software in a collaborative georeferencing effort, each participating institution is free to use the GIS Viewer as a visualization tool for its own purposes – source code is freely available. Geographic layers for areas outside of the USA will need to be created for the tool to be used in those regions. Traditional resources (maps and gazetteers) will be used in lieu of these geographic layers where they do not exist.
5 Easy Steps to Quicker, Better Georeferencing
To georeference specimen by specimen where the lat/long is found using an atlas, map, gazetteer, etc. is quite time-consuming. Easier methods involve:
As localities are georeferenced and institutions are connected to the network, MaNIS will be a dynamically-maintained gazetteer of the locality data contained therein.
How do we proceed with georeferencing?
We need to have a common conception of the data we wish to capture. Latitude and longitude are not sufficient. Having only those values is similar to saying you have a length of 24 without providing the units. Additional fields are needed, and include the following:
1) Datum — a datum is a geometric model that simulates the surface of the earth. If you do not specify the datum on which latitude and longitude coordinates are based, the error of any lat/long value may be as great as ca. one km (see http://www.colorado.edu/geography/gcraft/notes/datum/datum_f.html). For example, you may have taken a GPS reading in the field, and then plotted that point on a map only to find that it doesn’t correspond to the right location. It is quite likely that the datum used for the GPS reading (WGS84 by default on most GPS units) was different from that for the map on which you are plotted the points (NAD27 or NAD83 for USGS maps).
2) Maximum error — this value will be derived from a rule set that the group will devise and use for all lat/long measurements to achieve and ensure data consistency.
3) Assumptions — this field will record how determinations were made for data when the assumptions differ from or fall outside of the scope of the documented rules for determining lat/long and maximum error.
4) Source — The reference from which the coordinates were determined (e.g., the name of the map, gazetteer, or software, or a GPS).
5) Determining agent — Name of the individual responsible for assigning lat/long values.
6) Determination date — Date when the determination was made.
Given the value of these data, even localities where latitude and longitude have already been assigned will benefit from being revisited.
If we establish georeferencing rules and adhere to them, people using our data will have no reason to mistrust them - it will be clear how our determinations were made. When the georeferencing is done it will be possible to query MaNIS based on the maximum error distance so that only those localities meeting a specified standard of accuracy will be returned. John and Barbara will draft a set of rules for georeferencing, and then place those on the list for discussion and refinement before georeferencing commences.
To maximize efficiency, it is important to start with the easy localities - those for which good maps, etc., exist or for which experts are available to help. We also stand to gain tremendous efficiency by cooperating on georeferencing. Ideally we would like to get commitments from participants to georeference localities from well-defined geographic regions that interest them (e.g., MVZ volunteers to georeference all localities in California, Argentina, Brazil, and Chile). As an institution finishes georeferencing localities from geographic areas to which they’ve committed, new geographic regions should be reserved as appropriate to the geographic holdings of that institution as well as the expertise that can be called upon there. Commitments should be made after John has made available information about how many localities each institution has from each geographic area.
We need to cooperate – based on what I talked about above, our overall efficiency will increase by working cooperatively.
What will happen next?
|John Wieczorek, 27 June 2001||
Rev. 5 Sep 2002, JRW