Or, “March of the zombie librarians” - this is what Brian Surratt calls the phenomenon of librarians going to their 8:00am meetings…
The first presenter, Jane Greenberg, (Associate Professor, School of Information and Library Science, University of North Carolina at Chapel Hill), spoke about DRIADE, the Digital Repository of Information and Data for Evolution.
She explained that the word is pronounced “dree AH dee” and that it is named after a coffee shop in Chapel Hill.
In a nutshell, DRIADE is a repository for for data supporting published research (i.e., the datasets behind the published research) related to evolutionary biology, a small science discipline that follows the open science model (i.e., the open approach toward research).
The open science model, apparently, has prompted the creation of small repositories. Other small repositories (in addition to DRIADE) are KNB (Knowledge Network for Biocomplexity) and MMI (Marine Metadata Initiative). Science disciplines are categorized as “small” based on
- their unique structure (data are collected by individuals and small groups)
- the size of the datasets and
- the amount of government funding received.
In the field of evolutionary biology, there are seven major journals. Molecular Biology and Evolution is one of those journals. Typically, supplemental data is stored in a repository associated with a journal and the metadata is minimal (perhaps only author, no subject access, species info, etc.). Some researchers are depositing data in specialized databases like Genbank, but many of these specialized databases have limited linking between pieces of a publication.
According to Greenberg, a central repository is the ideal way to link original data with published research. Enter DRIADE, a “collaboration between NESCent (National Evolutionary Synthesis Center) and the SILS Metadata Research Center” at UNC. The project is being supported by the National Science Foundation and is slated to last six years. NESCent approached SILS because they needed metadata/cataloging.
DRIADE will provide a one-stop deposition of data objects: “one stop shopping for all.” Key features of the project include:
- handshaking with specialized databanks
- support for acquisition, preservation, discovery, and reuse of data
Journal articles will not be part of repository but they will be available to create metadata for subject access, etc.
The project began in September 2006, and in December 2006, there was a stakeholders’ workshop that included representatives from major journals and societies plus metadata folks – all in order to get feedback on idea of repository. The idea was widely supported.
Greenberg spoke about scope of the repository and the logistics. She mentioned Maslow’s “hierarchy of needs” (preservation, access, synthesis = three levels of metadata). The repository was created in response to a need to preserve data as well as make it discoverable to facilitate data synthesis. The journals felt it was their responsibility to get repository going and that they had a moral responsibility to provide access to data so it could be reused. Also, the repository would aid in verifying authenticity of data. Six of the seven journals agreed to make DRIADE the place for storing data.
After it was agreed that the repository would be a good idea, the next step was to determine the functional requirements of the metadata. The group surveyed existing repositories and projects, and looked at their goals and products, but none of the databases coupled data with the object the way DRIADE wanted to: the metadata supports resource discovery and sharing, and provides extra special security for sites because data submission is linked with manuscript review.
The people involved in DRIADE are also doing research in automatic metadata creation, using OASIS as a functional model.
The application profile for DRIADE was created by mixing and matching metadata schemes. There are three levels of metadata; level 1 is based on Dublin Core, Data Documentation Initiative, Ecological Metadata Language, and PREMIS (Preservation Metadata Implementation Strategies). The DRIADE scheme is modular and simple (only 18 elements), and some metadata is generated automatically. For controlled vocabulary, LCSH and MeSH are used.
Greenberg concluded her presentation by noting that DRIADE is about teamwork; it is partnership of stakeholders. What makes it unique is its model of linking data with published research.
From the DRIADE website:
DRIADE (Digital Repository of Information and Data for Evolution) is a planned repository for data underlying published works in the field of evolutionary biology. DRIADE will support data acquisition, ensure long-term preservation of data objects, and promote resource discovery, data sharing, and data reuse of heterogeneous digital datasets.
The project aims to address a number of challenges in effectively storing and retrieving heterogeneous scientific data. Chief among them is to use emerging information technologies to avoid placing an undue burden of metadata generation on the individual researcher while at the same time capturing sufficient metadata to enable resource discovery and reuse.
DRIADE’s plans include the following functionalities:
- Computer-aided metadata generation and augmentation to assist the data provider in capturing metadata of sufficient richness and quality to enable advanced data discovery, reusability and data integration.
- Specialized modules allowing data submission to be coordinated with manuscript review and publication, as well as with the submission process to external specialized databases (e.g. for sequence data, phylogenies, anatomical images). This will provide one-stop data submission for the user.
- Data and metadata quality control, which will be overseen by a curatorial staff, supported by a separate data curation software module that employs automatic techniques to evaluate metadata quality.
- An identity, authority and data security module will implement fine-grained data access privileges for users using global user identities.
- Resource discovery, sharing, and interoperability with external repositories will be enabled by implementing the OAI-PMH metadata harvesting standard supplemented by custom web services. These services will be exposed to collaborating journals, specialized data repositories, third-party content aggregators, and the DRIADE web portal itself.
Extensive evaluations and user testing are being employed throughout the design and implementation process by
- conducting metadata generation studies and analyzing the resulting quality of metadata content,
- conducting information retrieval experiments and usability studies to evaluate the effectiveness and performance of the system.
Stakeholders from evolutionary biology journals and scientific societies are helping NESCent to develop a business model to ensure the long-term maintenance and financial sustainability of the repository.
The second speaker on the program, Diane Hillman (Research Librarian, Cornell University), spoke about Dublin Core and RDA.
The most significant outcome of the RDA Data Model Meeting held in London in May 2007, was the following of recommendation (quoted from the Meeting website):
The meeting participants agreed that RDA and DCMI should work together to build on the existing work of both communities.
The participants recommend that the RDA Committee of Principals and DCMI seek funding for work to develop an RDA Application Profile — specifically that the following activities be undertaken:
- development of an RDA Element Vocabulary
- development of an RDA DC Application Profile based on FRBR and FRAD
- disclosure of RDA Value Vocabularies using RDF/RDFS/SKOS
Hillmann pointed out one significance of this recommendation - the element vocabulary:
- separates elements from instruction and application; formal representation (definitions,
- relations between elements and subelements)
- can be exposed to humans and machines
- includes FRBR and entities
- URI – unambiguously identifies that element in all instances
- label – can have language attribute – use for labeled display; can be whatever you want; can mess around with the label but not the name
- description = definition
- histories of term changes
The instructional material will now sit in the middle (used to be like the garlic in the tomato sauce – you know it’s in there…).
This change will help move us from silos around data – for us and other people who are interested in our data and will allow us to incorporate FRBR relationships to provide clarity and help display issues. Extensibility will be much easier (Hillmann said “technically, maybe not socially”).
The instruction part is the content standard and is provided at several levels. The approach is “top down” — from general rules to most specific (communities will want different levels; this approach allows RDA to be used by broader variety of people).
Hillmann said that distribution is being discussed, but some of level of RDA will be free. Also,
some piece of this will be available as part of the application profile.
This change opens the door to other communities (extension to specialized communities). The general instructions are still followed but may have different examples, etc. The general rules aren’t changed but are extended to fit a particular context.
RDA value vocabularies are imbedded vocabularies being pulled out of instructions (last count was 56 terms). Hillmann said that RDA is loaded with these; when they are imbedded in the text, there is no room to give more information to make them easy to apply. For the web environment, the vocabulary will be pulled out of the instruction.
There will be cross references, etc. (i.e., standard vocabulary management)
Application Profiles
- provide the documentation of community understanding and intent
- what is being described, what are the key relationships
- what is required; how many may you have, etc.
- provides guidance for crosswalks
- provides ability to go from community consensus to next step (machine manipulation and evaluation and validation; someone should be able to use application profiles to determine if metadata is adequate)
- specifies appropriate controlled vocabularies and encoding schemes
- formal expression in human readable form; nothing in machine readable form yet, but this is being discussed by DCMI
The hope is to stay out of the way of RDA development; doesn’t enforce traditional complexity of library data. The first version will be a machine readable version; funding is being worked on. A draft should be available by end of year.
JSC is aware of the need for training.
Someone asked Hillmann about catalogers using Anglo-American cataloging rules – will they need to create an application profile, too? Hillmann suggested one possible scenario – LC could come up with an application profile, but the question remains is interoperability guaranteed? This should improve it, but we still need to depend on lead institution.
Hillmann also pointed out that an institution wouldn’t necessarily have just one application profile; these may vary by project. It depends on the context.
Who will maintain RDA remains a question. JSC would maintain the guidance instructions, but not the vocabulary. How to manage this over time is being discussed; it isn’t clear how much JSC wants to do in the future. We shouldn’t assume that JSC’s relationship with RDA will be the same as it is with AACR2.
Following Hillmann’s presentation, there was a business meeting. The following people were elected to serve next year:
Chair: Louise Ratliff, UCLA
Secretary: Erin Stahlberg, UVA
Program:
John Chapman, Minnesota
Rebeka Irwin, Yale
Publishing:
Jennifer O’Brien Roper, University of Maryland
Sai Deng, Wichita State
Intern: Jin Ma, CUNY
CDA representative: Greta de Groat, Stanford





