The ALCTS Electronic Resources Interest Group met on Saturday, January 12, 2008, from 10:30am-12:00pm. The program was well attended (about 115 people signed in, but the actual headcount was closer to 200), especially given the number of other programs occurring in the same time slot.
The topic of session was making e-resources more manageable and other issues related to records, in particular issues related to e-book aggregations and databases.
Our first speaker was Nicole Pelsinsky, who recently joined Serials Solutions as a product manager focused on the 360 MARC Updates service. Prior to joining Serials Solutions, Nicole worked at Microsoft Corporation for seven years. Some of her previous projects include the multifaceted search experience for the Microsoft Library Global Market Research Search Development, a glossary lookup tool, an image search system, and search usability testing. Before joining Microsoft, Nicole worked with digital asset metadata for Getty Images. In more traditional library roles, her work included positions at Johns Hopkins University and the Centers for Disease Control.
Nicole began by noting that she is able to bring a unique perspective to her work at Serials Solutions based her prior work on the customer side—issues from the library perspective, something “near and dear to her heart.” The problem she found with e-resources of any type, especially when she was working at Microsoft, is that there was a lack of discoverability. People couldn’t find the resources unless they were represented in the OPAC. If the resources weren’t in the OPAC and the library didn’t provide some other way for users to access the resources (e.g., facilitating browsing through the library portal), people were mystified about why it was that they weren’t having a good user experience and being connected with information. The other thing that was an issue was that with all the changes in the data with the subscriptions that they had, they did not have enough staff and resources to keep up with all of those changes. They were constantly scurrying around trying to figure out ways to discover broken links, etc. Also, many times their customers would discover problems before staff would, so there was a lot of retroactive cleanup. Furthermore, a new e-book subscription might mean thousands of MARC records to be loaded into the OPAC, so just the sheer volume would “terrify” her IT people, since they were the ones who were supporting the library and loading the records in the OPAC.
At this point, Nicole assured the audience that there are solutions for these issues, but first she gave us the provider perspective based on her experiences at Serials Solutions. Now that she is at Serials Solutions, Nicole has more of an understanding of the issues they face as a content provider:
- there are varying levels of completeness of records and the standards are all over the place, so many times the information needs to be manipulated in order to conform to some sort of standard
- the metadata for e-books is not always reliable – sometimes it may not exist at all, sometimes it may not be complete
- keeping up with changes and growth of subscriptions is problematic – titles come and go in the packages and links change often
- providers need to do a better job of defining standards they want from publishers, in terms of content and data
- the PCC has gone a long way in developing standards and Serials Solutions has reviewed those standards, but the many times there is very useful data beyond just what conforms to a MARC standard and it would be very useful for them to have “forward thought” about how they could use that data and make it available to libraries
So what can Serials Solutions do for libraries?
- improve the publisher provided data and, more importantly, they can improve the relationships between them as a provider of content to libraries and publishers
- providers can improve relationships between themselves and the publishers and the librarians – eliminate the lack of good support (e.g., whom to speak to about a problem)
- supplying standardized frame work for delivering data and delivering it to libraries is really crucial – tools are being developed and they are working populating their repository with content other than e-serials and e-journals; are working on how to put it in a framework that is usable for libraries
-
they will be offering customization of records for libraries’ local practices – they want to make sure they conform to libraries’ local practices; the Serials Solutions records won’t look any different
- they provide coverage of a variety of electronic resources from different publishers and deliver it to library in one package; instead of trying to go and get e-book information for several publishers, perhaps e-resource database from another place, SS will bring all of that information together and deliver it to the library in one package
- behind the scenes – make regular updates to the data; have a title authority control system in place and are using it for e-serials and e-journals and will be expanding use to e-books and other types of content; they provide holdings data; and they have brief record generation to provide a broader coverage of a libraries’ collection
Another point Nicole emphasized was finding a solution to the question “what is the most crucial issue for them as a content provider?” For Serials Solutions, the answer is really trying to get better data from publishers, which comes down to building relationships and trying to make it mutually beneficial for Serials Solutions, the publishers, and the libraries.
Serials Solutions is starting with its journal collections for their KnowledgeWorks certification program and will be expanding it to e-books and other types of content. The requirements that publishers have to adhere to are that they supply title list and a dedicated contact person. In return, the publisher will get a dedicated Serials Solutions contact person so that troubleshooting will be easier, etc. Publishers will also get access to the Serials Solutions interface so they can see what libraries are seeing when they are looking at their information; they can see firsthand the problems that a library might be experiencing. There is also an element of mutual publicity that goes along with this program – a good way to tie all of the pieces together and share all of the pieces of information. The end result of certification is having a collaborative environment with publishers. Serials Solutions can do lots of data manipulation in house, but it isn’t scalable in the long term. So, getting the data from the source puts them in a better position to provide better service to libraries.
Click here to view Nicole’s PowerPoint slides.
Our second speaker was Timothy Savage from OCLC. Tim received a bachelor of music degree from Oberlin Conservatory and an MLIS degree from the University of Pittsburgh. He has been at OCLC for nine years working in various capacities—first as a music cataloger in TechPro and later as a project manager. Currently he serves as a product manager in Cataloging and Metadata Services where he manages vendor partnerships, particularly those involving electronic content. Chief among his duties are managing and automating the cataloging activities for numerous partners in a variety of formats including e-books, e-audiobooks, e-music, and e-video.
Tim spoke about OCLC’s activities in terms of creating records for electronic resources, particularly their efforts to automate record creation.
Dramatis personæ
Tim likes to draw a distinction between publishers and aggregators. He defines publishers as “those who publish.” They own the rights to their content. Aggregators, on the other hand, host the published content and license the content from the publishers. Oftentimes, aggregators can also be publishers and vice versa.
According to Tim, there is a great deal of electronic content “out there” so it only makes sense to try introduce automation to try to cope with the cataloging load. Automation in cataloging isn’t new, however. Devices such as constant data, macros and “newing” records have been used for years. OCLC Contract Services has actually been working with aggregators to create records for econtent for nearly five years, so this really isn’t that new, but there is always room to improve upon their processes.
Tim pointed out that the LC Working Group’s report highlighted the fact that there is a great deal of redundancy in our cataloging workflows in terms of duplicate data entry taking place, and perhaps more surprisingly, duplicate subject analysis. Many publishers assign BISAC headings, which is a subject vocabulary created by the Book Industry’s Study Group, to their metadata records. There is a great opportunity to try to leverage some of that subject work into the assignment of LC subject headings and even into the derivation of call numbers.
But as we move forward with automation, Tim points out that we (as catalogers) shouldn’t lose sight of our core values:
- performing subject analysis and classification
- determining the relevancy of access points
- performing authority control
But as we move into the future, Tim suggests that there are some new values we might want to consider, namely, the value that can be gained from using vendor-supplied metadata to try to automate the way a lot of the descriptive tasks and the descriptive parts of cataloging are done. Tim admitted having spoken to many librarians who are understandably concerned that introducing data from these sources and interest in automation compromises quality or “dumbs down” the catalog, but he doesn’t agree. Tim thinks we really need to focus on improving how we do things so they can be done with greater efficiency so that we can have more things cataloged and, therefore, more things discoverable. By automating a lot of the “descriptive minutia,” we can repurpose our intellectual efforts to those tasks that we deem the most critical for resource discovery. He finds it interesting that the LC Working Group’s report says that more research is needed to determine those elements of the bibliographic record that most heavily drive resource discovery because he suspects that if this research were done, the results would point back to the three core values he mentioned earlier.
So what is OCLC doing?
Building on the success of the past five years of partnering with aggregators, OCLC is looking to “take it to the next level” by not only receiving metadata from aggregators but by going “further upstream” directly to the publishers themselves. Tim described OCLC’s “next generation cataloging effort”—they are receiving metadata directly from publishing community in an XML standard called ONIX (Online Information Exchange), a standard that is widely used across the publishing industry. As titles are being made ready for publication, publishers will often pass metadata on to places like Amazon.com or Barnes & Noble, and that data automatically gets uploaded onto those sites and the items described by that metadata become available on that vendor’s website for sale. Since this metadata is already available, it makes sense for OCLC to tap into it as well. OCLC is using automated processes to convert the ONIX data into MARC, and in so doing, make the records more robust and complete in accordance with our current cataloging standards.
At this point, Tim showed the audience a sample ONIX record and pointed out what data could be used, such as ISBN, title, author, imprint data, and even subject terms, which can be leveraged by matching these terms to controlled vocabularies.
OCLC has established a “template-based cloning process” in which they take the best parts of the ONIX record and combine these with the best parts of data that is already in WorldCat. Tim describes the process as a sequence of constant data layers being applied one on top of the other. Using the ONIX metadata received from the provider, OCLC searches and mines WorldCat to find records to clone of the same title that are perhaps in a different form (e.g., a print version of an e-book or a CD of an e-music res