QA catalogue for analysing library data

Search

Found records

Completeness of MARC21 field groups

Completeness of MARC21 fields

Issues in MARC21 records

Functional analysis

Discovery functions

Search for a resource corresponding to stated criteria (i.e., to search either a single entity or a set of entities using an attribute or relationship of the entity as the search criteria).

Identify

Identify a resource (i.e., to confirm that the entity described or located corresponds to the entity sought, or to distinguish between two or more entities with similar characteristics).

Select

Select a resource that is appropriate to the user’s needs (i.e., to choose an entity that meets the user’s requirements with respect to content, physical format, etc., or to reject an entity as being inappropriate to the user’s needs).

Obtain

Access a resource either physically or electronically through an online connection to a remote computer, and/or acquire a resource through purchase, licence, loan, etc.

Usage functions

Restrict

Control access to or use of a resource (i.e., to restrict access to and/or use of an entity on the basis of proprietary rights, administrative policy, etc.).

Manage

Manage a resource in the course of acquisition, circulation, preservation, etc.

Operate

Operate a resource (i.e., to open, display, play, activate, run, etc. an entity that requires specialized equipment, software, etc. for its operation).

Interpret

Interpret or assess the information contained in a resource.

Management functions

Identify

Identify a record, segment, field, or data element (i.e., to differentiate one logical data component from another).

Process

Process a record, segment, field, or data element (i.e., to add, delete, replace, output, etc. a logical data component by means of an automated process).

Sort

Sort a field for purposes of alphabetic or numeric arrangement.

Display

Display a field or data element (i.e., to display a field or data element with the appropriate print constant or as a tracing).

The Funtional Requirements for Bibliographic Records (FRBR) document's main part defines the primary and secondary entities which became famous as FRBR models. Years later Tom Delsey created a mapping between the 12 functions and the individual MARC elements.

Tom Delsey (2002) Functional analysis of the MARC 21 bibliographic and holdings formats. Tech. report, Library of Congress, 2002. Prepared for the Network Development and MARC Standards Office Library of Congress. Second Revision: September 17, 2003. https://www.loc.gov/marc/marc-functional-analysis/original_source/analysis.pdf.

This page shows how these functions are supported by the records. The horizontal axis show the strength of the support: something on the left means that support is low so only small portion of the fields support a function are available in the records, something on the right means the support is strength. The bars represents a range of values. The vertical axis shows the number of records having values in the same range.

It is experimental because it turned out, that the the mapping covers about 2000 elements (fields, subfields, indicatiors etc.), however on an average record there are max several hundred elements, which results that even in the best record has about 10-15% of the totality of the elements supporting a given function. So the tool doesn't shows you exact numbers, and the scale is not 0-100 but 0-[best score] which is different for every catalogue.

Subject analysis

Authority analysis

Terms

Serials analysis

These scores are calculated for each continuing resources (type of record (LDR/6) is language material ('a') and bibliographic level (LDR/7) is serial component part ('b'), integrating resource ('i') or serial ('s')).

The calculation is based on a slightly modified version of the method published by Jamie Carlstone in the following paper:

Jamie Carlstone (2017) Scoring the Quality of E-Serials MARC Records Using Java, Serials Review, 43:3-4, pp. 271-277, DOI: 10.1080/00987913.2017.1350525 URL: https://www.tandfonline.com/doi/full/10.1080/00987913.2017.1350525

Thompson—Traill completeness

These scores are the implementation of the following paper:

Kelly Thompson and Stacie Traill (2017) Implementation of the scoring algorithm described in Leveraging Python to improve ebook metadata selection, ingest, and management, Code4Lib Journal, Issue 38, 2017-10-18. http://journal.code4lib.org/articles/12828

Their approach to calculate the quality of ebook records comming from different data sources.

Field frequency distribution

These charts show how the field frequency patterns. Each chart shows a line which is the function of field frequency: on the x axis you can see the subfields ordered by the frequency (how many time a given subfield occured in the whole catalogue). They are ordered by frequency from the most frequent top 1% to the least frequent 1% subfields. The Y axis represents the cumulative occurrence (from 0% to 100%).

This experimental website is part of a research project called Measuring Metadata Quality conducted by Péter Király. You can read more about the research at pkiraly.github.io.

This is an open source project. You can find the code at:

Credits

Thanks for Johann Rolschewski and Phú for their help in collecting the list of published library catalog, Jakob Voß for the Avram specification and for his help in exporting MARC schema to Avram, Carsten Klee for the MARCspec. I would like to thank the early users of the software, Patrick Hochstenbach (Gent), Osma Suominen and Tuomo Virolainen (FNL), Kokas Károly and Bernátsky László (SZTE), Sören Auer and Berrit Genat (TIB), Shelley Doljack, Darsi L Rueda, and Philip E. Schreur (Stanford), Marian Lefferts (CERL), Alex Jahnke and Maike Kittelmann (SUB) who provided data, suggestions or other kinds of feedback, Justin Christoffersen for language assistance. Special thanks to Reinhold Heuvelmann (DNB) for terminological and language suggestions.

I would like to thank the experts I have consulted regarding to subject analysis: Rudolf Ungváry (retired, Hungarian National Library, HU), Gerard Coen (DANS and ISKO-NL, NL), Andreas Ledl (BARTOC and Uni Basel, CH), Anna Kasprzik (ZBW, DE), Jakob Voß (GBV, DE), Uma Balakrishnan (GBV, DE), Yann Y. Nicolas (ABES, FR), Michael Franke-Maier (Freie Universität Berlin, DE), Gerhard Lauer (Uni Basel, CH).