Data Standards FAQ
Q: What are data standards?
A: Data standards promote the consistent recording of information and are fundamental to the efficient exchange of information. They provide the shared rules for data representation, format, definition, structure, tagging, transmission, so that the data entered into a system can be reliably read, sorted, indexed, retrieved, communicated between systems. They help protect the long-term value of data.
Q: What is metadata and why is it important?
A: Metadata is data about data; this includes data about information resources. The identification and management of metadata are important to facilitate access to wide ranges of materials over networks. The use of data standards improves metadata quality and facilitates metadata sharing across platforms and communities. Since the 1990s we have seen the proliferation of data standards tailored for different communities and uses (examples include Dublin Core, VRA Core, CDWA, Darwin Core, and others). The future will see further development of the Semantic Web; ways to link and discover these data sets on the Internet through the use of RDF (Resource Description Framework), LOD (Linked Open Data), and other transmission standards.
Q: Are there different types of data standards?
A: Yes, there are at least four: data structure, data content, data value, and data communication. Here lists below definitions of each type of data standard and known examples in the Cultural Heritage Community.
Data Structure Standards
Data structure standards define a record and the relationship of the fields within it.
It defines the fields for describing works of visual culture as well as the images which document them. VRA Core is uniquely able to capture descriptive information about both the work and the image, and indicate relationships between the two.
MARC (MAchine Readable Cataloging)
It is a hybrid of a data structure and an information exchange (transmission) standard used widely in the libraries. It provides the mechanism by which computers (MAchine) exchange, use, and interpret bibliographic information, and its data elements make up the foundation of most library catalogs used today. MARC became USMARC in the 1980s and MARC 21 in the late 1990s.
It is a collections management standard developed by the Museum Documentation Association (MDA). It provides what are called “procedures” for documenting museum collections, including a procedure for object creation.
Encoded Archival Description (EAD)
XML markup designed for encoding archival finding aids.
Data Content Standards
These rules instruct how data should be entered. They are sometimes called cataloging rules. They can include rules on collecting and formatting of data values as well as the authority files (data value standards) that should be consulted.
Cataloging Cultural Objects (CCO)
Cataloging Cultural Objects (CCO) provides standards that guide the choice of terms and define the order, syntax, and form in which data values should be entered into a data structure for documenting works of art, architecture, cultural artifacts, and images of these things. Although the guide is not about system design, it may also be useful to system designers who need to understand the nature and form of cultural object information. CCO helps create shareable metadata; build common practice for museums, digital libraries, and archives; complement diverse data structure and value standards in any system; and improve discovery and access of cultural works.
Describing Archives: A Content Standard (DACS)
It provides a set of rules for describing archives, personal papers, and manuscript collections, and can be applied to all material types. It is the U.S. implementation of international standards (i.e., ISAD[G] and ISAAR[CPF]) for the description of archival materials and their creators.
Resource Description and Access (RDA)
It is a standard for descriptive cataloging initially released in June 2010,[1] providing instructions and guidelines on formulating bibliographic data. Intended for use by libraries and other cultural organizations such as museums and archives, RDA is the successor to Anglo-American Cataloguing Rules, Second Edition (AACR2).
Data Value Standards
Data value standards usually take the form of controlled vocabularies, including subject specific-terminologies and authorities for names and places.
The Getty Vocabularies contain structured terminology for art, architecture, decorative arts, archival materials, visual surrogates, conservation, and bibliographic materials. Compliant with international standards, they provide authoritative information for catalogers, researchers, and data providers in multiple languages that include English, Dutch, Spanish, Chinese, and more. The vocabularies grow through contributions from institutions and projects comprising the expert user community. In the linked, open environments, the Getty Vocabularies also provide Linked Open Data services.
The Getty Vocabularies include:
The Art & Architecture Thesaurus (AAT): it is a thesaurus containing generic terms, dates, relationships, sources, and notes for work types, roles, materials, styles, cultures, techniques, and other concepts related to art, architecture, and other cultural heritage.
The Getty Thesaurus of Geographic Names (TGN): it focuses on places relevant to art, architecture, and related disciplines, recording names, relationships, place types, dates, notes, and coordinates for current and historical cities, nations, empires, archaeological sites, lost settlements, and physical features; it is a thesaurus, may be linked to GIS and maps.
The Union List of Artist Names (ULAN): it contains names, relationships, notes, sources, and biographical information for artists, architects, firms, studios, repositories, patrons, sitters, and other individuals and corporate bodies, both named and anonymous.
The Cultural Objects Name Authority (CONA): it compiles titles/names and other metadata for works of art, architecture, and other cultural works, current and historical, documented as items or in groups, whether works are extant, destroyed, or never built; in development, may be used to record works depicted in visual surrogates and for other purposes
The Categories for the Description of Works of Art (CDWA): it is a set of guidelines and cataloging rules for the description of art, architecture, and other cultural works. CDWA advises which fields and categories in art metadata are appropriate for the use of vocabularies.
The Getty Iconography Authority (IA): it covers topics relevant to art, architecture, and related disciplines; includes multilingual proper names, relationships, and dates for iconographical narratives, religious or fictional characters, themes, historical events, and named literary works and performing arts.
Library of Congress Linked Data Service
Library of Congress Linked Data Service provides both interactive and machine access to commonly used ontologies, controlled vocabularies, and other lists for bibliographic description. Most commonly used in the cultural heritage community includes Library Congress Subject Headings (LCSH), LC Name Authority Files (LCNAF), LC Genre/Form Terms (LCGFT), and more.
It collects structured data from all major vocabulary authorities that include both the Getty Vocabularies, the Library Congress, and many more. It powers Wikipedia and other Wikimedia projects. A google search of certain concepts can usually lead to a Wikipedia page where a “Wikidata Item”, the vocabulary page can be reached from the sidebar underneath the “Tools”. It is a good supplementary source for vocabularies not included in major authorities.
Q: What are record interchange/transmission standards? What are XML and RDF?
A: The success of shared cataloging is due in part to the adoption of cataloging standards but is also attributable to the development of effective data communication/record interchange standards and protocols. These standards define the technical framework for exchanging information work between systems and functions either within a single institution or among systems in multiple institutions. XML and RDF are the most relevant and popular ones in use in the Cultural Heritage community.
A: VRA Core 4.0 is the first version to have an XML schema written for it. XML stands for eXtensible Markup Language. It is a markup language much like HTML but was designed to store and transport data. It is supposed to be both machine-readable and human-readable. VRA Core XML data could also be converted to RDF, the triple structure using the XSLT stylesheet.
A: RDF stands for Resource Description Framework [link to Web-enabled Data and New Initiatives page]. It is a framework for describing resources on the web. The subject–predicate–object triple structure is one of the technologies that the Linked Data is built upon. There are many forms of RDF serialization formats.
Members of VRA Core Oversight Committee (now a part of VRA Cataloging and Metadata Standards Committee) developed the VRA RDF Ontology in response to Linked Data Initiatives. For the project detail, please check its GitHub site at https://github.com/mixterj/VRA-RDF-Project
Q: What is Linked Open Data?
A: An initiative to make data sets openly accessible and able to communicate with one another on the web. A list of resources on web-enabled data (RDF/LOD, IIF) is available in the MyVRA Member Portal.
Q: How do data value standards differ from classification schemes?
A: Classification systems are sometimes treated as data value standards because their elements can sometimes be used as values. However, in the visual resources community, the use of classification tends to be collection-specific as well as dictated by the changing needs of users. Some of these classification terms have migrated from analog filing schemes to digital information. Museum classification may reflect departments or collection divisions. While some standards have a classification element, VRA Core does not, rather allowing it to be determined locally. Cataloging Cultural Objects, Chapter 7 Class is dedicated to discussing topics around cataloging Class information.
Q: Can multiple data standards be used within one institution?
A: Yes. The process that would enable this use would entail the use of mapping; using an existing standard’s “crosswalk”, or developing a local crosswalk, which identifies fields that hold equivalent data values across multiple standards. Dublin Core is frequently used as a sort of “lowest common denominator” in this scenario, as it addresses the core asset and descriptive information of who, what, where, when. This allows common searching and discovery of assets in an institutional system or data asset management system (DAM); richer, more detailed standards may then also be deployed at the collection or discipline-specific department level as well. As well, for describing different materials, eg. biological specimen vs visual resource materials, the institution usually has to apply different standards. A bibliography of resources on crosswalks, mapping, database design, and data output is available through the MyVRA Member Portal.
Q: My collection contains surrogates of objects which are neither art nor architecture; they include such items or topics as musical instruments, geography, natural science, etc. Will VRA Core work for this use?
A: VRA Core is a data standard for the description of works of visual culture as well as the images which document them. Because of the needs of teaching collections based on image surrogates, VRA Core is uniquely able to capture descriptive information about both the work and the image, and indicate relationships between the two. VRA Core’s data model that captures the 1-to-many relationship between any two from collection, work and image could be applied for describing objects from many disciplines. Find more on VRA Core and its data model online.
Q: What do I do if I need more fields than are covered by the VRA Core (or another standard)?
A: As well as different types of data standards, there are also different types of data; descriptive, administrative, and technical. Most data standards will not cover all local administrative needs; but rather identify those essential elements which describe the data common to most collections and which are shareable. Most curators of visual resources collections will decide to incorporate additional fields in their local databases in order to adequately describe and manage their holdings. The use of various commonly available guidelines including the Categories for the Description of Works of Art (CDWA), CCO (Cataloging Cultural Objects), MARC, and the CIDOC categories is recommended for guidance when developing local fields.
Q: Is the current VRA Core 4.0 definitive or will it change?
A: VRA Core exists in versions, 1.0 through 4.0, which is the current version. Core 4.0 is the first version to have an XML schema written for it. VRA Core is developed and maintained by the VRA Core Oversight Committee (currently a part of VRA Cataloging and Metadata Standards Committee); any changes which occur would receive a version number update. For questions and inquiries about the Core standard, please email the committee at vra-cams@googlegroups.com.
Additional Sources
Background Questions
Q: What are standards?
A: Standards are mutually agreed-upon statements that help control an action or product.
Q: Are there different types of standards?
A: Yes, they include technical standards, conventions, and guidelines. The most exacting standards, technical standards, are codified forms of common practice that yield consistent results. Conventions are similar to technical standards; however, they are intended to accommodate variation in local practice. Guidelines are criteria against which products, systems, or programs can be measured or evaluated.
Q: Where do standards come from?
A: Standards originate in various sectors and exist as both proprietary and open standards. The commercial sector develops standards for such products as computer hardware. Organizations such as the International Organization for Standardization (ISO), the American Standards Institute (ANSI), and the British Standards Organization (BSO) are official standard-makers. Professional organizations such as the Visual Resources Association (VRA), the Art Libraries Society (ARLIS/NA), etc. draft standards in an attempt to meet the needs of their constituencies. Standards are also developed in conjunction with collaborative projects such as the core record developed by Colum Hourihane for the Van Eyck project.
Acknowledgment
The original FAQ was first posted in 1996 with many of the questions excerpted from the papers which Eric Childress, Elisa Lanzi and Roy McKeown presented at the session Faces and Names: Standards for Navigating Visual Databases, at the XXIX International Congress of the History of Art: VRA Satellite Meeting (1996, Amsterdam). The basic concepts remain the same; the page was updated in 2020 to reflect new standards and transmission mechanisms.