-> Deutsche Version

A research project for archiving AV materials. Led by Stiftung Deutsche Kinemathek (SDK) and Zuse Institute Berlin (ZIB).


Problem description:

During the digitization of film material, various image and sound objects are produced, sometimes even with a delay of several months. It is not always possible or feasible to combine these items into an Archival Information Package (OAIS AIP1) at a given time. Nevertheless, the data volumes arriving at different times should be continuously transferred to long-term archiving (LTA).

Proposed approach:

Persistent identifiers (PID) will be used for all archival packages on the levels work, version and data object to provide the respective references. In this way, clear relationships of the data packages to each other and to a superordinate are possible. In each AIP the metadata (according to FIAF D7) Manifestation/Item Physical Description as well as the core metadata for the identification of the work are represented in form of the minimum set EN 15744. The metadata will be searchable via a user interface.

As part of the long-term archiving strategy of the SDK -ZIB project, the metadata will be automatically extracted from the SDK internal Adlib database and written to a metadata file in METS format, which comprehensively describes the data package. METS is the established standard in the long-term archiving community to capture structures and descriptive information.

The METS file serves as a container in which the metadata, which may be in different formats, can be embedded.

Each METS container in a data package contains perspectively

  • the checksums collected during data transfer at file level
  • the minimum set standard EN 15744 for the identification of the filmographic work
  • the metadata Manifestation/Item Physical Description (according to FIAF D7) for the respective data package and
  • Persistent identifiers from the PID system

This approach achieves two objectives:

Persistent identifiers make it possible to transfer individual parts of e.g. a restoration event to the long-term archive without losing the relationships between the individual parts and to the superordinate structure. The PID system is not operated in a proprietary system within the institution, but outside the archive on the web. The use of established standards and an openness to other identifier systems promotes broad acceptance in the community. Through an identifier system, the work, version or technical metadata are also visible to the outside world.

A user interface allows research in the metadata on all recorded levels. A PID system with standardised core metadata has great potential to become a kind of union catalogue of institutions collecting AV materials. The search on the interface of the PID system then leads, for example, to the much more extensive metadata of the SDK database on its website. The relation of the metadata in the PID system to the data packages in the long-term archive and to the SDK database are preserved by the issued persistent identifiers. Using this reference, it will also be possible to request individual parts of a publication event from the digital long-term archive.

Tasks:

1: Metadata of the Identifier System.

The EN 15744 standard is used for the basic identification of audiovisual works on the work level (max. 15 fields). The minimum set defines the data elements that enable interoperability between multilingual cinematographic catalogues. So far there is no standardized XML implementation of EN 15744 available. At present institutions have been developing their own XML schemas for EN 15744 or using elements of EN 15907 in their own exchange schemas.

The aim is to represent the metadata set EN 15744 in a standardized XML schema in order to be embedded it in a METS. For this purpose, EN 15744 is implemented in EBUCore and serialized in the EBUCore XML schema.

EBUCore is a widespread, active and accepted international metadata standard for media assets. It is therefore very well suited as a target format for technical as well as decriptive metadata to be stored with the archive packages. However, since the filmographic data from EN 15744 is an uncomplicated and small set, it can also be fitted in EBUCore. EN 15744 is implemented in EBUCore according to the specifications and is not changed. An additional level "version" is added to the metadata set, which is extracted from the manifestation fields in the SDK Adlib system. The version can be used to assign data objects to a manifestation. The work, the version and the data object each receive a PID.

2: Building a test system for research

Based on the handle system a test data set with the metadata fields defined in task 1 is developed.

3: Interface PID system and Adlib database:

When works and corresponding sub parts are created in the SDK database, identifiers are automatically generated via an interface to the PID system and transferred to Adlib. The associated descriptive metadata is automatically extracted from Adlib and linked to the identifiers in the PID system.2



1 Archival Information Package, An archive package of content and metadata in LTA. The content may be an image, for example, while the metadata provides further information about the object.

2 Both Adlib and all common PID systems have well-documented interfaces that allow metadata to be exchanged automatically.

  • Keine Stichwörter