Skip to content
Holly Becker edited this page Mar 31, 2017 · 3 revisions

The METS reader/writer (metsrw) is a library to make it easier to work with METS XML files in Python.

Table of Contents

Goals

Design

  • Easy to use API
  • Abstract away the details of XML
  • Read and write a METS file
  • Validate a METS file against a METS profile
  • Works with METS files from a variety of sources
  • Plugins to work with other standards inside METS - eg PREMIS, DublinCore, etc

Technical

  • Python 2 & 3 support
  • Fully tested
  • Modular
  • Documented

Use Cases & Proposed API

Please add and update use cases! Please suggest use cases for removal if you think that's a bad idea! Please add ideas of what an API for a usecase could look like!

Creation

Read a METS file

mets = metsrw.METSDocument.fromfile('path/to/file')  # Reads a file
mets = metsrw.METSDocument.fromstring('<mets document>')  # Parses a string
mets = metsrw.METSDocument.fromtree(lxml.ElementTree)  # Parses an lxml.Element or lxml.ElementTree

Create a new METS file

mets = metsrw.METSDocument()

Search

Note: Find == return one file; filter == return multiple files

Find a file by FILEID, GROUPID, FLocat@href, structMap/div@LABEL

Filter files by fileGrp@USE, structMap/div@TYPE

Find/filter dmdSec/techMD/digiprovMD/sourceMD/rightsMD by tag attributes (eg digiprovMD with DC metadata, the PREMIS:OBJECT, all rightsMDs)

Find/filter objects by above

Create & Update

Add a dmdSec with DublinCore metadata

Add a dmdSec with other metadata

Add multiple dmdSecs for the same object

Replace a dmdSec with an updated one and version the change

Add an amdSec

Add multiple amdSecs for the same object

Add a techMD/digiprovMD/sourceMD/rightsMD

Add multiple techMD/digiprovMD/sourceMD/rightsMD for the same object

Add techMD/digiprovMD/sourceMD/rightsMD to a specific amdSec if multiple

Replace a techMD/digiprovMD/sourceMD/rightsMD with an updated one and version the change

Add a techMD/digiprovMD/sourceMD/rightsMD containing PREMIS metadata (event, agent, object etc)

Add a mdWrap or mdRef inside a techMD/digiprovMD/sourceMD/rightsMD

Add a new object

Add a new object with children

Set an object's: fileSec href, USE & CHECKSUM; structMap LABEL & TYPE; FILEID; GROUPID

Sensibly derive the above from a subset of the information

Set where DMDIDs and ADMIDs appear (structMap vs fileSec)

Define a structMap

Define a second structMap with a different structure

Define a second structMap with the same structure but different labels

Validation

Specify a METS profile to affect parsing & validation

Add a new METS profile

Validate a METS file against a profile

Output

Write a METS file

mets.write('path/to/output')  # Writes to a file
mets.tofile('path/to/output')  # Writes to a file
mets.tostring()  # Outputs a string
mets.serialize()  # Outputs lxml.Element
mets.totree() # Outputs lxml.Element or lxml.ElementTree

Other standards

METS can wrap other standards, notably PREMIS & DublinCore. Since this is the METS reader/writer, these should be kept as optional plugins and used as needed.

Possible external libraries: