Skip to content

Maintainer's Sheet

Matthieu Gautier edited this page Jan 20, 2021 · 2 revisions

Description

This page list all assumptions, requirements, features, components, aspects, etc, related to libzim that its maintainer must know about. When making or reviewing a non-local enhancement or change to libzim you should scan this list and ask yourself whether your change affects any of its items. Of course, this list must be kept up-to-date.

Requirements

  • Cross-platformness (note: a limited subset of functionality may be exempt from this requirement)

    • Supported platforms:
      • Linux
      • Android
      • iOS
      • MacOS
      • Windows
      • FreeBSD (?)
    • Cross-platformness of the source code
    • Cross-platformness of ZIM archives
  • Backward compatibility

    • Read ZIM archives of previous versions
  • Forward compatibility

    • Does ZIM archives created can be read by previous version of library ?
    • If forward compatibility is broken, the major version must be changed.
  • Thread safety

    • At reading (libzim can be called from several threads. Critical section must be protected. For now libzim doesn't launch any thread at reading.
    • At creation. libzim creates threads to parallelize writing. Some (user) code will be called from different threads.
  • Bindings to high level programming languages

    • Python
    • node
  • Low memory usage. Libzim (reading) must be usable on platform with few memory available. (Their is no strong specification about the actual memory usage).

  • API/ABI break libzim code should avoid introduce api/abi break. If so, it must be clearly identified (major version number).

Non-Requirements

There is no commitment to support items in this section

  • Handling of ZIM archives created by an official version of libzim that was declared buggy or obsolete

Public features

  • Access by path/url order
  • Access by title order
  • Access by efficient/internal order
  • Support for direct access to (some) items
  • Support for multipart ZIM files
  • Compression
    • LZMA
    • Zstd
  • Search
    • by title (suggestion mode)
    • full text
  • ZIM namespaces

Internal/implementation-specific features

  • Caching of clusters
  • Caching of dirents
  • Usage of mmap
  • Lazy/incremental decompression of clusters

ZIM file content & layout

  • Table of dirent pointers: required; can be anywhere in the archive; currently created at the end (before the checksum)
  • Table of title ordered listing: required; can be anywhere in the archive
  • Table of cluster pointers: required; can be anywhere in the archive
  • MIME list: required; at a fixed location, immediately following the ZIM header. Can technically be empty. But item MUST have a mimetype so if there is content, mimelist is not empty.
  • Checksum: optional; if present, is at the very end of the archive (16 bytes long)
  • Clusters: technically optional (if no content); can be everywhere in the archive (offsets of clusters are stored in "Table of cluster pointers"). Currently created contiguously and starting at offset 1024.
  • Dirents: technically optional (if no entry); can be everywhere in the archive (offsets of dirents are stored in "Table of dirents pointers")

Unsorted

Current assumptions

  • MIME list size <= 944 (the space between the end of the ZIM header and the beginning of the cluster data)

Non-assumptions

This section lists things that one might wrongly assume about ZIM archives or libzim

  • Dirents in a ZIM archive are not necessarily laid out contiguously and/or in the same order as their pointers in the dirent table.
  • Clusters in a ZIM archive are not necessarily laid out contiguously and/or in the same order as their pointers in the cluster table.