Skip to content

richardlehane/siegfried

Repository files navigation

Siegfried

Siegfried is a signature-based file format identification tool, implementing:

  • the National Archives UK's PRONOM file format signatures
  • freedesktop.org's MIME-info file format signatures
  • the Library of Congress's FDD file format signatures (beta).
  • Wikidata (beta).

Version

1.11.1

GoDoc Go Report Card

Usage

Command line

sf file.ext
sf *.ext
sf DIR

Options

sf -csv file.ext | *.ext | DIR             // Output CSV rather than YAML
sf -json file.ext | *.ext | DIR            // Output JSON rather than YAML
sf -droid file.ext | *.ext | DIR           // Output DROID CSV rather than YAML
sf -nr DIR                                 // Don't scan subdirectories
sf -z file.zip | *.ext | DIR               // Decompress and scan zip, tar, gzip, warc, arc
sf -zs gzip,tar file.tar.gz | *.ext | DIR  // Selectively decompress and scan 
sf -hash md5 file.ext | *.ext | DIR        // Calculate md5, sha1, sha256, sha512, or crc hash
sf -sig custom.sig *.ext | DIR             // Use a custom signature file
sf -                                       // Scan stream piped to stdin
sf -name file.ext -                        // Provide filename when scanning stream 
sf -f myfiles.txt                          // Scan list of files and directories
sf -v | -version                           // Display version information
sf -home c:\junk -sig custom.sig file.ext  // Use a custom home directory
sf -serve hostname:port                    // Server mode
sf -throttle 10ms DIR                      // Pause for duration (e.g. 1s) between file scans
sf -multi 256 DIR                          // Scan multiple (e.g. 256) files in parallel 
sf -log [comma-sep opts] file.ext          // Log errors etc. to stderr (default) or stdout
sf -log e,w file.ext | *.ext | DIR         // Log errors and warnings to stderr
sf -log u,o file.ext | *.ext | DIR         // Log unknowns to stdout
sf -log d,s file.ext | *.ext | DIR         // Log debugging and slow messages to stderr
sf -log p,t DIR > results.yaml             // Log progress and time while redirecting results
sf -log fmt/1,c DIR > results.yaml         // Log instances of fmt/1 and chart results
sf -replay -log u -csv results.yaml        // Replay results file, convert to csv, log unknowns
sf -setconf -multi 32 -hash sha1           // Save flag defaults in a config file
sf -setconf -serve :5138 -conf srv.conf    // Save/load named config file with '-conf filename' 

Example

asciicast

Signature files

By default, siegfried uses the latest PRONOM signatures without buffer limits (i.e. it may do full file scans). To use MIME-info or LOC signatures, or to add buffer limits or other customisations, use the roy tool to build your own signature file.

Install

With go installed:

go install github.com/richardlehane/siegfried/cmd/sf@latest

sf -update

Or, without go installed:

Win:

Download a pre-built binary from the releases page. Unzip to a location in your system path. Then run:

sf -update

Mac Homebrew (or Linuxbrew):

brew install mistydemeo/digipres/siegfried

Or, for the most recent updates, you can install from this fork:

brew install richardlehane/digipres/siegfried

Ubuntu/Debian (64 bit):

curl -sL "http://keyserver.ubuntu.com/pks/lookup?op=get&search=0x20F802FE798E6857" | gpg --dearmor | sudo tee /usr/share/keyrings/siegfried-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/siegfried-archive-keyring.gpg] https://www.itforarchivists.com/ buster main" | sudo tee -a /etc/apt/sources.list.d/siegfried.list
sudo apt-get update && sudo apt-get install siegfried

FreeBSD:

pkg install siegfried

Arch Linux:

git clone https://aur.archlinux.org/siegfried.git
cd siegfried
makepkg -si

Changes

v1.11.1 (2024-06-28)

Added

  • WASM build. See wasm/README.md for more details. Feature sponsored by Archives New Zealand. Inspired by Andy Jackson
  • -sym flag enables following symbolic links to files during scanning. Requested by Max Moser

Changed

  • XDG_DATA_DIRS checked when determining siegfried home location. Requested by Michał Górny
  • Windows 7 build on releases page (built with go 1.20). Requested by Aleksandr Sergeev
  • update PRONOM to v118
  • update LOC to 2024-06-14

Fixed

  • zips piped into STDIN are decompressed with -z flag. Reported by Max Moser
  • panics from OS calls in init functions. Reported by Jürgen Enge

v1.11.0 (2023-12-17)

Added

Changed

  • default location for siegfried HOME now follows XDG Base Directory Specification; see #216. Implemented by Bernhard Hampel-Waffenthal
  • siegfried prints version before erroring with failed signature load; requested by Ross Spencer
  • update PRONOM to v116
  • update LOC to 2023-12-14
  • update tika-mimetypes to v3.0.0-BETA
  • update freedesktop.org to v2.4

Fixed

  • panic on malformed zip file during container matching; reported by James Mooney

v1.10.2 (2023-12-17)

Changed

  • update PRONOM to v116
  • update LOC to 2023-12-14
  • update tika-mimetypes to v3.0.0-BETA
  • update freedesktop.org to v2.4

v1.10.1 (2023-04-24)

Fixed

  • glob expansion now only on Windows & when no explicit path match. Implemented by Bernhard Hampel-Waffenthal
  • compression algorithm for debian packages changed back to xz. Implemented by Paul Millar
  • -multi droid setting returned empty results when priority lists contained self-references. See #218
  • CGO disabled for debian package and linux binaries. See #219

v1.10.0 (2023-03-25)

Added

  • format classification included as "class" field in PRONOM results. Requested by Robin François. Implemented by Ross Spencer
  • -noclass flag added to roy build command. Use this flag to build signatures that omit the new "class" field from results.
  • glob paths can be used in place of file or directory paths for identification (e.g. sf *.jpg). Implemented by Ross Spencer
  • -multi droid setting for roy build command. Applies priorities after rather than during identification for more DROID-like results. Reported by David Clipsham
  • /update command for server mode. Requested by Luis Faria

Changed

  • new algorithm for dynamic multi-sequence matching for improved wildcard performance
  • update PRONOM to v111
  • update LOC to 2023-01-27
  • update tika-mimetypes to v2.7.0
  • minimum go version to build siegfried is now 1.18

Fixed

  • archivematica extensions built into wikidata signatures. Reported by Ross Spencer
  • trailing slash for folder paths in URI field in droid output. Reported by Philipp Wittwer
  • crash when using sf -replay with droid output

See the CHANGELOG for the full history.

Rights

Copyright 2024 Richard Lehane, Ross Spencer

Licensed under the Apache License, Version 2.0

Announcements

Join the Google Group for updates, signature releases, and help.

Contributing

Like siegfried and want to get involved in its development? That'd be wonderful! There are some notes on the wiki to get you started, and please get in touch.

Thanks

Thanks TNA for http://www.nationalarchives.gov.uk/pronom/ and http://www.nationalarchives.gov.uk/information-management/projects-and-work/droid.htm

Thanks Ross for https://github.com/exponential-decay/skeleton-test-suite-generator and http://exponentialdecay.co.uk/sd/index.htm, both are very handy!

Thanks Misty for the brew and ubuntu packaging

Thanks Steffen for the FreeBSD and Arch Linux packaging