name of the tool and version: FITS, version 0.8.6 from may 2015
date of the test: july 2015
used system: Windows 7
Use Case: check files and folders and have FITS-xml as the output in a file
User experience: Fits does not contain any GUI and can only be used as a command-line-tool. For some librarians and archivists this might not be the usual way to deal with tools.
If the commands are known and well-documented, the use of FITS is feasible, though (please check the Fits manual scrolling down this page). For users who are not used to Command-Line-Tools (like me), it might be cumbersome to type in all the paths or copy and paste the paths into the black window. By the way: STRG+V does not work, but you can paste with the help of the mouse and the right-click.
Furthermore, for me personally it has proven to be better to work with the full paths to the files or folders I want to examine, although FITS also bears the possibility to change into the folder you want to examine and then have to type in much shorter commands to the to examined files or folders (relative paths). This has proven to be error-prone if you are not used to this kind of work.
The FITS-xml output is okay to understand, but not really perfect. There might be a nice XSLT-Sheet to bring the findings in a nicer and more user-friendly order and omit the information which does not seem important or understandable. More about this scrolling down to "enhancement ideas".
- Download FITS from the Harvard Website
- Unzip the folder
- There is no GUI. In Windows you get to the Command Line (CMD) via clicking the start-button, then type "cmd" in the search line and enter (Screenshot find CMD)
- FITS is java based and needs at least java version 1.6. You can check your currently installed java via typing "java -version" into the CMD (Screenshot finding out which java version your computer currently uses)
- Now you have to navigate to your fits-Directory in the CMD. You can change the active drive via typing "D:". Typing "dir" will always show you what othter folders and files exist in the currently chosen directory and "cd" means "change directory" (Screenshot navigation to the fits-folder)
- In this example, fits is here:
fits.bat -h: show all the possible commands (Screenshot all Fits commands)
fits.bat -i <arg>: Give Fits a file to work with. Type in the whole path like this:
fits.bat -i C:\output\PDF\PDF1.pdfIf you leave it like this, the output will be thrown into the CMD window. If you prefer a text-output, this can be done as well like this:
fits.bat -o <arg>:Saves the output in a textfile. An example would be
fits.bat -i C:\output\PDF\PDF1.pdf
-o text.txt, even better would be
text.xml, as the output is in FITS-XML.
- There might be "
log4j-Warnings" in the CMD. These can be ignored for now.
fits.bat -r -i <arg> -o <arg>: You can also check folders. FITS checks each file in the folder. There has to be a folder as an output-folder then and this folder must already exist (create it beforehand) in the FITS-Folder. An example is:
fits.bat -i C:\Users\Friese Yvonne\FITS -o log
fits.bat -v:show the version number of the currently used FITS version
fits.bat -x:transforms the FITS-xml into the Standard-XML schema for the technical metadata of the format
fits.bat -xc:The output is done in FITS-xml AND Standard-xml
- Most of the commands can be combined
An example for a FITS output can be found in the attachement.
The FITS-xml Output contains of the following parts:
the usual suspect XML-header
Contains the name and the mimetype of the format + all version information of the used tools to identify the format. Sometimes there can be conflicts (=different opinions about the format).
Some technical metadata as the date of the last modification, the file name and the file path are output.
This deals with the file format validation. Usually, JHOVE is used. Wellformedness and Validity are checked and output with either true or false. If anything is false, the JHOVE error message are output as well.
Contains some extracted metadata for the file. E. g. for JPEG2000 there are quite a few chunks of information about the used ICC Profile, see the attachement with the chunk of the metadata output of an JPEG2000 file.
Depending on the file type, the metadata differs a lot. The tool which has extracted the metadata also is different depending on the type, it can be Exiftool (with image and video) and JHOVE (PDF and TIFF). I have tested some different file formats:
There is no extracted metadata. Example-xml epub
Extracted metadata is "isRightsManaged" and "isProtected". Example-xml excel
Metadata is "charset" (in this Case UTF-8), markupBasis (XML) and markupBasisVersion (1.0). Example-xml xml
Metadata is "charset" and "markupBasis". Exampe-xml html
Has not metadata for this kind of mimetype ("application/x-iso9660-image") Example-xml Iso-image
Metadata: compressionScheme, imageWidth, imageHeight and colorSpace. Example-xml JPEG2000
Metadata: bitRate, sampleRate, channels and sampleRate Example xml Mp3
Metadata: title, author, author, pageCount, isTagged, hasOutline, hasAnnotations, isRightsmanaged, isProtected and hasForms Example-xml pdf
Metadata: Quite a few, Tags like byteOrder, colorSpace etc. Example-xml tiff
Metadata: duration, frameRate, audioSampleRate, channels, imageWidth, imageHeight, Example-xml video
The information is output which available tools within FITS were actually used and how much time the performance has needed.
It would be nice to have an XSLT-Sheet to transform the xml- Output into html. Having to create this entirely on your own proves difficult without having to alter the original output xml at least once to refer to the XSLT-sheet.
So far, there is an obstacle: You have to manually add a 2nd line into the Output - xml that refers to the used XSLT Sheet, like this:
<?xml-stylesheet type="text/xsl" href="FitsCustomized.xsl"?>
Apart from these obstacles it is possible to build a Stylesheet, which surely can be enhanced via some if-clauses (e. g. if JHOVE has decided that the file is not valid, it could output the error messages as well).
There is ain this attachement, an (altered, because of the 2nd line and the long fits-node) and a screenshot of the output:
Comments on the output and the xslt-Stylesheet: The Number of Pages will only be extracted for certain file formats, e. g. PDF and might better be omitted. Furthermore, the Well-formedness and validity is only tested for file formats which have a JHOVE modul. So with some files, some rows might stay empty.