FileFerret design

From HypertWiki
Jump to navigation Jump to search

Woozle: Projects: FileFerret: design


This page is for FileFerret design details, as I create them.


FileFerret must know how to access a file from different machines. This complicates things a little bit. (It is necessary so that FileFerret can manage accessible files across all machines, not just machines with a FileFerret agent installed, thus maximizing its ability to track files with a minimum of deployment.)

FileFerret also has the idea of an "elemental" file ("Filament"), which is an abstraction of a particular data pattern that may be contained by multiple Files on multiple machines (e.g. backup copies).

A Filament can correspond to zero or more Files. A File is contained within a hierarchy of Folders; at some point in the hierarchy of folders must be a Location which can be Mapped for each machine in order for that machine to be able to access it. The file/folder objects contain only relative path information; the mapping gives an absolute path for the top level of the hierarchy, to which the relative path is added to get the full local (i.e. machine-specific) filespec.

From the user's point of view, this should all be hidden as much as possible by assuming the mappings available to the user's machine, but I haven't worked out a good conceptual system for handling this; so far the technique has been to provide a set of utility routines for translating between local filespecs, Files, and Filaments.

Data Concepts

  • A File corresponds to the traditional operating system concept of a file, extended somewhat by:
    • the ability to locate files in non-local and offline media
    • the need for different machines to be able to locate the same file from the same information
  • A Filament (portmanteau for "elemental file") corresponds to a particular ordered set of data values which in turn may correspond to zero or more actual files (if more than one file corresponds to the same filament, then those files contain identical data, though they do not necessarily have the same filename, timestamps, security attributes, etc.). Any filament may also have the following attributes in FileFerret's database:
    • a description
    • one or more topics
  • A Folder corresponds to the traditional operating system concept of a folder or directory, i.e. it is a named location which may contain zero or more files, and also has a number of OS-dependent attributes such as creation time, access permissions, etc. Any given folder may or may not be reachable by any particular machine; if a folder is reachable by a particular machine, however, it is assumed that all folders underneath it are also reachable by that machine (although a given folder's contents may be inaccessible due to security restrictions).
  • A Location is a record of the highest-level folders which are accessible to any particular class of machine. It consists of:
    • ID unique identifier, referred to by Mappings
    • ID_Folder: ID of highest-level Folder which is being pinned down
    Locations will be entirely human-maintained in early versions of FileFerret; later on it may be possible to have FileFerret detect known folders which have been moved to other filespecs
    Locations generally correspond to the root folders of storage media (including fixed disks) and folders shared over the network.
    There are currently two classes of machine, from a file-access point of view:
    • "root level" machines can access the entire file structure (including local file structures) if they have the right user/password
    • "share level" machines can only access explicitly shared folders (via Samba or other network file-sharing protocols), typically due to limitations in the network protocols available for them to use.
  • A Mapping represents to a particular machine's view of a Location, i.e. how to access it. Mappings generally correspond to storage media reading devices (including fixed disks) and network shares. Each mapping consists of:
    • ID: unique identifier for easy reference
    • ID_Location: which location is being mapped
    • ID_Machine: which machine this location is being mapped for
    • FileSpec: how to access the location on the specified machine; includes "protocol://user@servername:password" where applicable
    • isPrimary: is Machine primarily responsible for maintaining the record of this Location (i.e. the mapped Folder and all files under it)?
    Note: Location x Machine must be unique

Notes for Next Version

  • Is there any real reason for Locations to have its own table? If the problem is just that we want to be able to choose from a shorter list than "all folders", wouldn't a flag be good enough? (Filtering on ID_Parent IS NULL wouldn't be sufficient in cases where a share is below the root-folder.)