ContainerSystem

ContainerSystem(self, cDir='./containers', mDir='./modulefiles', forceImage=False, prereqs='', threads=8, cache_dir='/Users/gzynda/rgc_cache', force_cache=False, verbose=False)

Class for managing the rgc image cache

Parameters

  • cDir (str): Path to output container directory
  • mDir (str): Path to output module directory
  • forceImage (bool): Option to force the creation of singularity images
  • prereqs (str): string of prerequisite modules separated by ":"
  • threads (int): Number of threads to use for concurrent operations
  • cache_dir (str): Path to rgc cache
  • force_cache (bool): Whether to force overwrite the cache
  • verbose (bool): Whether to enable verbose logging

Attributes

  • system (str): Container system
  • containerDir (str): Path to use for containers
  • moduleDir (str): Path to use for module files
  • forceImage (bool): Force singularity image creation
  • invalid (set): Set of invalid urls
  • valid (set): Set of valid urls
  • images (dict): Path of singularity image or docker url after pulling
  • registry (dict): Registry of origin
  • progs (dict): Set of programs in a container
  • name_tag (dict): (name, tag) tuple of a URL
  • keywords (dict): List of keywords for a container
  • categories (dict): List of categories for a container
  • homepage (dict): Original homepage of software in container
  • description (dict): Description of software in container
  • full_url (dict): Full URL to container in registry
  • blocklist (set): Set of programs to be blocked from being output
  • prog_count (Counter): Occurance count of each program seen
  • lmod_prereqs (list): List of prerequisite modules
  • n_threads (int): Number of threads to use for concurrent operations
  • logger (logging): Class level logger
  • cache_dir (str): Location for metadata cache
  • force_cache (str): Force the regeneration of the metadata cache

validateURL

ContainerSystem.validateURL(self, url, include_libs=False)

Adds url to the self.invalid set when a URL is invalid and self.valid when a URL work.

By default, containers designated as libraries on bio.tools are excluded.

Parameters

  • url (str): Image url used to pull
  • include_libs (bool): Include containers of libraries

Attributes

  • self.valid (set): Where valid URLs are stored
  • self.invalid (set): Where invalid URLs are stored

validateURLs

ContainerSystem.validateURLs(self, url_list, include_libs=False)

Adds url to the self.invalid set and returns False when a URL is invalid

Parameters

  • url_list (list): List of URLs to validate
  • include_libs (bool): Include containers of libraries

pullAll

ContainerSystem.pullAll(self, url_list, delete_old=False)

Uses worker threads to concurrently pull

  • image
  • metadata
  • repository info

for a list of urls.

Parameters

  • url_list (list): List of urls to pul
  • delete_old (bool): Delete old images that are no longer used

pull

ContainerSystem.pull(self, url)

Pulls the following

  • image
  • metadata
  • repository info

Parameters

  • url (str): Image url used to pull

deleteImage

ContainerSystem.deleteImage(self, url)

Deletes a cached image

Parameters

  • url (str): Image url used to pull

scanAll

ContainerSystem.scanAll(self)

Runs self.cachProgs on all containers concurrently with threads

cacheProgs

ContainerSystem.cacheProgs(self, url, force=False)

Crawls all directories on a container's PATH and caches a list of all executable files in

  • self.progs[url]

and counts the global occurance of each program in

  • self.prog_count[prog]

Parameters

  • url (str): Image url used to pull
  • force (bool): Force a re-scan and print results (for debugging only)

getProgs

ContainerSystem.getProgs(self, url, blocklist=True)

Retruns a list of all programs on the path of a url that are not blocked

Parameters

  • url (str): Image url used to pull
  • blocklist (bool): Filter out blocked programs

Returns

list: programs on PATH in container

getAllProgs

ContainerSystem.getAllProgs(self, url)

Returns a list of all programs on the path of url.

This is a shortcut for self.getProgs(url, blaclist=False)

Parameters

  • url (str): Image url used to pull

findCommon

ContainerSystem.findCommon(self, p=25, baseline=[])

Creates a blocklist containing all programs that are in at least p% of the images

  • self.blocklist[url] = set([prog, prog, ...])

Parameters

  • p (int): Percentile of images
  • baesline (list): Exclude all programs from this list of urls

Attributes

  • permitlist (set): Set of programs that are always included when present
  • blocklist (set): Set of programs to be excluded

genModFiles

ContainerSystem.genModFiles(self, pathPrefix, contact_url, modprefix, delete_old)

Generates an Lmod modulefile for every valid image

Parameters

  • url (str): Image url used to pull
  • pathPrefix (str): Prefix to prepend to containerDir (think environment variables)
  • contact_url (list): List of contact urls for reporting issues
  • modprefix (str): Container module files can be tagged with modprefix-tag for easy stratification from native modules
  • delete_old (bool): Delete outdated module files

genLMOD

ContainerSystem.genLMOD(self, url, pathPrefix, contact_url, modprefix='')

Generates an Lmod modulefile based on the cached container.

Parameters

  • url (str): Image url used to pull
  • pathPrefix (str): Prefix to prepend to containerDir (think environment variables)
  • contact_url (list): List of contact urls for reporting issues
  • modprefix (str): Container module files can be identified with modprefix-tag for easy stratification from native modules