FileCache

class hybrid_learning.datasets.caching.FileCache(cache_root=None)[source]

Bases: Cache, ABC

Base class to cache objects as files under a cache folder. An implementation needs to set the FILE_ENDING and implement the object type specific put_file() and load_file() methods. Mind that writing to the files is not multiprocess save, so ensure no objects in cache are overwritten while other processes are reading from cache.

The descriptors are used to create the filenames by appending the FILE_ENDING.

Public Data Attributes:

FILE_ENDING

The file ending to append to descriptors to get the file path.

Public Methods:

put(descriptor, obj)

Store obj under the cache root using put_file().

load(descriptor)

Load object from file descriptor + FILE_ENDING under cache root.

clear()

Remove all files from cache root.

descriptors()

Provide paths of all cached files with ending stripped and relative to cache root.

descriptor_to_fp(descriptor)

Return the file path of the cache file for a given descriptor.

put_file(filepath, obj)

Save put obj under filepath.

load_file(filepath)

Load object from filepath.

Inherited from : py: class:Cache

put(descriptor, obj)

Store obj under the cache root using put_file().

load(descriptor)

Load object from file descriptor + FILE_ENDING under cache root.

put_batch(descriptors, objs)

Store a batch of objs in this cache using according descriptors.

load_batch(descriptors[, return_none_if])

Load a batch of objects.

clear()

Remove all files from cache root.

descriptors()

Provide paths of all cached files with ending stripped and relative to cache root.

as_dict()

Return a dict with all cached descriptors and objects.

wrap(getitem[, descriptor_map])

Add this cache to the deterministic function getitem (which should have no side effects).

Special Methods:

__init__([cache_root])

Init.

__repr__()

Return repr(self).

Inherited from : py: class:Cache

__repr__()

Return repr(self).

__add__(other)

Return a (cascaded) cache which will first lookup self then other with default sync mode.

__radd__(other)

Return a (cascaded) cache which will first lookup other then self with default sync mode.


Parameters

cache_root (str) –

__init__(cache_root=None)[source]

Init.

Parameters

cache_root (Optional[str]) – see cache_root

__repr__()[source]

Return repr(self).

Return type

str

clear()[source]

Remove all files from cache root.

Warning

This also removes files which were not created by this cache handle.

descriptor_to_fp(descriptor)[source]

Return the file path of the cache file for a given descriptor.

Parameters

descriptor (str) –

Return type

str

descriptors()[source]

Provide paths of all cached files with ending stripped and relative to cache root. These can be used as descriptors for accessing the cached files via load(). The paths are given as normed paths using os.path.normpath().

Return type

Iterable

load(descriptor)[source]

Load object from file descriptor + FILE_ENDING under cache root. Return None if file is not in cache.

Parameters

descriptor (str) – The (unique) file name to use without the FILE_ENDING; may also be a file path relative to the cache_root

Return type

Optional[Tensor]

abstract load_file(filepath)[source]

Load object from filepath.

Parameters

filepath (str) –

Return type

Optional[Any]

put(descriptor, obj)[source]

Store obj under the cache root using put_file(). The file name is descriptor + FILE_ENDING.

Warning

This put method is not multiprocessing capable! Already created/put files may be overwritten by parallel processes. Make sure, no two processes will attempt to put an object to the same descriptor (e.g. handled by torch.utils.data.DataLoader for map-style datasets).

Parameters
  • descriptor (str) – The (unique) file name to use without FILE_ENDING; may also be a file path relative to the cache_root

  • obj (Any) – the object to save; must not be None

abstract put_file(filepath, obj)[source]

Save put obj under filepath.

Parameters
  • filepath (str) –

  • obj (Any) –

FILE_ENDING = None

The file ending to append to descriptors to get the file path. See descriptor_to_fp().

cache_root

The path to the root folder under which to store cached files.