Dataset Caching =============== The base dataset class :py:class:`~hybrid_learning.datasets.base.BaseDataset` defined in this library supports basic caching functionality. Respective cache handles are defined in :py:mod:`~hybrid_learning.datasets.caching`. About Caches ------------ The base class of cache handles is :py:class:`~hybrid_learning.datasets.caching.Cache`. A cache implementation provides methods :py:meth:`~hybrid_learning.datasets.caching.Cache.put` and :py:meth:`~hybrid_learning.datasets.caching.Cache.load` which will store a given object respectively load a previously pushed object using a given descriptor. If load is called upon a descriptor for which no object has been stored so far, ``None`` is returned. An example is here given for a file cache that stores objects to disk using ``torch.save()``: >>> import os, torch >>> from hybrid_learning.datasets.caching import PTCache >>> mycache = PTCache(cache_root=".pytest_tmpdir") >>> obj: torch.Tensor = torch.tensor([1,2,3]) >>> descriptor: str = "unique_descriptor" >>> mycache.put(descriptor, obj) >>> assert os.path.exists(os.path.join(mycache.cache_root, descriptor + ".pt")) >>> print(mycache.load(descriptor)) tensor([1., 2., 3.]) >>> print(mycache.load("descriptor_of_not_yet_stored_object")) None Adding a Cache to a Dataset --------------------------- Implementations of :py:class:`~hybrid_learning.datasets.base.BaseDataset` allow to specify a cache handle in order to cache transformed items. They feature a :py:meth:`~hybrid_learning.datasets.base.BaseDataset.descriptor` method that returns the unique descriptor of the sample at an index. If the dataset is assigned a :py:attr:`~hybrid_learning.datasets.base.BaseDataset.transforms_cache` handle, these descriptors are used to load or put a transformed sample into the cache. To apply further transformations to items, independent on whether they were loaded from cache or newly transformed using :py:attr:`~hybrid_learning.datasets.base.BaseDataset.transforms`, use the :py:meth:`~hybrid_learning.datasets.base.BaseDataset.after_cache_transforms`. >>> from hybrid_learning.datasets.custom import coco >>> concept_data = coco.ConceptDataset( ... body_parts=[coco.BodyParts.FACE], ... dataset_root=os.path.join("dataset", "coco_test", "images", "train2017"), ... transforms_cache=mycache ... ) Also, caching can be applied manually by decorating ``__getitem__``-like functions with a cache's :py:meth:`~hybrid_learning.datasets.caching.Cache.wrap` method.