BaseDataset
- class hybrid_learning.datasets.base.BaseDataset(split=None, dataset_root=None, transforms=None, transforms_cache=None, after_cache_transforms=None, device='cpu')[source]
Bases:
Dataset
Abstract base class for tuple datasets with storage location.
Derived datasets should yield tuples of
(input, target)
. The transformationtransforms
is applied to data tuples before return from__getitem__()
can be controlled. The default fortransforms
is given by The default fortransforms
is given by_default_transforms
. Override in sub-classes if necessary. The default combination of collected dataset tuples and_default_transforms
should yield a tuple oftorch.Tensor
or dicts thereof.The
hybrid_learning.datasets.base.BaseDataset.dataset_root
is assumed to provide information about the storage location. Best, all components (input data, annotations, etc.) should be stored relative to this root location.The transformed tuple values are cached by
transforms_cache
if it is given. Then values are only collected and transformed if they cannot be loaded from the cache. To get the cache descriptor for an entry the, thedescriptor()
method is consulted. Make sure to override this appropriately (e.g. by image ID or image file name).Note
In case a
CacheTuple
is used, make sure thatNone
is returned if any tuple value isNone
.Public Data Attributes:
Settings of the instance.
Public Methods:
getitem
(idx)Get data item tuple from
idx
in this dataset.descriptor
(i)Return a unique descriptor for the item at position
i
.Special Methods:
__init__
([split, dataset_root, transforms, ...])Init.
__len__
()Number of data points in the dataset; to be implemented in subclasses.
__getitem__
(idx)Get item from
idx
in dataset with transformations applied.__repr__
()Nice printing function.
Inherited from : py: class:Dataset
__getitem__
(idx)Get item from
idx
in dataset with transformations applied.__add__
(other)
- __getitem__(idx)[source]
Get item from
idx
in dataset with transformations applied. Transformations must be stored as single tuple transformation intransforms
.- Returns
tuple output of
getitem()
transformed bytransforms
- Parameters
idx (int) –
- __init__(split=None, dataset_root=None, transforms=None, transforms_cache=None, after_cache_transforms=None, device='cpu')[source]
Init.
- Parameters
split (Optional[DatasetSplit]) – The split of the dataset (e.g.
DatasetSplit.TRAIN
,DatasetSplit.VAL
,DatasetSplit.TEST
).dataset_root (Optional[str]) – The location where to store the dataset.
transforms (Optional[Callable]) – The transformations to be applied to the data when loaded; defaults to
_default_transforms
transforms_cache (Optional[Cache]) – optional cache instance for caching transformed tuples; must return
None
in case one of the tuple values has not been cached yet; seetransforms_cache
after_cache_transforms (Optional[Callable]) – transformations applied after consulting the cache (no matter, whether the tuples was retrieved from cache or not); by default, tensor gradients are disabled and tensors are moved to a common device
device (Optional[Union[str, device]]) – device to use in the default
after_cache_transforms
- abstract descriptor(i)[source]
Return a unique descriptor for the item at position
i
. This can e.g. be an image ID or the image file name. It is used for caching.
- abstract getitem(idx)[source]
Get data item tuple from
idx
in this dataset.- Parameters
idx (int) – index to retrieve data point from
- Returns
tuple
(input, label)
withinput
one of: image (asPIL.Image.Image
), Radar/Lidar point cloudlabel
one of:None
class label (as
torch.Tensor
orbool
),semantic segmentation map (as
PIL.Image.Image
ortorch.Tensor
compatible with torchvision transforms),bounding box
string-indexed dict of combinations
- Return type
Tuple[Union[Tensor, Image], Union[Tensor, Image, Dict[Tensor, Image]]]
- __parameters__ = ()
- after_cache_transforms: Callable
Transformation function applied after consulting the cache (no matter, whether the tuples was retrieved from cache or not). Use these transformations instead of
transforms
to ensure the transformation is always applied, regardless of caching. By default, tensor gradients are disabled and tensors are moved to a common device (see_get_default_after_cache_trafo()
).
- dataset_root: str
Assuming the dataset is saved in some storage location, a root from which to navigate to the dataset information.
- property settings: Dict[str, Any]
Settings of the instance.
transforms
info is skipped if set to default.
- split: Optional[DatasetSplit]
Optional specification what use-case this dataset is meant to represent, e.g. training, validation, or testing.
- transforms: Callable
Transformation function applied to each item tuple before return. Applied in
__getitem__()
. Default transformations are sub-class-specific. Items transformed usingtransforms
can be cached by settingtransforms_cache
. If the transformations should be applied always, regardless of caching, useafter_cache_transforms
.
- transforms_cache: Optional[Cache]
Cache for the transformed
(input, target)
tuples. If set,__getitem__()
will first try to load the tuple from cache before loading and transforming it normally. Items not in the cache are put in there aftertransforms
is applied.