rising.loading¶

DataLoader¶

default_transform_call¶

BatchTransformer¶

patch_worker_init_fn¶

patch_collate_fn¶

Dataset¶

class rising.loading.dataset.AsyncDataset(data_path, load_fn, mode='append', num_workers=0, verbose=False, **load_kwargs)[source][source]¶

Bases: Dataset

A dataset to preload all the data and cache it for the entire lifetime of this class.

Parameters

data_path (Union[Path, str, list]) – the path(s) containing the actual data samples
load_fn (Callable) – function to load the actual data
mode (str) – whether to append the sample to a list or to extend the list by it. Supported modes are: append and extend. Default: append
num_workers (Optional[int]) – the number of workers to use for preloading. 0 means, all the data will be loaded in the main process, while None means, the number of processes will default to the number of logical cores.
verbose (bool) – whether to show the loading progress.
**load_kwargs – additional keyword arguments. Passed directly to load_fn

Warning

if using multiprocessing to load data, there are some restrictions to which load_fn() are supported, please refer to the dill or pickle documentation

static _add_item(data, item, mode)[source][source]¶

Adds items to the given data list. The actual way of adding these items depends on mode

Parameters

data (list) – the list containing the already loaded data
item (Any) – the current item which will be added to the list
mode (str) – the string specifying the mode of how the item should be added.F

Raises

TypeError – No known mode detected

Return type

None

_make_dataset(path, mode)[source][source]¶

Function to build the entire dataset

Parameters

path (Union[Path, str, list]) – the path(s) containing the data samples
mode (str) – whether to append or extend the dataset by the loaded sample

Returns

the loaded data

Return type

list

load_multi_process(load_fn, path)[source][source]¶

Helper function to load dataset with multiple processes

Parameters

load_fn (Callable) – function to load a single sample
path (Sequence) – a sequence of paths which should be loaded

Returns

loaded data

Return type

list

load_single_process(load_fn, path)[source][source]¶

Helper function to load dataset with single process

Parameters

load_fn (Callable) – function to load a single sample
path (Sequence) – a sequence of paths which should be loaded

Returns

iterator of loaded data

Return type

Iterator

class rising.loading.dataset.Dataset(*args, **kwargs)[source][source]¶

Bases: Dataset

Extension of torch.utils.data.Dataset by a get_subset method which returns a sub-dataset.

get_subset(indices)[source][source]¶

Returns a torch.utils.data.Subset of the current dataset based on given indices

Parameters: indices (Sequence[int]) – valid indices to extract subset from current dataset
Returns: the subset of the current dataset
Return type: Subset

Dataset¶

class rising.loading.dataset.Dataset(*args, **kwargs)[source][source]¶

Bases: Dataset

Extension of torch.utils.data.Dataset by a get_subset method which returns a sub-dataset.

get_subset(indices)[source][source]¶

Returns a torch.utils.data.Subset of the current dataset based on given indices

Parameters: indices (Sequence[int]) – valid indices to extract subset from current dataset
Returns: the subset of the current dataset
Return type: Subset

AsyncDataset¶

class rising.loading.dataset.AsyncDataset(data_path, load_fn, mode='append', num_workers=0, verbose=False, **load_kwargs)[source][source]¶

Bases: Dataset

A dataset to preload all the data and cache it for the entire lifetime of this class.

Parameters

data_path (Union[Path, str, list]) – the path(s) containing the actual data samples
load_fn (Callable) – function to load the actual data
mode (str) – whether to append the sample to a list or to extend the list by it. Supported modes are: append and extend. Default: append
num_workers (Optional[int]) – the number of workers to use for preloading. 0 means, all the data will be loaded in the main process, while None means, the number of processes will default to the number of logical cores.
verbose (bool) – whether to show the loading progress.
**load_kwargs – additional keyword arguments. Passed directly to load_fn

Warning

if using multiprocessing to load data, there are some restrictions to which load_fn() are supported, please refer to the dill or pickle documentation

static _add_item(data, item, mode)[source][source]¶

Adds items to the given data list. The actual way of adding these items depends on mode

Parameters

data (list) – the list containing the already loaded data
item (Any) – the current item which will be added to the list
mode (str) – the string specifying the mode of how the item should be added.F

Raises

TypeError – No known mode detected

Return type

None

_make_dataset(path, mode)[source][source]¶

Function to build the entire dataset

Parameters

path (Union[Path, str, list]) – the path(s) containing the data samples
mode (str) – whether to append or extend the dataset by the loaded sample

Returns

the loaded data

Return type

list

load_multi_process(load_fn, path)[source][source]¶

Helper function to load dataset with multiple processes

Parameters

load_fn (Callable) – function to load a single sample
path (Sequence) – a sequence of paths which should be loaded

Returns

loaded data

Return type

list

load_single_process(load_fn, path)[source][source]¶

Helper function to load dataset with single process

Parameters

load_fn (Callable) – function to load a single sample
path (Sequence) – a sequence of paths which should be loaded

Returns

iterator of loaded data

Return type

Iterator

dill_helper¶

rising.loading.dataset.dill_helper(payload)[source][source]¶

Load single sample from data serialized by dill :type payload: Any :param payload: data which is loaded with dill

Returns: loaded data
Return type: Any

load_async¶

rising.loading.dataset.load_async(pool, fn, *args, callback=None, **kwargs)[source][source]¶

Load data asynchronously and serialize data via dill

Parameters

pool (Pool) – multiprocessing pool to use for apply_async()
fn (Callable) – function to load a single sample
*args – positional arguments to dump with dill
callback (Optional[Callable]) – optional callback. defaults to None.
**kwargs – keyword arguments to dump with dill

Returns

reference to obtain data with get()

Return type

Any

Collation¶

rising.loading.collate.do_nothing_collate(batch)[source][source]¶

Returns the batch as is (with out any collation :type batch: Any :param batch: input batch (typically a sequence, mapping or mixture of those).

Returns: the batch as given to this function
Return type: Any

rising.loading.collate.numpy_collate(batch)[source][source]¶

function to collate the samples to a whole batch of numpy arrays. PyTorch Tensors, scalar values and sequences will be casted to arrays automatically.

Parameters

batch (Any) – a batch of samples. In most cases either sequence, mapping or mixture of them

Returns

collated batch with optionally converted type: (to numpy.ndarray)

Return type

Any

Raises

TypeError – When batch could not be collated automatically

numpy_collate¶

rising.loading.collate.numpy_collate(batch)[source][source]¶

function to collate the samples to a whole batch of numpy arrays. PyTorch Tensors, scalar values and sequences will be casted to arrays automatically.

Parameters

batch (Any) – a batch of samples. In most cases either sequence, mapping or mixture of them

Returns

collated batch with optionally converted type: (to numpy.ndarray)

Return type

Any

Raises

TypeError – When batch could not be collated automatically

do_nothing_collate¶

rising.loading.collate.do_nothing_collate(batch)[source][source]¶

Returns the batch as is (with out any collation :type batch: Any :param batch: input batch (typically a sequence, mapping or mixture of those).

Returns: the batch as given to this function
Return type: Any