Data Manager#
- class DataManager(root, seed, save_dir=None)[source]#
DataManager base class. Any other Data Manager is inherited from this class. There are four abstract class methods that child classes should implement: get_identifiers, make_datasets, make_transforms, partition_local_data.
Warning
when inheritted, super should be called at the end of the constructor because the abstract classes are called in super's constructor!
- Parameters
root (str) -- root dir of the dataset to partition
seed (int) -- random seed of partitioning
save_dir (str, optional) -- path to save partitioned indices.
- get_global_dataset() Dict[str, torch.utils.data.dataset.Dataset] [source]#
returns the global dataset
- Returns
Dict[str, Dataset] -- global dataset for each split
- get_global_splits_names()[source]#
returns name of the global splits (train, test, etc.)
- Returns
List[str] -- list of global split names
- get_group_dataset(ids: Iterable[int]) Dict[str, torch.utils.data.dataset.Dataset] [source]#
returns the local dataset corresponding to a group of given partition ids
- Parameters
ids (Iterable[int]) -- a list or tuple of partition ids
- Returns
Dict[str, Dataset] -- a mapping of split_name: dataset
- get_identifiers() Sequence[str] [source]#
Returns identifiers to be used for saving the partition info.
- Raises
NotImplementedError -- this abstract method should be implemented by child classes
- Returns
Sequence[str] -- a sequence of str identifing class instance
- get_local_dataset(id: int) Dict[str, torch.utils.data.dataset.Dataset] [source]#
returns the local dataset corresponding to a given partition id
- Parameters
id (int) -- partition id
- Returns
Dict[str, Dataset] -- a mapping of split_name: dataset
- get_local_splits_names()[source]#
returns name of the local splits (train, test, etc.)
- Returns
List[str] -- list of local split names
- get_oracle_dataset() Dict[str, torch.utils.data.dataset.Dataset] [source]#
returns all of the local datasets stacked up.
- Returns
Dict[str, Dataset] -- Oracle dataset for each split
- get_partitioning_name() str [source]#
returns unique name of the DataManager instance. .. note:: This method can help store and retrieval of the partitioning indices, so the experiments could reproduced on a machine.
- Returns
str -- a unique name for the DataManager instance.
- make_datasets(root: str) Tuple[object, object] [source]#
makes and returns local and global dataset objects. The created datasets do not need a transform as recompiled datasets with separately provided transforms on the fly.
- Parameters
dataset_name (str) -- name of the dataset.
root (str) -- directory to download and manipulate data.
- Raises
NotImplementedError -- this abstract method should be implemented by child classes
- Returns
Tuple[object, object] -- local and global dataset
- make_transforms() Tuple[object, object] [source]#
make and return the dataset trasformations for local and global split.
- Raises
NotImplementedError -- this abstract method should be implemented by child classes
- Returns
Tuple[Dict[str, Callable], Dict[str, Callable]] --
- tuple of two dictionaries,
first, the local transform mapping and second the global transform mapping.
- partition_global_data(dataset: object) Dict[str, Iterable[int]] [source]#
partitions global data indices into splits (e.g., train, test, ...).
- Parameters
dataset (object) -- global dataset
- Returns
Dict[str, Iterable[int]] -- dictionary of {split:example indices of global dataset}.
- partition_local_data(dataset: object) Dict[str, Iterable[Iterable[int]]] [source]#
partitions local data indices into client-indexed Iterable.
- Parameters
dataset (object) -- local dataset
- Raises
NotImplementedError -- this abstract method should be implemented by child classes
- Returns
Dict[str, Iterable[Iterable[int]]] -- dictionary of {split:client-indexed iterables of example indices}.