A Basic Data Manager#

class BasicDataManager(root='data', dataset='mnist', num_partitions=500, rule='iid', sample_balance=0.0, label_balance=1.0, local_test_portion=0.0, global_valid_portion=0.0, seed=10, save_dir='partitions')[source]#

A basic data manager for partitioning the data. Currecntly three rules of partitioning are supported:

  • iid:

    same label distribution among clients. sample balance determines quota of each client samples from a lognorm distribution.

  • dir:

    Dirichlete distribution with concentration parameter given by label_balance determines label balance of each client. sample balance determines quota of each client samples from a lognorm distribution.

  • exclusive:

    samples corresponding to each label are randomly splitted to k clients where k = total_sample_size * label_balance. sample_balance determines the way this split happens (quota). This rule also is know as "shards splitting".

  • root (str) -- root dir of the dataset to partition

  • dataset (str) -- name of the dataset

  • num_clients (int) -- number of partitions or clients

  • rule (str) -- rule of partitioning

  • sample_balance (float) -- balance of number of samples among clients

  • label_balance (float) -- balance of the labels on each clietns

  • local_test_portion (float) -- portion of local test set from trian

  • global_valid_portion (float) -- portion of global valid split. What remains from global samples goes to the test split.

  • seed (int) -- random seed of partitioning

  • save_dir (str, optional) -- dir to save partitioned indices.


Returns identifiers to be used for saving the partition info.


Sequence[str] -- a sequence of str identifing class instance


makes and returns local and global dataset objects. The created datasets do not need a transform as recompiled datasets with separately provided transforms on the fly.

  • dataset_name (str) -- name of the dataset.

  • root (str) -- directory to download and manipulate data.


Tuple[object, object] -- local and global dataset


make and return the dataset trasformations for local and global split.


Tuple[Dict[str, Callable], Dict[str, Callable]] --

tuple of two dictionaries,

first, the local transform mapping and second the global transform mapping.


partitions global data indices into splits (e.g., train, test, ...).


dataset (object) -- global dataset


Dict[str, Iterable[int]] -- dictionary of {split:example indices of global dataset}.


partitions local data indices into client-indexed Iterable.


dataset (object) -- local dataset


Dict[str, Iterable[Iterable[int]]] -- dictionary of {split:client-indexed iterables of example indices}.