A Basic Data Manager#

class BasicDataManager(root='data', dataset='mnist', num_partitions=500, rule='iid', sample_balance=0.0, label_balance=1.0, local_test_portion=0.0, global_valid_portion=0.0, seed=10, save_dir='partitions')[source]#

A basic data manager for partitioning the data. Currecntly three rules of partitioning are supported:

  • iid:

    same label distribution among clients. sample balance determines quota of each client samples from a lognorm distribution.

  • dir:

    Dirichlete distribution with concentration parameter given by label_balance determines label balance of each client. sample balance determines quota of each client samples from a lognorm distribution.

  • exclusive:

    samples corresponding to each label are randomly splitted to k clients where k = total_sample_size * label_balance. sample_balance determines the way this split happens (quota). This rule also is know as "shards splitting".

Parameters
  • root (str) -- root dir of the dataset to partition

  • dataset (str) -- name of the dataset

  • num_clients (int) -- number of partitions or clients

  • rule (str) -- rule of partitioning

  • sample_balance (float) -- balance of number of samples among clients

  • label_balance (float) -- balance of the labels on each clietns

  • local_test_portion (float) -- portion of local test set from trian

  • global_valid_portion (float) -- portion of global valid split. What remains from global samples goes to the test split.

  • seed (int) -- random seed of partitioning

  • save_dir (str, optional) -- dir to save partitioned indices.

get_identifiers()[source]#

Returns identifiers to be used for saving the partition info.

Returns

Sequence[str] -- a sequence of str identifing class instance

make_datasets(root)[source]#

makes and returns local and global dataset objects. The created datasets do not need a transform as recompiled datasets with separately provided transforms on the fly.

Parameters
  • dataset_name (str) -- name of the dataset.

  • root (str) -- directory to download and manipulate data.

Returns

Tuple[object, object] -- local and global dataset

make_transforms()[source]#

make and return the dataset trasformations for local and global split.

Returns

Tuple[Dict[str, Callable], Dict[str, Callable]] --

tuple of two dictionaries,

first, the local transform mapping and second the global transform mapping.

partition_global_data(dataset)[source]#

partitions global data indices into splits (e.g., train, test, ...).

Parameters

dataset (object) -- global dataset

Returns

Dict[str, Iterable[int]] -- dictionary of {split:example indices of global dataset}.

partition_local_data(dataset)[source]#

partitions local data indices into client-indexed Iterable.

Parameters

dataset (object) -- local dataset

Returns

Dict[str, Iterable[Iterable[int]]] -- dictionary of {split:client-indexed iterables of example indices}.