Guide to data manager#
Provided with the simulator is a basic DataManager called BasicDataManager
which for now supports the following datasets
It supports the popular partitioning schemes (iid, Dirichlet distribution, unbalanced, etc.).
Custom DataManager#
Any Custom data manager class should inherit from fedsim.data_manager.data_manager.DataManager
(or its children) and implement its abstract methods.
DataManager Template#
from fedsim.distributed.data_management import DataManager
class CustomDataManager(DataManager)
def __init__(self, root, seed, save_dir=None, other_args="default value", ...):
self.other_arg = other_arg
"""
apply the changes required by the abstract methods here (before calling
super's constructor).
"""
super(BasicDataManager, self).__init__(root, seed, save_dir=save_dir)
"""
apply the operation that assume the abstract methods are performed here
(after calling super's constructor).
"""
def make_datasets(self, root: str) -> Tuple[object, object]:
"""makes and returns local and global dataset objects. The created datasets do
not need a transform as recompiled datasets with separately provided transforms
on the fly (for vision datasets).
Args:
dataset_name (str): name of the dataset.
root (str): directory to download and manipulate data.
Raises:
NotImplementedError: this abstract method should be
implemented by child classes
Returns:
Tuple[object, object]: local and global dataset
"""
raise NotImplementedError
def make_transforms(self) -> Tuple[object, object]:
"""make and return the dataset trasformations for local and global split.
Raises:
NotImplementedError: this abstract method should be
implemented by child classes
Returns:
Tuple[Dict[str, Callable], Dict[str, Callable]]: tuple of two dictionaries,
first, the local transform mapping and second the global transform
mapping.
"""
raise NotImplementedError
def partition_local_data(self, datasets: Dict[str, object]) -> Dict[str, Iterable[Iterable[int]]]:
"""partitions local data indices into splits and within each split, partition in client-indexed Iterable.
Return a dictionary of these splits (e.g., train, test, ...).
Args:
dataset (object): local dataset
Raises:
NotImplementedError: this abstract method should be
implemented by child classes
Returns:
Dict[str, Iterable[Iterable[int]]]:
dictionary of {split:client-indexed iterables of example indices}.
"""
raise NotImplementedError
def partition_global_data(
self,
dataset: object,
) -> Dict[str, Iterable[int]]:
"""partitions global data indices into desired splits (e.g., train, test, ...).
Args:
dataset (object): global dataset
Returns:
Dict[str, Iterable[int]]:
dictionary of {split:example indices of global dataset}.
"""
raise NotImplementedError
def get_identifiers(self) -> Sequence[str]:
""" Returns identifiers to be used for saving the partition info.
A unique identifier for a unique setup ensures the credibility of comparing your experiments results.
Raises:
NotImplementedError: this abstract method should be
implemented by child classes
Returns:
Sequence[str]: a sequence of str identifing class instance
"""
raise NotIm
Note
scores can be passed to --criterion
option the same way, however, if the selected score class is not differentiable an error may be raised (if necessary).plementedError
You can use BasicDataManager as a working template.
Integration with fedsim-cli#
To automatically include your custom data-manager into the provided cli tool, you can define it in a python file and pass its path to -a
or --data-manager
option (without .py) followed by colon and the definition of the data-manager (class or method).
For example, if you have data-manager DataManager
stored in foo/bar/my_custom_dm.py
, you can pass --data-manager foo/bar/my_custom_dm:DataManager
.
Note
Arguments of constructor of any data-manager could be given in arg:value
format following its name (or path if a local file is provided). Examples:
fedsim-cli fed-learn --data-manager BasicDataManager num_clients:1100 ...
fedsim-cli fed-learn --data-manager foo/bar/my_custom_dm:DataManager arg1:value ...