-
-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataloading utils #85
base: master
Are you sure you want to change the base?
Conversation
reformatted and added method to split and directly create vertical federated dataset
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
To Resolve #81 |
Updated split_data_create_vertical_dataset to match with current dataset classes (i.e. samplesetwithlabels).
import datasets | ||
|
||
|
||
"""I think this is not needed anymore""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean we don't need the partitioned dataloader?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I mean that the default pytorch dataloader in PyTorch works, so we do not need a custom one (for how it is done now). See the notebook for an example.
self.values = torch.Tensor(values) if is_labels else torch.stack(values) | ||
|
||
self.worker_id = None | ||
if worker_id != None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that can simplify to if worker_id:
fmt_str = "FederatedDataset\n" | ||
fmt_str += f" Distributed accross: {', '.join(str(x) for x in self.workers)}\n" | ||
fmt_str += f" Number of datapoints: {self.__len__()}\n" | ||
return fmt_str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newline at the end of the file
self.dataset = dataset #It can also be None, and then it would be only computational | ||
self.model = model | ||
|
||
self.level = level if level >= 0 else 0 #it should start from zero, otherwise throw error #TODO: implement error throwing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simplify to max(level, 0)
This code is meant to be used with dual-headed Neural Networks, where there are a bunch of different workers, | ||
which agrees on the labels, and there is a server with the labels only. | ||
Code built upon: | ||
- Abbas Ismail's (@abbas5253) work on dual-headed NN. In particular, check Configuration 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this PR require abbas' PR to be merged?
the third the index, which is to keep track of the same data point. | ||
""" | ||
|
||
if worker_list == None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if worker_list:
or if worker_list is None:
Description
Work in progress pull request for dataloading utils, dataloaders and datasets.
Affected Dependencies
Currently using PySyft 2.0. To be changed to not using PySyft at all, or eventually PySyft 3.0
How has this been tested?
Manually, unit and integration tests to be properly added
Checklist