PyVertical

We introduce PyVertical, a library for vertical federated learning using Split neural networks (SplitNN) and Private Set Intersection (PSI). PyVertical is built on the Duet framework in PySyft, which allows a data scientist to run computations on a data owner's data without accessing the raw data. This repository contains a minimal example. More code will be released upon publication to preserve anonymity.

How it works

We assume data subjects in each data owner's dataset are attached to a unique ID. We use PSI to identify IDs shared among the datasets. The data owners discard data not shared by all parties and order their datasets by ID. No other party can access the raw IDs of a data owner, except for the IDs shared among all parties.

Data owners control a model segment (part of a SplitNN) to convert raw data into an abstract, shareable representation. Information from multiple data owners can be combined (concatenated) and sent to a data scientist. The data scientist completes inference on this combined data using a separate model segment to perform classification. The data scientist calculates loss and updates their model segment, They then send the gradients back to the data owners, who update their model segments.

PySyft, upon which PyVertical is built, only allows the data scientist to access pointers to raw data. The data scientist controls the training process without being able to access raw, private data.

In this demo, only two input heads are used. However, PyVertical is an extensible method which could facilitate massively multi-party VFL.

Requirements

important: The PSI library does not work on windows. Model training does work for Mac, Windows, Ubuntu (not tested on other linux distros)
Run pip install -r requirements.txt to install the Python libraries necessary for this demo.

Experiment

We demonstrate PyVertical on a vertically-partitioned version of the MNIST dataset. We provide a utility function which splits MNIST images into a left half and right half. The left half images are held by one data owner and the right half by another. We assume that the data scientist has access to data labels. Note that the models are not trained on all MNIST images to reduce communication bandwidth.

To demonstrate the method cleanly, we separate data linkage and federated learning into different notebooks. In practice, this would be done in a single notebook/script. See the examples folder for the demo.

License

Code is released under an Apache 2.0 license

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
examples		examples
src		src
PyVertical_architecture.png		PyVertical_architecture.png
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyVertical

How it works

Requirements

Experiment

License

About

Releases

Packages

Contributors 3

Languages

daler3/PyVertical-paper

Folders and files

Latest commit

History

Repository files navigation

PyVertical

How it works

Requirements

Experiment

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages