Skip to content

Dynamic feature selection based on conditional mutual information

License

Notifications You must be signed in to change notification settings

q8888620002/dynamic-selection

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Greedy dynamic feature selection (GDFS)

GDFS is a method for dynamically selecting features based on conditional mutual information. It was developed in the paper Learning to Maximize Mutual Information for Dynamic Feature Selection.

In the dynamic feature selection (DFS) problem, you handle examples at test-time as follows: you begin with no features, progressively select a specific number of features (according to a pre-specified budget), and then make predictions given the available information. The problem can be addressed in many different ways, but GDFS tries to greedily select features according to their conditional mutual information (CMI) with the response variable. CMI is difficult to calculate, so GDFS approximates these selections using a custom training approach.

Installation

You can get started by cloning the repository, and then pip installing the package in your Python environment as follows:

pip install .

Usage

GDFS involves learning two separate networks: one responsible for making predictions (the predictor) and one responsible for making selections (the policy). Both networks receive a subset of features as their input, and the policy outputs probabilities for selecting each feature. During training, we sample a random feature using the Concrete distribution, but at test time we simply use the argmax.

The diagram below illustrates the training approach:

For usage examples, see the following:

  • Spam: a notebook showing an example with the UCI spam detection dataset.
  • MNIST: a notebook example with MNIST (digit recognition).
  • MNIST-Grouped: shows how to use feature grouping, which is necessary for some datasets (e.g., when using one-hot encoded categorical features).
  • The experiments directory contains code to reproduce experiments from the original paper

Authors

References

Ian Covert, Wei Qiu, Mingyu Lu, Nayoon Kim, Nathan White, Su-In Lee. "Learning to Maximize Mutual Information for Dynamic Feature Selection." ICML, 2023.

About

Dynamic feature selection based on conditional mutual information

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.7%
  • Shell 0.3%