Skip to content

Commit

Permalink
Move config inside of package such that it is accessible as a default
Browse files Browse the repository at this point in the history
  • Loading branch information
giuliabaldini committed Jul 23, 2024
1 parent 46be4c8 commit cba38d0
Show file tree
Hide file tree
Showing 6 changed files with 6 additions and 12 deletions.
2 changes: 0 additions & 2 deletions .github/workflows/gitlab.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,11 @@ on:
paths:
- "exact_kmeans/**"
- "tests/**"
- "config/**"
- "poetry.lock"
pull_request:
paths:
- "exact_kmeans/**"
- "tests/**"
- "config/**"
- "poetry.lock"
# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:
Expand Down
2 changes: 0 additions & 2 deletions .github/workflows/mypy-flake-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,11 @@ on:
paths:
- "exact_kmeans/**"
- "tests/**"
- "config/**"
- "poetry.lock"
pull_request:
paths:
- "exact_kmeans/**"
- "tests/**"
- "config/**"
- "poetry.lock"
# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ ILP#2 gets cluster sizes $c_1,\ldots, c_i$ with $2\leq i\leq k$ as input. If $c_

To search for the cluster sizes of an optimal solution we use a branch and bound approach. In a branch node of level $i$ the $i$ largest cluster sizes $c_1, \ldots, c_i$ are already fixed. The variables `branching_levels` and `fill_cluster_sizes` define the behavior on these branching nodes. If `branching_levels` is greater equal to the level $i$ of the node then we use ILP#2 to bound the current cost and decide if we branch, otherwise we always branch on this node. If the variable `fill_cluster_sizes` is set to true we compute the smallest possible remaining cluster sizes $c_{i+1},\ldots, c_{k}$ and run ILP#2 with cluster sizes $c_1,\ldots, c_k$. If the variable `fill_cluster_sizes` is set to false we run ILP#2 only with the fixed cluster sizes $c_1,\ldots, c_i$. Setting `fill_cluster_sizes` to true may lead to less branching but can increase the solving time of ILP#2.

To customize the runs, you can create a config file. The default config file is [`config/default.yaml`](config/default.yaml). You can also pass a different config file as an argument.
To customize the runs, you can create a config file. The default config file is [`exact_kmeans/config/default.yaml`](exact_kmeans/config/default.yaml). You can also pass a different config file as an argument.
- `num_processes` (integer or float) sets the number of processes used. The algorithm was parallelized using the `multiprocessing` package, so you can set the number of processes that you want to use. If you use an integer, at most that number of processes will be taken, otherwise if you use a float, it will be a fraction of the available CPUs. If the parameter is not passed, the algorithm will use all available CPUs.
- `bound_model_params` are the arguments that are passed to the ILP#1 model. Please have a look at the [Gurobi documentation](https://www.gurobi.com/documentation/9.1/refman/parameters.html) for more information.
- `model_params` are the arguments that are passed to the ILP#2 model. Please have a look at the [Gurobi documentation](https://www.gurobi.com/documentation/9.1/refman/parameters.html) for more information.
Expand Down Expand Up @@ -90,7 +90,7 @@ poetry install

Run the program
```bash
poetry run python -m exact_kmeans --data-path iris.csv --verbose --results-path test/iris.json --k 3 --config-file config/default.yaml
poetry run python -m exact_kmeans --data-path iris.csv --verbose --results-path test/iris.json --k 3 --config-file exact_kmeans/config/default.yaml
```

Your `data-path` should be a file with a header containing only the comma-separated data points. Example:
Expand Down
4 changes: 3 additions & 1 deletion exact_kmeans/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,9 @@ def set_up_logger(log_file: Path, mode: str = "w+") -> None:

parser.add_argument("--k", type=int, required=True)
parser.add_argument("--data-path", type=Path, required=True)
parser.add_argument("--config-file", type=Path, default="config/default.yaml")
parser.add_argument(
"--config-file", type=Path, default="exact_kmeans/config/default.yaml"
)
parser.add_argument("--kmeans-iterations", type=int, default=100)
parser.add_argument("--results-path", type=Path, default=None)
parser.add_argument("--load-existing-run-path", type=Path, default=None)
Expand Down
File renamed without changes.
6 changes: 1 addition & 5 deletions exact_kmeans/ilp.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,9 +51,7 @@ def __init__(
self,
X: Union[np.ndarray, pd.DataFrame],
k: int,
config_file: Union[str, Path] = Path(
os.path.split(__file__)[0]
).parent.resolve()
config_file: Union[str, Path] = Path(__file__).parent.resolve()
/ "config"
/ "default.yaml",
cache_current_run_path: Optional[Path] = None,
Expand All @@ -69,8 +67,6 @@ def __init__(
self.k = k
self.n = len(X)

# self.dmax = compute_largest_distance(X)

self._v = 1
self._n = self.n + self._v
self._k = self.k + self._v
Expand Down

0 comments on commit cba38d0

Please sign in to comment.