Documentation updates

1. Doc updates 2. Vignettes updates
mingdeyu · Dec 12, 2024 · 903bd33 · 903bd33
1 parent b4fdd62
commit 903bd33
Show file tree

Hide file tree

Showing 74 changed files with 775 additions and 1,295 deletions.
diff --git a/.Rbuildignore b/.Rbuildignore
@@ -11,4 +11,9 @@
 ^RESEARCH-NOTICE\.md$
 ^vignettes/images
 ^vignettes/motorcycle.Rmd$
+^vignettes/classification.Rmd$
+^vignettes/large_scale_emulation.Rmd$
+^vignettes/linked_DGP.Rmd$
+^vignettes/seq_design.Rmd$
+^vignettes/seq_design_2.Rmd$
 ^LICENSE\.md$
diff --git a/NAMESPACE b/NAMESPACE
@@ -29,9 +29,6 @@ S3method(validate,lgp)
 S3method(vigf,bundle)
 S3method(vigf,dgp)
 S3method(vigf,gp)
-export(Hetero)
-export(NegBin)
-export(Poisson)
 export(alm)
 export(combine)
 export(continue)
@@ -42,7 +39,6 @@ export(draw)
 export(get_thread_num)
 export(gp)
 export(init_py)
-export(kernel)
 export(lgp)
 export(mice)
 export(nllik)

diff --git a/NEWS.md b/NEWS.md
@@ -15,8 +15,8 @@
 - The `plot()` function has been updated to generate validation plots for DGP classifiers (i.e., DGP emulators with categorical likelihoods) and linked emulators created by `lgp()` using the new data frame form for `struc`.  
 - The `summary()` function has been redesigned to provide both summary tables and visualizations of structure and model specifications for (D)GP and linked (D)GP emulators.  
 - A `sample_size` argument has been added to the `validate()` and `plot()` functions, allowing users to adjust the number of samples used for validation when the validation method is set to `sampling`.
-- The following functions are deprecated as of this version and will be removed in the next release: `combine()`, `set_linked_idx()`, `kernel()`, `Poisson()`, `Hetero()`, and `NegBin()`. These functions are no longer maintained. Please refer to the updated package documentation for alternative workflows.
-- The basic node functions `kernel()`, `Hetero()`, `Poisson()`, and `NegBin()`, along with the `struc` argument in the `gp()` and `dgp()` functions, have been deprecated as of this version and will be removed in the next release. Customization of (D)GP specifications can be achieved by modifying the other arguments in `gp()` and `dgp()`.
+- `combine()` and `set_linked_idx()` are deprecated as of this version and will be removed in the next release. These two functions are no longer maintained. Please refer to the updated package documentation for alternative workflows.
+- The basic node functions `kernel()`, `Hetero()`, `Poisson()`, and `NegBin()`, along with the `struc` argument in the `gp()` and `dgp()` functions, have been removed as of this version. Customization of (D)GP specifications can be achieved by modifying the other arguments in `gp()` and `dgp()`.
 - The `draw()` function has been updated for instances of the `bundle` class to allow drawing of design and evaluation plots of all emulators in a single figure.  
 - The `plot()` function has been updated for linked emulators generated by `lgp()` using the new data frame form for `struc`.  
 - The `design()` function has been redesigned to allow new specifications of the user-supplied `method` function.  
@@ -28,6 +28,8 @@
 - The `write()` function now allows `light = TRUE` for both GP emulators and bundles of GP emulators.  
 - Two new functions, `serialize()` and `deserialize()`, have been added to allow users to export emulators to multi-session workers for parallel processing.
 - Additional vignettes are available, showcasing large-scale DGP emulation and DGP classification.
+- Enhanced clarity and consistency across the documentation.
+- Improved examples and explanations in vignettes for better user guidance.
 
 # dgpsi 2.4.0
 - One can now use `design()` to implement sequential designs using `f` and a fixed candidate set passed to `x_cand` with `y_cand = NULL`.

diff --git a/R/alm.R b/R/alm.R
@@ -7,10 +7,10 @@
 #' * the S3 class `gp`.
 #' * the S3 class `dgp`.
 #' * the S3 class `bundle`.
-#' @param x_cand a matrix (with each row containing a design point and column representing an input dimension) that gives a candidate set
-#'     from which the next design point(s) are determined. If `object` is an instance of the `bundle` class and `aggregate` is not supplied, `x_cand` could also
-#'     be a list with length equal to the number of emulators contained in `object`. In this case, each slot in `x_cand` should be a candidate set matrix
-#'     for each emulator included in the bundle. Defaults to `NULL`.
+#' @param x_cand a matrix (with each row being a design point and column being an input dimension) that gives a candidate set
+#'     from which the next design point(s) are determined. If `object` is an instance of the `bundle` class and `aggregate` is not supplied, `x_cand` can also be a list.
+#'     The list must have a length equal to the number of emulators in `object`, with each element being a matrix representing the candidate set for a corresponding
+#'     emulator in the bundle. Defaults to `NULL`.
 #' @param n_start an integer that gives the number of initial design points to be used to determine next design point(s). This argument
 #'     is only used when `x_cand` is `NULL`. Defaults to `20`.
 #' @param batch_size an integer that gives the number of design points to be chosen. Defaults to `1`.
@@ -33,37 +33,40 @@
 #'   of the matrix is equal to:
 #'   - the emulator output dimension if `object` is an instance of the `dgp` class; or
 #'   - the number of emulators contained in `object` if `object` is an instance of the `bundle` class.
-#' * the output should be a vector that aggregates scores across outputs or emulators at different design points.
+#' * the output should be a vector that gives aggregate scores at different design points.
 #'
-#' Set to `NULL` to disable the aggregation. Defaults to `NULL`.
+#' Set to `NULL` to disable aggregation. Defaults to `NULL`.
 #' @param ... any arguments (with names different from those of arguments used in [alm()]) that are used by `aggregate`
 #'     can be passed here.
 #'
 #' @return
-#' 1. If `x_cand` is not `NULL` and:
-#'    - `object` is an instance of the `gp` class, a vector is returned with length equal to `batch_size`, giving the positions (i.e., row numbers)
-#'      of next design points from `x_cand`.
-#'    - `object` is an instance of the `dgp` class, a vector is returned with length equal to `batch_size * D`, giving positions (i.e., row numbers)
-#'      of next design points from `x_cand` to be added to the DGP emulator. `D` equals to the number of output dimensions of the DGP
-#'      emulator if there is no likelihood layer in the hierarchy. If `object` is a DGP emulator with either `Hetero` or `NegBin` likelihood layer,
-#'      `D = 2`. If `object` is a DGP emulator with a `Categorical` likelihood layer, `D` equals to one (for binary output) or `K` (for multi-class output with `K` classes).
-#'    - `object` is an instance of the `bundle` class, a matrix is returned with row number equal to `batch_size` and column number equal to the number of
-#'      emulators in the bundle, giving positions (i.e., row numbers) of next design points from `x_cand` to be added to individual emulators.
-#' 2. If `x_cand = NULL` and:
-#'    - `object` is an instance of the `gp` class, a matrix is returned with row number equal to `batch_size`, giving the next design points to be evaluated.
-#'    - `object` is an instance of the `dgp` class, a matrix is returned with row number equal to `batch_size * D` where `D` is the number of output dimensions of the DGP
-#'      emulator if no likelihood layer is included. If `object` is a DGP emulator with either `Hetero` or `NegBin` likelihood layer, `D = 2`. If `object` is a DGP emulator
-#'      with a `Categorical` likelihood layer, `D` equals to one (for binary output) or `K` (for multi-class output with `K` classes).
-#'    - `object` is an instance of the `bundle` class, a list is returned with the length equal to the number of
-#'      emulators in the bundle. Each element in the list is a matrix with row number equal to `batch_size`, giving next design points to be added to individual emulators.
+#' 1. If `x_cand` is not `NULL`:
+#'    - When `object` is an instance of the `gp` class, a vector of length `batch_size` is returned, containing the positions
+#'      (row numbers) of the next design points from `x_cand`.
+#'    - When `object` is an instance of the `dgp` class, a vector of length `batch_size * D` is returned, containing the positions
+#'      (row numbers) of the next design points from `x_cand` to be added to the DGP emulator.
+#'      * `D` is the number of output dimensions of the DGP emulator if no likelihood layer is included.
+#'      * For a DGP emulator with a `Hetero` or `NegBin` likelihood layer, `D = 2`.
+#'      * For a DGP emulator with a `Categorical` likelihood layer, `D = 1` for binary output or `D = K` for multi-class output with `K` classes.
+#'    - When `object` is an instance of the `bundle` class, a matrix is returned with `batch_size` rows and a column for each emulator in
+#'      the bundle, containing the positions (row numbers) of the next design points from `x_cand` for individual emulators.
+#' 2. If `x_cand` is `NULL`:
+#'    - When `object` is an instance of the `gp` class, a matrix with `batch_size` rows is returned, giving the next design points to be evaluated.
+#'    - When `object` is an instance of the `dgp` class, a matrix with `batch_size * D` rows is returned, where:
+#'      - `D` is the number of output dimensions of the DGP emulator if no likelihood layer is included.
+#'      - For a DGP emulator with a `Hetero` or `NegBin` likelihood layer, `D = 2`.
+#'      - For a DGP emulator with a `Categorical` likelihood layer, `D = 1` for binary output or `D = K` for multi-class output with `K` classes.
+#'    - When `object` is an instance of the `bundle` class, a list is returned with a length equal to the number of emulators in the bundle. Each
+#'      element of the list is a matrix with `batch_size` rows, where each row represents a design point to be added to the corresponding emulator.
 #'
 #' @note
-#' The column order of the first argument of `aggregate` must be consistent with the order of emulator output dimensions (if `object` is an instance of the
-#'     `dgp` class), or the order of emulators placed in `object` if `object` is an instance of the `bundle` class.
+#' The first column of the matrix supplied to the first argument of `aggregate` must correspond to the first output dimension of the DGP emulator
+#'     if `object` is an instance of the `dgp` class, and so on for subsequent columns and dimensions. If `object` is an instance of the `bundle` class,
+#'     the first column must correspond to the first emulator in the bundle, and so on for subsequent columns and emulators.
 #' @references
 #' MacKay, D. J. (1992). Information-based objective functions for active data selection. *Neural Computation*, **4(4)**, 590-604.
 #'
-#' @details See further examples and tutorials at <https://mingdeyu.github.io/dgpsi-R/>.
+#' @details See further examples and tutorials at <`r get_docs_url()`>.
 #' @examples
 #' \dontrun{
 #'

diff --git a/R/design.R b/R/design.R
@@ -51,9 +51,9 @@
 #' * if `object` is an instance of the `bundle` class, `y_test` is a matrix with each row representing the outputs for the corresponding row of `x_test` and each column representing the output of the different emulators in the bundle.
 #'
 #' Set to `NULL` for LOO-based emulator validation. Defaults to `NULL`. This argument is only used if `eval = NULL`.
-#' @param reset A boolean or a vector of booleans indicating whether to reset the hyperparameters of the emulator(s) to their initial values (as set during initial construction) before re-fitting.
+#' @param reset A bool or a vector of bools indicating whether to reset the hyperparameters of the emulator(s) to their initial values (as set during initial construction) before re-fitting.
 #'     The re-fitting occurs based on the frequency specified by `freq[1]`. This option is useful when hyperparameters are suspected to have converged to a local optimum affecting validation performance.
-#' - If a single boolean is provided, it applies to every iteration of the sequential design.
+#' - If a single bool is provided, it applies to every iteration of the sequential design.
 #' - If a vector is provided, its length must equal `N` (even if the re-fit frequency specified in `freq[1]` is not 1) and it will apply to the corresponding iterations of the sequential design.
 #'
 #' Defaults to `FALSE`.
@@ -91,18 +91,18 @@
 #'
 #' If no custom function is provided, a built-in evaluation metric (RMSE or log-loss, in the case of DGP emulators with categorical likelihoods) will be used.
 #' Defaults to `NULL`. See the *Note* section below for additional details.
-#' @param verb a boolean indicating if trace information will be printed during the sequential design.
+#' @param verb a bool indicating if trace information will be printed during the sequential design.
 #'     Defaults to `TRUE`.
 #' @param autosave a list that contains configuration settings for the automatic saving of the emulator:
-#' * `switch`: a boolean indicating whether to enable automatic saving of the emulator during sequential design. When set to `TRUE`,
+#' * `switch`: a bool indicating whether to enable automatic saving of the emulator during sequential design. When set to `TRUE`,
 #'   the emulator in the final iteration is always saved. Defaults to `FALSE`.
 #' * `directory`: a string specifying the directory path where the emulators will be stored. Emulators will be stored in a sub-directory
 #'   of `directory` named 'emulator-`id`'. Defaults to './check_points'.
 #' * `fname`: a string representing the base name for the saved emulator files. Defaults to 'check_point'.
 #' * `save_freq`: an integer indicating the frequency of automatic saves, measured in the number of iterations. Defaults to `5`.
-#' * `overwrite`: a boolean value controlling the file saving behavior. When set to `TRUE`, each new automatic save overwrites the previous one,
+#' * `overwrite`: a bool value controlling the file saving behavior. When set to `TRUE`, each new automatic save overwrites the previous one,
 #'   keeping only the latest version. If `FALSE`, each automatic save creates a new file, preserving all previous versions. Defaults to `FALSE`.
-#' @param new_wave a boolean indicating whether the current call to [design()] will create a new wave of sequential designs or add the next sequence of designs to the most recent wave.
+#' @param new_wave a bool indicating whether the current call to [design()] will create a new wave of sequential designs or add the next sequence of designs to the most recent wave.
 #'     This argument is relevant only if waves already exist in the emulator. Creating new waves can improve the visualization of sequential design performance across different calls
 #'     to [design()] via [draw()], and allows for specifying a different evaluation frequency in `freq`. However, disabling this option can help limit the number of waves visualized
 #'     in [draw()] to avoid issues such as running out of distinct colors for large numbers of waves. Defaults to `TRUE`.
@@ -123,9 +123,9 @@
 #'     if the DGP emulator was constructed without the Vecchia approximation. Otherwise, the number of processes is set to `max physical cores available %/% 2`.
 #'     Only use multiple processes when there is a large number of GP components in different layers and optimization of GP components
 #'     is computationally expensive. Defaults to `1`.
-#' @param pruning a boolean indicating if dynamic pruning of DGP structures will be implemented during the sequential design after the total number of
+#' @param pruning a bool indicating if dynamic pruning of DGP structures will be implemented during the sequential design after the total number of
 #'     design points exceeds `min_size` in `control`. The argument is only applicable to DGP emulators (i.e., `object` is an instance of `dgp` class)
-#'     produced by `dgp()` with `struc = NULL`. Defaults to `TRUE`.
+#'     produced by `dgp()`. Defaults to `TRUE`.
 #' @param control a list that can supply any of the following components to control the dynamic pruning of the DGP emulator:
 #' * `min_size`, the minimum number of design points required to trigger dynamic pruning. Defaults to 10 times the number of input dimensions.
 #' * `threshold`, the \eqn{R^2} value above which a GP node is considered redundant. Defaults to `0.97`.
@@ -156,8 +156,8 @@
 #'     If `target` is not `NULL`, the following additional elements are also included:
 #'     - `target`: the target evaluating metric computed by the `eval` or built-in function to stop the sequential design.
 #'     - `reached`: indicates whether the `target` was reached at the end of the sequential design:
-#'        - a boolean if `object` is an instance of the `gp` or `dgp` class.
-#'        - a vector of booleans if `object` is an instance of the `bundle` class, with its length determined as follows:
+#'        - a bool if `object` is an instance of the `gp` or `dgp` class.
+#'        - a vector of bools if `object` is an instance of the `bundle` class, with its length determined as follows:
 #'          - equal to the number of emulators in the bundle when `eval = NULL`.
 #'          - equal to the length of the output from `eval` when a custom `eval` function is provided.
 #'   - a slot called `type` that gives the type of validation:
@@ -201,7 +201,7 @@
 #'   within `f` are handled by appropriately returning `NA`s.
 #' * When defining `eval`, the output metric needs to be positive if [draw()] is used with `log = T`. And one needs to ensure that a lower metric value indicates
 #'   a better emulation performance if `target` is set.
-#' @details See further examples and tutorials at <https://mingdeyu.github.io/dgpsi-R/>.
+#' @details See further examples and tutorials at <`r get_docs_url()`>.
 #'
 #' @examples
 #' \dontrun{
@@ -3237,10 +3237,6 @@ check_reset <- function(reset, N){
 check_auto <- function(object){
   auto_pruning <- T
   # exclude user-defined structure
-  if (!"internal_dims" %in% names(object[['specs']])) {
-    auto_pruning <- F
-    return(auto_pruning)
-  } else {
     n_layer <- object$constructor_obj$n_layer
     if (object$constructor_obj$all_layer[[n_layer]][[1]]$type!='gp') {
       n_layer <- n_layer - 1
@@ -3257,7 +3253,7 @@ check_auto <- function(object){
         }
       }
     }
-  }
+
   return(auto_pruning)
 }
 
@@ -3342,24 +3338,24 @@ reverse_minmax <- function(normalized_data, limits) {
   return(original_data)
 }
 
-generic_wrapper <- function(r_func) {
-  function(...) {
-    # Capture the arguments
-    args <- list(...)
-
-    # Convert Python-native arguments to R-native if necessary
-    args <- lapply(args, function(arg) {
-      if (inherits(arg, "python.builtin.object")) {
-        reticulate::py_to_r(arg)
-      } else {
-        arg
-      }
-    })
-
-    # Call the user-provided R function with converted arguments
-    result <- do.call(r_func, args)
-
-    # Convert the result back to Python-native types
-    reticulate::r_to_py(result)
-  }
-}
+#generic_wrapper <- function(r_func) {
+#  function(...) {
+#    # Capture the arguments
+#    args <- list(...)
+#
+#    # Convert Python-native arguments to R-native if necessary
+#    args <- lapply(args, function(arg) {
+#      if (inherits(arg, "python.builtin.object")) {
+#        reticulate::py_to_r(arg)
+#      } else {
+#        arg
+#      }
+#    })
+#
+#    # Call the user-provided R function with converted arguments
+#    result <- do.call(r_func, args)
+#
+#    # Convert the result back to Python-native types
+#    reticulate::r_to_py(result)
+#  }
+#}