Skip to content

Latest commit

 

History

History
111 lines (82 loc) · 4 KB

README.md

File metadata and controls

111 lines (82 loc) · 4 KB

Automated Data Linkage




Introduction

autolink provides an easy and user friendly way for data analysts to develop linkage algorithms and use them to perform data linkage tests. With the package allowing for testing out multiple algorithms per dataset, to help data analysts achieve an ideal and successful linkage rate.

This package would be most beneficial in the field of data science, specifically data-linkage and data analysis as the datalink package would help make algorithm design and the testing process more streamlined, as the end-user of the package would not be required to make any programmatic changes to an R script and instead would only need to mix-and-match different blocking and matching variables, and what rules they would like for each. Doing so until a desired linkage rate is achieved.

Installation

R Studio Installation

To install autolink from GitHub, begin by installing and loading the devtools package:

# install.packages("devtools")
library(devtools)

Afterwards, you may install the automated data linkage package using install_github():

devtools::install_github("CHIMB/autolink")

Local Installation

To install autolink locally from GitHub, select the most recent release from the right-hand tab on the GitHub repository page. Download the Source code (zip) file, then move over to RStudio. You may then run the code:

path_to_pkg <- file.choose() # Select the unmodified package you downloaded from GitHub.
devtools::install_local(path_to_pkg)

Usage

Generating Empty Metadata File

To begin working with the autolink package, begin by creating an empty linkage metadata file:

output_dir <- choose.dir() # Select the output directory where the .SQLite file should go.
autolink::create_new_metadata("linkage_metadata", output_dir)

Working With The GUI

With an empty file, you may begin adding datasets, algorithms, and iteration specific iteration to the metadata file by using the provided R Shiny application. To begin using the application, make the following call in your R environment:

linkage_file <- file.choose() # Select the .SQLite file you wish to modify.
autolink::start_linkage_metadata_ui(linkage_file, "Data Analyst")

Within the GUI, you may first add the file paths to the data sets you wish to use for the linkage process. Once uploaded, you can select a pair of uploaded data sets to add algorithms to, of which you can add, modify, and disable any number of passes you wish. If you are uncertain with what exactly the GUI has to offer, considering reading the User Documentation on the package found below.

Running Algorithms

Once your algorithms have been created, you may run it either through the GUI, or by calling it programmatically as such:

left_dataset  <- file.choose() # The left dataset you plan on using.
right_dataset <- file.choose() # The right dataset (spine) you plan on using.
metadata_file <- file.choose() # The .SQLite file that contains all saved information.
algorithm_ids <- c(1,3,4)      # The algorithm(s) ID you want to run under the dataset pair.
extra_params  <- create_extra_parameters_list(...) # Any number of optional/extra parameters you may want (export options & data).

Additional Information & Documentation

For more details on how the architecture of the package is structured and how the stored algorithms are pulled and used to link data, consider reading the Developer Facing Documentation (474KB).

For more details on how to work function calls, how to navigate the pages of the user interface, and how to make changes, or add new information to the metadata, consider reading the User Facing Documentation (978KB).