-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add raw price threshold for sales val #142
Merged
Merged
Changes from all commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
d5c254e
Add first pass at hard threshold
wagnerlmichael 006cfaf
Update log transform
wagnerlmichael 2039c7c
Change cap amount
wagnerlmichael d259afe
Add to docstring
wagnerlmichael a93bbbe
Add arg for raw price thresh
wagnerlmichael cb44fdc
Add yaml entry
wagnerlmichael 0cb5c1d
Revert changes
wagnerlmichael 11143b3
Revert changes
wagnerlmichael 5e8dff0
Update glue/sales_val_flagging.py
wagnerlmichael 126ee19
Update manual_flagging/yaml/inputs.yaml
wagnerlmichael 324e47a
Persist raw price thresh
wagnerlmichael e294c73
Standardize threshold naming
wagnerlmichael cbec904
Update glue/sales_val_flagging.py
wagnerlmichael 448b2b4
Update manual_flagging/yaml/inputs.yaml
wagnerlmichael File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -144,7 +144,8 @@ def classify_outliers(df, stat_groups: list, min_threshold): | |
2. Implement our group threshold requirement. In the statistical flagging process, if | ||
the group a sale belongs too is below N=30 then we want to manually set these flags to | ||
non-outlier status, even if they were flagged in the mansueto script. This requirement | ||
is bypasses for ptax outliers - we don't care about group threshold in this case. | ||
is bypasses for ptax outliers and raw price threshold outliers - we don't care about | ||
group threshold in this case. | ||
|
||
Inputs: | ||
df: The data right after we perform the flagging script (go()), when the exploded | ||
|
@@ -178,6 +179,7 @@ def classify_outliers(df, stat_groups: list, min_threshold): | |
"sv_ind_ptax_flag_w_high_price_sqft": "High price per square foot", | ||
"sv_ind_price_low_price_sqft": "Low price per square foot", | ||
"sv_ind_ptax_flag_w_low_price_sqft": "Low price per square foot", | ||
"sv_ind_raw_price_threshold": "Raw price threshold", | ||
"sv_ind_ptax_flag": "PTAX-203 Exclusion", | ||
"sv_ind_char_short_term_owner": "Short-term owner", | ||
"sv_ind_char_family_sale": "Family Sale", | ||
|
@@ -199,6 +201,11 @@ def classify_outliers(df, stat_groups: list, min_threshold): | |
|
||
Note: This doesn't apply for sales that also have a ptax outlier status. | ||
In this case, we still assign the price outlier status. | ||
|
||
We also don't apply this threshold with sv_raw_price_threshold, | ||
since this is designed to be a safeguard that catches very high price | ||
sales that may have slipped through the cracks due to the group | ||
threshold requirement | ||
""" | ||
group_thresh_price_fix = [ | ||
"sv_ind_price_high_price", | ||
|
@@ -237,12 +244,14 @@ def fill_outlier_reasons(row): | |
# Drop the _merge column | ||
df = df.drop(columns=["_merge"]) | ||
|
||
# Assign outlier status | ||
# Assign outlier status, these are the outlier types | ||
# that assign a sale as an outlier | ||
values_to_check = { | ||
"High price", | ||
"Low price", | ||
"High price per square foot", | ||
"Low price per square foot", | ||
"Raw price threshold", | ||
} | ||
|
||
df["sv_is_outlier"] = np.where( | ||
|
@@ -471,8 +480,9 @@ def get_parameter_df( | |
ptax_sd, | ||
rolling_window, | ||
time_frame, | ||
short_term_thresh, | ||
min_group_thresh, | ||
short_term_threshold, | ||
min_group_threshold, | ||
raw_price_threshold, | ||
run_id, | ||
): | ||
""" | ||
|
@@ -488,8 +498,9 @@ def get_parameter_df( | |
ptax_sd: list of standard deviations used for ptax flagging | ||
rolling_window: how many months used in rolling window methodology | ||
date_floor: parameter specification that limits earliest flagging write | ||
short_term_thresh: short-term threshold for Mansueto's flagging model | ||
short_term_threshold: short-term threshold for Mansueto's flagging model | ||
min_group_thresh: minimum group size threshold needed to flag as outlier | ||
raw_price_threshold: raw price threshold at which we unconditionally classify sales as outliers | ||
run_id: unique run_id to flagging program run | ||
Outputs: | ||
df_parameters: parameters table associated with flagging run | ||
|
@@ -512,8 +523,9 @@ def get_parameter_df( | |
"ptax_sd": [ptax_sd], | ||
"rolling_window": [rolling_window], | ||
"time_frame": [time_frame], | ||
"short_term_owner_threshold": [short_term_thresh], | ||
"min_group_thresh": [min_group_thresh], | ||
"short_term_owner_threshold": [short_term_threshold], | ||
"min_group_thresh": [min_group_threshold], | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This min_group_thresh should eventually be migrated to min_group_threshold to maintain naming style. |
||
"raw_price_threshold": [raw_price_threshold], | ||
} | ||
|
||
df_parameters = pd.DataFrame(parameter_dict_to_df) | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assigning this reason with priority right after the other price flags that are generated with the statistical flagging groups.
Order of assignment between the three outlier reasons will be