-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserve old CR3 metadata when CRIS updates a crash #1427
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't test but 🚢
@@ -25,7 +25,7 @@ def get_pgfutter_path(): | |||
|
|||
def invalidate_cr3(pg, crash_id): | |||
invalidate_cr3_sql = f"""UPDATE public.atd_txdot_crashes | |||
SET cr3_stored_flag = 'N', cr3_file_metadata = null, cr3_ocr_extraction_date = null | |||
SET cr3_stored_flag = 'N' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm going to pick back up on this later so I wanted to drop some notes because I'm finding it hard to trace the dependencies between the cris import, cr3 download, narrative extract, and populate CR3 metadata scripts right now in addition to picturing the entire flow.
I'm looking at the query that the CR3 metadata script uses and the one that the OCR extract one uses. I'll take a closer look at this later but wanted to share in case anyone else is looking at this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mddilley - thanks for these great questions.
Here's a little bit of info that might be helpful, at least in terms of the OCR script. Back when the OCR script was first put into place, I was behind a little bit of questionable database design. The cr3_ocr_extraction_date
field is a nullable timestamp, and it's intended to be both the most recent time that OCR was completed on the CR3 and, if null, to mean that OCR is required / requested for the CR3.
It's the implied meaning of the null
value that doesn't jump off the page for me, and I wish I had not opted for that choice now, but it's not entirely wrong either ... it's not great.
I certainly don't think that I've answered Mike's questions here, but it was a little bit of history that might be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ugh, thanks @mddilley for mentioned those other scripts. i had only looked at the CR3 download query, but overlooked the obvious way that these other ETLs depend on this field. so this PR as written will have the effect of causing CR3s to never be re-OCR'd after they're downloaded anew.
i am going to cancel my review request and probably move this into the backlog. it feels like something that we can pick up again in the data model work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nah, i don't think that it is obvious at all! thanks for thinking through this, y'all! (i was way too distracted w/ the eclipse yesterday). I do think that we could get the switches right if we put all our heads together, but, yep, agreed that it will be nice to take another swing at this in the near future!
Never mind! I had the wrong idea about the intent of this PR and my question above was 100% ill-informed. When I reviewed the issue itself, this all came together for me. Thank you for working to keep the CR3s available to the end user, which is 100% a better behavior. Here's the query that the downloader uses, and all that really matters is that the Also, thank you to Mike for asking good questions to help ensure all the parts of the stack stay in concert with each other. 🙏 Thanks! |
Associated issues
Testing
I'm not sure what the best way is to test the CRIS import ETL. Looking for guidance here :)
Ship list