-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip CRIS import record updates if a key column is missing a value #1437
Conversation
for key_column in key_columns: | ||
key_column_value = source[key_column] | ||
if key_column_value == None: | ||
print("\nSkipping because unique key column is missing") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super open to whatever amount of logging we want here. 🙏 I used this because it works good for testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks good to me. maybe add another print
here that prints the entire source
, so that we have some hope of tracking this record down?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mike pointed out that we don't want to print people records to our logs. i retract this suggestion 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chatted with John about this and decided to keep as is so we don't print out sensitive people data. 🔒
for source in imported_records: | ||
# Check unique key columns to make sure they all have a value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed specifically looking at people tables, but this should protect us from updating records in other tables when we don't have all the info we need to make sure we are updating the correct record.
Drop a CRIS extract zip file into your development folder as described above, and run the import script: | ||
```bash | ||
docker compose run cris-import | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙌
# If we are missing a column that uniquely identifies the record, we should skip the update | ||
if should_skip_update: | ||
for key_column in key_columns: | ||
print(f"{key_column}: {source[key_column]}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think you can remove this print
since you have the more verbose one above 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we aren't printing the source record, i'll keep this here since it will gives us the details to track down which record was skipped no matter what table it is in.
atd-etl/cris_import/env-template
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you!!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have not tested, but this looks fantastic too me. thank you @mddilley!! 🚢 🚢 🚢 🚢 🚢
atd-etl/cris_import/cris_import.py
Outdated
# Check unique key columns to make sure they all have a value | ||
for key_column in key_columns: | ||
key_column_value = source[key_column] | ||
if key_column_value == None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super minor, but i always see this test written as is None
instead of == None
. there is some arcane pythonic reason for it that i am too lazy to look up 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds great 🐍 I'll push this up along with the other minor ones above. Thanks! 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before the fix:
Traceback (most recent call last):
File "/app/./cris_import.py", line 831, in <module>
main()
File "/app/./cris_import.py", line 108, in main
align_records_token = align_records(typed_token)
File "/app/./cris_import.py", line 622, in align_records
if util.fetch_target_record(pg, output_map, table, record_key_sql):
File "/app/lib/sql.py", line 169, in fetch_target_record
cursor.execute(sql)
File "/usr/local/lib/python3.10/dist-packages/psycopg2/extras.py", line 236, in execute
return super().execute(query, vars)
psycopg2.errors.UndefinedColumn: column "none" does not exist
LINE 5: ...id = 2 and public.atd_txdot_person.prsn_occpnt_pos_id = None
After the fix
Finding updated records
Skipping because unique key column is missing
Table: person
Missing key column: prsn_occpnt_pos_id
crash_id: 1*******0
unit_nbr: 2
prsn_nbr: 3
prsn_type_id: 2
prsn_occpnt_pos_id: None
Skipping because unique key column is missing
Table: person
Missing key column: prsn_occpnt_pos_id
crash_id: 1********0
unit_nbr: 2
prsn_nbr: 4
prsn_type_id: 2
prsn_occpnt_pos_id: None
root@85b604a1dcaa:/app#
Perfect! 🚢🚢🚢🚢
Associated issues
prsn_occpnt_pos_id
isnull
atd-data-tech#16823See https://austininnovation.slack.com/archives/CM9MK950S/p1713190489691989
TLDR: Older records in the person and primary person tables that are missing the
prsn_occpnt_pos_id
are breaking the SQL that we use to query and update records that receive updates from CRIS. This is one of the columns that we use to check if a person row is unique.Testing
URL to test:
Local
Steps to test:
prsn_occpnt_pos_id
column value logged as skippedReleasing this:
We should see the production image of the CR3 import update after merging this to
master
. GH action codeShip list