Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip CRIS import record updates if a key column is missing a value #1437

Merged
merged 4 commits into from
Apr 15, 2024

Conversation

mddilley
Copy link
Contributor

@mddilley mddilley commented Apr 15, 2024

Associated issues

TLDR: Older records in the person and primary person tables that are missing the prsn_occpnt_pos_id are breaking the SQL that we use to query and update records that receive updates from CRIS. This is one of the columns that we use to check if a person row is unique.

Testing

URL to test:
Local

Steps to test:

  1. Ask me for a zip file that I created that contains csvs with only one crash and its related person, primary person, and unit records that can replicate the original bug 🪄🪱 You can see our current values for the people records in the test csv with the following query (which some of these appear to be duplicates):
SELECT
	atc.crash_id,
	atc.crash_date,
	atp.prsn_nbr,
	atp.unit_nbr,
	atp.prsn_type_id,
	atp.prsn_occpnt_pos_id
FROM
	atd_txdot_person atp
	LEFT JOIN atd_txdot_crashes atc ON atp.crash_id = atc.crash_id
WHERE
	atc.crash_id = 14003860;
  1. See readme for local testing instructions
  2. Run the import on the zip and you should see the person records that are missing the prsn_occpnt_pos_id column value logged as skipped

Releasing this:
We should see the production image of the CR3 import update after merging this to master. GH action code


Ship list

  • Check migrations for any conflicts with latest migrations in master branch
  • Confirm Hasura role permissions for necessary access
  • Code reviewed
  • Product manager approved

for key_column in key_columns:
key_column_value = source[key_column]
if key_column_value == None:
print("\nSkipping because unique key column is missing")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super open to whatever amount of logging we want here. 🙏 I used this because it works good for testing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks good to me. maybe add another print here that prints the entire source, so that we have some hope of tracking this record down?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mike pointed out that we don't want to print people records to our logs. i retract this suggestion 🙏

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chatted with John about this and decided to keep as is so we don't print out sensitive people data. 🔒

for source in imported_records:
# Check unique key columns to make sure they all have a value
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed specifically looking at people tables, but this should protect us from updating records in other tables when we don't have all the info we need to make sure we are updating the correct record.

Drop a CRIS extract zip file into your development folder as described above, and run the import script:
```bash
docker compose run cris-import
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙌

# If we are missing a column that uniquely identifies the record, we should skip the update
if should_skip_update:
for key_column in key_columns:
print(f"{key_column}: {source[key_column]}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you can remove this print since you have the more verbose one above 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we aren't printing the source record, i'll keep this here since it will gives us the details to track down which record was skipped no matter what table it is in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you!!!

Copy link
Member

@johnclary johnclary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have not tested, but this looks fantastic too me. thank you @mddilley!! 🚢 🚢 🚢 🚢 🚢

# Check unique key columns to make sure they all have a value
for key_column in key_columns:
key_column_value = source[key_column]
if key_column_value == None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super minor, but i always see this test written as is None instead of == None. there is some arcane pythonic reason for it that i am too lazy to look up 😅

Copy link
Contributor Author

@mddilley mddilley Apr 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds great 🐍 I'll push this up along with the other minor ones above. Thanks! 🙏

Copy link
Member

@frankhereford frankhereford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before the fix:

Traceback (most recent call last):
  File "/app/./cris_import.py", line 831, in <module>
    main()
  File "/app/./cris_import.py", line 108, in main
    align_records_token = align_records(typed_token)
  File "/app/./cris_import.py", line 622, in align_records
    if util.fetch_target_record(pg, output_map, table, record_key_sql):
  File "/app/lib/sql.py", line 169, in fetch_target_record
    cursor.execute(sql)
  File "/usr/local/lib/python3.10/dist-packages/psycopg2/extras.py", line 236, in execute
    return super().execute(query, vars)
psycopg2.errors.UndefinedColumn: column "none" does not exist
LINE 5: ...id = 2 and public.atd_txdot_person.prsn_occpnt_pos_id = None

After the fix

Finding updated records

Skipping because unique key column is missing
Table: person
Missing key column: prsn_occpnt_pos_id
crash_id: 1*******0
unit_nbr: 2
prsn_nbr: 3
prsn_type_id: 2
prsn_occpnt_pos_id: None

Skipping because unique key column is missing
Table: person
Missing key column: prsn_occpnt_pos_id
crash_id: 1********0
unit_nbr: 2
prsn_nbr: 4
prsn_type_id: 2
prsn_occpnt_pos_id: None
root@85b604a1dcaa:/app# 

Perfect! 🚢🚢🚢🚢

@mddilley mddilley merged commit 0a66f60 into master Apr 15, 2024
9 checks passed
@mddilley mddilley deleted the md-patch-cris-import-prsn branch April 15, 2024 21:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] CRIS import fails when prsn_occpnt_pos_id is null
3 participants