Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug in some NorbuKetaka imports #275

Closed
eroux opened this issue Sep 4, 2024 · 2 comments
Closed

bug in some NorbuKetaka imports #275

eroux opened this issue Sep 4, 2024 · 2 comments

Comments

@eroux
Copy link
Contributor

eroux commented Sep 4, 2024

Some NorbuKetaka imports have a problem with pagination. This comes from some errors in the csv files but they're easy to fix. The example I'm looking at is the pagination of volume 65 (I4PD4036) of W4PD502, currently in pecha I31F08544: https://github.com/OpenPecha-Data/I31F08544/blob/master/I31F08544.opf/layers/I4PD4036/Pagination.yml

If we look at s3://ocr.bdrc.io/NorbuKetaka2/W4PD502-I4PD4036.csv we see things like

W4PD502,I4PD4036,I4PD403682,1,ཆད་པ་གཡུང་དྲུང་གི་དབྱིངས་ཐམས་ཅད་ཡེ་ནས་སངས་རྒྱས་པའི་ཕྱིར། སེམས་ཀྱི་

which are technically incorrect because I4PD403682 is not an image name in I4PD4036 (see the image list). In that case the correct image file name is I4PD40360082. It is actually relatively easy to go from the wrong number to the correct one with the following algorithm:

  • start from I4PD403682
  • remove the prefix of the volume id: I4PD403682 - I4PD4036 = 82
  • pad with 0s to the left so that it's 4 digits long: 0082
  • add the prefix of the volume id again: I4PD40360082

Let's reimport the Norbuketaka etexts, if possible with the same pecha numbers (ie I31F08544)

eroux added a commit to OpenPecha/Norbu-Keta-eText-import that referenced this issue Sep 5, 2024
@ta4tsering
Copy link
Contributor

Hi @eroux I was gonna handle this bug but looks like you have already resolved this bug in the Norbu-Keta-eText-import repo. But there is still the need to re-import the etext right ? And with the same pecha number or pecha_id as before right ?

@eroux
Copy link
Contributor Author

eroux commented Sep 14, 2024

I've already reimported the pechas, see for instance https://github.com/OpenPecha-Data/I31F08544/commit/a04caa73b4ffd0a6c9bbea82da634dfa3331b25b#diff-cdf00ceee7565a2f1be4b4852ea304254dac8cc7321293f6a3854f583ab9976e I think we can close

@eroux eroux closed this as completed Sep 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants