Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python issues: Splitting long text by folia2txt and FLAT in the custom software #106

Open
osherenko opened this issue Nov 15, 2022 · 1 comment

Comments

@osherenko
Copy link

osherenko commented Nov 15, 2022

  1. I've installed folia-utils and used the "folia2txt -s ..." from CLI to split a long string in sentences. Unfortunately, if I split the old Slavonic text "Искони бе Слово и Слово бе отъ Бога. и Богъ бе слово." in sentences I get the wrong answer
    Искони бе Слово и Слово бе отъ Бога. и
    Богъ бе слово.

    If I split an English text, it works just fine. 
  2. Is it possible to run FLAT not as a tab in an internet browser, but as a PySide widget?
    BTW, I can't import folia2html from the foliatools package in my Python script as I did with foliatools.folia2txt, foliatools.foliafreqlist, foliatools.foliatree. Nevertheless, I can run it from the CLI by "python.exe foliatools\folia2txt.py -s myannotation.xml"
@proycon
Copy link
Owner

proycon commented Nov 15, 2022

  1. I've installed folia-utils and used the "folia2txt -s ..." from CLI to split a long string in sentences.

folia2txt -s is not a proper sentence splitter, it simply assumes each line of a text file is already its own sentence!

For an actual tokeniser and sentence splitter with rich FoLiA support, consider ucto: https://github.com/LanguageMachines/ucto
Although it has no specific rules for Old Church Slavonic, but you can use the generic ruleset (named generic) or the russian one tokconfig-rus).

  1. Is it possible to run FLAT not as a tab in an internet browser, but as a PySide widget?

I hadn't heard of these until now so I don't know. I suppose if there's such a qt widget which holds a whole web browser, then yes.

BTW, I can't import folia2html from the foliatools package in my Python script as I did with foliatools.folia2txt, foliatools.foliafreqlist, foliatools.foliatree. Nevertheless, I can run it from the CLI by "python.exe foliatools\folia2txt.py -s myannotation.xml"

Hmm.. I see.. that should be probably be improved yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants