Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ubuntu 18.04, 18.10 - czech characters replaced with different ascii characters #196

Open
imlask opened this issue Dec 5, 2018 · 15 comments

Comments

@imlask
Copy link

imlask commented Dec 5, 2018

Since Ubuntu 18.04, the czech characters (and probably other international characters as well) are replaced with different ascii characters. E.g. 'š' is replaced with 'a', 'Š' is replaced with '`' (backtick), 'ž' is replaced with '~', etc. All other applications work correctly, Password Gorilla is the only one suffering the problem, so I don't blame any system misconfiguration.

@rich123
Copy link
Collaborator

rich123 commented Dec 5, 2018

Which version are you running: 1) the Ubuntu package, or 2) the latest tip of the pre160 branch?

If you are running #1 above, what Gorilla version is the package installing?

Where within Gorilla is the replacement happening? Is it in the UI, or is it in the stored password data fields, or is it both?

@imlask
Copy link
Author

imlask commented Dec 5, 2018

I am running Password Gorilla version 1.6.0 beta1 from Ubuntu package (I am on Xubuntu btw.). I observe incorrect names of groups and titles of the logins. The login notes are affected as well.

@rich123
Copy link
Collaborator

rich123 commented Dec 5, 2018

Do you have Ubuntu's native Tcl/Tk installed? I think the Ubuntu package is a source package, so I suspect you do from the auto-dependency handling.

So, can you open a terminal, and type "tclsh" (without the quotes). That should give you a % prompt. When you see that, type "encoding system" (without quotes) and it should reply with what it as set for the system character encoding. Let me know what "encoding system" replies.

If you get "command not found" (or something similar) to typing "tclsh" then you may need to ask the Ubuntu package installer to install Tcl/Tk.

@imlask
Copy link
Author

imlask commented Dec 5, 2018

Yes, I have Ubuntu's native Tcl/Tk installed. Here is the terminal output:
% encoding system
utf-8

@rich123
Copy link
Collaborator

rich123 commented Dec 5, 2018

Hmm, it should work with that setting. I was expecting to see something other than utf-8.

Can you try creating a brand new empty DB file, then filling in a random entry or two, with several of the czech characters that are causing problems with your main DB file, save it, close/restart gorilla, then reopen it, and see what happens?

@imlask
Copy link
Author

imlask commented Dec 5, 2018

I tried to create a brand new empty DB and the result is the same. Interestingly, when I type the czech characters, they show up correctly. Once the login data is saved, it immediately shows up with incorrect characters. Close/restart of gorrila has no effect.

@rich123
Copy link
Collaborator

rich123 commented Dec 5, 2018

Ok, that may give me something to look into then.

Can you type the czech characters here on Github that you used in the test DB (and do they show correctly)? To cover for the "do they show" could you also find the Unicode names for them from here: https://en.wikipedia.org/wiki/List_of_Unicode_characters (I don't know which sub-section they fall into). But having the names would mean there's no 'translation' problem over Github.

And could you attach the test DB with the same characters (change the password on it to something meaningless like 'qwerty' before posting it) so I can see what should have gone in vs. what actually ended up present.

@imlask
Copy link
Author

imlask commented Dec 5, 2018

In the test database attached, there is just one login with title "Šimon". In login notes, there are two lines with these characters:
ěščřžýáíéúů
ĚŠČŘŽÝÁÍÉÚŮ

Here are the letters from the first row as specified in the Unicode tables:
U+0115 | ĕ | 277 | &ebreve; | Latin Small Letter E with breve | 0213 (Latin Extended-A)
U+0161 | š | 353 | š | Latin Small Letter S with caron | 0289 (Latin Extended-A)
U+010D | č | 269 | č | Latin Small Letter C with caron | 0205 (Latin Extended-A)
U+0159 | ř | 345 | ř | Latin Small Letter R with caron | 0281 (Latin Extended-A)
U+017E | ž | 382 | ž | Latin Small Letter Z with caron | 0318 (Latin Extended-A)
U+00FD | ý | 0253 | ý | Latin Small Letter Y with acute | 0189 (Latin-1 Supplement)
U+00E1 | á | 0225 | á | Latin Small Letter A with acute | 0161 (Latin-1 Supplement)
U+00ED | í | 0237 | í | Latin Small Letter I with acute | 0173 (Latin-1 Supplement)
U+00E9 | é | 0233 | é | Latin Small Letter E with acute | 0169 (Latin-1 Supplement)
U+00FA | ú | 0250 | ú | Latin Small Letter U with acute | 0186 (Latin-1 Supplement)
U+016F | ů | 367 | ů | Latin Small Letter U with ring above | 0303 (Latin Extended-A)

The letters from the second row are capital versions of the letters in the first row. In both rows, Latin Extended-A characters show up correctly, Latin-1 Supplement characters are displayed incorrectly.

Testing database is zipped, password is 'test'.
test.zip

@imlask
Copy link
Author

imlask commented Dec 5, 2018

Sorry now I noticed I swapped the sections: Latin Extended-A characters show up incorrectly, Latin-1 Supplement characters are displayed correctly.

@rich123
Copy link
Collaborator

rich123 commented Dec 5, 2018

Does Ubuntu's packaged Gorilla install the Tcl sources? I think it does, but am unsure of that fact.

If it does, can you apply this patch below and then try creating a new DB file. I was able to recreate the same issue here using your example characters, and this patch fixed the issue for me locally, so I'm very hopeful it also will fix it for you as well. If all goes well, I'll push a new commit into Github with this change.

The patch is simple enough that while you could use 'patch' to apply it, editing by hand is almost as easy. You'd edit the file pwsafe/pwsafe-db.tcl and make the changes below. Lines prefixed with a plus (+) are added, lines prefixed with a minus (-) are removed. Lines without prefixes are unchanged context so you can locate the proper line in the file to edit.

diff --git a/sources/pwsafe/pwsafe-db.tcl b/sources/pwsafe/pwsafe-db.tcl
index 0c7f305..73afb5a 100644
--- a/sources/pwsafe/pwsafe-db.tcl
+++ b/sources/pwsafe/pwsafe-db.tcl
@@ -257,12 +257,13 @@ itcl::class pwsafe::db {
 
        #
        # Encrypt a field, so that we don't store anything in cleartext
        #
 
        private method encryptField {data} {
+               set data [encoding convertto utf-8 $data]
                set dataLen [string length $data]
                set msg [pwsafe::int::randomString 4]
                append msg [binary format I $dataLen]
                append msg $data
                incr dataLen 8
                if {($dataLen % 16) != 0} {
@@ -289,13 +290,13 @@ itcl::class pwsafe::db {
                        [string range $encryptedMsg [expr {16*$i}] [expr {16*$i+15}]]]
                }
                binary scan $decryptedMsg @4I msgLen
                set res [string range $decryptedMsg 8 [expr {7+$msgLen}]]
                pwsafe::int::randomizeVar decryptedMsg
 
-               return $res
+               return [encoding convertfrom utf-8 $res]
        }
 
        #
        # Accessors for our data members
        #
 

@imlask
Copy link
Author

imlask commented Dec 6, 2018

It looks like Ubuntu packaged Gorilla installs the Tcl sources - I found the mentioned file under /usr/shared/password-gorilla folder. However, the source file encryptField procedure doesn't match your patch. Here is the procedure code:

	private method encryptField {data} {
		set dataLen [string length $data]
		set msg [pwsafe::int::randomString 4]
		append msg [binary format I $dataLen]
		append msg $data
		incr dataLen 8
		if {($dataLen % 16) != 0} {
			set padLen [expr {16-($dataLen%16)}]
			append msg [pwsafe::int::randomString $padLen]
			incr dataLen $padLen
		}
		set blocks [expr {$dataLen/16}]
		set encryptedMsg ""
		for {set i 0} {$i < $blocks} {incr i} {
			append encryptedMsg [$engine encryptBlock \
			[string range $msg [expr {16*$i}] [expr {16*$i+15}]]]
		}
		pwsafe::int::randomizeVar msg
		return $encryptedMsg
	}

Is it a different version? Here is what apt gives me:
password-gorilla/cosmic,cosmic,now 1.6.0~git20180203.228bbbb-1 all

BTW I tried to apply the patch and missing lines manually, but I got error "unknown variable $decryptedMsg" when opening a database (obviously).

@rich123
Copy link
Collaborator

rich123 commented Dec 6, 2018

Revert back to your original file.

The first insertion, of "set data [encoding convertto utf-8 $data]" goes right below the line "private method encryptField {data} {" and above the line "set dataLen [string length $data]".

Then, if you look below where your quote ended above, you'll find a line starting "private method decryptField {encryptedMsg} {". If you find the matching closing bracket "}" you'll be one line below a line that reads "return $res". Change the "return $res" line to read "return [encoding convertfrom utf-8 $res]" and you should be good to test.

@imlask
Copy link
Author

imlask commented Dec 6, 2018

I apologize for inattention when applying the patch manually. The patch FIXED the issue for newly created items. Unfortunately, all the existing logins are showing up incorrectly. Anyway, there are only tens affected out of hundreds in my database.

Thank you very much for your prompt and professional approach.

@rich123
Copy link
Collaborator

rich123 commented Dec 6, 2018

Good news that the fix corrected the issue.
Unfortunately, yes, for existing items, the corruption has already occurred and been stored, so they will need to be fixed manually. Good that it was only a small number.

rich123 added a commit that referenced this issue Dec 6, 2018
explicitly convert them to a binary value.
@ajraymond
Copy link
Contributor

I will try updating the Debian/Ubuntu package soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants