Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text encoding #12

Open
singpolyma opened this issue Jul 9, 2011 · 4 comments
Open

Text encoding #12

singpolyma opened this issue Jul 9, 2011 · 4 comments

Comments

@singpolyma
Copy link

Since all text in xapian is utf-8, strings coming back out of xapian-fu should be encoded in utf-8 (probably just by calling force_encoding('utf-8') on strings as they come out)

Right now the strings come out marked as local encoding, but are actually utf-8, and this causes some problems.

@djanowski
Copy link
Collaborator

What if you set Encoding.default_external?

@singpolyma
Copy link
Author

Sure, I can get around it, but the point is that since all of the data is always in fact going to be UTF-8, the library should honour that.

@djanowski
Copy link
Collaborator

I guess that's right, as long as Xapian always stores/returns UTF-8.

What should we do when storing? Should an exception be raised if the string is not UTF-8?

@singpolyma
Copy link
Author

I'm not sure how the Xapian bindings handle things, but if they just use the raw bytestream and assume it's UTF-8 (because, yes, Xapian alwas stores/returns in UTF-8) then you should probably call .encode('utf-8') and if there's a problem ruby will throw the exception for you :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants