-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loading of numpy types fails #7
Comments
Thanks for the very detailed summary. I am getting this when I run your script with
So both the vainilla and serialize are failing, right? |
Ignore the
But I do expect the "serialize-yaml" versions to work. They do serialize to SOMETHING (look at foo46.bin for instance), and that something looks something like what I'd expect to come from serializing a numpy object), but the loading of that obpect fails with a type error. serialize uses it's own custom Dumper/Loader I think, and it's probably not round-tripping numpy.array object because they reduce differently than others or something? |
We basically subclass Dumper and Loader. Take a look at yaml.py |
I basically removed the modified |
I would say "yes, it needs to be addressed"... numpy is in common use, and the underlying pyyaml correctly handles it. It's a surprising result that using serialize prevents that from working. If serialize is meant to make the job of using the various serialization formats easier, it shouldn't break anything. |
I did a small change that seems to fix this issue in a way that does not destroy the other functionality of By the way while it is great that pyyaml can deal with numpy array natively this is not the usual case. That is why serialize provides a way to create custom functions to dump/load per class. They are also sometimes more efficient. |
I'll give it a shot, probably tomorrow.
That's the entire reason I'm using serialize. It's very handy! Thank you. |
I got around to trying this. It works as written for the test case I gave you, but it causes errors in some other stuff in my larger application. I don't know if that's incompatibilities between the files on disk serialized with the older version (likely), or a bigger problem. I think there's a better solution that's guaranteed to be right, but is more work. I'll describe it below, and later I'll see about putting together an example. Ultimately you're trying to determine if a
If a classes I think the proper fix is to modify the "representer" behavior of a Dumper. Take a look at You'll want to make you're own representer class, and everytime that someone calls Then Where tag is a unique string that allows you to identify that it's "serialized-encoded for class XYZ". On the load side, you'd sort of do what you're doing in the released version today, but instead of just checking if |
Sounds like a really good plan. We might need to add a hook to |
Documented in hgrecco#7 serialize was overloading the object encoding/decoding, but didn't properly differentiate itself from the built-in object handlers in pyyaml. pyaml has a tagging facility to deal with this sort of thing, so now serialize-handled classes are handled explicitly with their own unique tag. This is a breaking change, previously-serialized data will no longer decode correctly. This adds a register_class callback for formats, and yaml implements that to register those classes with the built-in serialize.yaml.Dumper/Loader for tagging/encoding.
I mocked this up in jtbraun@abe54f4 there's a couple of outstanding things that need fixed in serialize.yaml.py... the tag needs to be confirmed/changed, and the deep=? flag needs to be set correctly. Both are marked with #BUGBUG. This is a breaking change, previously dumps data may not load, and newly dumped data won't load with old versions. |
I like the code and I think is the way to go. My suggestions:
and the I merge one other change that I have and we publish a new version. |
Tests added, yaml_legacy there (I backed out your other changes to it that were never released, they broke some of the tests anyway). pushed to that same fix_pickle branch for your review. |
Looks good. Make a PR against my repo and I will merge it. |
Attached is a small example.
PyYaml itself is capable of loading/dumping numpy data. When called through serialize, though, RepresenterErrors are thrown.
It's possible that this is a problem with numpy itself, but given that numpy works through the yaml.Loader, I figure it's something to do with serialize.
I used python 3.6.8 to reproduce these.
pyyaml_numpy_failure.zip
The text was updated successfully, but these errors were encountered: