-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
race condition on slow disk #349
Comments
Hmmm! Very interesting ... from the error I assume this is an error while trying to connect to a unix-based control socket. By "networked storage" you mean NFS or ...? (I have no idea how unix-sockets might work on such storage ;) ) |
yes
AWS EBS gp3 is mounted as the storage for a docker container It may sound complicated, but works surprisingly well. Across many applications this is the first issue I've encountered. https://github.com/cariaso/txtorcon/commits/main |
Obviously a delay isn't ever going to be the right thing (and, for Twisted code, So, I think what's really going on here is this: when Tor is launched, it takes some amount of time until we can connect to the control socket. Currently, that is determined by watching Tor's logs (e.g. https://github.com/meejah/txtorcon/blob/main/txtorcon/controller.py#L1280 looks for the "Opening control ..." line). I suspect what's happening is that on your "slow" disk, Tor is writing the control socket, printing that line to stdout, but the actual file hasn't been sync'd (or whatever) yet? So then immediately after that, txtorcon tries to connect, but there's no socket. So I can think of two "more proper" fixes:
The latter will make things more-robust, but also might fail slightly slower in some cases (oh well). I like that the latter thing doesn't have any special-case code (e.g. "is it a unix-socket?", parse file, etc, etc). |
https://github.com/magic-wormhole/magic-wormhole
uses txtorcon.
if I run
wormhole receive 3-some-code --tor --launch-tor
it will call into txtorcon.
However in my current environment 100% of the time it will quickly crash with the message
however if I add a
time.sleep(0.5)
after thetxtorcon/controller.py
line 360 callThe problem goes away. (Smaller sleeps seem to work, but I've not measured the exact threshold). I expect this is somehow related to the fact that I'm running off of networked storage.
Can anyone offer deeper insight into this, and perhaps a suitable solution.
The text was updated successfully, but these errors were encountered: