Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connect requests for unknown channels don't trigger a client-side search #50

Open
anjohnson opened this issue Aug 29, 2023 · 10 comments
Open

Comments

@anjohnson
Copy link
Member

When a CA name-server sits in front of a CA Gateway, clients that use the name-server don't make new connections to PVs the gateway should be serving, it seems that the gateway only searches for and sets up a new VC for PV names that come in from a UDP search.

Our name-server knows the names published by a set of IOCs that live behind our "RBB" gateway, which has its client side on the RBB subnet and its server side on our "EP" subnet. CA clients from machines on the EP subnet can connect to the RBB IOC PVs just fine through the gateway. However we have CA clients in our "MCR" subnet which are being told (correctly) by the name-server to contact the RBB gateway for PVs from those IOCs, and some connect fine but others don't. The PVs that don't connect are the ones that no client in the EP subnet has tried to connect to yet, so there is no VC for them in the gateway. Running a caget for those PVs on an EP client sends a broadcast search which triggers the creation of the VC for that PV, and the MCR clients can then connect.

I plan to have the RBB gateway serve the MCR subnet as well to work around the problem so this isn't urgent, when I can find someone with spare time I'll try to have them take a look to see if it can be resolved though.

@ralphlange
Copy link
Contributor

If I understand your setup correctly, the clients do not send a TCP name resolution request to the Gateway, but only to the nameserver. In that case, the title of the issue is not correct. It should rather be "No name searches don't trigger a client-side search" - which sounds more like an expected behavior...

Using a nameserver works with regular IOCs, as their list of served channels is static. A TCP connection request for an existing channel always succeeds, no matter if there had been a name resolution request or not.
For a Gateway, the list is not static, and a name resolution request does more than a hash lookup. A TCP connection request will only succeed if there has been a name resolution request before.

I'm not sure if using a nameserver for Gateway PVs is a good and valid mode of operation after all. The Gateway's list is dynamic, and you can easily get into a situation where the nameserver points to the Gateway for a PV that the Gateway lost access to, starting a ping-pong between the nameserver pointing to the Gateway and the Gateway rejecting the connection attempt.

The easiest and IMHO conceptually cleanest workaround would be removing the Gateway PVs from the nameserver and adding the Gateway to the client EPICS_CA_NAME_SERVERS list. The Gateway works fine for search requests over TCP - if these actually reach it.

What you suggest is running CA without name resolution requests - that's at least how it looks from the Gateway's perspective. Is that covered by the CA protocol?

@anjohnson
Copy link
Member Author

My clients were sending a UDP (not TCP) name resolution request to the name-server and then a TCP create channel request to the Gateway, so yes you were mostly right there. I had assumed that those were the same message, but I see that they aren't (and realize that would be bad). I don't see anything in the CA protocol description which requires a client to send a CA_PROTO_SEARCH before sending a CA_PROTO_CREATE_CHAN message with the same name — that is normally what happens, but nothing in the document says it has to. I wonder whether we should add a note about that?

Your statement about IOCs:

Using a nameserver works with regular IOCs, as their list of served channels is static. A TCP connection request for an existing channel always succeeds

isn't actually true when you consider server-side filters. An IOC's channel list is now potentially infinite, and even when the record and field names match a search operation doesn't actually check the field modifiers. That means an IOC can respond "yes I have that PV" to a search but not actually be able to create a channel using the filter requested. Thus any IOC can play the same ping-pong game anyway.

Adding the second network interface to our Gateway succeeds so this isn't a problem now, but I would still have preferred to be able to put the name-server in front of the gateway. That would prevent CA searches for PV names that the name-server knows don't exist from getting through to the protected subnet. The Gateway's regex rules can't encode every record name that those IOCs serve, while our name-server does actually know them all. However that wouldn't stop bad field names or channel filter specifications from getting through, so supporting my layout wouldn't block every DOS attack through the Gateway.

@ralphlange
Copy link
Contributor

You are right about the chances of playing ping-pong with an IOC by asking for a PV with a wrong field name or filter.

Also, using a dedicated nameserver should be a legitimate mode of operation - in that case the regular IOC also gets the connect message on TCP before any search request.

The pvAttach() method of the PCAS API does have the option of doing asynchronous operation, so technically your request is doable.
I am afraid that this mode of operation has not been tested well in the last decade, though. If at all. In terms of added risk, this is probably relevant.

I was not aware that your know-it-all nameserver has a better understanding than the Gateway of which channels exist where. I wouldn't have expected that.
How do you sync things like temporarily unavailable record names?

If you (still...) think that the Gateway should have a mode that issues IOC side search requests based on clients trying to connect to unknown channels, please reopen.
I can create a branch you can play with and/or require yet another command line option to activate this behaviour - making it pretty safe for all others.

@ralphlange ralphlange changed the title Name searches from TCP don't trigger a client-side search Connect requests for unknown channels don't trigger a client-side search Sep 1, 2023
@mdavidsaver
Copy link

I don't see anything in the CA protocol description which requires a client to send a CA_PROTO_SEARCH before sending a CA_PROTO_CREATE_CHAN message with the same name ...

wrt. PVA. While this bit of semantics has never been spelled out, it is, and has been, required in practice. To maintain this consistency with CA, both iterations of PVA gateway, and pvAccessCPP, would also have to be changed. (maybe pvAccessJava too?)

pvAccessCPP has a long running... oddity that search before connect is only required when a Server has more than one ChannelProvider.

@mdavidsaver
Copy link

You are right about the chances of playing ping-pong with an IOC by asking for a PV with a wrong field name or filter.

To my mind, a better solution would be to integrate this list of explicit names into CA/PVA gateways with, or instead of, a PV List file. (I wonder how bad the performance would be to have an RE with an 10,000 names | d together?)

@ralphlange
Copy link
Contributor

You are right about the chances of playing ping-pong with an IOC by asking for a PV with a wrong field name or filter.

To my mind, a better solution would be to integrate this list of explicit names into CA/PVA gateways with, or instead of, a PV List file. (I wonder how bad the performance would be to have an RE with an 10,000 names | d together?)

Not sure. We have existing Gateway instances with >100K records behind them. Times fields, times filters, that would create many millions of legit names.

Adding the EPICS database structure in the Gateway would be conceptionally wrong and couldn't cover IOCs having different definitions.

@mdavidsaver
Copy link

... that would create many millions of legit names.

Right, I was thinking (but not writing) to match prefix on record names. That is why I was thinking about the (lack of) performance for a ridiculously large regexp. (eg. (?:name:one|name:two)(?:\..*)? with 9999 more |)

Alternately, maybe @anjohnson 's nameserver can learn a "validation" or "filter" mode where it will pass through searches for valid names to an upstream server/gateway.

@ralphlange
Copy link
Contributor

Alternately, maybe @anjohnson 's nameserver can learn a "validation" or "filter" mode where it will pass through searches for valid names to an upstream server/gateway.

A variation of that would be having the Gateway resolve names through the nameserver. (Obviously on a different port that returns the IOC location, not the Gateway location.)
That would shut down name resolution through the Gateway completely. Interesting.

@mdavidsaver
Copy link

Ah, yes. Move the nameserver behind the gateway... This seems like it should work so long as there is only one level of gateway.

@anjohnson
Copy link
Member Author

It's going to be more efficient to put an omniscient name-server in front of the Gateway then behind it.

Move the nameserver behind the gateway...

We used to have name-servers behind our main office Gateways here, I'll have to ask why we removed them, the configuration did work. It may have been that we needed to support too many IOCs that didn't follow our production standards and weren't writing out their lists of record names.

Our new name-server is much more likely to continue to know all of the record names in the future because our IOCs now live in different subnets to the control-room workstations, so the operator screens and physics app's rely on the name-server for routing all channel names to their IOCs. Our MCR name-server currently reads over 700,000 record names from the dbl output files of 229 IOCs, and we still have many more IOCs to add to it (we're expecting well over 800 IOCs). The hash table is currently sized at a million entries but I expect to increase that soon.

nameserver can learn a "validation" or "filter" mode

The name-server already has that ability. If an initial name lookup in the hash table fails it can pattern-match the name, search for those that match inside its client subnets and associate any that resolve to the TCP:Port that serves them. Names that don't get found in the client subnets are added to a negative name cache so it doesn't try to search for them too frequently, but I don't know how long it keeps them there.

I was not aware that your know-it-all nameserver has a better understanding than the Gateway of which channels exist where. I wouldn't have expected that.
How do you sync things like temporarily unavailable record names?

The name-server monitors the dbl output files from all of the IOCs, so when an IOC reboots and rewrites that file it can immediately notice the update and reload all the names from it. It also maintains a CA connection to each IOC (preferably to the $(IOC):heartbeat PV which must appear in the file) so when the IOC goes down it will stop serving those names until that channel reconnects. We write the dbl file before iocInit for obvious reasons.

The name-server also detects any record names that are duplicates, which is useful.

@ralphlange if you think you can implement this search I would be interested, but it's not urgent and I can understand your wariness. Using a flag to enable it would be a good idea as I might only want to enable it on one or two specific gateways.

@anjohnson anjohnson reopened this Sep 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants