Connect requests for unknown channels don't trigger a client-side search #50

anjohnson · 2023-08-29T17:47:28Z

When a CA name-server sits in front of a CA Gateway, clients that use the name-server don't make new connections to PVs the gateway should be serving, it seems that the gateway only searches for and sets up a new VC for PV names that come in from a UDP search.

Our name-server knows the names published by a set of IOCs that live behind our "RBB" gateway, which has its client side on the RBB subnet and its server side on our "EP" subnet. CA clients from machines on the EP subnet can connect to the RBB IOC PVs just fine through the gateway. However we have CA clients in our "MCR" subnet which are being told (correctly) by the name-server to contact the RBB gateway for PVs from those IOCs, and some connect fine but others don't. The PVs that don't connect are the ones that no client in the EP subnet has tried to connect to yet, so there is no VC for them in the gateway. Running a caget for those PVs on an EP client sends a broadcast search which triggers the creation of the VC for that PV, and the MCR clients can then connect.

I plan to have the RBB gateway serve the MCR subnet as well to work around the problem so this isn't urgent, when I can find someone with spare time I'll try to have them take a look to see if it can be resolved though.

The text was updated successfully, but these errors were encountered:

ralphlange · 2023-08-29T20:33:09Z

If I understand your setup correctly, the clients do not send a TCP name resolution request to the Gateway, but only to the nameserver. In that case, the title of the issue is not correct. It should rather be "No name searches don't trigger a client-side search" - which sounds more like an expected behavior...

Using a nameserver works with regular IOCs, as their list of served channels is static. A TCP connection request for an existing channel always succeeds, no matter if there had been a name resolution request or not.
For a Gateway, the list is not static, and a name resolution request does more than a hash lookup. A TCP connection request will only succeed if there has been a name resolution request before.

I'm not sure if using a nameserver for Gateway PVs is a good and valid mode of operation after all. The Gateway's list is dynamic, and you can easily get into a situation where the nameserver points to the Gateway for a PV that the Gateway lost access to, starting a ping-pong between the nameserver pointing to the Gateway and the Gateway rejecting the connection attempt.

The easiest and IMHO conceptually cleanest workaround would be removing the Gateway PVs from the nameserver and adding the Gateway to the client EPICS_CA_NAME_SERVERS list. The Gateway works fine for search requests over TCP - if these actually reach it.

What you suggest is running CA without name resolution requests - that's at least how it looks from the Gateway's perspective. Is that covered by the CA protocol?

anjohnson · 2023-08-31T22:19:06Z

My clients were sending a UDP (not TCP) name resolution request to the name-server and then a TCP create channel request to the Gateway, so yes you were mostly right there. I had assumed that those were the same message, but I see that they aren't (and realize that would be bad). I don't see anything in the CA protocol description which requires a client to send a CA_PROTO_SEARCH before sending a CA_PROTO_CREATE_CHAN message with the same name — that is normally what happens, but nothing in the document says it has to. I wonder whether we should add a note about that?

Your statement about IOCs:

Using a nameserver works with regular IOCs, as their list of served channels is static. A TCP connection request for an existing channel always succeeds

isn't actually true when you consider server-side filters. An IOC's channel list is now potentially infinite, and even when the record and field names match a search operation doesn't actually check the field modifiers. That means an IOC can respond "yes I have that PV" to a search but not actually be able to create a channel using the filter requested. Thus any IOC can play the same ping-pong game anyway.

Adding the second network interface to our Gateway succeeds so this isn't a problem now, but I would still have preferred to be able to put the name-server in front of the gateway. That would prevent CA searches for PV names that the name-server knows don't exist from getting through to the protected subnet. The Gateway's regex rules can't encode every record name that those IOCs serve, while our name-server does actually know them all. However that wouldn't stop bad field names or channel filter specifications from getting through, so supporting my layout wouldn't block every DOS attack through the Gateway.

ralphlange · 2023-09-01T09:38:55Z

You are right about the chances of playing ping-pong with an IOC by asking for a PV with a wrong field name or filter.

Also, using a dedicated nameserver should be a legitimate mode of operation - in that case the regular IOC also gets the connect message on TCP before any search request.

The pvAttach() method of the PCAS API does have the option of doing asynchronous operation, so technically your request is doable.
I am afraid that this mode of operation has not been tested well in the last decade, though. If at all. In terms of added risk, this is probably relevant.

I was not aware that your know-it-all nameserver has a better understanding than the Gateway of which channels exist where. I wouldn't have expected that.
How do you sync things like temporarily unavailable record names?

If you (still...) think that the Gateway should have a mode that issues IOC side search requests based on clients trying to connect to unknown channels, please reopen.
I can create a branch you can play with and/or require yet another command line option to activate this behaviour - making it pretty safe for all others.

mdavidsaver · 2023-09-01T09:57:26Z

I don't see anything in the CA protocol description which requires a client to send a CA_PROTO_SEARCH before sending a CA_PROTO_CREATE_CHAN message with the same name ...

wrt. PVA. While this bit of semantics has never been spelled out, it is, and has been, required in practice. To maintain this consistency with CA, both iterations of PVA gateway, and pvAccessCPP, would also have to be changed. (maybe pvAccessJava too?)

pvAccessCPP has a long running... oddity that search before connect is only required when a Server has more than one ChannelProvider.

mdavidsaver · 2023-09-01T10:00:35Z

You are right about the chances of playing ping-pong with an IOC by asking for a PV with a wrong field name or filter.

To my mind, a better solution would be to integrate this list of explicit names into CA/PVA gateways with, or instead of, a PV List file. (I wonder how bad the performance would be to have an RE with an 10,000 names | d together?)

ralphlange · 2023-09-01T10:08:45Z

You are right about the chances of playing ping-pong with an IOC by asking for a PV with a wrong field name or filter.

To my mind, a better solution would be to integrate this list of explicit names into CA/PVA gateways with, or instead of, a PV List file. (I wonder how bad the performance would be to have an RE with an 10,000 names | d together?)

Not sure. We have existing Gateway instances with >100K records behind them. Times fields, times filters, that would create many millions of legit names.

Adding the EPICS database structure in the Gateway would be conceptionally wrong and couldn't cover IOCs having different definitions.

mdavidsaver · 2023-09-01T11:49:27Z

... that would create many millions of legit names.

Right, I was thinking (but not writing) to match prefix on record names. That is why I was thinking about the (lack of) performance for a ridiculously large regexp. (eg. (?:name:one|name:two)(?:\..*)? with 9999 more |)

Alternately, maybe @anjohnson 's nameserver can learn a "validation" or "filter" mode where it will pass through searches for valid names to an upstream server/gateway.

ralphlange · 2023-09-01T11:54:20Z

Alternately, maybe @anjohnson 's nameserver can learn a "validation" or "filter" mode where it will pass through searches for valid names to an upstream server/gateway.

A variation of that would be having the Gateway resolve names through the nameserver. (Obviously on a different port that returns the IOC location, not the Gateway location.)
That would shut down name resolution through the Gateway completely. Interesting.

mdavidsaver · 2023-09-01T12:21:11Z

Ah, yes. Move the nameserver behind the gateway... This seems like it should work so long as there is only one level of gateway.

anjohnson · 2023-09-01T21:34:08Z

It's going to be more efficient to put an omniscient name-server in front of the Gateway then behind it.

Move the nameserver behind the gateway...

We used to have name-servers behind our main office Gateways here, I'll have to ask why we removed them, the configuration did work. It may have been that we needed to support too many IOCs that didn't follow our production standards and weren't writing out their lists of record names.

Our new name-server is much more likely to continue to know all of the record names in the future because our IOCs now live in different subnets to the control-room workstations, so the operator screens and physics app's rely on the name-server for routing all channel names to their IOCs. Our MCR name-server currently reads over 700,000 record names from the dbl output files of 229 IOCs, and we still have many more IOCs to add to it (we're expecting well over 800 IOCs). The hash table is currently sized at a million entries but I expect to increase that soon.

nameserver can learn a "validation" or "filter" mode

The name-server already has that ability. If an initial name lookup in the hash table fails it can pattern-match the name, search for those that match inside its client subnets and associate any that resolve to the TCP:Port that serves them. Names that don't get found in the client subnets are added to a negative name cache so it doesn't try to search for them too frequently, but I don't know how long it keeps them there.

I was not aware that your know-it-all nameserver has a better understanding than the Gateway of which channels exist where. I wouldn't have expected that.
How do you sync things like temporarily unavailable record names?

The name-server monitors the dbl output files from all of the IOCs, so when an IOC reboots and rewrites that file it can immediately notice the update and reload all the names from it. It also maintains a CA connection to each IOC (preferably to the $(IOC):heartbeat PV which must appear in the file) so when the IOC goes down it will stop serving those names until that channel reconnects. We write the dbl file before iocInit for obvious reasons.

The name-server also detects any record names that are duplicates, which is useful.

@ralphlange if you think you can implement this search I would be interested, but it's not urgent and I can understand your wariness. Using a flag to enable it would be a good idea as I might only want to enable it on one or two specific gateways.

anjohnson closed this as completed Aug 31, 2023

ralphlange changed the title ~~Name searches from TCP don't trigger a client-side search~~ Connect requests for unknown channels don't trigger a client-side search Sep 1, 2023

anjohnson reopened this Sep 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connect requests for unknown channels don't trigger a client-side search #50

Connect requests for unknown channels don't trigger a client-side search #50

anjohnson commented Aug 29, 2023

ralphlange commented Aug 29, 2023

anjohnson commented Aug 31, 2023

ralphlange commented Sep 1, 2023

mdavidsaver commented Sep 1, 2023

mdavidsaver commented Sep 1, 2023

ralphlange commented Sep 1, 2023

mdavidsaver commented Sep 1, 2023

ralphlange commented Sep 1, 2023

mdavidsaver commented Sep 1, 2023

anjohnson commented Sep 1, 2023

Connect requests for unknown channels don't trigger a client-side search #50

Connect requests for unknown channels don't trigger a client-side search #50

Comments

anjohnson commented Aug 29, 2023

ralphlange commented Aug 29, 2023

anjohnson commented Aug 31, 2023

ralphlange commented Sep 1, 2023

mdavidsaver commented Sep 1, 2023

mdavidsaver commented Sep 1, 2023

ralphlange commented Sep 1, 2023

mdavidsaver commented Sep 1, 2023

ralphlange commented Sep 1, 2023

mdavidsaver commented Sep 1, 2023

anjohnson commented Sep 1, 2023