Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proxy WS requests to chromium instead of returning its URL directly #8

Merged
merged 7 commits into from
Aug 30, 2024

Conversation

nadiamoe
Copy link
Member

@nadiamoe nadiamoe commented Jul 23, 2024

On recent chromium versions, allegedly at some point after 126.0.6478.126-r0, chromium has stopped honoring --remote-debugging-address, specifically --remote-debugging-address=0.0.0.0. Before this PR, Crocochrome assumed chromium would honor this flag and forwarded chromium's debugging URL to clients untouched, confident that clients would be able to reach that URL.

As this is no longer true, we now need to do some plumbing to somehow forward requests to Chromium.

This PR implements a websocket proxy that, given a session ID present in the URL, looks up the chromium URL for that session and proxies the connection to that URL. To make this possible, a couple satellite changes were required:

  • Adding a Session(id string) endpoint to Supervisor to fetch existing sessions by ID after they are created
  • Thus, renaming the previous Session() method that created the session to Create()
  • Storing session metadata in memory so Session(id string) can retrieve it, in addition to the cancel func that was already stored
  • Replacing the URL sent in the response to that of the proxy endpoint
  • The proxy endpoint itself
  • Additionally, upgrade alpine back to 3.20.0 (reverting Dockerfile: downgrade alpine to 3.19.2 #7, which was the work around for the original problem)

I have tested this manually for now, until we add the integration test to CI/CD. This will happen in a subsequent PR as there is still some plumbing left to do to get there.


Related slack thread: https://raintank-corp.slack.com/archives/C073DULU8TE/p1721663571781409

@nadiamoe nadiamoe marked this pull request as ready for review July 23, 2024 09:36
@nadiamoe nadiamoe requested a review from a team as a code owner July 23, 2024 09:36
@nadiamoe
Copy link
Member Author

Runner integration tests also now pass when using a build from this branch:

13:06:00 ~/Devel/sm-k6-runner $> TEST_CROCOCHROME_IMAGE=test.local/crocochrome go test ./internal/integration -test.run=TestIntegration/browser_test
ok  	github.com/grafana/sm-k6-runner/internal/integration	8.326s

Copy link
Contributor

@mem mem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of minor things.

http/http.go Outdated Show resolved Hide resolved
crocochrome.go Outdated Show resolved Hide resolved
crocochrome.go Outdated Show resolved Hide resolved
http/http.go Show resolved Hide resolved
// Currently, creating a new session will terminate other existing ones, if present. Clients should not rely on this
// behavior and should delete their sessions when they finish. If a session has to be terminated when a new one is
// created, an error is logged.
func (s *Supervisor) Session() (SessionInfo, error) {
func (s *Supervisor) Create() (SessionInfo, error) {
s.sessionsMtx.Lock()
defer s.sessionsMtx.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'm trying to follow what you said elsewhere.

This whole function runs with the mutex held.

That means that adding the session to the map on line 137 only has an effect if this function returns early.

The goroutine on lines 147-181 does not interact with the session.

The entry is removed on line 189.

On line 193 onwards is the only place where sess is mutated.

So the only place where it matters that this is already in place is on line 120.

It's OK to call Delete with an id that is not present on the map, because Delete also takes the mutex.

It looks like it should be possible to defer adding the session to the map down to line ~ 193.

So, having a pointer in the map allows me to mutate the object, but I'm missing a place where mutating the object in the map is necessary.

Basically what I'm trying to wrap my head around is whether returning from line 166 with an error causes the context to be cancelled. Unless I'm missing something, that doesn't happen. The context will be cancelled after the timeout, which will cause the session to be removed from the map, but that's it (line 213). What I think is that in that case the session shouldn't have been added to the map in the first place.

I'm just trying to clarify whether I'm missing something here.

Copy link
Member Author

@nadiamoe nadiamoe Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole function runs with the mutex held.
That means that adding the session to the map on line 137 only has an effect if this function returns early.

Correct

The goroutine on lines 147-181 does not interact with the session.
Basically what I'm trying to wrap my head around is whether returning from line 166 with an error causes the context to be cancelled. Unless I'm missing something, that doesn't happen. The context will be cancelled after the timeout, which will cause the session to be removed from the map, but that's it (line 213). What I think is that in that case the session shouldn't have been added to the map in the first place.

Also correct. This is not needed because if creating the process fails, then connecting to chromium will necessarily fail and trigger deletion anyway:

crocochrome/crocochrome.go

Lines 186 to 191 in e8f7d6a

version, err := s.cclient.Version(versionCtx, net.JoinHostPort("localhost", s.opts.ChromiumPort))
if err != nil {
logger.Error("could not get chromium info", "err", err)
s.delete(id) // We were not able to connect to chrome, the session is borked.
return SessionInfo{}, err
}

The entry is removed on line 189.
On line 193 onwards is the only place where sess is mutated.
It's OK to call Delete with an id that is not present on the map, because Delete also takes the mutex.

Correct

It looks like it should be possible to defer adding the session to the map down to line ~ 193.

Right now, the cancel func needs to be on the map so line 189 can cancel it if connecting to chromium fails:

s.delete(id) // We were not able to connect to chrome, the session is borked.

I think you are implying that we could simplify this by not registering the object on the map before registering the AfterFunc, and I think you are right. Let me prototype that and see if the tests pass.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried this out in the last commit, tests are happy and I think it makes more sense. LMK what you think!

s.sessions[id] = cancel
sess := &session{cancel: cancel}

s.sessions[id] = sess

context.AfterFunc(ctx, func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cancel on line 216 causes this function to be called, which causes (L#144) Delete to be called, which calls delete again. What is preventing the loop? (and the deadlock) Shouldn't the order of lines 216 and 217 be swapped?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supervisor.delete is always called with the mutex held, so cancelling the context is safe and won't race with Supervisor.Delete on the AfterFunc on L139. When the mutex is released. Supervisor.Delete will be a noop as L217 removed the entry from the map.

This is not particularly great but we really really want timeouts to cancel and remove the session from the map, and I wasn't able to figure out a less confusing way of doing it unfortunately :(

@nadiamoe nadiamoe merged commit f6e202c into main Aug 30, 2024
4 checks passed
@nadiamoe nadiamoe deleted the wsproxy branch August 30, 2024 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants