feat: allow a port value of zero for service ports (auto-assign port)#3828
Conversation
* tcp/rest/websocket/metrics/discv5 default port changed to 0 * all CLI port types are now uint16 (instead of Port) * any port set to 0 on conf results in a random port bound * write back bound values to both WakuConf and WakuNode.ports * REST GET /info reports actually bound ports for enabled services * conf builders now err if any port config is unset * setupDiscoveryV5 returns Result and errors out on port 0 * rename setupAndStartDiscv5WithAutoPort to setupAndStartDiscv5 * updateWaku ENR rebuild now runs after discv5 startup * remove Port(0) conf from tests (0 is default) * add port = 0 to conf builder tests (conf builder has no default)
|
You can find the image built from this PR at Built from cd61419 |
There was a problem hiding this comment.
I would leave the good defaults here. Due these are a kind of commonly known ports of a delivery node.
But yes we should allow the user to set them to 0 or even add a new switch option like useRandomPorts.
WDYT?
cc: @igor-sirotin , @Ivansete-status
There was a problem hiding this comment.
I have no particular preference on any of the possible semantics here. Whatever is decided here I'll just push to the PR.
There was a problem hiding this comment.
I mean we have good default as for popular games back in time, everyone know which port to open on firewall :-). So it is predictable in the 98% of the cases... we try tackle those 2% now.
Also I would include into the PR that logos.dev/twn/ or any preset should default to port = 0s with use override possible. Hm?
There was a problem hiding this comment.
Even though that represents a breaking change I think this is the simplest option. Then, if a node operator is aiming to configure a firewall, etc, then s/he would explicitly set the ports of interest (as we do in our fleets.)
I suggest to make the description clear that 0 means auto-port lookup, or sth like that.
There was a problem hiding this comment.
I have pushed a commit now that changes the port=0 feature to only apply when an explicit --preset is given via the CLI. If we decide that that's not what we want, I'll revert it.
There was a problem hiding this comment.
Even though that represents a breaking change I think this is the simplest option. Then, if a node operator is aiming to configure a firewall, etc, then s/he would explicitly set the ports of interest (as we do in our fleets.)
I suggest to make the description clear that
0 means auto-port lookup, or sth like that.
Whoops, I just pushed a "applies only to --preset" commit. Should I revert it?
There was a problem hiding this comment.
haha no worries @fcecin
I suggest to revert it, yes.
That kind of internal conditions look nice now but we will forget in the future so the more explicit we are with the defaults, the better. I mean, I think config params should be independent, and the preset one is a weak helper that should be overridden by any other config param.
EDITED: preset is convenient because it sets good defaults but any of those "good defaults" should get overridden by any explicit config set by the user.
Ivansete-status
left a comment
There was a problem hiding this comment.
Some comments so far, thanks for the PR!
The only big concern is the change in the loop, in waku/node/waku_node.nim.
Cheers
| try: | ||
| let failRes = | ||
| await startMetricsServerAndLogging(buildMetricsConf(takenPort), 0'u16) | ||
| check failRes.isErr |
There was a problem hiding this comment.
Super nitpick sorry. Applies elsewhere.
| check failRes.isErr | |
| check failRes.isErr() |
| finally: | ||
| taken.stop() | ||
| await taken.closeWait() |
There was a problem hiding this comment.
Maybe we can have these snippets in defer: section. Another option could be having a tearDown (and setup) section for all tests.
| mb.withEnabled(true) | ||
| mb.withHttpPort(0'u16) | ||
| let metricsRes = await startMetricsServerAndLogging(mb.build().value.get(), 0'u16) | ||
| check metricsRes.isErr |
There was a problem hiding this comment.
What is being validated in this test?
| waku.conf.portsShift, | ||
| ) | ||
| ).valueOr: | ||
| raiseAssert error |
There was a problem hiding this comment.
I think is interesting to give some more context.
| raiseAssert error | |
| assert false, "failed setup discv5 in test: " & $error |
|
|
||
| # key and crypto modules different | ||
| type | ||
| BoundPorts* = object ## Set by the factory once each service has bound to a port. |
There was a problem hiding this comment.
Let's better have this in a dedicated module so that we keep waku_node "simple" :)
Also, important to have init proc, considering that it is an object type.
| BoundPorts* = object ## Set by the factory once each service has bound to a port. | |
| BoundPorts* {.requiresInit.} = object ## Set by the factory once each service has bound to a port. |
| enrAutoUpdate: b.enrAutoUpdate.get(true), | ||
| tableIpLimit: b.tableIpLimit.get(10), | ||
| udpPort: b.udpPort.get(9000.Port), | ||
| udpPort: b.udpPort.get(), |
There was a problem hiding this comment.
Maybe to make it a bit more explicit?
| udpPort: b.udpPort.get(), | |
| udpPort: b.udpPort.get(0), |
| MetricsServerConf( | ||
| httpAddress: b.httpAddress.get(static parseIpAddress("127.0.0.1")), | ||
| httpPort: b.httpPort.get(8008.Port), | ||
| httpPort: b.httpPort.get(), |
There was a problem hiding this comment.
| httpPort: b.httpPort.get(), | |
| httpPort: b.httpPort.get(0), |
| 60000.Port | ||
| if builder.p2pTcpPort.isNone(): | ||
| return err("p2pTcpPort is not specified") | ||
| let p2pTcpPort = builder.p2pTcpPort.get() |
There was a problem hiding this comment.
Perhaps?
| let p2pTcpPort = builder.p2pTcpPort.get() | |
| let p2pTcpPort = builder.p2pTcpPort.get(0) |
| let autoMode = serverPort == Port(0) | ||
| let attempts = if autoMode: AutoPortRetryCount else: 1 |
There was a problem hiding this comment.
It seems the algorithm in this proc for trial, while being great, it seems is a bit duplicated. Maybe we can create a template in auto_port.nim.
|
|
||
| #### Serialization and deserialization | ||
|
|
||
| proc writeValue*( |
There was a problem hiding this comment.
That's great. It would be interesting if we could have the chance to expose this information into the Messaging Debug API, if not done already ;P
| @[ | ||
| "--preset=logos.dev", | ||
| "--tcp-port=12345", | ||
| "--rest=true", | ||
| "--rest-port=23456", | ||
| "--metrics-server=true", | ||
| "--metrics-server-port=34567", | ||
| "--discv5-udp-port=45678", | ||
| "--websocket-support=true", | ||
| "--websocket-port=56789", | ||
| ], | ||
| ) | ||
| let wakuConf = wakuNodeConf.toWakuConf().valueOr: | ||
| raiseAssert error | ||
|
|
||
| check wakuConf.endpointConf.p2pTcpPort == Port(0) | ||
| check wakuConf.discv5Conf.isSome() | ||
| check wakuConf.discv5Conf.get().udpPort == Port(0) | ||
| check wakuConf.metricsServerConf.isSome() | ||
| check wakuConf.metricsServerConf.get().httpPort == Port(0) | ||
| check wakuConf.restServerConf.isSome() | ||
| check wakuConf.restServerConf.get().port == Port(0) | ||
| check wakuConf.webSocketConf.isSome() | ||
| check wakuConf.webSocketConf.get().port == Port(0) |
There was a problem hiding this comment.
I'm not pretty sure if this is ok.
Given that explicit config says --tcp-port=23456, I suspect the user wants it to be there.
How I see, we had good defaults so far, now the default should be 0 = auto select, but when I configure it explicitly the preset overrides?
How should the user than predict the port?
I think preset should only default to 0 autoselect if not other that good default (if we leave it).
We can think of it as an optional config (if not set we use autoselect when preset is also used).
Also this is even worse in case of rest-port... in order to use rest api I should gather the bound port from logs or how?
Same for metrics port, I see it as a node operator feature, so unlikely normal devs/users will handle it. Having it random makes Prometheus config a nightmare, imo.
There was a problem hiding this comment.
How I see, we had good defaults so far, now the default should be 0 = auto select, but when I configure it explicitly the preset overrides?
Yes, 0 performs auto and any explicit port config should override it.
How should the user than predict the port?
A desktop or mobile user doesn's need to specify the port.
Only server node operators will explicitly take care of the ports that need to be accessible and hence they will explicitly set values to the ports of interest.
Also this is even worse in case of rest-port... in order to use rest api I should gather the bound port from logs or how?
Yes, also by checking debug API results.
Same for metrics port, I see it as a node operator feature, so unlikely normal devs/users will handle it. Having it random makes Prometheus config a nightmare, imo.
For the case of metrics, I think node operators should lookup which port to configure anyways.
There was a problem hiding this comment.
I'm not pretty sure if this is ok. Given that explicit config says
--tcp-port=23456, I suspect the user wants it to be there. How I see, we had good defaults so far, now the default should be 0 = auto select, but when I configure it explicitly the preset overrides? How should the user than predict the port?
Sorry, this "--preset" feature was completely broken. I just reverted it back to "port 0 is the default for everything" which is what we ended up deciding to do.
Also this is even worse in case of rest-port... in order to use rest api I should gather the bound port from logs or how? Same for metrics port, I see it as a node operator feature, so unlikely normal devs/users will handle it. Having it random makes Prometheus config a nightmare, imo.
It's not usual for servers to auto-bind ports, but if we're going to go down that road, we'll have to deal with the fallout.
I'm thinking that if logos-delivery is linked as a library in the app, the user should call LMAPI procs to ask for the effective port configs (e.g. support arbitrary REST queries via LMAPI). It's the same as passing a config that has specific ports, but if the app doesn't care about the ports then it can just ask after the fact.
d4038a5 to
abaae88
Compare
* fix node info to use announcedAddresses again
* expose ports via Debug API ("MyBoundPorts")
* builders default to port 0 in discv5/metrics/p2pTcp
* add waku/net/bound_ports.nim
* move auto_port and net_config to waku/net/
* extract port-0 retry loop to tryWithAutoPort[T]
* misc refactors and fixes
|
I'm thinking here that maybe (app users == library users) and (node operators == CLI users). If that holds, then we could restore the previous non-zero port defaults to wakunode2 (or any other standalone "node" / CLI binary that front-ends the library for operators) and leave the zero/auto-port anti-port-collision feature for library use only. If so, we can also remove the REST changes in this PR. |
* apply auto-port/port-0 default to library users only * revert REST API additions * relax rest/ws conf builders to accept an unset port for consistency * apply fixes to tests and testing infra
Formerly I was suggesting a third WDYT? |
So maybe the idea would be:
So the mode selector drives the application profile (is this an application user, or is this a node operator), instead of the layer (wakunode2 vs. library). Is that it? EDIT: I think that #3845 basically settles this discussion: |
Ivansete-status
left a comment
There was a problem hiding this comment.
LGTM! Thanks for it! 🙌
Just adding some minor nitpicks that I hope you find useful.
Cheers.
| if bound.tcpPort.isSome(): | ||
| waku[].node.ports.tcp = some(bound.tcpPort.get().uint16) | ||
| if bound.websocketPort.isSome(): | ||
| waku[].node.ports.webSocket = some(bound.websocketPort.get().uint16) |
There was a problem hiding this comment.
Maybe better to explicitly set port zero as default? Or perhaps not needed.
| if bound.tcpPort.isSome(): | |
| waku[].node.ports.tcp = some(bound.tcpPort.get().uint16) | |
| if bound.websocketPort.isSome(): | |
| waku[].node.ports.webSocket = some(bound.websocketPort.get().uint16) | |
| waku[].node.ports.tcp = some(bound.tcpPort.get(0).uint16) | |
| waku[].node.ports.webSocket = some(bound.websocketPort.get(0).uint16) |
There was a problem hiding this comment.
The root issue here is that BoundPorts should arguably have uint16 fields, not Option[uint16] as it is now. Arguably, there's no benefit from differentiating "the bound port was reported as actually zero" (to catch extremely weird/rare errors) vs. "the port was not specified or the service is disabled, so the port is reported as zero here." I'll just remove the Option wrapper and use raw uint16.
| Port(getAutoPort()) | ||
| else: | ||
| startingPort | ||
| let res = await attempt(port) |
There was a problem hiding this comment.
Maybe set a .withTimeout? Not a big deal though
| lastErr = res.error | ||
| if autoMode: | ||
| return err("auto-port exhausted; last error: " & lastErr) | ||
| return err(lastErr) |
There was a problem hiding this comment.
| return err(lastErr) | |
| return err("reached last err in tryWithAuthoPort: " & lastErr) |
| var rng = initRand() | ||
|
|
||
| proc getAutoPort*(): uint16 = | ||
| uint16(rng.rand(AutoPortMin.int .. AutoPortMax.int)) |
There was a problem hiding this comment.
Maybe?
| var rng = initRand() | |
| proc getAutoPort*(): uint16 = | |
| uint16(rng.rand(AutoPortMin.int .. AutoPortMax.int)) | |
| proc getAutoPort*(): uint16 = | |
| let rng = initRand() | |
| uint16(rng.rand(AutoPortMin.int .. AutoPortMax.int)) |
There was a problem hiding this comment.
Ha! This proc is called literally just twice, and we can drop the memory for the RNG. Nice! Must be var to compile, tho.
| metrics*: Option[uint16] | ||
|
|
||
| proc init*(T: type BoundPorts): BoundPorts = | ||
| BoundPorts( |
There was a problem hiding this comment.
Super nitpick one, sorry
| BoundPorts( | |
| return BoundPorts( |
| var obj = newJObject() | ||
| for name, value in fieldPairs(p): | ||
| if value.isSome(): | ||
| obj[name] = %value.get() | ||
| return $obj |
There was a problem hiding this comment.
| var obj = newJObject() | |
| for name, value in fieldPairs(p): | |
| if value.isSome(): | |
| obj[name] = %value.get() | |
| return $obj | |
| return $(%*p) |
There was a problem hiding this comment.
By making BoundPorts drop the Option wrapper on the uint16 port values (see above) this change will fit well.
| info "Starting metrics HTTP server", serverIp = $serverIp, serverPort = $port | ||
|
|
||
| let server = MetricsHttpServerRef.new($serverIp, port).valueOr: | ||
| return err($error) |
There was a problem hiding this comment.
| return err($error) | |
| return err("fail to start service metrics server, attempt:" & $error) |
| try: | ||
| await server.start() | ||
| except CatchableError: | ||
| return err(getCurrentExceptionMsg()) |
There was a problem hiding this comment.
| return err(getCurrentExceptionMsg()) | |
| return err("exception while startMetricsServer, attempt: " & getCurrentExceptionMsg()) |
| return err(error) | ||
| let startRes = await wd.start() | ||
| if startRes.isErr(): | ||
| return err(startRes.error) |
There was a problem hiding this comment.
| return err(startRes.error) | |
| return err("failed to start discovery, attempt: " & startRes.error) |
NagyZoltanPeter
left a comment
There was a problem hiding this comment.
Approve not to block. Still have the impression that for metrics and rest server - if enabled - user must assign exact port or just use the good old default, we should not allow config port 0 for them and try assign random.
|
|
||
| info "Metrics HTTP server started", serverIp = $serverIp, serverPort = $serverPort | ||
| return ok(server) | ||
| let started = (await tryWithAutoPort[StartedMetricsServer](serverPort, attempt)).valueOr: |
There was a problem hiding this comment.
I think metrics server needs an explicit port set. It has no mean to assign an automatic port. It will avoid collision but will make prometheus data collection impossible or inconvenient, imo.
There was a problem hiding this comment.
Yes I agree with Zoltán.
The metrics ports will come from metricsServerConf , and this config is only set when using the cli wakunode2 app, i.e., not used from the library.
| allowOrigin: b.allowOrigin, | ||
| listenAddress: b.listenAddress.get(), | ||
| port: b.port.get(), | ||
| port: b.port.get(Port(0)), |
There was a problem hiding this comment.
Same as metrics server. How inconvenient to use a rest endpoint if you need to dig out the port of it from the logs?
I think we can do a further refinement of this PR. We could change this PR to support a config of "port=0" ("please, logos-delivery instance, auto-allocate a port for me") but we would not make the value of zero (auto-alloc) the default value of any server port parameter. Instead, if no value is given by the app, for any logos-delivery server port parameter, then logos-delivery just keeps the existing non-zero defaults (that would apply for all cases: both for the CLI and the library use case). I think that would accomplish two key things:
As for metrics vs. p2pTcp etc., we can't just do something different for every service port. If we tried to write auto-allocation of ports just for services that are enabled by default, like p2pTcp, and do something different for metrics, etc., the problem would persist: N components launch their own local-machine copies of If you are using (It's worth noting that "ports" being a scarce system-wide resource will always give heartburn to any P2P component system. P2P component systems can't live in the world of "server" metaphors where the port config is some known and pre-configured value by a centralized entity. We could instead have native pid:port routing or the like, achievable with a machine-wide socket/message "Connection/Message Broker" routing service -- which would actually own all the physical sockets -- and at least 10% of the reason why people use Docker for emulating P2P networks on a single machine would be gone... bottom line, there's no "100% clean" solution to this problem since the "communication port values are a system-wide resource known ahead of time per-instance" is fundamentally broken for P2P.) I'll wait for a decision here before merging; CC: @Ivansete-status @igor-sirotin ; I think the PR in its current form is mergeable, but requiring explicit port=0 arguments from the app stack for the auto-alloc feature is a clear improvement, IMO. |
Ivansete-status
left a comment
There was a problem hiding this comment.
Adding some more comments.
We can merge but is important to leave auto-port only for library, and keep fixed ports for app-config items such as metrics service or REST service.
| p2pListenAddress: IpAddress, | ||
| portsShift: uint16, | ||
| ): WakuDiscoveryV5 = | ||
| ): Result[WakuDiscoveryV5, string] = |
There was a problem hiding this comment.
Kindly add a comment in this proc explaining that this is public only for testing purposes and the exposed proc is actually: setupAndStartDiscv5.
| ## Update waku data that is set dynamically on node start | ||
| try: | ||
| (await updateWaku(waku)).isOkOr: | ||
| return err("Error in updateWaku: " & $error) |
There was a problem hiding this comment.
Sorry, nitpick one
| return err("Error in updateWaku: " & $error) | |
| return err("Error in startWaku: " & $error) |
| tcp: 0'u16, webSocket: 0'u16, rest: 0'u16, discv5Udp: 0'u16, metrics: 0'u16 | ||
| ) | ||
|
|
||
| proc toJsonString*(p: BoundPorts): string = |
There was a problem hiding this comment.
| proc toJsonString*(p: BoundPorts): string = | |
| proc `$`*(p: BoundPorts): string = |
|
|
||
| info "Metrics HTTP server started", serverIp = $serverIp, serverPort = $serverPort | ||
| return ok(server) | ||
| let started = (await tryWithAutoPort[StartedMetricsServer](serverPort, attempt)).valueOr: |
There was a problem hiding this comment.
Yes I agree with Zoltán.
The metrics ports will come from metricsServerConf , and this config is only set when using the cli wakunode2 app, i.e., not used from the library.
Yes. So, to get that done, I will first go ahead and execute here, on this PR, my last proposal above: at the builder level, the defaults are concrete nonzero port values, for all service ports, which is the existing behavior for both CLI and library. That fixes my earlier interpretation that we wanted automatic port selection by default for all ports all the time, which was a broken assumption. So this PR now mostly just adds the capability of automatically selecting a port for a service if a value of zero is given to it. No Port == 0 defaults here, anywhere. In the follow-up API Consistency PR #3866, we will take advantage of what @igor-sirotin specified: that the
... and it is inside the "createNode() for app developers" library API that we will implicitly pre-set the ports to a default value of "0", that is, automatic port selection. That way, we won't have to try and guess the user profile (advanced vs dev) from the interface being used (CLI vs library), which doesn't work -- one of the central points of the API Consistency issue is to fix that. Then, if the app developer still wants concrete ports, they can inform concrete ports in the This effectively "keeps auto-port only for library," in the sense that you must use one of the library APIs (maybe the only one, if we choose to eliminate
I will make it so PR #3866 will not apply a 0-port (auto-port) default for Metrics and REST via the createNode app-dev API. It will keep the default-port-0 (auto-port) for Discv5, p2p-tcp, and websocket port only. If we still get port collision problems after that, we can relax that and set 0-port for all services again, under the app-developer library API for createNode. If I still screwed up somewhere in my interpretation here, let me know. But I think that's right this time and that should cover everyone's reviews. |
* Remove auto-port default value (Port==0) from all services * Fixes from Ivan's latest review
Description
This PR adds the capability of specifying a port value of zero (0) for any service port, which results in the choosing of a free port, automatically, for that service.
NOTE: conf builder defaults remain concrete (60000 for p2pTcp, 9000 for discv5 UDP, 8000 for WebSocket, 8645 for REST, 8008 for metrics). Auto-port is an opt-in capability: callers pass Port(0) explicitly to get auto-bind.
For TCP, WebSocket, and REST the bind logic was prepared to work with a port of 0. For discv5 and metrics, port 0 can't be passed through directly because eth discv5 and nim-metrics don't echo the actually bound port back, so we add a retry loop on our side.
setupDiscoveryV5is now stricter: it returns Result and errs on udpPort==0. If config port is 0, must go through the newsetupAndStartDiscv5.Once any service has bound, its real port is written back into the running
WakuConfref and into a newBoundPortsstruct onWakuNode. The port value inBoundPortsfor a service is reported as 0 if the service is disabled.The ENR rebuild (
updateWaku) now runs after discv5 startup so the advertised multiaddresses reflect the actually-bound UDP port, not whatever was in the conf before (potentially "port=0").The Messaging Debug API also gains a
MyBoundPortsinfo item exposing the same bound port values as a JSON string for library consumers.Changes
Superseded PRs
closes #3822
Issue
iterates logos-co/logos-delivery-module#18
iterates #3819