libtcb: skip privilege drop in unprivileged user namespaces#38
Conversation
|
Thank you @Blarse! Are these changes already in use in ALT's package perhaps? I was concerned at first that the Another approach I'd consider is only checking for |
"The unknown" includes possible future changes to the kernel, where the |
|
Hi,
Not yet, but I'm working on porting mkosi to ALT Linux and stumbled upon a systemd-sysusers segfault caused by this. I can confirm that this patch fixes it. Apparently, when useradd invoked libtcb and libtcb could not drop privileges after a failed setgroups() call, it left an empty shadow file, which in turn triggered another issue in ALT's downstream patch to systemd-sysusers that adds tcb support.
(For reference, virtiofsd hit the same setgroups + user namespace issue and applied a similar fix in !207.)
|
9f42bc3 to
ad67dc3
Compare
|
Thank you for revising the proc file reading/check per my request - this part passes my review now.
This looks like they're skipping only @ldv-alt Do we need to wait for your review here? |
|
I'd definitely prefer if the code would do the following:
|
You're right. According to the man page referenced above: The current PR effectively handles only case (b), where it makes no sense to switch uid/gid (though without the single line mapping check that the virtiofsd patch has). In case (a) the mapping can cover an arbitrary range, and I'll update the patch to skip only setgroups(0, NULL) when it's denied, while still calling ch_uid()/ch_gid(). tcb_gain_priv_r() needs a matching change to skip the setgroups() restoration in that case. |
Would something like this work for you? }
free(dir);
- res = getgroups(p->number_of_groups, p->grplist);
- if (res < 0 || res > p->number_of_groups)
- return -1;
-
- p->number_of_groups = res;
+ if (setgroups_allowed()) {
+ res = getgroups(p->number_of_groups, p->grplist);
+ if (res < 0 || res > p->number_of_groups)
+ return -1;
+ p->number_of_groups = res;
+ if (sys_setgroups(0, NULL) == -1)
+ return -1;
+ } else {
+ p->number_of_groups = -1;
+ }
- if (sys_setgroups(0, NULL) == -1)
- return -1;
if (!ch_gid(shadow_gid, &p->old_gid))
return -1;
if (!ch_uid(st.st_uid, &p->old_uid))And for tcb_gain_priv_r: return -1;
if (!ch_gid(p->old_gid, NULL))
return -1;
- if (sys_setgroups(p->number_of_groups, p->grplist) == -1)
+ if (p->number_of_groups >= 0 &&
+ sys_setgroups(p->number_of_groups, p->grplist) == -1)
return -1; |
I'd rather invoked |
Detect /proc/self/setgroups == "deny" to recognize an unprivileged user namespace where setgroups(2) is permanently denied by the kernel. No-op on non-Linux.
In a user namespace where /proc/self/setgroups is "deny", setgroups(2) is permanently rejected by the kernel. Perform the regular privilege drop and only tolerate sys_setgroups(0, NULL) failing with EPERM in such a namespace; in that case the kernel guarantees no supplementary group could have been gained via the namespace, so leaving the list in place is safe. Record this with a new PRIV_MAGIC_NOSETGROUPS state so that tcb_gain_priv_r() skips the matching setgroups() call. Fixes failures of pam_tcb, libnss_tcb, tcb_unconvert and shadow's shadowtcb_drop_priv() when running under rootless container.
ad67dc3 to
b8805df
Compare
I've updated the patch according to your suggestions. |
|
Hi, is there anything else I need to address here, or is this good to merge? |
When an unprivileged user sets up a user namespace without newuidmap, it must first write "deny" to /proc/self/setgroups before writing gid_map, after which setgroups() becomes permanently forbidden[1]. tcb_drop_priv_r() then trips over sys_setgroups(0, NULL) and bails out with EPERM, even though euid 0 inside the namespace is not real root and there is nothing to drop in the first place.
In my case this breaks mkosi tool that builds images inside an unprivileged userns, where any tcb-aware utility aborts on the setgroups step. The patch detects the "deny" state via a small setgroups_allowed() helper and short-circuits tcb_drop_priv_r() with PRIV_MAGIC_NONROOT, the same path already taken for non-root callers. No behavioural change outside user namespaces; non-Linux builds get a stub that always returns 1.
[1] https://man7.org/linux/man-pages/man7/user_namespaces.7.html