Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failures in chroot #135

Open
dbnicholson opened this issue Jan 4, 2017 · 14 comments
Open

Failures in chroot #135

dbnicholson opened this issue Jan 4, 2017 · 14 comments
Labels

Comments

@dbnicholson
Copy link

In our Endless image builder, we chroot into the ostree deployment to install apps with flatpak. The triggers always fail for 2 reasons:

  1. The slave mounting of / fails because the deployment directory is not actually a mountpoint. This is easily fixed by doing a bind mount before hand, but I think this can be done in bubblewrap, too. Systemd does this - https://github.com/systemd/systemd/blob/master/src/core/namespace.c#L910.

  2. pivot_root fails with EINVAL for reasons I can't quite grok. See https://github.com/torvalds/linux/blob/master/fs/namespace.c#L3035. FWIW, I can't really see why the pivot_root is needed. It seems that you could just build up the newroot, then move the mount over /. This is also what systemd does. It used to use pivot_root, but changed that in systemd/systemd@ac0930c.

@jlebon
Copy link
Contributor

jlebon commented Jan 4, 2017

This same issue also happens when running bubblewrap inside mock. I wrote a little compat script to make it work: https://gist.github.com/jlebon/fb6e7c6dcc3ce17d3e2a86f5938ec033

Of relevance to your (2):

# The parent of mount in which we'll chroot can't be shared
# or pivot_root will barf. So we just remount onto itself,
# but make sure to make the first parent mount private.

@dbnicholson
Copy link
Author

@jlebon Thank you! I could not figure out the right private/bind magic to make that happen. Let me try that out.

@cgwalters
Copy link
Collaborator

See some discussion on this in opencontainers/runc#41

@cgwalters
Copy link
Collaborator

That said, I think you should bubblewrap instead of chroot - we should support nested containerization if you're root, or the host has unprivileged userns enabled.

@alexlarsson
Copy link
Collaborator

The nice thing about pivot_root is that we can completely clean out any references to any mounts we didn't create from the sandbox. MS_MOVE would just cover them. This seems like a safer option.

@alexlarsson
Copy link
Collaborator

auto-creating a mountpoint for the root seems nice though

@dbnicholson
Copy link
Author

Yeah, I noticed that later searching around about pivot_root. I had a try hacking in the root mount, but it didn't quite work out.

@dbnicholson
Copy link
Author

In case anyone ever feels like picking this up, https://gist.github.com/dbnicholson/da8aa72ea3bd7ee8731c9da2792fd5a3 is what I played with before but didn't get working.

dbnicholson added a commit to endlessm/eos-image-builder that referenced this issue Mar 16, 2021
Bubblewrap uses pivot_root to provide a clean enviroment for its
sandbox. Unfortunately, pivot_root requires that current root mount and
its parent mount are not shared mounts, which they are by default when
making new mounts.

To accomplish that, make the chroot root mount private and then bind
mount the chroot on top of itself. This will guarantee that both
conditions are satisfied.

See containers/bubblewrap#135 for details
and the workaround suggested in
https://gist.github.com/jlebon/fb6e7c6dcc3ce17d3e2a86f5938ec033.

https://phabricator.endlessm.com/T14860
@safinaskar
Copy link

safinaskar commented Apr 28, 2023

I understand how to fix this. We need to bind mount / on /. Just doing C analog of mount --rbind / / will not work, because root directory of our process will still point to "old" root. In more precise terms: root directory (RTD) of our process (i. e. task_struct::fs.root ( https://elixir.bootlin.com/linux/v6.3/source/include/linux/fs_struct.h#L15 )) will still point to path ( https://elixir.bootlin.com/linux/v6.3/source/include/linux/path.h#L8 ) of "old" root, not "new" one. See also: https://elixir.bootlin.com/busybox/1.36.0/source/util-linux/switch_root.c#L356 .

So we need to do C analog of this: mount --rbind / /foo; cd /foo; mount --move . /; chroot ..

But this gives another problem: all filesystems mounted in original namespace will remain mounted inside bubblewrap, even after two pivot_roots. They will be inaccessible and hidden, but still mounted.

What is wrong with such situation? Consider this: we insert USB flash drive and mount it. Then start bubblewrap. Then we umount flash drive (in our host system). Actually it remained to be mounted in bubblewrap's namespace. So when we remove flash drive, we get data loss. What to do? We have two choices how to fix this situation:

  • Escape chroot. This can be done :)
  • Unmount as lot as possible

I like second solution more (but I can implement both). Second solution will look like this:

  • mount --rbind / /foo
  • Iterate over all mounts (using getmntent) (except for everything below /foo) and unmount
  • cd /foo; mount --move . /; chroot .

I can write patch if you want.

Also I can describe workaround for users of bubblewrap

@smcv
Copy link
Collaborator

smcv commented May 1, 2023

I can write patch if you want

If you think you know how to solve this, please do: reviewing a pull request and checking for things that can go wrong there will be a lot easier than reviewing a text description that is less precise than code.

@safinaskar
Copy link

@smcv , I spent some more time thinking about bubblewrap. Now I think bubblewrap don't need to work in chroot. Let me tell why. As well as I understand bubblewrap needs root privileges. And it acquires them in one of 3 ways:

  1. Bubblewrap already started as root
  2. Bubblewrap is setuid
  3. Bubblewrap creates new user namespace and thus becomes root

In 3rd way it is absolutely impossible to make bubblewrap to work in chroot, because this is prohibited in the kernel ( https://elixir.bootlin.com/linux/v6.2/source/kernel/user_namespace.c#L105 ). You can easily verify this by running this command (as root, not in chroot): chroot --userspec=1000:1000 /somedir unshare -r bash. The command will fail, because of the mentioned line in kernel sources.

So I think in 1st way and in 2nd way bubblewrap in chroot should not work, too. For consistency purposes. (But keep in mind that sometimes it still works.) So, I will not write any patch, I'm sorry about this. (Of course, you can try to convince me.)

Also, this will be very cool to disable setuid mode. I. e. simply to exit if we run as setuid binary. Because, as well as I understand, in modern distros user namespaces are enabled by default anyway

@smcv
Copy link
Collaborator

smcv commented May 4, 2023

As well as I understand bubblewrap needs root privileges

Not exactly, it needs CAP_SYS_ADMIN and various other capabilities(7) in its current namespace. "root" is specifically uid 0, but in the most common use-cases for bubblewrap, uid 1000 without capabilities (in the initial namespace) becomes uid 1000 with capabilities (in the new namespace), with no "root" involved.

If you're saying "root" but you really mean "CAP_SYS_ADMIN" (and other capabilities), it's probably easier to understand the constraints if we're as precise as possible about what is happening.

(Just to make this extra-confusing, there is a concept called a capabilities-based security system, but the capabilities(7) feature is using a different meaning for that word.)

  1. Bubblewrap creates new user namespace and thus becomes root

Again, this would be more accurately stated as: bubblewrap creates a new user namespace, and thus gains all capabilities(7) in that user namespace. It doesn't matter whether it's uid 0 in the new userns or not.

In 3rd way it is absolutely impossible to make bubblewrap to work in chroot, because this is prohibited in the kernel ( https://elixir.bootlin.com/linux/v6.2/source/kernel/user_namespace.c#L105 )

Right, yes.

So I think in 1st way and in 2nd way bubblewrap in chroot should not work, too. For consistency purposes.

For what you're calling the 1st way, where bubblewrap is started such that it already has elevated capabilities (most commonly by being root, for example sudo bwrap ...), it's not obvious to me that it shouldn't work. I am mostly only interested in running bubblewrap as an unprivileged user (for use-cases like Flatpak), but some of the other bubblewrap maintainers seem to value the ability to have elevated privileges (which is why issues like #518 and #551 stay open, instead of being closed as out-of-scope). But if you're doing that, then you could probably equally well use something else that is less limited than bwrap, like unshare and newuidmap; so I can see an argument that having bwrap provide that functionality is unnecessary.

For what you're calling the 2nd way, where bubblewrap is setuid root, the design principle is that bubblewrap shouldn't allow anything that an unprivileged user on a suitable kernel wouldn't be allowed to do. So I agree with your assertion that a setuid bubblewrap shouldn't work when run inside a chroot.

Also, this will be very cool to disable setuid mode. I. e. simply to exit if we run as setuid binary. Because, as well as I understand, in modern distros user namespaces are enabled by default anyway

Sorry, I don't understand "this will be very cool to".

Are you requesting a new feature: the ability to configure bubblewrap at build-time so that if it is run while setuid (as detected via AT_SECURE or by comparing real uid with effective uid), it will simply refuse to run and exit with an error? I have thought about that myself. If you or someone else can propose a pull request implementing that feature, I'll try to review it.

Or are you saying that something related to chroots would be a useful mechanism to use to provide that feature? If that, I don't understand: please be more specific.

@dbnicholson
Copy link
Author

I don't see any reason why case 1 (bubblewrap started by a privileged user inside a chroot) couldn't work. It's why I opened the issue and started working on fixes to make it comply with pivot_root, after all. Even to this day our image builder, which runs very privileged because it needs to do things like setup loop devices, goes through the mount dance before chrooting just so that bwrap won't fail when we try to install flatpaks within it.

This could certainly be deemed wontfix, but I think it's a legitimate use case.

@safinaskar
Copy link

Are you requesting a new feature: the ability to configure bubblewrap at build-time so that if it is run while setuid (as detected via AT_SECURE or by comparing real uid with effective uid), it will simply refuse to run and exit with an error?

I want bubblewrap to always refuse to run and exit with an error if it detects it runs as a setuid

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants