Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] bwrap in LXC #362

Closed
smtalk opened this issue Mar 30, 2020 · 26 comments
Closed

[Question] bwrap in LXC #362

smtalk opened this issue Mar 30, 2020 · 26 comments

Comments

@smtalk
Copy link

smtalk commented Mar 30, 2020

Hello,

Is bwrap suitable for sandboxing apps/users in LXC environment? If yes - any special flag for it?

# su - test -s /usr/bin/bwshell
Last login: Mon Mar 30 20:30:16 EEST 2020 on pts/2
bwrap: pivot_root: Permission denied
[pid 14230] pivot_root("/tmp", "oldroot") = -1 EACCES (Permission denied)
[pid 14230] write(2, "bwrap: ", 7bwrap: )      = 7
[pid 14230] write(2, "pivot_root", 10pivot_root)  = 10
[pid 14230] write(2, ": Permission denied\n", 20: Permission denied
@smcv
Copy link
Collaborator

smcv commented Mar 30, 2020

LXC probably doesn't leave it with enough privileges to run successfully. In general, "nesting" containers is harder to do successfully or securely than creating a single level of container on a "bare metal" machine or VM.

@smtalk
Copy link
Author

smtalk commented Mar 30, 2020

@smcv yes, I was just thinking if bwrap could be used universally for jailed shell :) Even if it’s not so “secure” as jailing in KVM/Xen or bare-metal.

@foresto
Copy link

foresto commented Aug 5, 2021

Related issue: https://gitlab.steamos.cloud/steamrt/steam-runtime-tools/-/issues/35
(Recent versions of Steam Proton are failing in LXC, despite earlier versions working fine, due to the addition of Pressure Vessel, which uses bubblewrap.)

@foresto
Copy link

foresto commented Aug 5, 2021

LXC probably doesn't leave it with enough privileges to run successfully. In general, "nesting" containers is harder to do successfully or securely than creating a single level of container

LXC explicitly supports nested containers. Here is the relevant LXC container config option:

lxc.apparmor.profile = lxc-container-default-with-nesting

The bubblewrap errors that I am encountering are due to the above apparmor profile being permissive enough for nested LXC containers, but not permissive enough for whatever system calls bubblewrap is attempting on Proton's behalf. That apparmor profile isn't static, though; we can configure whatever rules we want.

Given that LXC can already nest containers if the appropriate apparmor profile is used, it seems to me that there should be a way to make bubblewrap and LXC cooperate. (And if that can be done, it also seems there should be a solution for Steam & Proton.)

@smcv
Copy link
Collaborator

smcv commented Aug 5, 2021

The bubblewrap errors that I am encountering are due to the above apparmor profile being permissive enough for nested LXC containers, but not permissive enough for whatever system calls bubblewrap is attempting on Proton's behalf. That apparmor profile isn't static, though; we can configure whatever rules we want.

As a step towards this, please try to get it to work with the LXC container being completely unconfined. If that can't work, then it definitely won't work with AppArmor restrictions.

I think there are probably two sides to this. One is that the LXC container can apply AppArmor (and maybe seccomp?) restrictions that prevent bubblewrap from doing its job; you can avoid this factor by making the LXC container unconfined. After you have a proof-of-concept with it unconfined, we can either allow more operations in the AppArmor profile, or potentially do things slightly differently in bubblewrap so that it is only doing things that LXC's AppArmor profile would allow.

The other is that running bwrap in a chroot (#135) is known not to work, because the chroot breaks one of the conditions for pivot_root(); LXC might be suffering from something similar. The conditions for a successful pivot_root() are not obvious, and when they are not met the only diagnostic is EINVAL, so it is not straightforward to determine what bubblewrap should be doing differently here (I've tried and failed in the past).

@foresto
Copy link

foresto commented Aug 5, 2021

@smcv I'd like to investigate further, but running Steam with an unconfined apparmor profile defeats the purpose of my running it in LXC at all. For the sake of testing without having to poke giant holes in my sandbox, can you tell me a bubblewrap command line I could use to test this chroot issue?

Also, what is creating the chroot you're referring to? Steam? If the chroot syscall turns out to be a real blocker for bwrap, couldn't it be replaced with a different approach, like a mount namespace?

@smcv
Copy link
Collaborator

smcv commented Aug 5, 2021

For the sake of testing without having to poke giant holes in my sandbox, can you tell me a bubblewrap command line I could use to test this chroot issue?

Any bubblewrap command line would do. The simplest is bwrap --dev-bind / / true, or for something more thorough, you could run bubblewrap's own test suite (./autogen.sh && make && make check).

Also, what is creating the chroot you're referring to? Steam?

No, it's the equivalent of your use of LXC. Most Steam users run it on "the real system" (for which bubblewrap is fine), but some people run Steam in a LXC container, or in a Docker container, or in schroot (which uses chroot(2) internally), or some similar environment, and that doesn't currently work in all cases.

Similarly, most people who run bubblewrap for other purposes (Flatpak, WebKitGTK, libgnome-desktop, etc.) are running it on "the real system", but some people try to run it inside a LXC container, inside a Docker container, or in schroot, and that doesn't currently work.

bubblewrap does create a new mount namespace, but it needs to use either pivot_root(2) or chroot(2) to get its root directory to be the new root that it has created. It currently uses pivot_root(2) for that.

When Steam games run under the Steam container runtime or recent Proton versions, that is exactly a mount namespace, using bubblewrap. Steam doesn't use chroot or pivot_root itself.

@foresto
Copy link

foresto commented Aug 12, 2021

As a step towards this, please try to get it to work with the LXC container being completely unconfined. If that can't work, then it definitely won't work with AppArmor restrictions.

Okay, I made another (unprivileged) lxc container and configured it with lxc.apparmor.profile = unconfined. Then, inside the container:

$ bwrap --dev-bind / / true && echo it works
it works

I think that means there should be a solution here, if we can find the minimal set of apparmor permissions (or bwrap changes) to get it working without the container running unconfined. Right?

What would you suggest next?

@foresto
Copy link

foresto commented Sep 16, 2021

Progress!

The following steps got bwrap to work in an (unprivileged) lxc container:

  • Set this key in the lxc container profile:
    lxc.apparmor.profile = lxc-container-bwrap
  • cp /etc/apparmor.d/lxc/lxc-default-with-nesting /etc/apparmor.d/lxc/lxc-bwrap
  • Change the profile name in the new file from lxc-container-default-with-nesting to lxc-container-bwrap
  • Add these rules to the new profile:
    # bwrap support
    pivot_root oldroot=/tmp/oldroot/ /tmp/,
    pivot_root /newroot/,
    mount options=rbind /oldroot/ -> /newroot/,
    mount options=rbind /tmp/newroot/ -> /tmp/newroot/,
    mount options=(remount,bind,nosuid) options in (relatime) -> /newroot/{,**},
    mount options=rprivate -> /oldroot/,
    
  • If you want Steam's pressure-vessel to work, add these rules as well (and consider choosing a profile name like lxc-container-steam):
    # steam pressure-vessel bwrap support
    mount options=rbind -> /newroot/**,
    mount options=(remount,bind,nosuid,nodev) options in (noexec,relatime,ro) -> /newroot/{,**},
    
  • Load the new LXC AppArmor profile into the kernel:
    apparmor_parser -r /etc/apparmor.d/lxc-containers

I have only tested this on Ubuntu 20.04 LTS so far. Different LXC versions and configurations might require something different. This command is handy for watching AppArmor complaints on a distro that uses systemd: journalctl _TRANSPORT=audit _COMM=bwrap --follow

I haven't thought deeply about the security implications of these rules, but just seeing it work was an exciting step forward. It might be worth a review from the LXC maintainers, and maybe asking them to include a bubblewrap/steam AppArmor profile alongside the ones they already provide for nested containers and other common cases. This would relieve users from duplicating upstream policy files, and avoid falling out of sync with upstream changes.

Both bwrap and pressure-vessel do more {bind,re,}mount jugging than I would have expected. I tried to minimize the number of AppArmor rules required to accommodate it all, by leaning on globbing and the in conditional operator, but the result is still a bit more verbose than I would like and might not be as restrictive as it should be. I wonder if the bubblewrap and pressure-vessel maintainers could do anything differently to help with this.

@smcv
Copy link
Collaborator

smcv commented Sep 16, 2021

I think if you were using bubblewrap non-trivially (actually changing processes' view of the filesystem), you would find that the rules you found that you needed for pressure-vessel are also necessary for most (all?) non-trivial uses of bubblewrap. bwrap --dev-bind / / true was intentionally the simplest possible test-case, which only has one user-specified bind-mount, / onto / (implementation detail: this ends up as /oldroot onto /newroot because of the way it works internally), but a less trivial bubblewrap invocation (like the ones done by Flatpak, or by libgnome-desktop's support for sandboxed thumbnailers) is just as complicated as the ones done by pressure-vessel.

I'm somewhat surprised you didn't also need to add options to mount a tmpfs below /newroot.

I wonder if the bubblewrap and pressure-vessel maintainers could do anything differently to help with this.

Everything pressure-vessel does, it does because there is some reason why it has to. I wish it could be simpler, but I don't think it can.

If you look into the code and commit history for either bubblewrap, pressure-vessel, or another user of bubblewrap such as Flatpak, I think you'll see that it's all there for a reason. Simpler would be better, but we do have to make things "as simple as possible, but no simpler".

@Maryse47
Copy link

Maryse47 commented Sep 17, 2021

I haven't thought deeply about the security implications of these rules

I'm afraid pivot_root is known bypass for apparmor rules: https://bugs.launchpad.net/apparmor/+bug/1791711 therefore allowing it may make false sense of security.

Generally protecting something that allows you to arbitrary manipulate fs paths like bubblewrap is undoable with something that rely on those paths to protect like apparmor.

@foresto
Copy link

foresto commented Sep 21, 2021

I'm afraid pivot_root is known bypass for apparmor rules: https://bugs.launchpad.net/apparmor/+bug/1791711 therefore allowing it may make false sense of security.

Thanks for pointing that out. Given that the current recommendation for running bubblewrap in LXC containers is to use lxc.apparmor.profile = unconfined, I don't think having a bwrap-specific AppArmor profile would be any worse. It would presumably be offered with the same cautions that come with unconfined, thus avoiding a false sense of security, and it should still enforce path-agnostic rules even if exploited, thus being better than running unconfined.

Also, as noted in comment #8 of that report, the AppArmor maintainers intend to fix the pivot_root problem.

In the meantime, I wonder if an AppArmor policy could be crafted that would allow pivot_root only for the container's /usr/bin/bwrap, meaning an attacker would first have to become uid 0 within the LXC container in order to exploit it.

@smcv
Copy link
Collaborator

smcv commented Sep 22, 2021

I'm afraid pivot_root is known bypass for apparmor rules

It depends on the AppArmor rules and how they are being used.

If you're using AppArmor in a way that runs all programs in the same mount namespace and completely relies on path-based policies, like the traditional use of /etc/apparmor.d in openSUSE/Ubuntu/Debian to apply rules like "Firefox can run Evince" and "Evince can't read ~/.gnupg", then yes, any filesystem manipulation like pivot_root will defeat that.

If you're using AppArmor as part of lxc, as a way to prevent container breakout by blocking things like mount operations and other VFS manipulation, similar to the way Docker and Flatpak use seccomp, then the path-based parts of AppArmor are hopefully a lot less important, because anything the container shouldn't be able to access shouldn't be visible in the container's filesystem namespace at all. You can't read a file if there is no name you can give that will result in it being opened! That's how Docker and Flatpak manage to do access-control for files despite not using a path-based LSM like AppArmor: they build a filesystem namespace where only the allowed files exist.

@smcv
Copy link
Collaborator

smcv commented Sep 22, 2021

I'm somewhat surprised you didn't also need to add options to mount a tmpfs below /newroot.

Ah, I see this is because /etc/apparmor.d/abstractions/lxc/container-base already allows that.

@Maryse47
Copy link

If you're using AppArmor as part of lxc, as a way to prevent container breakout by blocking things like mount operations and other VFS manipulation, similar to the way Docker and Flatpak use seccomp, then the path-based parts of AppArmor are hopefully a lot less important, because anything the container shouldn't be able to access shouldn't be visible in the container's filesystem namespace at all

The thing is lxc apparmor profiles rely on apparmor blocking access to various sensitive files which are visible in container namesapce. With bublewrap you may mount those files in different path in container which make apparmor protection useless.

@smcv
Copy link
Collaborator

smcv commented Sep 22, 2021

The thing is lxc apparmor profiles rely on apparmor blocking access to various sensitive files which are visible in container namesapce

If that's the case, then it is not possible to do this securely, and you'll need to either:

  • use a different container technology that does not rely on path-based access control (such as Docker or Podman) for container payloads that need the ability to rearrange the namespace; or
  • use a container technology that forbids namespace rearrangement but does allow limited application control over creation of extra containers in parallel (such as Flatpak "sub-sandboxing", which pressure-vessel has specific code to make use of); or
  • use a weaker AppArmor profile, and live with the fact that if you get root in the container, you get root in real life (sometimes summarized as "containers don't contain")

Flatpak protects a few sensitive files in /proc//sys by mounting an inaccessible read-only file over the top, and I think Docker does the same, but it looks as though that approach wouldn't work for lxc, because /etc/apparmor.d/abstractions/lxc/container-base allows umount.

If you are using lxc to confine Steam, I'd recommend the unofficial Flatpak app as an alternative to that. It has some limitations that make Valve hesitant to recommend it in general or treat it as official, but running Steam inside lxc will already have most of those limitations anyway. Using fast user switching to run Steam as an unprivileged user, either on its own or combined with Flatpak, is another way Steam can be given fewer privileges.

@stgraber
Copy link

Feels like a bunch of the issues here may have been related to the direct use of LXC rather than something like LXD which knows to configure a variety of LXC features to make nesting work nicely.

Here is an example on LXD which seems to behave just fine:

stgraber@castiana:~$ lxc launch images:ubuntu/21.10 u1 -c security.nesting=true
Creating u1
Starting u1
stgraber@castiana:~$ lxc exec u1 bash
root@u1:~# apt install bubblewrap
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following NEW packages will be installed:
  bubblewrap
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 39.9 kB of archives.
After this operation, 116 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu impish/main amd64 bubblewrap amd64 0.4.1-3 [39.9 kB]
Fetched 39.9 kB in 0s (232 kB/s)   
debconf: unable to initialize frontend: Dialog
debconf: (Dialog frontend requires a screen at least 13 lines tall and 31 columns wide.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (Can't locate Term/ReadLine.pm in @INC (you may need to install the Term::ReadLine module) (@INC contains: /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.32.1 /usr/local/share/perl/5.32.1 /usr/lib/x86_64-linux-gnu/perl5/5.32 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl-base /usr/lib/x86_64-linux-gnu/perl/5.32 /usr/share/perl/5.32 /usr/local/lib/site_perl) at /usr/share/perl5/Debconf/FrontEnd/Readline.pm line 7, <> line 1.)
debconf: falling back to frontend: Teletype
Selecting previously unselected package bubblewrap.
(Reading database ... 14498 files and directories currently installed.)
Preparing to unpack .../bubblewrap_0.4.1-3_amd64.deb ...
Unpacking bubblewrap (0.4.1-3) ...
Setting up bubblewrap (0.4.1-3) ...
sysctl: permission denied on key "kernel.unprivileged_userns_clone", ignoring
root@u1:~# vi /usr/local/bin/bwshell
root@u1:~# chmod +x /usr/local/bin/bwshell
root@u1:~# su - ubuntu -s /usr/local/bin/bwshell
bwrap-demo$ 

Everything that LXD does can of course be done by hand with LXC, but that may require quite a lot of care, especially when dealing with something as tricky as nesting. The exact config may also be quite dependent on the exact set of kernel features and host OS configuration.

In this case, the main things to get right are:

  • Run the LXC container unprivileged, privileged containers cannot do any of this safely at all and have a bunch of restrictions around nesting
  • /proc, /sys and /dev setup modes
  • Use AppArmor namespacing instead of a simple nesting profile (allows full use of AppArmor in the container)
  • Relax host side apparmor profile to allow nesting (primarily mounting new instances of /proc, /sys and pivot)
  • Pre-mount hidden non-overmounted copies of /proc and /sys (to avoid the kernel overmounting protection to kick in)

@foresto
Copy link

foresto commented Sep 29, 2021

I can't speak for @smtalk, who opened this issue, but I can respond to these suggestions with respect to my own use case...

@smcv suggested:

If you are using lxc to confine Steam, I'd recommend the unofficial Flatpak app as an alternative to that.

AFAICT, Flatpak allows images to define their own security policy, which is a curious choice that effectively makes externally-built ones not confined at all; I think I would have to either build my own Steam flatpaks (a hassle) or diligently review and override the permissions of every community-built release (more hassle). Also, Flatpak doesn't seem to make modifying a runtime environment or running multiple applications in the same container particularly easy, so operations like that would mean still more hassles compared to lxc.

On the other hand, perhaps those ongoing hassles would be tolerable (at least until AppArmor fixes the pivot_root issue) if accepting them meant my Steam games would work again.

The possible blocker that comes to mind is ALSA. Last time I considered Flatpak for Steam, it didn't support ALSA-only systems without resorting to --device=all, but I see they finally merged a fix for that (flatpak/flatpak#3663). Last time I tried the Freedesktop runtime, on which I think the Steam flatpak depends, some of its ALSA libs were broken, so nontrivial ALSA functionality didn't work even with a patched Flatpak. Maybe that has been fixed by now as well?

I suppose I could give it another try, and at least get an updated view of what problems remain.

Does the Steam flatpak requie a particular version of Flatpak, or a particular version of bubblewrap on the host system?

@foresto
Copy link

foresto commented Sep 29, 2021

@stgraber suggested:

Feels like a bunch of the issues here may have been related to the direct use of LXC rather than something like LXD which knows to configure a variety of LXC features to make nesting work nicely.

As far as I know, LXD didn't exist when I started using Steam in an LXC container. (Inspired by your blog, by the way.) At least, I wasn't aware of its existence back then.

I have considered moving to LXD, but as I recall, I was told it had no way for an unprivileged container to run on (a subtree of) the hosts's filesystem. I use that functionality in LXC. It makes a number of things convenient and efficient, such as using GUI tools to manage the container's files without having to install GUI tools in the container, and allowing a contained program to communicate with a program on the host via a unix domain socket. It was probably a couple years ago when I last asked, though; has this filesystem limitation been lifted since then?

If not, are the steps that LXD takes to make LXC nesting work nicely available in a format that that could be (relatively easily) copied to an LXC-only system?

@gitfan2
Copy link

gitfan2 commented Nov 15, 2021

@foresto suggested:

It makes a number of things convenient and efficient, such as using GUI tools to manage the container's files without having to install GUI tools in the container.

Sorry to barge in on an unrelated matter: would you mind naming the GUI tools you use to manage the container's files?

@foresto
Copy link

foresto commented Nov 15, 2021

Sorry to barge in on an unrelated matter: would you mind naming the GUI tools you use to manage the container's files?

When the guest's filesystem is a subtree of the host's, (which LXC allows), all the software running on the host system can directly access guest files. That includes shells, scripts, desktop file managers, save game editors... everything.

@gitfan2
Copy link

gitfan2 commented Jul 26, 2022

A quick info that bubblewrap works perfectly in a "nested LXC" in Proxmox 6.4.
Obviously, it may or may not work in other virtualisation environments.
Thanks to @smtalk, the OP, for his invaluable guidance in resolving the issue.

@foresto
Copy link

foresto commented Jul 26, 2022

@gitfan2 For the sake of others who find their way here, what steps were involved in that guidance?

@gitfan2
Copy link

gitfan2 commented Nov 15, 2022

@foresto Essentially, Proxmox is creating the structure in LXC that enables the nesting and is generating an apparmor profile to support it. Here are the relevant lines from the profile.

### Configuration: nesting
pivot_root,
ptrace,
signal,

deny /dev/.lxc/proc/** rw,
deny /dev/.lxc/sys/** rw,

mount fstype=proc -> /usr/lib/*/lxc/**,
mount fstype=sysfs -> /usr/lib/*/lxc/**,

# TODO: There doesn't seem to be a way to ask for:
# mount options=(ro,nosuid,nodev,noexec,remount,bind),
# as we always get mount to $cdir/proc/sys with those flags denied
# So allow all mounts until that is straightened out:
mount,

Conclusion: In a nested LXC in Proxmox 6.4, bubblewrap 0.5+ works flawlessly and creates the required sandbox for users.

In the same apparmor profile, there's an additional section for nesting LXDs.

# Allow nested LXD
mount none -> /var/lib/lxd/shmounts/,
mount /var/lib/lxd/shmounts/ -> /var/lib/lxd/shmounts/,
mount options=bind /var/lib/lxd/shmounts/** -> /var/lib/lxd/**,

@smcv
Copy link
Collaborator

smcv commented Nov 15, 2022

I don't think there's an actionable request for a change to be made in bwrap here, so I'm closing this issue.

@smcv smcv closed this as not planned Won't fix, can't repro, duplicate, stale Nov 15, 2022
@rcarmo
Copy link

rcarmo commented Jan 31, 2024

I'd like to know if someone has a solution for this -- running flatpak/bwrap inside an LXC sandbox is something I would like to be able to do (I was able to do it in LXD using its nesting feature, but not in LXC, even with the apparmor changes in this discussion).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants