Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate network-setup to nftables and improve it into a better state #4877

Closed
wants to merge 77 commits into from
Closed
Changes from 6 commits
Commits
Show all changes
77 commits
Select commit Hold shift + click to select a range
a8cd106
Migrate network-setup to nftables except for cleaning up
kanpov Oct 28, 2024
55b0586
Update "Cleaning Up"
kanpov Oct 28, 2024
c8fba11
Add guest ip boot arg section and multiple guests section
kanpov Oct 28, 2024
5582ddf
Slightly alter Multiple guests section, add IPv6 section
kanpov Oct 28, 2024
64e1eb8
Minor factual correction
kanpov Oct 29, 2024
5154147
Remove IPv6 section due to its fragility
kanpov Oct 29, 2024
24c2ee6
Update test_deflate_on_oom test
JackThomson2 Oct 29, 2024
73c2562
build(deps): Bump the firecracker group across 1 directory with 22 up…
dependabot[bot] Oct 29, 2024
e41f4af
chore: adjust to api changes in rust-vmm crates
roypat Oct 29, 2024
9c1f03e
test: add Ubuntu 24.10 to popular test
pb8o Oct 29, 2024
889b13d
test: drop support for unsupported versions
pb8o Oct 21, 2024
cead6be
feat(IovDeque): configurable queue len
ShadowCurse Oct 28, 2024
fa26074
feat(virtio-net): increase max queue size to 512
ShadowCurse Oct 28, 2024
b65d236
chore: update CHANGELOG with virtio-net changes
ShadowCurse Oct 30, 2024
f07a345
Apply recommendations
kanpov Nov 1, 2024
7cffa67
Remove redundant mention of bridge-based routing, clarify on the Adva…
kanpov Nov 1, 2024
f87d9a3
feat(gdb): Support config over api
JackThomson2 Oct 22, 2024
ef05c4f
Add changelog entry for GDB debugging
JackThomson2 Oct 22, 2024
55d676f
test: refactor: Simplify CpuMap._cpus
roypat Oct 28, 2024
52aa865
test: replace ventored chdir context manager with contextlib
roypat Oct 29, 2024
50979d0
test: do not set host_os dimension to `None`
roypat Oct 29, 2024
65151e9
test: fix: stop doing PR A/B-tests across host commands
roypat Oct 28, 2024
e8d57ee
test: ab: Add function for A/B-Tests across precompiled binaries
roypat Oct 29, 2024
24f588a
test: ab: operate on directories instead of commit SHAs
roypat Oct 28, 2024
0286a91
devtool: Add flag to build to allow compiling arbitrary revisions
roypat Oct 29, 2024
0dea3fe
doc: Update A/B-testing documentation
roypat Oct 29, 2024
9b30c64
test: stop compiling firecracker inside A/B-tests
roypat Oct 29, 2024
8a9bda6
test: do pre-PR A/B-test checkout into temporary directory
roypat Oct 29, 2024
a95c7ed
test: remove `@tag` parsing from `record_props` fixture
roypat Oct 30, 2024
77fc94c
test: use pytest.raises in `test_empty_jailer_id`
roypat Oct 30, 2024
352fca1
test(aarch64): add host vs guest cpu feature test
ShadowCurse Oct 30, 2024
20fb58b
test(x86_64): add host vs guest cpu feature test
ShadowCurse Nov 1, 2024
6b0e167
chore(tests): rename file with cpu feature tests for x86_64
ShadowCurse Nov 4, 2024
290145d
build(deps): Bump the firecracker group with 7 updates
dependabot[bot] Nov 4, 2024
16b4474
chore: Update release policy
JackThomson2 Nov 6, 2024
2308c83
net: revert virtio-net queue size to 256
bchalios Nov 6, 2024
b8df9de
chore: bump version to 1.11.0-dev
JackThomson2 Nov 7, 2024
3996275
test(net): check output in test_high_ingress_traffic
kalyazin Nov 7, 2024
87fae71
test(net): use iperf3-vsock in test_high_ingress_traffic
kalyazin Nov 7, 2024
21c1983
fix(test): Handle ssbs correctly in host/guest feature comparison
zulinx86 Nov 7, 2024
c15931d
fix(test): Remove flush_l1d from host/guest feature diff on kernel v6.4+
zulinx86 Nov 7, 2024
0c512f7
fix(test): Handle invpcid_single in guest/host feature comparison
zulinx86 Nov 7, 2024
630a49e
chore: Update changelog with v1.10.0 section
JackThomson2 Nov 8, 2024
3936447
chore(test): Double refill time for RX rate limiter
zulinx86 Nov 8, 2024
dc88ba1
fix(net): use correct constant for preallocation
ShadowCurse Nov 4, 2024
ff8fbe1
fix(iovec): update default used constants
ShadowCurse Nov 4, 2024
6b5a70d
chore: Bump snapshot version
JackThomson2 Nov 12, 2024
851ed01
build(deps): Bump the firecracker group with 10 updates
dependabot[bot] Nov 11, 2024
b04ba30
fix: Adjust for thiserror 2.0
roypat Nov 12, 2024
5299f8a
chore: update PR checklist
Manciukic Nov 12, 2024
ecd35e7
test: test ARM CPU templates in Linux host 5.10
pb8o Nov 13, 2024
14fed3f
chore: Update to v1.10.1 patch
JackThomson2 Nov 13, 2024
69a6aec
snapshot: Remove max_connections and max_pending_resets fields
zulinx86 Nov 13, 2024
3fdd15a
test(mmds): Do not use MmdsNetworkStack::new() in tests
zulinx86 Nov 13, 2024
316a0ae
chore: Clarify user action
zulinx86 Nov 14, 2024
df61998
ci: generate ext4 image after downloading artifacts
pb8o Sep 6, 2024
e1c7a28
ci: build debug kernels
pb8o Sep 17, 2024
ca21cd9
ci: compress squashfs with zstd
pb8o Sep 6, 2024
4cb86c5
fix: workaround socat 1.8.0 bug
pb8o Oct 17, 2024
fc7da1d
chore(rootfs): update rootfs to Ubuntu 24.04
pb8o Jun 20, 2024
98acad6
ci: generate SSH key after downloading artifacts
pb8o Oct 9, 2024
c777491
tests: add Microvm.ssh.Popen command
pb8o Jun 19, 2024
c292c84
devctr: add trace-cmd
pb8o Oct 16, 2024
e0906a5
tests: add a trace-cmd helper
pb8o Oct 16, 2024
58ca732
ci: use new CI rootfs 24.04
pb8o Oct 17, 2024
40ecc8f
ci: move create_snapshot_artifact to a test
pb8o Oct 18, 2024
c6b34cc
ci: reduce storage of snapshots in cross-restore test
pb8o Oct 18, 2024
f2aced7
doc: fix downloading kernel for ARM instances
pb8o Oct 21, 2024
65d3f1f
test: drop ubuntu version from rootfs fixture name
pb8o Oct 23, 2024
d9b85d9
devctr: add zstd
pb8o Oct 29, 2024
fdcaf14
test: add debug information to debug kernels
pb8o Oct 29, 2024
57075cf
devctr: pin cargo-deny until we upgrade Rust version
pb8o Nov 18, 2024
e5d4de3
build(deps): Bump the firecracker group with 13 updates
dependabot[bot] Nov 18, 2024
415fbf0
build(deps): Bump aiohttp from 3.10.5 to 3.10.11 in /tools/devctr
dependabot[bot] Nov 18, 2024
88b59f6
fix: devtool install --path broken
mpbb Nov 11, 2024
2d0bbc6
ci: infer instance architecture from a heuristic
pb8o Nov 20, 2024
bf71223
Fix style issues
kanpov Nov 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
247 changes: 188 additions & 59 deletions docs/network-setup.md
Original file line number Diff line number Diff line change
@@ -1,51 +1,76 @@
# Getting Started Firecracker Network Setup

This is a very simple quick-start guide to getting a Firecracker guest connected
to the network. If you're using Firecracker in production, or even want to run
multiple guests, you'll need to adapt this setup.
This is a simple quick-start guide to getting one or more Firecracker microVMs
connected to the Internet via the host. If you run a production setup, you should
consider modifying this setup to accommodate your specific needs.

**Note** Currently firecracker supports only TUN/TAP network backend with no
**Note:** Currently, Firecracker supports only a TUN/TAP network backend with no
multi queue support.

The simple steps in this guide assume that your internet-facing interface is
`eth0`, you have nothing else using `tap0` and no other `iptables` rules. Check
out the *Advanced:* sections if that doesn't work for you.
The steps in this guide assume `eth0` to be your Internet-facing network interface
on the host. If `eth0` isn't your main network interface, you should change the
value to the correct one in the commands below. IPv4 is also assumed to be used,
so you will need to adapt the instructions accordingly to support IPv6.

## On The Host
To run multiple microVMs with this approach, check out the
_Advanced: Multiple guests_ section.

The first step on the host is to create a `tap` device:
The `nftables` Linux firewall with the `nft` command should be used instead of
`iptables`, since `iptables` and the associated tools are
[no longer recommended](https://access.redhat.com/solutions/6739041) for use on
production Linux systems.
kanpov marked this conversation as resolved.
Show resolved Hide resolved

## On the Host

The first step on the host for any microVM is to create a Linux `tap` device, which Firecracker
will use for networking.

For this setup, only two IP addresses will be necessary - one for the `tap` device and one for
the guest itself, through which you will, for example, `ssh` into the guest. So, we'll choose the
smallest IPv4 subnet needed for 2 addresses: `/30`. For this VM, let's use the `172.16.0.1` `tap` IP
and the `172.16.0.2` guest IP.

```bash
# Create the tap device.
sudo ip tuntap add tap0 mode tap
# Assign it the tap IP and start up the device.
sudo ip addr add 172.16.0.1/30 dev tap0
sudo ip link set tap0 up
```

Then you have a few options for routing traffic out of the tap device, through
your host's network interface. One option is NAT, set up like this:
We'll use **NAT** for routing packets from the TAP device to `eth0` - you might want to consider
a bridge interface instead in order to connect the guest to your local network (LAN), for which
you can check out the _Advanced: Bridge-based routing_ section.

Firstly, we'll need to enable IPv4 forwarding on the system.
```bash
sudo ip addr add 172.16.0.1/24 dev tap0
sudo ip link set tap0 up
sudo sh -c "echo 1 > /proc/sys/net/ipv4/ip_forward"
sudo iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
sudo iptables -A FORWARD -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
sudo iptables -A FORWARD -i tap0 -o eth0 -j ACCEPT
```

*Note:* The IP of the TAP device should be chosen such that it's not in the same
subnet as the IP address of the host.

*Advanced:* If you are running multiple Firecracker MicroVMs in parallel, or
have something else on your system using `tap0` then you need to create a `tap`
for each one, with a unique name.
Then, we'll need an nftables table for our routing needs, and 2 chains inside that table: one
for NAT on `postrouting` stage, and another one for filtering on `forward` stage:
```bash
sudo nft add table firecracker
sudo nft 'add chain firecracker postrouting { type nat hook postrouting priority srcnat; policy accept; }'
sudo nft 'add chain firecracker filter { type filter hook forward priority filter; policy accept; }'
```

*Advanced:* You also need to do the `iptables` set up for each new `tap`. If you
have `iptables` rules you care about on your host, you may want to save those
rules before starting.
The first rule we'll need will masquerade packets from the guest IP as if they came from the
host's IP, by changing the source IP address of these packets:
```bash
sudo nft add rule firecracker postrouting ip saddr 172.16.0.2 oifname eth0 counter masquerade
```

The second rule we'll need will accept packets from the tap IP (the guest will use the tap IP as its
gateway and will therefore route its own packets through the tap IP) and direct them to the host
network interface:
```bash
sudo iptables-save > iptables.rules.old
sudo nft add rule firecracker filter iifname tap0 oifname eth0 accept
```

**Note:** The IP of the TAP device should be chosen such that it's not in the same
subnet as the IP address of the host.

## Setting Up Firecracker

Before starting the guest, configure the network interface using Firecracker's
Expand Down Expand Up @@ -85,14 +110,20 @@ configuration file like this:
```

Alternatively, if you are using firectl, add
--tap-device=tap0/06:00:AC:10:00:02\` to your command line.
`--tap-device=tap0/06:00:AC:10:00:02\` to your command line.

## In The Guest

Once you have booted the guest, bring up networking within the guest:
Once you have booted the guest, it will have its networking interface with the
name specified by `iface_id` in the Firecracker configuration.

You'll now need to assign the guest its IP, activate the guest's networking
interface and set up the `tap` IP as the guest's gateway address, so that packets
are routed through the `tap` device, where they are then picked up by the setup
on the host prepared before:

```bash
ip addr add 172.16.0.2/24 dev eth0
ip addr add 172.16.0.2/30 dev eth0
ip link set eth0 up
ip route add default via 172.16.0.1 dev eth0
```
Expand All @@ -107,7 +138,121 @@ your environment. For testing, you can add a public DNS server to
nameserver 8.8.8.8
```

## \[Advanced\] Setting Up a Bridge Interface
**Note:** Sometimes, it's undesirable to have `iproute2` (providing the `ip` command)
installed on your guest OS, or you simply want to have these steps be performed
automatically. To do this, check out the
_Advanced: Guest network configuration at kernel level_ section.

## Cleaning up

The first step to cleaning up is to delete the tap device on the host:

```bash
sudo ip link del tap0
```

You'll then want to delete the two nftables rules for NAT routing from the
`postrouting` and `filter` chains. To do this with nftables, you'll need to
look up the _handles_ (identifiers) of these rules by running:

```bash
sudo nft -a list ruleset
```

Now, find the `# handle` comments relating to the two rules and delete them.
For example, if the handle to the masquerade rule is 1 and the one to the
other rule is 2:
```bash
sudo nft delete rule firecracker postrouting handle 1
sudo nft delete rule firecracker filter handle 2
```

_Advanced:_ If you created a bridge interface, delete it using the following:
```bash
sudo ip link del br0
```
kanpov marked this conversation as resolved.
Show resolved Hide resolved

Run the following steps only **if you have no more guests** running on the host:

Set IPv4 forwarding back to disabled:
```bash
sudo sh -c "echo 0 > /proc/sys/net/ipv4/ip_forward" # usually the default
kanpov marked this conversation as resolved.
Show resolved Hide resolved
```

Delete the `firecracker` nftables table to revert your nftables configuration
fully back to its initial state:
```bash
sudo nft delete table firecracker
```

## Advanced: Multiple guests
kanpov marked this conversation as resolved.
Show resolved Hide resolved

To configure multiple guests, we will only need to repeat some of the steps in this setup
for each of the microVMs:

1. Each microVM has its own subnet and the two IP addresses inside of it: the `tap` IP and
the guest IP.
2. Each microVM has its own two nftables rules for masquerading and forwarding, while the same
table and two chains can be shared between the microVMs.
3. Each microVM has its own routing configuration inside the guest itself (achieved through
`iproute2` or the method described in the _Advanced: Guest network configuration at kernel level_
section).

To give a more concrete example, **let's add a second microVM** to the one you've already configured:

Let's assume we allocate /30 subnets in the 172.16.0.0/16 range sequentially to give out as
few addresses as needed.

The next /30 subnet in the 172.16.0.0/16 range will give us these two IPs: 172.16.0.5 as the
`tap` IP and 172.16.0.6 as the guest IP.

Our new `tap` device will, sequentially, have the name `tap1`:
```bash
sudo ip tuntap add tap1 mode tap
sudo ip addr add 172.16.0.5/30 dev tap1
sudo ip link set tap1 up
```

Now, let's add the new two nftables rules, also with the new values:
```bash
sudo nft add rule firecracker postrouting ip saddr 172.16.0.6 oifname eth0 counter masquerade
sudo nft add rule firecracker filter iifname tap1 oifname eth0 accept
```

Modify your Firecracker configuration with the `host_dev_name` now being `tap1` instead of `tap0`,
boot up the guest and perform the routing inside of it like so, changing the guest IP and `tap` IP:
```bash
ip addr add 172.16.0.6/30 dev eth0
ip link set eth0 up
ip route add default via 172.16.0.5 dev eth0
```

Or, you can use the setup from _Advanced: Guest network configuration at kernel level_ by simply
changing the G and T variables, i.e. the guest IP and `tap` IP.

**Note:** if you'd like to calculate the guest and `tap` IPs using the sequential subnet allocation
method that has been used here, you can use the following formulas specific to IPv4 addresses:

`tap` IP = `172.16.[(A*O+1)/256].[(A*O+1)%256]`.

Guest IP = `172.16.[(A*O+2)/256].[(A*O+2)%256]`.

Round down the division and replace `A` with the amount of IP addresses inside your subnet (for a
/30 subnet, that will be 4 addresses, for example) and replace `O` with the sequential number of
your microVM, starting at 0. You can replace `172.16` with any other values that fit between between
1 and 255 as usual with an IPv4 address.

For example, let's calculate the addresses of the 1000-th microVM with a /30 subnet in
the `172.16.0.0/16` range:

`tap` IP = `172.16.[(4*999+1)/256].[(4*999+1)%256]` = `172.16.15.157`.

Guest IP = `172.16.[(4*999+2)/256].[(4*999+2)%256]` = `172.16.15.158`.

This allocation setup has been used successfully in the `firecracker-demo` project for launching several
thousand microVMs on the same host: [relevant lines](https://github.com/firecracker-microvm/firecracker-demo/blob/63717c6e7fbd277bdec8e26a5533d53544a760bb/start-firecracker.sh#L45).

## Advanced: Bridge-based routing

### On The Host

Expand Down Expand Up @@ -184,36 +329,20 @@ nameserver 8.8.8.8
nameserver 192.168.1.1
```

## Cleaning up

The first step to cleaning up is deleting the tap device:

```bash
sudo ip link del tap0
```

If you don't have anything else using `iptables` on your machine, clean up those
rules:
## Advanced: Guest network configuration at kernel level
kanpov marked this conversation as resolved.
Show resolved Hide resolved

```bash
sudo iptables -F
sudo sh -c "echo 0 > /proc/sys/net/ipv4/ip_forward" # usually the default
```

If you have an existing iptables setup, you'll want to be more careful about
cleaning up.

*Advanced:* If you saved your iptables rules in the first step, then you can
restore them like this:
The Linux kernel supports an `ip` CLI arguments that can be passed to it when booting.
Boot arguments in Firecracker are configured in the `boot_args` property of the boot source
(`boot-source` object in the JSON configuration or the equivalent endpoint in the API server).

```bash
if [ -f iptables.rules.old ]; then
sudo iptables-restore < iptables.rules.old
fi
```
The value of the `ip` CLI argument for our setup will be the of this format:
`G::T:GM::GI:off`. G is the guest IP (without the subnet), T is the `tap` IP (without the subnet),
GM is the "long" mask IP of the guest CIDR and GI is the name of the guest network interface.

*Advanced:* If you created a bridge interface, delete it using the following:
Substituting our values, we get: `ip=172.16.0.2::172.16.0.1:255.255.255.252::eth0:off`. Insert this
at the end of your boot arguments for your microVM, and the guest Linux kernel will automatically
perform the routing configuration done in the _In the Guest_ section without needing `iproute2`
installed in the guest. (This argument doesn't configure DNS, however).
kanpov marked this conversation as resolved.
Show resolved Hide resolved

```bash
sudo ip link del br0
```
As soon as you boot the guest, it will already be connected to the network (assuming you correctly
performing the other steps).