Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Netlink tile #4049

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions book/.vitepress/config.mts
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,13 @@ export default defineConfig({
{ text: 'Troubleshooting', link: 'troubleshooting' },
{ text: 'Frequently Asked Questions', link: 'faq' },
]
},
{
text: 'Internals',
ripatel-fd marked this conversation as resolved.
Show resolved Hide resolved
collapsed: false,
items: [
{ text: 'Netlink', link: 'netlink' },
]
}
] },

Expand Down
15 changes: 15 additions & 0 deletions book/api/metrics-generated.md
Original file line number Diff line number Diff line change
Expand Up @@ -462,3 +462,18 @@
| gossip_​gossip_​peer_​counts_​total | `gauge` | Number of gossip peers tracked (Total Peers Detected) |
| gossip_​gossip_​peer_​counts_​active | `gauge` | Number of gossip peers tracked (Active) |
| gossip_​gossip_​peer_​counts_​inactive | `gauge` | Number of gossip peers tracked (Inactive) |

## Netlnk Tile
| Metric | Type | Description |
|--------|------|-------------|
| netlnk_​drop_​events | `counter` | Number of netlink drop events caught |
| netlnk_​link_​full_​syncs | `counter` | Number of full link table syncs done |
| netlnk_​route_​full_​syncs | `counter` | Number of full route table syncs done |
| netlnk_​updates_​link | `counter` | Number of netlink live updates processed (Link) |
| netlnk_​updates_​neigh | `counter` | Number of netlink live updates processed (Neighbor Table Entry) |
| netlnk_​updates_​ipv4_​route | `counter` | Number of netlink live updates processed (IPv4 Route Table Entry) |
| netlnk_​interface_​count | `gauge` | Number of network interfaces |
| netlnk_​route_​count_​local | `gauge` | Number of IPv4 routes (Local) |
| netlnk_​route_​count_​main | `gauge` | Number of IPv4 routes (Main) |
| netlnk_​neighbor_​solicits_​sent | `counter` | Number of neighbor solicit requests sent to kernel |
| netlnk_​neighbor_​solicits_​fails | `counter` | Number of neighbor solicit requests that failed to send |
111 changes: 111 additions & 0 deletions book/guide/netlink.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# Netlink Integration

## Summary

Firedancer's userland networking stack sources configuration from netlink
to allow mostly zero config interoperability with Linux.

This contrasts with other fast networking stacks which typically require
complex network configuration or a dedicated IP address.

The following describes the netlink integration in detail.

## Tile Overview

Firedancer uses XDP for fast networking. This means that some packet
processing steps traditionally done in the kernel (with UDP sockets) now
have to be done in the Firedancer software. Specifically routing and
resolving link-level neighbors.

The required information in these steps is requested from the kernel via
the [rtnetlink API](https://man7.org/linux/man-pages/man7/rtnetlink.7.html).
Doing all netlink requests in the data path (i.e. in the net tile) bears
security risk and is slow.

The reasons netlink requests are done in a separate tile are:
- **Improved security architecture.** Firedancer's sandbox isolates the
netlink interface from untrusted user traffic
- **Better performance.** The netlink tile provides shared memory caches
that greatly reduce the amount of netlink requests.

### "Netbase" shared memory region

The netlink tile keeps a read-only cache of the following information:

- Interface table
- IPv4 route tables `local` and `main`
- Neighbor tables (only for XDP-enabled Ethernet interfaces)

The objects containing the above information are stored in the "netbase"
workspace. (A workspace is a shared memory region)

### Security

A netlink tile requires an rtnetlink socket. On startup, it subscribes
to route and neighbor table changes. It will also issue RTM_GETROUTE
and RTM_GETNEIGH requests. On RHEL 8 with a Linux 4.18 kernel, all
netlink interactions (including creation of the socket) can be done from
a regular unprivileged user without capabilities.

The kernel's netlink interface exposes a large attack surface.
Therefore, this tile attempts to isolate itself from direct untrusted
inputs.

### Data flows

- `[net tiles] <-- [netbase]` <br/>
Net tiles have read only access to the shared memory region backing
the netbase object. A malicious netlink tile can compromise net tiles
by corrupting the netbase object, but not vice versa.

- `[changes by sysadmin] --> [netlink] --> [netlink tile]` <br/>
Route table updates are forwarded to the netlink tile. This occurs
rarely (typically if the sysadmin performs manual changes or if due to
a system daemon).

- `[netlink tile] --> [netbase]` <br/>
The netlink tile writes neighbor and route table updates to a shared
memory region.

- `[neighbor discovery] --> [netlink] --> [netlink tile]` <br/>
Neighbor table updates are forwarded ot the netlink tile. This path
has limited throughput (few ~100K updates per second).

- `[untrusted traffic] --> [net tile] --> [app tile]` <br/>
`--> [net tile] --> [netlink tile] --> [neighbor discovery]` <br/>
App tiles will blindly respond to the source IP found in untrusted
packets. This source IP can be spoofed. Neighbor solicitation might
be required in order to find out the MAC address of that IP. On IPv4,
these are ARP requests broadcasted to the local network.

Net tiles cannot solicit neighbors directly, so they notify the
netlink tile that neighbor solicitation is needed. (Potentially at
line rate if network configuration is part of a huge subnet)

The netlink tile will deduplicate these requests and forward them to
the kernel.

This path is the only direct 'untrusted traffic' -> 'netlink tile'
data flow, so the internal neighbor solicit message format is kept
as simple as possbile for security.

### Neighbor discovery (ARP)

A concurrent open addressed hash table is used to store ARP entries
(henceforth called "neighbor table"). This table attempts to
continuously stay in sync with the kernel.

The netlink tile requests neighbor solicitations via the netlink
equivalent of `ip neigh add dev DEVICE IP use`.

### Routing

The Firedancer network stack supports very simple routing tables as
typically seen on cloud instances, servers directly connected to an
Ethernet switch, or a router.

Only the "local" and "main" routing tables are synchronized. Policy
based routing and additional routing tables are NOT supported.

Outgoing traffic matching the "local" table is sent to the loopback
device.
1 change: 1 addition & 0 deletions src/app/fdctl/Local.mk
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ $(OBJDIR)/obj/app/fdctl/version.d: src/app/fdctl/version.h

# fdctl core
$(call add-objs,main1 config config_parse caps utility keys ready mem spy help version,fd_fdctl)
$(call add-objs,netconf,fd_fdctl)
$(call add-objs,run/run run/run1 run/run_agave,fd_fdctl)
$(call add-objs,monitor/monitor monitor/helper,fd_fdctl)
$(call make-fuzz-test,fuzz_fdctl_config,fuzz_fdctl_config,fd_fdctl fd_ballet fd_util)
Expand Down
49 changes: 21 additions & 28 deletions src/app/fdctl/config.c
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,9 @@
#include "../../flamenco/runtime/fd_blockstore.h"
#include "../../flamenco/runtime/fd_txncache.h"
#include "../../funk/fd_funk.h"
#include "../../waltz/ip/fd_fib4.h"
#include "../../waltz/mib/fd_dbl_buf.h"
#include "../../waltz/neigh/fd_neigh4_map.h"
#include "../../util/net/fd_eth.h"
#include "../../util/net/fd_ip4.h"

Expand Down Expand Up @@ -218,12 +221,22 @@ fdctl_obj_align( fd_topo_t const * topo,
return fd_fseq_align();
} else if( FD_UNLIKELY( !strcmp( obj->name, "metrics" ) ) ) {
return FD_METRICS_ALIGN;
} else if( FD_UNLIKELY( !strcmp( obj->name, "opaque" ) ) ) {
ulong align = fd_pod_queryf_ulong( topo->props, ULONG_MAX, "obj.%lu.align", obj->id );
if( FD_UNLIKELY( align==ULONG_MAX ) ) FD_LOG_ERR(( "obj.%lu.align was not set", obj->id ));
return align;
} else if( FD_UNLIKELY( !strcmp( obj->name, "dbl_buf" ) ) ) {
return fd_dbl_buf_align();
} else if( FD_UNLIKELY( !strcmp( obj->name, "blockstore" ) ) ) {
return fd_blockstore_align();
} else if( FD_UNLIKELY( !strcmp( obj->name, "funk" ) ) ) {
return fd_funk_align();
} else if( FD_UNLIKELY( !strcmp( obj->name, "txncache" ) ) ) {
return fd_txncache_align();
} else if( FD_UNLIKELY( !strcmp( obj->name, "neigh4_hmap" ) ) ) {
return fd_neigh4_hmap_align();
} else if( FD_UNLIKELY( !strcmp( obj->name, "fib4" ) ) ) {
return fd_fib4_align();
} else {
FD_LOG_ERR(( "unknown object `%s`", obj->name ));
return 0UL;
Expand Down Expand Up @@ -259,12 +272,20 @@ fdctl_obj_footprint( fd_topo_t const * topo,
return fd_fseq_footprint();
} else if( FD_UNLIKELY( !strcmp( obj->name, "metrics" ) ) ) {
return FD_METRICS_FOOTPRINT( VAL("in_cnt"), VAL("cons_cnt") );
} else if( FD_UNLIKELY( !strcmp( obj->name, "opaque" ) ) ) {
return VAL("footprint");
} else if( FD_UNLIKELY( !strcmp( obj->name, "dbl_buf" ) ) ) {
return fd_dbl_buf_footprint( VAL("mtu") );
} else if( FD_UNLIKELY( !strcmp( obj->name, "blockstore" ) ) ) {
return fd_blockstore_footprint( VAL("shred_max"), VAL("block_max"), VAL("idx_max"), VAL("txn_max") ) + VAL("alloc_max");
} else if( FD_UNLIKELY( !strcmp( obj->name, "funk" ) ) ) {
return fd_funk_footprint();
} else if( FD_UNLIKELY( !strcmp( obj->name, "txncache" ) ) ) {
return fd_txncache_footprint( VAL("max_rooted_slots"), VAL("max_live_slots"), VAL("max_txn_per_slot"), FD_TXNCACHE_DEFAULT_MAX_CONSTIPATED_SLOTS );
} else if( FD_UNLIKELY( !strcmp( obj->name, "neigh4_hmap" ) ) ) {
return fd_neigh4_hmap_footprint( VAL("ele_max"), VAL("lock_cnt"), VAL("probe_max") );
} else if( FD_UNLIKELY( !strcmp( obj->name, "fib4" ) ) ) {
return fd_fib4_footprint( VAL("route_max") );
} else {
FD_LOG_ERR(( "unknown object `%s`", obj->name ));
return 0UL;
Expand Down Expand Up @@ -504,34 +525,6 @@ fdctl_cfg_from_env( int * pargc,
config->tiles.net.ip_addr = iface_ip;
mac_address( config->tiles.net.interface, config->tiles.net.mac_addr );

/* support for multihomed hosts */
ulong multi_cnt = config->tiles.net.multihome_ip_addrs_cnt;
for( ulong j = 0; j < multi_cnt; ++j ) {
int success = fd_cstr_to_ip4_addr( config->tiles.net.multihome_ip_addrs[j],
&config->tiles.net.multihome_ip4_addrs[j] );
if( !success ) {
FD_LOG_ERR(( "configuration option [tiles.net.multihome_ip_addrs] "
"specifies malformed IP address `%s`",
config->tiles.net.multihome_ip_addrs[j] ));
}
}

/* look for duplicate addresses */
/* there's only a few, so do the O(n^2) comparison */
for( ulong j = 0; j < multi_cnt; ++j ) {
if( config->tiles.net.ip_addr == config->tiles.net.multihome_ip4_addrs[j] ) {
FD_LOG_ERR(( "configuration option [tiles.net.multihome_ip_addrs] "
"specifies an address that matches [tiles.net.src_ip_addr]" ));
}
for( ulong k = j+1; k < multi_cnt; ++k ) {
if( config->tiles.net.multihome_ip4_addrs[j] == config->tiles.net.multihome_ip4_addrs[k] ) {
FD_LOG_ERR(( "configuration option [tiles.net.multihome_ip_addrs] "
"specifies duplicate ip addresses `%s`",
config->tiles.net.multihome_ip_addrs[j] ));
}
}
}

}

username_to_id( config );
Expand Down
15 changes: 9 additions & 6 deletions src/app/fdctl/config.h
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
/* config_t represents all available configuration options that could be
set in a user defined configuration toml file. For information about
the options, see the `default.toml` file provided. */
typedef struct {
struct fdctl_config {
char name[ NAME_SZ ];
char user[ 256 ];
char hostname[ FD_LOG_NAME_MAX ];
Expand Down Expand Up @@ -216,12 +216,13 @@ typedef struct {
uint xdp_aio_depth;

uint send_buffer_size;

ulong multihome_ip_addrs_cnt; /* number of home ip addresses */
char multihome_ip_addrs[FD_NET_MAX_SRC_ADDR][32];
uint multihome_ip4_addrs[FD_NET_MAX_SRC_ADDR];
} net;

struct {
ulong max_routes;
ulong max_neighbors;
} netlink;

struct {
ushort regular_transaction_listen_port;
ushort quic_transaction_listen_port;
Expand Down Expand Up @@ -319,7 +320,9 @@ typedef struct {
} batch;

} tiles;
} config_t;
};

typedef struct fdctl_config config_t;

FD_PROTOTYPES_BEGIN

Expand Down
29 changes: 23 additions & 6 deletions src/app/fdctl/config/default.toml
Original file line number Diff line number Diff line change
Expand Up @@ -882,12 +882,29 @@ dynamic_port_range = "8900-9000"
# this really be configurable?
send_buffer_size = 16384

# The XDP program will filter packets that aren't destined for
# the IPv4 address of the interface bound above, but sometimes a
# validator may advertise multiple IP addresses. In this case
# the additional addresses can be specified here, and packets
# addressed to them will be accepted.
multihome_ip_addrs = []
# The netlink tile forwards Linux network configuration to net tiles.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These docs are operator facing so please expand a little bit, and explain things in language for operators not developers. At least a paragraph describing the tile, e.g. see net above.

# This config section contains advanced options that typically do not
# need to be changed.
# For further info, see https://docs.firedancer.io/guide/netlink.html
[tiles.netlink]
# The maximum number of routes per route table.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per above, lengthen docs, e.g. describe why you might need to increase this, what's a route table, ..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably note that these are advanced configuration and don't need to be changed except for unique / custom networking configurations?

#
# The netlink tile imports two route tables from Linux, namely
# `local` and `main`. You can view them by running
# `ip route show table main`. Decreasing this option can result
# in connectivity issues. Increasing this option can drastically
# decrease performance.
#
# For virtually all cloud and bare-metal server providers, the
# number of routes per table does not exceed 16.
max_routes = 128

# The maximum number of Ethernet neighbors.
#
# This should be roughly as large as the size your Ethernet subnet.
# E.g. if your IP address is 198.51.100.3/24, then your subnet has
# up to 256 neighbors (2^(32-24)).
max_neighbors = 4096

# QUIC tiles are responsible for serving network traffic, including
# parsing and responding to packets and managing connection timeouts
Expand Down
7 changes: 6 additions & 1 deletion src/app/fdctl/config_parse.c
Original file line number Diff line number Diff line change
Expand Up @@ -290,7 +290,9 @@ fdctl_pod_to_cfg( config_t * config,
CFG_POP ( uint, tiles.net.xdp_tx_queue_size );
CFG_POP ( uint, tiles.net.xdp_aio_depth );
CFG_POP ( uint, tiles.net.send_buffer_size );
CFG_POP_ARRAY( cstr, tiles.net.multihome_ip_addrs );

CFG_POP ( ulong, tiles.netlink.max_routes );
CFG_POP ( ulong, tiles.netlink.max_neighbors );

CFG_POP ( ushort, tiles.quic.regular_transaction_listen_port );
CFG_POP ( ushort, tiles.quic.quic_transaction_listen_port );
Expand Down Expand Up @@ -461,6 +463,9 @@ fdctl_cfg_validate( config_t * cfg ) {
CFG_HAS_NON_ZERO ( tiles.net.xdp_aio_depth );
CFG_HAS_NON_ZERO ( tiles.net.send_buffer_size );

CFG_HAS_NON_ZERO( tiles.netlink.max_routes );
CFG_HAS_NON_ZERO( tiles.netlink.max_neighbors );

CFG_HAS_NON_ZERO( tiles.quic.regular_transaction_listen_port );
CFG_HAS_NON_ZERO( tiles.quic.quic_transaction_listen_port );
CFG_HAS_NON_ZERO( tiles.quic.max_concurrent_connections );
Expand Down
7 changes: 5 additions & 2 deletions src/app/fdctl/fdctl.h
Original file line number Diff line number Diff line change
Expand Up @@ -123,8 +123,7 @@ fdctl_obj_loose( fd_topo_t const * topo,
fd_topo_run_tile_t
fdctl_tile_run( fd_topo_tile_t * tile );

#define ACTIONS_CNT (11UL)
extern action_t ACTIONS[ ACTIONS_CNT ];
extern action_t ACTIONS[];

void fdctl_boot( int * pargc,
char *** pargv,
Expand Down Expand Up @@ -207,6 +206,10 @@ void
spy_cmd_fn( args_t * args,
config_t * const config );

void
netconf_cmd_fn( args_t * args,
config_t * config );

void
help_cmd_fn( args_t * args,
config_t * const config );
Expand Down
2 changes: 1 addition & 1 deletion src/app/fdctl/help.c
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ help_cmd_fn( args_t * args,
--config parameter. */
FD_LOG_STDOUT(( " --config <PATH> Path to config TOML file\n\n" ));
FD_LOG_STDOUT(( "SUBCOMMANDS:\n" ));
for( ulong i=0; i<ACTIONS_CNT ; i++ ) {
for( ulong i=0; ACTIONS[ i ].name; i++ ) {
FD_LOG_STDOUT(( " %9s %s\n", ACTIONS[ i ].name, ACTIONS[ i ].description ));
}
}
2 changes: 2 additions & 0 deletions src/app/fdctl/main.c
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ configure_stage_t * STAGES[ CONFIGURE_STAGE_COUNT ] = {
};

extern fd_topo_run_tile_t fd_tile_net;
extern fd_topo_run_tile_t fd_tile_netlink;
extern fd_topo_run_tile_t fd_tile_quic;
extern fd_topo_run_tile_t fd_tile_verify;
extern fd_topo_run_tile_t fd_tile_dedup;
Expand Down Expand Up @@ -47,6 +48,7 @@ extern fd_topo_run_tile_t fd_tile_rpcserv;

fd_topo_run_tile_t * TILES[] = {
&fd_tile_net,
&fd_tile_netlink,
&fd_tile_quic,
&fd_tile_verify,
&fd_tile_dedup,
Expand Down
Loading