What is BPF?
BPF is an bytecode language for defining a filter to select interesting (or inversely non-interesting) packets. The language itself is extremely fast, and is usually executed in kernel without the need for superfluous copying or other wastage.
What does it look like?
What: List all packets to/from 8.8.8.8
Text Form:
host 8.8.8.8
Bytecode Form (Link-Layer based):
(000) ldh [12]
(001) jeq #0x800 jt 2 jf 6
(002) ld [26]
(003) jeq #0x8080808 jt 12 jf 4
(004) ld [30]
(005) jeq #0x8080808 jt 12 jf 13
(006) jeq #0x806 jt 8 jf 7
(007) jeq #0x8035 jt 8 jf 13
(008) ld [28]
(009) jeq #0x8080808 jt 12 jf 10
(010) ld [38]
(011) jeq #0x8080808 jt 12 jf 13
(012) ret #65535
(013) ret #0
How can I use this?
Most system administrators probably use BPF primarily to control the packets captured by utilities such as tcpdump. However there many other uses this bytecode, specifically the one I want to explore in this post - the IPTables xt_bpf module.
Why use BPF with IPTables?
There are many reasons, it really depends on what you are using IPTables for...
1. A Reduction in Rules
Its not uncommon to see IPTables rulesets like the following:
-A INPUT --fragment -j DROP
-A INPUT -s 1.2.3.4 -j ACCEPT # prevent lock out
-A INPUT -d 1.1.1.1 -p tcp -m tcp --dport 80 -j ACCEPT # Bob's Web Server
-A INPUT -d 1.1.1.1 -p udp -m udp --dport 53 -j ACCEPT # Bob's DNS Server
-A INPUT -d 1.1.1.2 -p tcp -m tcp --dport 80 -j ACCEPT # Alice's Web Server
-A INPUT -d 1.1.1.0/24 -j DROP # drop for all on server
or with state-full filtering (less overall rule hits)
-A INPUT --fragment -j DROP
-A INPUT -m state ESTABLISHED -j ACCEPT
-A INPUT -m state INVALID -j DROP
-A INPUT -s 1.2.3.4 -j ACCEPT # prevent lock out
-A INPUT -d 1.1.1.1 -p tcp -m tcp --dport 80 -j ACCEPT # Bob's Web Server
-A INPUT -d 1.1.1.1 -p udp -m udp --dport 53 -j ACCEPT # Bob's DNS Server
-A INPUT -d 1.1.1.2 -p tcp -m tcp --dport 80 -j ACCEPT # Alice's Web Server
-A INPUT -d 1.1.1.0/24 -j DROP # drop for all on server
Although simple in this example, in the real world this could expand over many hundreds of rules. Worse, the computation cost scales out linearly. If bob needs 100 ports that cant be expressed as ranges, then it requires 100 rules at 100 times the CPU cost. While this cost may be acceptable for you now, if under DDoS attack this can be a very significant restriction on the packets per second you can absorb.
IPTables, is not the fastest filtering engine out there. In fact its rather slow (go nftables!), and many of the performance bottlenecks are per rule (i.e incrementing rule hit counters) to this extent reducing rules in turn significantly improves performance. And with BPF the above rule set can be reduced to:
BPF=bytecode_compile "ip and not ip[6:2] & 0x1fff == 0 and (src 1.2.3.4 or (dst 1.1.1.1 and (tcp dst port 80 or udp dst port 53) or (dst 1.1.1.2 and tcp dst port 80)))"
-A INPUT -m bpf --bytecode "$BPF" -j ACCEPT
-A INPUT -d 1.1.1.1 -j DROP
or
-A INPUT --fragment -j DROP
-A INPUT -m state ESTABLISHED -j ACCEPT
-A INPUT -m state INVALID -j DROP
BPF=bytecode_compile "ip and (src 1.2.3.4 or (dst 1.1.1.1 and (tcp dst port 80 or udp dst port 53) or (dst 1.1.1.2 and tcp dst port 80))"
-A INPUT -m bpf --bytecode "$BPF" -j ACCEPT
-A INPUT -d 1.1.1.1 -j DROP
Less rules, Less CPU cost. And you can also easily test this expression using tcpdump. No more looking at IPTables packet counters or relying on the xt_log module (slow process of debugging)
# tcpdump -n "ip and ip[6:2] & 0x1fff == 0 and (src 1.2.3.4 or (dst 1.1.1.1 and (tcp dst port 80 or udp dst port 53) or (dst 1.1.1.2 and tcp dst port 80)))"
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
^C
0 packets captured
1 packet received by filter
0 packets dropped by kernel
2. Representation of more complex filters.
BPF is a very expressive language, far more than anything else in IPTables (even u32). Furthermore when compared to u32 it is far more human readable. For example (MONLIST is command 42):
Drop all NTP MONLIST (u32):
-A INPUT -p udp -m multiport --ports 123 -m u32 --u32 "0>>22&0x3C@8&0xFF=42" -j DROP
Drop all NTP MONLIST (bpf):
-A INPUT -m bpf --bytecode "$(nfbpf_compile RAW 'udp port 123 and udp[11]==42')" -j DROP
What about nftables?
While quite a nice idea, its still quite a way from being feature complete (in comparison with iptables, and in general) or even classed as stable. IPTables is immediately available, and BPF is great for bringing that little bit of extra power
Appendix I: nfbpf_compile
For those who don't know, you can get the utility required to compile the text form of BPF into bytecodes that can be accepted by xt_bpf from http://git.netfilter.org/iptables/tree/utils/nfbpf_compile.c
Unfortunately tcpdump wont suffice as the bytecodes generated begin at the link-layer whereas xt_bpf deals with bytecodes starting at the IP layer.
BPF is an bytecode language for defining a filter to select interesting (or inversely non-interesting) packets. The language itself is extremely fast, and is usually executed in kernel without the need for superfluous copying or other wastage.
What does it look like?
What: List all packets to/from 8.8.8.8
Text Form:
host 8.8.8.8
Bytecode Form (Link-Layer based):
(000) ldh [12]
(001) jeq #0x800 jt 2 jf 6
(002) ld [26]
(003) jeq #0x8080808 jt 12 jf 4
(004) ld [30]
(005) jeq #0x8080808 jt 12 jf 13
(006) jeq #0x806 jt 8 jf 7
(007) jeq #0x8035 jt 8 jf 13
(008) ld [28]
(009) jeq #0x8080808 jt 12 jf 10
(010) ld [38]
(011) jeq #0x8080808 jt 12 jf 13
(012) ret #65535
(013) ret #0
How can I use this?
Most system administrators probably use BPF primarily to control the packets captured by utilities such as tcpdump. However there many other uses this bytecode, specifically the one I want to explore in this post - the IPTables xt_bpf module.
Why use BPF with IPTables?
There are many reasons, it really depends on what you are using IPTables for...
1. A Reduction in Rules
Its not uncommon to see IPTables rulesets like the following:
-A INPUT --fragment -j DROP
-A INPUT -s 1.2.3.4 -j ACCEPT # prevent lock out
-A INPUT -d 1.1.1.1 -p tcp -m tcp --dport 80 -j ACCEPT # Bob's Web Server
-A INPUT -d 1.1.1.1 -p udp -m udp --dport 53 -j ACCEPT # Bob's DNS Server
-A INPUT -d 1.1.1.2 -p tcp -m tcp --dport 80 -j ACCEPT # Alice's Web Server
-A INPUT -d 1.1.1.0/24 -j DROP # drop for all on server
or with state-full filtering (less overall rule hits)
-A INPUT --fragment -j DROP
-A INPUT -m state ESTABLISHED -j ACCEPT
-A INPUT -m state INVALID -j DROP
-A INPUT -s 1.2.3.4 -j ACCEPT # prevent lock out
-A INPUT -d 1.1.1.1 -p tcp -m tcp --dport 80 -j ACCEPT # Bob's Web Server
-A INPUT -d 1.1.1.1 -p udp -m udp --dport 53 -j ACCEPT # Bob's DNS Server
-A INPUT -d 1.1.1.2 -p tcp -m tcp --dport 80 -j ACCEPT # Alice's Web Server
-A INPUT -d 1.1.1.0/24 -j DROP # drop for all on server
Although simple in this example, in the real world this could expand over many hundreds of rules. Worse, the computation cost scales out linearly. If bob needs 100 ports that cant be expressed as ranges, then it requires 100 rules at 100 times the CPU cost. While this cost may be acceptable for you now, if under DDoS attack this can be a very significant restriction on the packets per second you can absorb.
IPTables, is not the fastest filtering engine out there. In fact its rather slow (go nftables!), and many of the performance bottlenecks are per rule (i.e incrementing rule hit counters) to this extent reducing rules in turn significantly improves performance. And with BPF the above rule set can be reduced to:
BPF=bytecode_compile "ip and not ip[6:2] & 0x1fff == 0 and (src 1.2.3.4 or (dst 1.1.1.1 and (tcp dst port 80 or udp dst port 53) or (dst 1.1.1.2 and tcp dst port 80)))"
-A INPUT -m bpf --bytecode "$BPF" -j ACCEPT
-A INPUT -d 1.1.1.1 -j DROP
or
-A INPUT --fragment -j DROP
-A INPUT -m state ESTABLISHED -j ACCEPT
-A INPUT -m state INVALID -j DROP
BPF=bytecode_compile "ip and (src 1.2.3.4 or (dst 1.1.1.1 and (tcp dst port 80 or udp dst port 53) or (dst 1.1.1.2 and tcp dst port 80))"
-A INPUT -m bpf --bytecode "$BPF" -j ACCEPT
-A INPUT -d 1.1.1.1 -j DROP
Less rules, Less CPU cost. And you can also easily test this expression using tcpdump. No more looking at IPTables packet counters or relying on the xt_log module (slow process of debugging)
# tcpdump -n "ip and ip[6:2] & 0x1fff == 0 and (src 1.2.3.4 or (dst 1.1.1.1 and (tcp dst port 80 or udp dst port 53) or (dst 1.1.1.2 and tcp dst port 80)))"
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
^C
0 packets captured
1 packet received by filter
0 packets dropped by kernel
2. Representation of more complex filters.
BPF is a very expressive language, far more than anything else in IPTables (even u32). Furthermore when compared to u32 it is far more human readable. For example (MONLIST is command 42):
Drop all NTP MONLIST (u32):
-A INPUT -p udp -m multiport --ports 123 -m u32 --u32 "0>>22&0x3C@8&0xFF=42" -j DROP
Drop all NTP MONLIST (bpf):
-A INPUT -m bpf --bytecode "$(nfbpf_compile RAW 'udp port 123 and udp[11]==42')" -j DROP
What about nftables?
While quite a nice idea, its still quite a way from being feature complete (in comparison with iptables, and in general) or even classed as stable. IPTables is immediately available, and BPF is great for bringing that little bit of extra power
Appendix I: nfbpf_compile
For those who don't know, you can get the utility required to compile the text form of BPF into bytecodes that can be accepted by xt_bpf from http://git.netfilter.org/iptables/tree/utils/nfbpf_compile.c
Unfortunately tcpdump wont suffice as the bytecodes generated begin at the link-layer whereas xt_bpf deals with bytecodes starting at the IP layer.