The seccomp filter should check the most frequently used syscalls first #1022

dhrgit · 2019-03-19T16:52:27Z

The seccomp whitelist should have a way to specify the order in which the rules are checked. This would allow for the most frequently used syscalls to be checked first, so that they are quickly allowed through the filter.

An idea is to use a vector to hold the whitelist, such that the order of items in that vector is reflected in the order of BPF statements produced by the filter. A quick PoC for that idea can be found here.

alindima · 2020-09-15T08:27:49Z

I have done a small investigation to see whether there is significant space for improvement in regards to seccomp ordering.

The measurements were done by attaching an strace process to each jailer pid, on Firecracker’s CI test suite (using the -c and -f flags). This results in a big list of files, where each of them contains a statistic on the nr of syscalls issued by that FC process and the total time spent in syscalls by that process (these are across all threads).
By summing these two variables across all files and dividing the total time spent in syscalls by the total nr of syscalls, we get the average time per syscall.

This process was repeated with both --seccomp-level=0 and seccomp-level=2

The average time spent by FC in a syscall is 0.0400369 miliseconds, with advanced seccomp enabled. With seccomp disabled, the time drops to only 0.0378411 miliseconds. This is only a 5.484% decrease. Keep in mind that this is the absolute difference, when running without seccomp. So, by sorting the syscalls, we seem to not be getting close enough to a value that would bring any significant improvements, especially given the potential reduction of the filter (done by the recent manual auditing, yet to be merged).

Before taking a decision, with the help of @iulianbarbu, we will run some IO-intensive performance tests with and without seccomp to see the effect they have on measurements like throughput, latency, etc.

alindima · 2021-05-14T15:09:06Z

Looking again over the above facts, strace incurs a pretty big overhead, so the measurements are likely not relevant. I'll run some IO-intensive performance tests when I get some time

acatangiu · 2021-05-14T15:26:49Z

This results in a big list of files, where each of them contains a statistic on the nr of syscalls issued by that FC process and the total time spent in syscalls by that process (these are across all threads).

If we have a sorted list of numbers of calls for each syscall type, it might be easier to just rearrange our filters to match that.

If the investigation cost of finding out how much we gain outweighs the cost of just doing it, I think we should just do it based only on the knowledge that it will be statistically more efficient (without knowing if negligibly or significantly so).

What are the downsides here?

alindima · 2021-05-17T07:17:19Z

The only thing we should pay attention to is the fact that it may be throwaway work.
Running Firecracker's performance tests with seccomp disabled would give some insight into what the absolute room for improvement would be. I'll do this in the following days, it shouldn't take so long, now that we have some performance tests in the framework.

The only way of realistically measuring the improvement will be to implement the feature and see.

alindima · 2021-05-20T14:58:51Z

I ran the network and vsock throughput performance tests and no difference was noticed. This is not conclusive, since they have some deltas that go up to ~10%.

Next step would be to try to measure the average syscall duration with and without seccomp, with ftrace or perf, to see the difference. They use kernel tracepoints so they should be pretty low overhead, compared to strace

alindima · 2021-05-21T10:22:25Z

I ran some tests with iperf-vsock, with HOST-to-VM traffic, and recorded the syscall entry and syscall exit points using perf, for the recvfrom syscall. I chose recvfrom because its syscall number is 45, therefore not among the first in the filter (the filter is sorted by syscall number, increasingly), and because it is heavily called throughout the iperf-vsock test, for receiving data from the UDS, with the same buffer size always: 4096B, so the duration should be relatively constant.

I've written a script that parses the perf output and computes the average.
Here are the results:

with seccomp-level 2
perf sampled 415525 recvfrom syscalls
average duration:1.267261897581429e-06 (1.2672 microseconds)
with seccomp-level 0
perf sampled 430318 recvfrom syscalls
average duration:1.26224559512035e-06 (1.2622 microseconds)

There is also no visible difference in the reported throughput when running iperf-vsock, with seccomp enabled and disabled.

We can now conclude the investigation, since the overhead of the seccomp filtering is not noticeable.

Closing this issue per the above investigation and reasoning

alxiord added Feature: Jailing labels Mar 19, 2019

alxiord added the Priority: Low Indicates that an issue or pull request should be resolved behind issues or pull requests labelled ` label Apr 8, 2019

This was referenced Jul 14, 2019

[Hardening] Seccomp improvements #1177

Closed

[RFC] 2020 Roadmap #1104

Closed

alindima self-assigned this Jul 23, 2020

alindima added the Performance: Misc label May 14, 2021

alindima closed this as completed May 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The seccomp filter should check the most frequently used syscalls first #1022

The seccomp filter should check the most frequently used syscalls first #1022

dhrgit commented Mar 19, 2019

alindima commented Sep 15, 2020

Uh oh!

alindima commented May 14, 2021

Uh oh!

acatangiu commented May 14, 2021 •

edited

Loading

Uh oh!

alindima commented May 17, 2021

Uh oh!

alindima commented May 20, 2021

Uh oh!

alindima commented May 21, 2021

Uh oh!

The seccomp filter should check the most frequently used syscalls first #1022

The seccomp filter should check the most frequently used syscalls first #1022

Comments

dhrgit commented Mar 19, 2019

alindima commented Sep 15, 2020

Uh oh!

alindima commented May 14, 2021

Uh oh!

acatangiu commented May 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alindima commented May 17, 2021

Uh oh!

alindima commented May 20, 2021

Uh oh!

alindima commented May 21, 2021

Uh oh!

acatangiu commented May 14, 2021 •

edited

Loading