Skip to content

The seccomp filter should check the most frequently used syscalls first #1022

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dhrgit opened this issue Mar 19, 2019 · 6 comments
Closed
Assignees
Labels
Priority: Low Indicates that an issue or pull request should be resolved behind issues or pull requests labelled `

Comments

@dhrgit
Copy link
Contributor

dhrgit commented Mar 19, 2019

The seccomp whitelist should have a way to specify the order in which the rules are checked. This would allow for the most frequently used syscalls to be checked first, so that they are quickly allowed through the filter.

An idea is to use a vector to hold the whitelist, such that the order of items in that vector is reflected in the order of BPF statements produced by the filter. A quick PoC for that idea can be found here.

@alxiord alxiord added the Priority: Low Indicates that an issue or pull request should be resolved behind issues or pull requests labelled ` label Apr 8, 2019
This was referenced Jul 14, 2019
@alindima alindima self-assigned this Jul 23, 2020
@alindima
Copy link
Contributor

I have done a small investigation to see whether there is significant space for improvement in regards to seccomp ordering.

The measurements were done by attaching an strace process to each jailer pid, on Firecracker’s CI test suite (using the -c and -f flags). This results in a big list of files, where each of them contains a statistic on the nr of syscalls issued by that FC process and the total time spent in syscalls by that process (these are across all threads).
By summing these two variables across all files and dividing the total time spent in syscalls by the total nr of syscalls, we get the average time per syscall.

This process was repeated with both --seccomp-level=0 and seccomp-level=2

The average time spent by FC in a syscall is 0.0400369 miliseconds, with advanced seccomp enabled. With seccomp disabled, the time drops to only 0.0378411 miliseconds. This is only a 5.484% decrease. Keep in mind that this is the absolute difference, when running without seccomp. So, by sorting the syscalls, we seem to not be getting close enough to a value that would bring any significant improvements, especially given the potential reduction of the filter (done by the recent manual auditing, yet to be merged).

Before taking a decision, with the help of @iulianbarbu, we will run some IO-intensive performance tests with and without seccomp to see the effect they have on measurements like throughput, latency, etc.

@alindima
Copy link
Contributor

Looking again over the above facts, strace incurs a pretty big overhead, so the measurements are likely not relevant. I'll run some IO-intensive performance tests when I get some time

@acatangiu
Copy link
Contributor

acatangiu commented May 14, 2021

This results in a big list of files, where each of them contains a statistic on the nr of syscalls issued by that FC process and the total time spent in syscalls by that process (these are across all threads).

If we have a sorted list of numbers of calls for each syscall type, it might be easier to just rearrange our filters to match that.

If the investigation cost of finding out how much we gain outweighs the cost of just doing it, I think we should just do it based only on the knowledge that it will be statistically more efficient (without knowing if negligibly or significantly so).

What are the downsides here?

@alindima
Copy link
Contributor

The only thing we should pay attention to is the fact that it may be throwaway work.
Running Firecracker's performance tests with seccomp disabled would give some insight into what the absolute room for improvement would be. I'll do this in the following days, it shouldn't take so long, now that we have some performance tests in the framework.

The only way of realistically measuring the improvement will be to implement the feature and see.

@alindima
Copy link
Contributor

I ran the network and vsock throughput performance tests and no difference was noticed. This is not conclusive, since they have some deltas that go up to ~10%.

Next step would be to try to measure the average syscall duration with and without seccomp, with ftrace or perf, to see the difference. They use kernel tracepoints so they should be pretty low overhead, compared to strace

@alindima
Copy link
Contributor

I ran some tests with iperf-vsock, with HOST-to-VM traffic, and recorded the syscall entry and syscall exit points using perf, for the recvfrom syscall. I chose recvfrom because its syscall number is 45, therefore not among the first in the filter (the filter is sorted by syscall number, increasingly), and because it is heavily called throughout the iperf-vsock test, for receiving data from the UDS, with the same buffer size always: 4096B, so the duration should be relatively constant.

I've written a script that parses the perf output and computes the average.
Here are the results:

  • with seccomp-level 2
    perf sampled 415525 recvfrom syscalls
    average duration:1.267261897581429e-06 (1.2672 microseconds)

  • with seccomp-level 0
    perf sampled 430318 recvfrom syscalls
    average duration:1.26224559512035e-06 (1.2622 microseconds)

There is also no visible difference in the reported throughput when running iperf-vsock, with seccomp enabled and disabled.

We can now conclude the investigation, since the overhead of the seccomp filtering is not noticeable.

Closing this issue per the above investigation and reasoning

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: Low Indicates that an issue or pull request should be resolved behind issues or pull requests labelled `
Projects
None yet
Development

No branches or pull requests

4 participants