-
Notifications
You must be signed in to change notification settings - Fork 1.9k
vsock saga, episode 1: vhost-free vsock device #1106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -28,7 +28,4 @@ panic = "abort" | |||
lto = true | |||
panic = "abort" | |||
|
|||
[features] | |||
vsock = ["api_server/vsock", "jailer/vsock"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How difficult would it be to keep the vsock-vhost implementation as an alternative to vsock-unixsockets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, aside from the considerable implementation and maintenance effort, supporting two different implementations for the same feature would not really align well with our tenets.
pub const NUM_QUEUES: usize = 3; | ||
/// Virtio queue sizes, in number of descriptor chain heads. | ||
/// There are 3 queues for a virtio device (in this order): RX, TX, Event | ||
pub const QUEUE_SIZES: &[u16] = &[256; NUM_QUEUES]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The queue sizes shoud be [256, 256, 2]
. There's no reason for allocating 256 descriptors for the reset event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We aren't allocating anything, though. Those are just the max sizes that the driver can choose to allocate.
/// | ||
/// The device processes available buffers in the same order in which the device | ||
/// offers them. | ||
pub const VIRTIO_F_IN_ORDER: usize = 35; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one and the next are virtio device independent feature bits, so maybe we should get them out of the Vsock device related modules (they might already be defined somewhere else as well) for neatness reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I considered that, especially since all these UAPI defs should be neatly stored somewhere else, but that would require some clean up effort throughout all our devices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: at least leave a TODO
for this, someone might do it when they're bored if they see it when browsing code
defs::NUM_QUEUES, | ||
queues.len() | ||
); | ||
return Err(ActivateError::BadActivate); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this leas to the VMM exiting. I am not sure this is the expected behavior on driver initialization failure. I think the device can be reset and re-initialized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea we don't support device initialization failures/resets/etc right now, and you're right it error causes an exit. We plan to get rid of it as we ingest more rust-vmm components, but that might change if there's a pressing need along the way.
mut queue_evts: Vec<EventFd>, | ||
) -> ActivateResult { | ||
if queues.len() != defs::NUM_QUEUES || queue_evts.len() != defs::NUM_QUEUES { | ||
error!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would argue that this is misbehavior from the guest and might be worth a warning, but not an error. I think errors should be reserved for non-recoverable states of the VMM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only current practical implication of the log level is whether a message actually shows up as you restrict what actually gets written to the log file. We see this as pretty important given the use cases we know of, so we've made it (and similar events) an error. Things might definitely change in the future.
@@ -13,6 +13,8 @@ pub struct VsockDeviceConfig { | |||
pub id: String, | |||
/// A 32-bit Context Identifier (CID) used to identify the guest. | |||
pub guest_cid: u32, | |||
/// Path to local unix socket. | |||
pub uds_path: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a VsockBackend
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. However, currently, there's no clean way of handling these config structs properly. The time is close when we'll have to, though, and this will get solved.
vmm/src/vmm_config/vsock.rs
Outdated
@@ -13,6 +13,8 @@ pub struct VsockDeviceConfig { | |||
pub id: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not clear what the purpose of this is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just a device identifier, since currently there's support for multiple devices, even if the Linux driver doesn't.
devices/src/virtio/vsock/device.rs
Outdated
backend: Option<B>, | ||
avail_features: u64, | ||
acked_features: u64, | ||
config_space: Vec<u8>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed? There's no reason for a guest to write random data to the config_space of this device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, the device specific configuration space for vsock appears to only contain the guest_cid
. If we don't want to allow guests to change that (which is also the simpler thing to do for now), we can remove this member (as it seems the read_config
function only makes use of cid
anyway), and not do any actual change for write_config
.
4ccf155
to
636e4a9
Compare
7c28a35
to
41e214d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Posting/updating some comments from a while back regarding the first part of the PR.
@@ -102,13 +102,3 @@ def test_firecracker_swagger(): | |||
os.path.join(os.getcwd(), '../api_server/swagger/firecracker.yaml') | |||
) | |||
check_swagger_style(yaml_spec) | |||
|
|||
|
|||
def test_experimental_firecracker_swagger(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would make sense for the test to declare itself as skipped if the experimental swagger file does not exist. I imagine we might bring it back in the future if we try out new experimental features, and we'll want the check to happen again at that point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we may bring an experimental swagger file back at some point, but I think that's not enough reason to keep an unused test hanging around. I think it's a pretty straight-forward test to write, once we'll need it (mostly copy-paste from the stable version).
devices/src/virtio/mod.rs
Outdated
@@ -49,6 +51,7 @@ pub const NOTIFY_REG_OFFSET: u32 = 0x50; | |||
pub enum ActivateError { | |||
EpollCtl(IOError), | |||
BadActivate, | |||
VsockError(self::vsock::VsockError), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, couldn't notice from the code, why is VsockError
a special variant of ActivateError
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, that's a leftover from an old version of the code - will remove it.
// } else { | ||
// warn!("vsock: unexpected backend event flags={:08x}", evset_bits); | ||
// } | ||
self.backend.notify(epoll::Events::EPOLLIN); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be self.backend.notify(evset);
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, it's fixed.
devices/src/virtio/vsock/packet.rs
Outdated
/// | ||
/// Warning: Both `ptr` and `len` must be insured as valid by the caller. | ||
/// | ||
fn from_fat_ptr_unchecked(ptr: *const u8, len: usize) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove the _fat
part from the name, as there's no risk for confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole code is gone now.
2861dcd
to
4971ff2
Compare
260f5ef
to
60efa35
Compare
/// | ||
pub struct VsockPacket { | ||
hdr: *mut u8, | ||
buf: Option<*mut u8>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some general concerns with these pointers to memory: the backing memory could definitely go away at some point before the VsockPacket disappears. VsockPacket should not exceed the lifetime of the backing memory or all the operations done to it should be otherwise unsafe.
There is a comment around the unsafe code, but it's not entirely accurate.
It's true that you're using it in a way that's safe, as far as I can tell, but in the general case, this breaks memory safety rules, as the compiler can't tell if hdr
and buf
still point to valid memory.
// This is safe since bound checks have already been performed when creating the packet
// from the virtq descriptor.
I think adding a lifetime parameter to this type would improve things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you referring to the backing memory going away once the virtio buffers are marked used, and thusly their ownership returned to the driver? If that's so, I think a rigorously correct solution would have us synchronize the lifetime of virtio buffer wrappers (i.e. DescriptorChain
) and VsockPacket
in a non-trivial manner. That would require a considerable amount of work on our virtio handling code.
If you were referring to something else, could you please dive into a bit more detail? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that's exactly right. The lifetime of both hdr
and buf
depends on the logical lifetime of associated virtqueue descriptors. I haven't looked into an exact solution, but there should be some lifetime dependency between the VsockPacket and the memory it has pointers to. Not sure how easy that would be considering you hold direct pointers to memory instead of references.
Maybe something like this would work?
pub struct VsockPacket<'a, 'b> {
hdr: UnsafeCell<u8>,
buf: Option<UnsafeCell<u8>>,
phantom_hdr: PhantomData<&'a u8>,
phantom_buf: PhantomData<&'b u8>,
}
There's probably something even simpler than this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, then I don't think rust lifetimes are going to help, since descriptor lifetimes are not known at compilation time. I.e. the data pointed to by a descriptor lives up until the moment we add that descriptor to the used vring.
If I understand you correctly, you are referring to the rust lifetimes explicitly set for DescriptorChain
(please correct me if I misunderstood). Those are actually not what they might appear to be, though. I.e. they aren't connected to the data lifetime (since, as I was saying, that can only be known at runtime), but are just there so DescriptorChain
can hold a reference to GuestMemory
. That pattern itself needs to be cleaned up, since references to GuestMemory
make little sense (GuestMemory
is just an Arc
, so a reference to it is basically a statically-tracked reference to a dynamically-tracked reference).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The descriptors themselves no, but the wrapping structure does have a lifetime.
let used_len = match VsockPacket::from_rx_virtq_head(&head)
This creates a new VsockPacket
from &head
by extracting the internal memory region of head
and the next descriptor. However, after this creation succeeds, VsockPacket
is completely disconnected from head
, instead of owning a reference to it with a proper lifetime to it. So, you could add it to the used queue and the memory gets reclaimed by the guest, while the VsockPacket
is still alive and usable from the VMM.
Speaking of which:
self.rxvq.add_used(&self.mem, head.index, used_len);
This needs to consume head
and expire its lifetime instead of using head.index
.
I think I'll turn this into a larger issue instead of commenting on this particular PR.
b5f09d0
to
4038418
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vsock saga: Review v1
(Reached final commit)
LittleEndian::read_u64(&self.hdr()[HDROFF_SRC_CID..]) | ||
} | ||
|
||
pub fn set_src_cid(&mut self, cid: u64) -> &mut Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use references here (i.e cid: &u64) to avoid copying of u64 and then dereferencing.
It applies to all of the above args passed by value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A reference to an u64
would be another u64
(pointers are 64-bit wide on 64 bit architectures). Even on 32-bit, though, we still wouldn't want to force passing 64-bit args by reference, since that would imply a memory access. Large args can be passed by value more efficiently, via various regs, without touching the memory. E.g. the compiler may choose to use AVX-512 registers to pass an argument that's 512-bit wide.
Passing by value vs by reference is actually decided by the compiler, when applying compile-time optimizations. Using &
is a higher-level abstraction, telling rust we want to borrow, not pass ownership. The compiler then decides the optimal way to pass the argument.
@@ -142,6 +213,10 @@ where | |||
} | |||
} | |||
|
|||
if raise_irq { | |||
self.signal_used_queue().unwrap_or_default(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not self.signal_used_queue()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rust warns you when you're not handling Results in any way. This seems like a hack to silence the compiler about it. However, is there any good reason why this shouldn't return the error? At least logging such an event would be a better solution than simply failing silently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error is already logged by signal_used_queue()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not referring to logging the error but to propagating it. Right now the unwrap_or_default
is silencing it in case it returns error.
Is that what you want?
The net device is propagating it: https://github.com/firecracker-microvm/firecracker/blob/master/devices/src/virtio/net.rs#L581.
assert_eq!(data, [0u8, 1, 2, 3, 4, 5, 6, 7]); | ||
|
||
// Just covering lines here. | ||
ctx.device.write_config(0, &data[..4]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd actually check that read_config
returns the same value after calling write_config
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The vsock device has no writable config data. The only config available is the device CID, and that's not writable by the guest driver. However, the VirtioDevice
trait does require a write_config()
function, hence the "just covering lines" comment. I've amended the comment to make this more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, yes, no writable config data for vsock which is exactly what I was implying with my comment. Since the write_config
is supposed to not change the config, I would do:
x = read_config;
write_config(smth);
y = read_config;
assert_eq(x,y);
You are checking that write_config
for the vsock device does not actually write the config data (which in this case consists only of the CID).
} | ||
|
||
// Test a correct activation. | ||
ctx.device |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these are unit tests why not use a explicit assert
(i.e
assert!(ctx.device
.activate(
ctx.mem.clone(),
EventFd::new().unwrap(),
Arc::new(AtomicUsize::new(0)),
vec![
VirtQueue::new(256),
VirtQueue::new(256),
VirtQueue::new(256),
],
vec![
EventFd::new().unwrap(),
EventFd::new().unwrap(),
EventFd::new().unwrap(),
],
)
.is_ok());
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really see how the two are different. Can you elaborate on why this is preferable, please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to use asserts in unit tests for verbosity. See below.
Output when test fails and you use unwrap
---- virtio::vsock::device::tests::test_virtio_device stdout ----
thread 'virtio::vsock::device::tests::test_virtio_device' panicked at 'called `Result::unwrap()` on an `Err` value: BadActivate', src/libcore/result.rs:997:5
Output when test fails and you use assert!:
---- virtio::vsock::device::tests::test_virtio_device stdout ----
thread 'virtio::vsock::device::tests::test_virtio_device' panicked at 'assertion failed: ctx.device.activate(ctx.mem.clone(), EventFd::new().unwrap(),
Arc::new(AtomicUsize::new(0)),
vec![VirtQueue :: new ( 256 ) , VirtQueue :: new ( 256 ) ,
VirtQueue :: new ( 256 ) ,],
vec![EventFd :: new ( ) . unwrap ( ) , EventFd :: new (
) . unwrap ( ) , EventFd :: new ( ) . unwrap ( )
,]).is_ok()', devices/src/virtio/vsock/device.rs:301:9
However, this is a personal preference which I do not feel strongly about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of nits I forgot about before. Looks good otherwise; needs rebasing.
vsock_id='vsock2', | ||
guest_cid=16 | ||
guest_cid=15, | ||
uds_path='vsock.sock' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v.sock
sounds so much cooler. There are some other occurrences which could benefit from this change as well.
@@ -111,6 +119,20 @@ where | |||
} | |||
defs::BACKEND_EVENT => { | |||
debug!("vsock: backend event"); | |||
// if let Some(evset) = epoll::Events::from_bits(evset_bits) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be a nice touch to remove these unused lines from the current commit as well.
Prepared the way for the new AF_UNIX-backed vsock device by: * removing old vhost code; and * removing the "vsock" feature. Signed-off-by: Dan Horobeanu <[email protected]>
Updated the devices crate to use Rust edition 2018 features. This was done in preparation for the new vsock device, the code for which will make use of some nifty 2018 features, such as non-lexical lifetimes. Signed-off-by: Dan Horobeanu <[email protected]>
Added the skeleton for a new vsock device, that will use Unix sockets as a backend, instead of vhost. Signed-off-by: Dan Horobeanu <[email protected]>
Added the definition for a generic backend trait, that could be used to possibly offload vsock traffic to something other than Unix sockets. Signed-off-by: Dan Horobeanu <[email protected]>
Added code to handle reading and writing vsock packets from/to the virtio queues. Signed-off-by: Dan Horobeanu <[email protected]>
Added unit tests for: - the vsock virtio device implementation - the vsock epoll handler - the vsock packet handling interface Signed-off-by: Dan Horobeanu <[email protected]>
Closed as a subset of #1176 . |
Issue: #650
Description of changes
Note: the vsock series is now split across this PR, together with #1175 and #1176, for a leaner review process. Considering that the experimental, vhost-based, vsock functionality gets removed here, the entire series should get merged as closely as possible.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.