Skip to content

vsock saga, episode 1: vhost-free vsock device #1106

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

dhrgit
Copy link
Contributor

@dhrgit dhrgit commented May 26, 2019

Issue: #650

Description of changes

  • Removed experimental vhost-based vsock support;
  • Added a vhost-less vsock device;
  • Added a generic vsock backend trait;
  • Implemented the vsock packet assembly.

Note: the vsock series is now split across this PR, together with #1175 and #1176, for a leaner review process. Considering that the experimental, vhost-based, vsock functionality gets removed here, the entire series should get merged as closely as possible.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@dhrgit dhrgit self-assigned this May 26, 2019
This was referenced May 26, 2019
@@ -28,7 +28,4 @@ panic = "abort"
lto = true
panic = "abort"

[features]
vsock = ["api_server/vsock", "jailer/vsock"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How difficult would it be to keep the vsock-vhost implementation as an alternative to vsock-unixsockets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, aside from the considerable implementation and maintenance effort, supporting two different implementations for the same feature would not really align well with our tenets.

pub const NUM_QUEUES: usize = 3;
/// Virtio queue sizes, in number of descriptor chain heads.
/// There are 3 queues for a virtio device (in this order): RX, TX, Event
pub const QUEUE_SIZES: &[u16] = &[256; NUM_QUEUES];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The queue sizes shoud be [256, 256, 2]. There's no reason for allocating 256 descriptors for the reset event.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We aren't allocating anything, though. Those are just the max sizes that the driver can choose to allocate.

///
/// The device processes available buffers in the same order in which the device
/// offers them.
pub const VIRTIO_F_IN_ORDER: usize = 35;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this required?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one and the next are virtio device independent feature bits, so maybe we should get them out of the Vsock device related modules (they might already be defined somewhere else as well) for neatness reasons.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I considered that, especially since all these UAPI defs should be neatly stored somewhere else, but that would require some clean up effort throughout all our devices.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: at least leave a TODO for this, someone might do it when they're bored if they see it when browsing code

defs::NUM_QUEUES,
queues.len()
);
return Err(ActivateError::BadActivate);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this leas to the VMM exiting. I am not sure this is the expected behavior on driver initialization failure. I think the device can be reset and re-initialized.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea we don't support device initialization failures/resets/etc right now, and you're right it error causes an exit. We plan to get rid of it as we ingest more rust-vmm components, but that might change if there's a pressing need along the way.

mut queue_evts: Vec<EventFd>,
) -> ActivateResult {
if queues.len() != defs::NUM_QUEUES || queue_evts.len() != defs::NUM_QUEUES {
error!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue that this is misbehavior from the guest and might be worth a warning, but not an error. I think errors should be reserved for non-recoverable states of the VMM.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only current practical implication of the log level is whether a message actually shows up as you restrict what actually gets written to the log file. We see this as pretty important given the use cases we know of, so we've made it (and similar events) an error. Things might definitely change in the future.

@@ -13,6 +13,8 @@ pub struct VsockDeviceConfig {
pub id: String,
/// A 32-bit Context Identifier (CID) used to identify the guest.
pub guest_cid: u32,
/// Path to local unix socket.
pub uds_path: String,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a VsockBackend instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. However, currently, there's no clean way of handling these config structs properly. The time is close when we'll have to, though, and this will get solved.

@@ -13,6 +13,8 @@ pub struct VsockDeviceConfig {
pub id: String,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear what the purpose of this is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just a device identifier, since currently there's support for multiple devices, even if the Linux driver doesn't.

backend: Option<B>,
avail_features: u64,
acked_features: u64,
config_space: Vec<u8>,
Copy link
Contributor

@petreeftime petreeftime May 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed? There's no reason for a guest to write random data to the config_space of this device.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, the device specific configuration space for vsock appears to only contain the guest_cid. If we don't want to allow guests to change that (which is also the simpler thing to do for now), we can remove this member (as it seems the read_config function only makes use of cid anyway), and not do any actual change for write_config.

@dhrgit dhrgit force-pushed the vsock branch 2 times, most recently from 4ccf155 to 636e4a9 Compare June 8, 2019 14:14
@dhrgit dhrgit force-pushed the vsock branch 3 times, most recently from 7c28a35 to 41e214d Compare June 20, 2019 08:03
Copy link
Contributor

@alexandruag alexandruag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posting/updating some comments from a while back regarding the first part of the PR.

@@ -102,13 +102,3 @@ def test_firecracker_swagger():
os.path.join(os.getcwd(), '../api_server/swagger/firecracker.yaml')
)
check_swagger_style(yaml_spec)


def test_experimental_firecracker_swagger():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would make sense for the test to declare itself as skipped if the experimental swagger file does not exist. I imagine we might bring it back in the future if we try out new experimental features, and we'll want the check to happen again at that point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we may bring an experimental swagger file back at some point, but I think that's not enough reason to keep an unused test hanging around. I think it's a pretty straight-forward test to write, once we'll need it (mostly copy-paste from the stable version).

@@ -49,6 +51,7 @@ pub const NOTIFY_REG_OFFSET: u32 = 0x50;
pub enum ActivateError {
EpollCtl(IOError),
BadActivate,
VsockError(self::vsock::VsockError),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, couldn't notice from the code, why is VsockError a special variant of ActivateError?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, that's a leftover from an old version of the code - will remove it.

// } else {
// warn!("vsock: unexpected backend event flags={:08x}", evset_bits);
// }
self.backend.notify(epoll::Events::EPOLLIN);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be self.backend.notify(evset); here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, it's fixed.

///
/// Warning: Both `ptr` and `len` must be insured as valid by the caller.
///
fn from_fat_ptr_unchecked(ptr: *const u8, len: usize) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove the _fat part from the name, as there's no risk for confusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole code is gone now.

///
pub struct VsockPacket {
hdr: *mut u8,
buf: Option<*mut u8>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some general concerns with these pointers to memory: the backing memory could definitely go away at some point before the VsockPacket disappears. VsockPacket should not exceed the lifetime of the backing memory or all the operations done to it should be otherwise unsafe.

There is a comment around the unsafe code, but it's not entirely accurate.
It's true that you're using it in a way that's safe, as far as I can tell, but in the general case, this breaks memory safety rules, as the compiler can't tell if hdr and buf still point to valid memory.

// This is safe since bound checks have already been performed when creating the packet
// from the virtq descriptor.

I think adding a lifetime parameter to this type would improve things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referring to the backing memory going away once the virtio buffers are marked used, and thusly their ownership returned to the driver? If that's so, I think a rigorously correct solution would have us synchronize the lifetime of virtio buffer wrappers (i.e. DescriptorChain) and VsockPacket in a non-trivial manner. That would require a considerable amount of work on our virtio handling code.

If you were referring to something else, could you please dive into a bit more detail? Thanks!

Copy link
Contributor

@petreeftime petreeftime Jul 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that's exactly right. The lifetime of both hdr and buf depends on the logical lifetime of associated virtqueue descriptors. I haven't looked into an exact solution, but there should be some lifetime dependency between the VsockPacket and the memory it has pointers to. Not sure how easy that would be considering you hold direct pointers to memory instead of references.
Maybe something like this would work?

pub struct VsockPacket<'a, 'b> {
    hdr:  UnsafeCell<u8>,
    buf: Option<UnsafeCell<u8>>,
    phantom_hdr: PhantomData<&'a u8>,
    phantom_buf: PhantomData<&'b u8>,
}

There's probably something even simpler than this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, then I don't think rust lifetimes are going to help, since descriptor lifetimes are not known at compilation time. I.e. the data pointed to by a descriptor lives up until the moment we add that descriptor to the used vring.

If I understand you correctly, you are referring to the rust lifetimes explicitly set for DescriptorChain (please correct me if I misunderstood). Those are actually not what they might appear to be, though. I.e. they aren't connected to the data lifetime (since, as I was saying, that can only be known at runtime), but are just there so DescriptorChain can hold a reference to GuestMemory. That pattern itself needs to be cleaned up, since references to GuestMemory make little sense (GuestMemory is just an Arc, so a reference to it is basically a statically-tracked reference to a dynamically-tracked reference).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The descriptors themselves no, but the wrapping structure does have a lifetime.

let used_len = match VsockPacket::from_rx_virtq_head(&head)

This creates a new VsockPacket from &head by extracting the internal memory region of head and the next descriptor. However, after this creation succeeds, VsockPacket is completely disconnected from head, instead of owning a reference to it with a proper lifetime to it. So, you could add it to the used queue and the memory gets reclaimed by the guest, while the VsockPacket is still alive and usable from the VMM.

Speaking of which:

self.rxvq.add_used(&self.mem, head.index, used_len);

This needs to consume head and expire its lifetime instead of using head.index.

I think I'll turn this into a larger issue instead of commenting on this particular PR.

@acatangiu acatangiu changed the title Vsock saga: the code is here vsock saga, episode 1: vhost-free vsock device Jul 18, 2019
@dhrgit dhrgit force-pushed the vsock branch 2 times, most recently from b5f09d0 to 4038418 Compare August 3, 2019 06:48
Copy link
Contributor

@dianpopa dianpopa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vsock saga: Review v1
(Reached final commit)

LittleEndian::read_u64(&self.hdr()[HDROFF_SRC_CID..])
}

pub fn set_src_cid(&mut self, cid: u64) -> &mut Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use references here (i.e cid: &u64) to avoid copying of u64 and then dereferencing.
It applies to all of the above args passed by value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A reference to an u64 would be another u64 (pointers are 64-bit wide on 64 bit architectures). Even on 32-bit, though, we still wouldn't want to force passing 64-bit args by reference, since that would imply a memory access. Large args can be passed by value more efficiently, via various regs, without touching the memory. E.g. the compiler may choose to use AVX-512 registers to pass an argument that's 512-bit wide.

Passing by value vs by reference is actually decided by the compiler, when applying compile-time optimizations. Using & is a higher-level abstraction, telling rust we want to borrow, not pass ownership. The compiler then decides the optimal way to pass the argument.

@@ -142,6 +213,10 @@ where
}
}

if raise_irq {
self.signal_used_queue().unwrap_or_default();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not self.signal_used_queue()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust warns you when you're not handling Results in any way. This seems like a hack to silence the compiler about it. However, is there any good reason why this shouldn't return the error? At least logging such an event would be a better solution than simply failing silently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error is already logged by signal_used_queue().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not referring to logging the error but to propagating it. Right now the unwrap_or_default is silencing it in case it returns error.
Is that what you want?
The net device is propagating it: https://github.com/firecracker-microvm/firecracker/blob/master/devices/src/virtio/net.rs#L581.

assert_eq!(data, [0u8, 1, 2, 3, 4, 5, 6, 7]);

// Just covering lines here.
ctx.device.write_config(0, &data[..4]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd actually check that read_config returns the same value after calling write_config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vsock device has no writable config data. The only config available is the device CID, and that's not writable by the guest driver. However, the VirtioDevice trait does require a write_config() function, hence the "just covering lines" comment. I've amended the comment to make this more clear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, yes, no writable config data for vsock which is exactly what I was implying with my comment. Since the write_config is supposed to not change the config, I would do:

x = read_config;
write_config(smth);
y = read_config;
assert_eq(x,y);

You are checking that write_config for the vsock device does not actually write the config data (which in this case consists only of the CID).

}

// Test a correct activation.
ctx.device
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these are unit tests why not use a explicit assert(i.e

assert!(ctx.device
            .activate(
                ctx.mem.clone(),
                EventFd::new().unwrap(),
                Arc::new(AtomicUsize::new(0)),
                vec![
                    VirtQueue::new(256),
                    VirtQueue::new(256),
                    VirtQueue::new(256),
                ],
                vec![
                    EventFd::new().unwrap(),
                    EventFd::new().unwrap(),
                    EventFd::new().unwrap(),
                ],
            )
            .is_ok());

)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really see how the two are different. Can you elaborate on why this is preferable, please?

Copy link
Contributor

@dianpopa dianpopa Aug 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to use asserts in unit tests for verbosity. See below.
Output when test fails and you use unwrap

---- virtio::vsock::device::tests::test_virtio_device stdout ----
thread 'virtio::vsock::device::tests::test_virtio_device' panicked at 'called `Result::unwrap()` on an `Err` value: BadActivate', src/libcore/result.rs:997:5

Output when test fails and you use assert!:

---- virtio::vsock::device::tests::test_virtio_device stdout ----
thread 'virtio::vsock::device::tests::test_virtio_device' panicked at 'assertion failed: ctx.device.activate(ctx.mem.clone(), EventFd::new().unwrap(),
                    Arc::new(AtomicUsize::new(0)),
                    vec![VirtQueue :: new ( 256 ) , VirtQueue :: new ( 256 ) ,
                         VirtQueue :: new ( 256 ) ,],
                    vec![EventFd :: new (  ) . unwrap (  ) , EventFd :: new (
                         ) . unwrap (  ) , EventFd :: new (  ) . unwrap (  )
                         ,]).is_ok()', devices/src/virtio/vsock/device.rs:301:9

However, this is a personal preference which I do not feel strongly about.

Copy link
Contributor

@alexandruag alexandruag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of nits I forgot about before. Looks good otherwise; needs rebasing.

vsock_id='vsock2',
guest_cid=16
guest_cid=15,
uds_path='vsock.sock'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v.sock sounds so much cooler. There are some other occurrences which could benefit from this change as well.

@@ -111,6 +119,20 @@ where
}
defs::BACKEND_EVENT => {
debug!("vsock: backend event");
// if let Some(evset) = epoll::Events::from_bits(evset_bits) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be a nice touch to remove these unused lines from the current commit as well.

dhrgit added 6 commits August 28, 2019 10:44
Prepared the way for the new AF_UNIX-backed vsock device by:
* removing old vhost code; and
* removing the "vsock" feature.

Signed-off-by: Dan Horobeanu <[email protected]>
Updated the devices crate to use Rust edition 2018 features. This was
done in preparation for the new vsock device, the code for which will
make use of some nifty 2018 features, such as non-lexical lifetimes.

Signed-off-by: Dan Horobeanu <[email protected]>
Added the skeleton for a new vsock device, that will use Unix sockets as
a backend, instead of vhost.

Signed-off-by: Dan Horobeanu <[email protected]>
Added the definition for a generic backend trait, that could be used to
possibly offload vsock traffic to something other than Unix sockets.

Signed-off-by: Dan Horobeanu <[email protected]>
Added code to handle reading and writing vsock packets from/to the
virtio queues.

Signed-off-by: Dan Horobeanu <[email protected]>
Added unit tests for:
- the vsock virtio device implementation
- the vsock epoll handler
- the vsock packet handling interface

Signed-off-by: Dan Horobeanu <[email protected]>
@dhrgit
Copy link
Contributor Author

dhrgit commented Sep 2, 2019

Closed as a subset of #1176 .

@dhrgit dhrgit closed this Sep 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants