源代码:linux-3.16.37-git, qemu-v2.7

1.vm启动时qemu的代码

virtio block的qemu cmd:

qemu中virtio blk代码所在的重点文件如下:

hw\virtio\virtio.c

hw\virtio\virtio-bus.c

hw\virtio\virtio-rng.c

hw\block\virtio-blk.c

hw\net\virtio-net.c

在type_initialize过程中,virtio的设备都会初始化一遍,有virtio_device_class_init,virtio_rng_class_init,virtio_bus_class_init,virtio_pci_bus_class_init,virtio_blk_class_init。

gdb抓取的信息如下:

#0 0x0000555555804ffc in virtio_device_class_init (klass=0x5555566f0090, data=0x0)

at /home/oenhan/workspace/src/qemu-v2.7.0/hw/virtio/virtio.c:1968

#1 0x0000555555b1e542 in type_initialize (ti=0x5555566c9060) at qom/object.c:328

#2 0x0000555555b1e2ba in type_initialize (ti=0x5555566c5b80) at qom/object.c:280

#3 0x0000555555b1f6f6 in object_class_foreach_tramp (key=0x5555566a9710, value=0x5555566c5b80, opaque=0x7fffffffd870) at qom/object.c:798

#4 0x00007ffff2bd43d0 in g_hash_table_foreach () at /lib64/libglib-2.0.so.0

#5 0x0000555555b1f7cd in object_class_foreach (fn=

0x555555b1f922 <object_class_get_list_tramp>, implements_type=0x555555c4816e "machine", include_abstract=false, opaque=0x7fffffffd8c0)

at qom/object.c:820

#6 0x0000555555b1f99d in object_class_get_list (implements_type=0x555555c4816e "machine", include_abstract=false) at qom/object.c:874

#7 0x00005555558bdf1c in find_default_machine () at vl.c:1470

#8 0x00005555558c20bf in select_machine () at vl.c:2732

#9 0x00005555558c2bc3 in main (argc=12, argv=0x7fffffffdd38, envp=0x7fffffffdda0) at vl.c:3986

对于virtio blk而言,重点看一下virtio_blk_device_realize函数,

回到virtio_blk_device_realize,virtio_add_queue_aio负责初始化vq结构体

此时需要注意到vq的handle_output设置为virtio_blk_handle_output。

下面是VirtIOBlock的解释:

2.guest kernel中virtio modules处理

virtio pv的代码在virtio_blk.c文件,初始化init就三个函数:

重点在virtio_blk,当然driver->driver.bus = &virtio_bus需要留意,直接看virtio_blk,当kernel 探测到设备时调用virtblk_probe:

再看init_vq,到virtio_find_single_vq,vdev->config->find_vqs(vdev, 1, &vq, callbacks, names)在virtio_pci_config_ops对象中,即从vp_find_vqs到vp_try_to_find_vqs,注意此处传进来的callback是virtblk_done。在vp_try_to_find_vqs中:

在setup_vq中

在guest kernel block层提交,submit_bio,使用generic_make_request提交bio

回头看virtblk_probe函数,vblk->disk->queue = blk_mq_init_queue(&vblk->tag_set)

blk_mq_init_queue调用blk_queue_make_request给q->make_request_fn赋值为blk_mq_make_request,在blk_mq_make_request中有q->mq_ops->queue_rq(data.hctx, rq),即virtio_mq_ops.queue_rq=virtio_queue_rq。

在virtio_queue_rq中:

3.qemu对virtio vring的处理

在virtio_blk_device_realize中,最终调用virtio_add_queue_internal,vdev->vq[i].handle_output = handle_output。而当iowrite到qemu实现是virtio_pci_config_write,在virtio_ioport_write中,当满足VIRTIO_PCI_QUEUE_NOTIFY后执行virtio_queue_notify,然后virtio_queue_notify_vq,最终执行vq->handle_output即virtio_blk_handle_output。

在virtio_blk_handle_output中,blk_data_plane机制和普通的统一线程差别不是太大,不在单独表述,在virtio_blk_handle_vq中,virtio_blk_get_request通过virtqueue_pop获取req,其中gpa到hva的转换在virtqueue_map_desc函数完成,virtio_blk_handle_request拿到req后执行virtio_blk_submit_multireq,即是跳出这个函数外面也是一个virtio_blk_submit_multireq。

virtio_blk_submit_multireq调用submit_requests:

以blk_aio_preadv为例:

在blk_aio_prwv中,另外创建线程执行blk_aio_read_entry,即blk_co_preadv,又bdrv_co_preadv,最终到bdrv_aligned_preadv,然后在bdrv_driver_preadv有:

drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov)

此处就算完成了整个IO过程

剩下的就是qcow2到raw的不同kvm img进行解析了,参考gdb bt栈

#0 0x0000555555b83b05 in raw_co_preadv (bs=0x5555570ee4f0, offset=1175007232, bytes=28672, qiov=0x55555b66d330, flags=0) at block/raw-posix.c:1274

#1 0x0000555555b8cb0b in bdrv_driver_preadv (bs=0x5555570ee4f0, offset=1175007232, bytes=28672, qiov=0x55555b66d330, flags=0) at block/io.c:815

#2 0x0000555555b8d441 in bdrv_aligned_preadv (bs=0x5555570ee4f0, req=0x55555b66d240, offset=1175007232, bytes=28672, align=1, qiov=0x55555b66d330, flags=0) at block/io.c:1039

#3 0x0000555555b8d92b in bdrv_co_preadv (child=0x55555708b4a0, offset=1175007232, bytes=28672, qiov=0x55555b66d330, flags=0) at block/io.c:1131

#4 0x0000555555b54c08 in qcow2_co_preadv (bs=0x5555570e8250, offset=1174548480, bytes=28672, qiov=0x55555b66d660, flags=0) at block/qcow2.c:1509

#5 0x0000555555b8cb0b in bdrv_driver_preadv (bs=0x5555570e8250, offset=1174548480, bytes=28672, qiov=0x55555b66d660, flags=0) at block/io.c:815

#6 0x0000555555b8d441 in bdrv_aligned_preadv (bs=0x5555570e8250, req=0x55555b66d550, offset=1174548480, bytes=28672, align=1, qiov=0x55555b66d660, flags=0) at block/io.c:1039

#7 0x0000555555b8d92b in bdrv_co_preadv (child=0x5555570c61f0, offset=1174548480, bytes=28672, qiov=0x55555b66d660, flags=0) at block/io.c:1131

#8 0x0000555555b549cb in qcow2_co_preadv (bs=0x5555570b9c70, offset=1174548480, bytes=28672, qiov=0x555558153fb0, flags=0) at block/qcow2.c:1446

#9 0x0000555555b8cb0b in bdrv_driver_preadv (bs=0x5555570b9c70, offset=1174548480, bytes=28672, qiov=0x555558153fb0, flags=0) at block/io.c:815

对于qcow2如果有backing file,qcow2_co_preadv第一遍失败,大递归,在执行qcow2_co_preadv,然后解析成raw,使用raw_co_prw。

然后从raw_co_prw到paio_submit_co:

thread_pool_submit_co(pool, aio_worker, acb)

aio_worker调用handle_aiocb_rw,继续handle_aiocb_rw_linear

最终完成终点。

关于读写request的设定

在guest中:

写磁盘的head放在第一个out_iovc中,读磁盘的head放在最后一个in_iovc中.

vring满就好kick到qemu,一次pop全出。

4.guest区分读写request

1.virtio_queue_rq根据req->cmd_type区分,在vbr->out_hdr.type中添加VIRTIO_BLK_T_IN, VIRTIO_BLK_T_OUT, VIRTIO_BLK_T_SCSI_CMD, VIRTIO_BLK_T_FLUSH, VIRTIO_BLK_T_GET_ID,每一个req都有自己的目的,其中IN/OUT优先级最大,可以覆盖其他。根据vbr->out_hdr.type,读写类型让num_out和num_in继承,

通过

通过sgs遍历,读写类型由desc[i].flags根据VRING_DESC_F_WRITE继承。

有具体数据的情况下,读写在req中是对立的,但是对于全写的req也会有些非数据的cmd读取。

所以在qemu virtio block中,有读写element,理论上virtqueue_pop中的读写不会都大于1,gdb测试结果也是如此


KVM virtIO block源代码分析来自于OenHan

链接为:http://oenhan.com/kvm-virtio-block-src

6 对 “KVM virtIO block源代码分析”的想法;

  1. 请问,guestOS中的virtio虚拟总线是什么时候注册的?我知道是通过bus_unregister(&virtio_bus);这个函数注册。但是不太了解是在哪个流程里面注册的virtio_bus.谢谢。

  2. 上条评论有误:应该是通过bus_register(&virtio_bus)注册,而不是bus_unregister(&virtio_bus);。我看到代码里面直接通过dev->dev.bus = &virtio_bus将virtio设备注册到virtio总线上了。但是没有看到向注册virtio虚拟总线是在哪个流程里面。是不需要注册virtio总线吗?谢谢。

发表评论