Percpu variable in sv6
sv6是mit的pdos组在xv6基础上开发的一个research OS,加入了refcache、radixvm、rcu等一系列高级特性,能在multicore的平台上达到线性scalability。为了实现这些复杂的数据结构,作者Austin真是无所不用其极,用c++重写了xv6几乎所有的代码,并用到了c++很多晦涩的为人诟病的语法特性,让人看得是云里雾里。不过其中Percpu variable的实现还是非常精妙的,这也是本文要介绍的主题。
关于Percpu variable的好处意义这里就不多介绍了,Linux中Percpu variable的实现需要借助一堆的宏,这是因为c本来语法规则就非常简单,可以利用的空间不多,现在用了c++,我们就可以做到使用Percpu variable对用户完全透明,以下就来介绍sv6中Percpu variable的实现。
// This defines CPU 0's instance of this percpu variable and the
// static_percpu wrapper for accessing it. The gunk after the section
// makes .percpu a BSS-like section (allocated, writable, but with no
// data in the object). It also create an initializer function and
// records a pointer to it in the .percpuinit_array section.
#define DEFINE_PERCPU(type, name, ...) \
DEFINE_PERCPU_NOINIT(type, name, ##__VA_ARGS__); \
static void __##name##_init(size_t c) { new(&name[c]) type(); } \
static void (*__##name##_initp)(size_t) \
__attribute__((section(".percpuinit_array"),used)) = __##name##_init;
// This is like DEFINE_PERCPU, but doesn't call the class's
// constructor. This is for special cases like cpus that must be
// initialized remotely.
#define DEFINE_PERCPU_NOINIT(type, name, ...) \
type __##name##_key __attribute__((__section__(".percpu,\"aw\",@nobits#"))); \
static_percpu<type, &__##name##_key, ##__VA_ARGS__> name
#define DECLARE_PERCPU(type, name, ...) \
extern type __##name##_key; \
extern static_percpu<type, &__##name##_key, ##__VA_ARGS__> name
在DEFINE_PERCPU_NOINIT中首先定义了类型为type的__##name##_key变量,并将其放到了.percpu这个section当中,随后又实例化了一个static_percpu的模板类,将type和__##name##_key变量的指针传了进去。我们知道c++中类的实例变量是需要调用构造函数初始化的(不像c中的memset一遍就可以),为了在系统启动时能及时初始化这些Percpu variable,我们还必须一一记录下这些Percpu variable的初始化函数,这一点是通过DEFINE_PERCPU宏下半部分来完成的,在__##name##_init函数中用&name[c]来取得第c个core所对应的Percpu variable的地址,并用placement new来完成初始化,最后将这个__##name##_init函数的地址也放到一个叫做.percpuinit_array的特殊section中。
接下来我们再来看看static_percpu这个模板类的实现,这也是sv6中Percpu variable实现的精妙所在。
// A per-CPU variable of type T at location 'key' in CPU 0's per-CPU
// region. For debugging, CM specifies how interruptable the current
// context can be when accessing this per-CPU variable. It defaults
// to requiring the preemption be disabled to prevent the accessing
// thread from migrating, but for per-CPU variables that may be
// accessed by interrupt handlers, it should be set to NO_INT.
template<class T, T *key, critical_mask CM = NO_SCHED>
struct static_percpu
{
constexpr static_percpu() = default;
static_percpu(const static_percpu &o) = delete;
static_percpu(static_percpu &&o) = delete;
static_percpu &operator=(const static_percpu &o) = delete;
static_percpu &operator=(static_percpu &&o) = delete;
T* get_unchecked() const
{
uintptr_t val;
// The per-CPU memory offset is stored at %gs:32.
// XXX Having to subtract __percpu_start makes this several
// instructions longer than strictly necessary. Alternatively, we
// could locate .percpu at address 0 and use the key as a direct
// offset.
__asm("add %%gs:32, %0" : "=r" (val) : "0" ((char*)key - __percpu_start));
return (T*)val;
}
T* get() const
{
#if DEBUG
assert(check_critical(CM));
#endif
return get_unchecked();
}
T* operator->() const
{
return get();
}
T& operator*() const
{
return *get();
}
T& operator[](int id) const
{
assert(percpu_offsets[id]);
return *(T*)((char*)percpu_offsets[id] + ((char*)key - __percpu_start));
}
};
可以看到这个模板类中重载了operator ->以及operator *和operator [],这样的话,再写name->field的时候,实际上调用的就是get_unchecked()函数,get_unchecked()函数通过%%gs段寄存器访问这个core的struct cpu结构体,根据偏移取得这个core的Percpu variable数据区的地址,再加上core 0上要访问的变量相当于Percpu variable数据区首地址的偏移,就得到了要访问变量的实际地址。同样的如果要访问第n个core的Percpu variable只需要这么写,name[n].field即可,做到了使用Percpu variable对用户的完全透明。
关于Percpu variable的初始化,在mpboot函数中有这么一段内容,相信很容易就能看懂。
// Call per-CPU static initializers. This is the per-CPU equivalent
// of the init_array calls in cmain.
extern void (*__percpuinit_array_start[])(size_t);
extern void (*__percpuinit_array_end[])(size_t);
for (size_t i = 0; i < __percpuinit_array_end - __percpuinit_array_start; i++)
(*__percpuinit_array_start[i])(bcpuid);
blog comments powered by Disqus