ivanjobs.github.io - Ceph源码解析(3)-rados put过程探究
Ceph源码解析(3)-rados put过程探究 | Ivan的博客

Search Preview

Ceph源码解析(3)-rados put过程探究 | Ivan的博客

ivanjobs.github.io
之前写过一篇源码解析，分析了object到PG一层的映射关系，其中关键的函数为ceph_stable_mod。但是对于PG到OSDs这层映射却没有提及，而这一层映射是CRUSH算法最核心的地方，对应到OSDMap.cc里的 _pg_to_osds函数。代码如下： ``` int OSDMap::_pg_to_osds(const pg_pool_t& pool, pg_t pg, vector *osds, int *primary, ps_t *ppps) const { // map to osds[] ps_t pps = pool.raw_pg_to_pps(pg); // placement ps unsigned size = pool.get_size();
.io > ivanjobs.github.io

SEO audit: Content analysis

Language

Error! No language localisation is found.

Title

Ceph源码解析(3)-rados put过程探究 | Ivan的博客

Text / HTML ratio

51 %

Frame

Excellent! The website does not use iFrame solutions.

Flash

Excellent! The website does not have any flash contents.

Keywords cloud

= int curstep>arg1 Ceph numrep break case < test demoubuntu~cephsrc >= Mesos Python step wsize choose_leaf_tries osds scratch result_max result_len

Keywords consistency

Keyword	Content	Title	Description	Headings
=	70
int	37
curstep>arg1	15
Ceph	14
numrep	13
break	11

Headings

H1	H2	H3	H4	H5	H6
1	0	0	0	0	0

Images

We found 1 images on this web page.

SEO Keywords (Single)

Keyword	Occurrence	Density
=	70	3.50 %
int	37	1.85 %
curstep>arg1	15	0.75 %
Ceph	14	0.70 %
numrep	13	0.65 %
break	11	0.55 %
case	11	0.55 %
<	10	0.50 %
test	9	0.45 %
demoubuntu~cephsrc	8	0.40 %
>=	8	0.40 %
Mesos	7	0.35 %
Python	7	0.35 %
step	7	0.35 %
wsize	7	0.35 %
choose_leaf_tries	6	0.30 %
osds	6	0.30 %
scratch	6	0.30 %
result_max	5	0.25 %
result_len	5	0.25 %

SEO Keywords (Two Word)

Keyword	Occurrence	Density
break case	7	0.35 %
= curstep>arg1	7	0.35 %
curstep>arg1 break	6	0.30 %
= 1	6	0.30 %
if curstep>arg1	6	0.30 %
curstep>arg1 >=	4	0.20 %
numrep =	4	0.20 %
wsize =	4	0.20 %
= i	3	0.15 %
p test	3	0.15 %
i <	3	0.15 %
numrep <=	3	0.15 %
put test	3	0.15 %
IvanJobs calling	3	0.15 %
calling _pg_to_osds	3	0.15 %
= scratch	3	0.15 %
i =	3	0.15 %
w =	3	0.15 %
recurse_tries =	3	0.15 %
demoubuntu~cephsrc rados	3	0.15 %

SEO Keywords (Three Word)

Keyword	Occurrence	Density	Possible Spam
curstep>arg1 break case	6	0.30 %	No
= curstep>arg1 break	5	0.25 %	No
if curstep>arg1 >=	4	0.20 %	No
up 100000 100000	3	0.15 %	No
test put test	3	0.15 %	No
i = i	3	0.15 %	No
IvanJobs calling _pg_to_osds	3	0.15 %	No
demoubuntu~cephsrc rados p	3	0.15 %	No
rados p test	3	0.15 %	No
= i <	3	0.15 %	No
p test put	3	0.15 %	No
demoubuntu~cephsrc echo test	2	0.10 %	No
curstep>arg2 oosize j	2	0.10 %	No
weight weight_max x	2	0.10 %	No
numrep curstep>arg2 oosize	2	0.10 %	No
PYTHONPATH and LD_LIBRARY_PATH	2	0.10 %	No
map>buckets weight weight_max	2	0.10 %	No
map map>buckets weight	2	0.10 %	No
if curstep>arg1 >	2	0.10 %	No
= 1 else	2	0.10 %	No

SEO Keywords (Four Word)

Keyword	Occurrence	Density	Possible Spam
= curstep>arg1 break case	5	0.25 %	No
p test put test	3	0.15 %	No
rados p test put	3	0.15 %	No
i = i <	3	0.15 %	No
demoubuntu~cephsrc rados p test	3	0.15 %	No
numrep curstep>arg2 oosize j	2	0.10 %	No
setting PATH PYTHONPATH and	2	0.10 %	No
put test tmptest IvanJobs	2	0.10 %	No
test put test tmptest	2	0.10 %	No
map>buckets weight weight_max x	2	0.10 %	No
map map>buckets weight weight_max	2	0.10 %	No
PATH PYTHONPATH and LD_LIBRARY_PATH	2	0.10 %	No
MODE setting PATH PYTHONPATH	2	0.10 %	No
test tmptest IvanJobs calling	2	0.10 %	No
DEVELOPER MODE setting PATH	2	0.10 %	No
for i = i	2	0.10 %	No
= i < wsize	2	0.10 %	No
tmptest IvanJobs calling _pg_to_osds	2	0.10 %	No
wsize = 1 else	1	0.05 %	No
curstep>arg1 wsize = 1	1	0.05 %	No

Internal links in - ivanjobs.github.io

开始使用gtest
开始使用gtest | Ivan的博客

寻找正确的语义[比赛总结]
寻找正确的语义[比赛总结] | Ivan的博客

score_thresholder服务开发总结
score_thresholder服务开发总结 | Ivan的博客

Debug CPP Program On Ubuntu
Debug CPP Program On Ubuntu | Ivan的博客

Modern CPP Developer Need To Know
Modern CPP Developer Need To Know | Ivan的博客

汇编语言学习笔记
汇编语言学习笔记 | Ivan的博客

Mesos Quota 和 Reservation
Mesos Quota 和 Reservation | Ivan的博客

libprocess学习笔记
libprocess学习笔记 | Ivan的博客

Consul使用笔记
Consul使用笔记 | Ivan的博客

SSH重新学习
SSH重新学习 | Ivan的博客

Protocol buffers 代码入门
Protocol buffers 代码入门 | Ivan的博客

Mesos Slave 如何上报资源？
Mesos Slave 如何上报资源？ | Ivan的博客

Object Locator (Ceph) 探究笔记
Object Locator (Ceph) 探究笔记 | Ivan的博客

librados接口使用
librados接口使用 | Ivan的博客

Ceph RGW Pools 浅析
Ceph RGW Pools 浅析 | Ivan的博客

在单机上搭建多Ceph集群
在单机上搭建多Ceph集群 | Ivan的博客

Dockerfile中RUN/CMD/ENTRYPOINT的区分
Dockerfile中RUN/CMD/ENTRYPOINT的区分 | Ivan的博客

strace使用入门
strace使用入门 | Ivan的博客

Haystack论文学习笔记
Haystack论文学习笔记 | Ivan的博客

Mesos关联配置
Mesos关联配置 | Ivan的博客

ZooKeeper概览
ZooKeeper概览 | Ivan的博客

Ceph故障解析-filestore_merge_threshold
Ceph故障解析-filestore_merge_threshold | Ivan的博客

基于laravel+mysql的容器化DAL方案
基于laravel+mysql的容器化DAL方案 | Ivan的博客

vuejs使用小结1
vuejs使用小结1 | Ivan的博客

Ceph新技能Get
Ceph新技能Get | Ivan的博客

Ceph v10.2.3 RGW源码解析2
Ceph v10.2.3 RGW源码解析2 | Ivan的博客

Ceph v10.2.3 RGW源码解析1
Ceph v10.2.3 RGW源码解析1 | Ivan的博客

s3cmd使用说明
s3cmd使用说明 | Ivan的博客

vuejs工具链简介
vuejs工具链简介 | Ivan的博客

requirejs简介
requirejs简介 | Ivan的博客

可编程自动化输入方案(Mac下)
可编程自动化输入方案(Mac下) | Ivan的博客

Mesos Supress/Revive Offers测试
Mesos Supress/Revive Offers测试 | Ivan的博客

Mesos Offer生命周期杂记
Mesos Offer生命周期杂记 | Ivan的博客

Mesos Agent Containerizer分析
Mesos Agent Containerizer分析 | Ivan的博客

get started with createjs chapter 1 notes
get started with createjs chapter 1 notes | Ivan的博客

mesos agent /monitor/statistics返回数据业务意义
mesos agent /monitor/statistics返回数据业务意义 | Ivan的博客

mesos master/messages_deactivate_frameworks 不生效？
mesos master/messages_deactivate_frameworks 不生效？ | Ivan的博客

KMP算法杂谈
KMP算法杂谈 | Ivan的博客

Mesos配置项深入分析
Mesos配置项深入分析 | Ivan的博客

mesos-master replicated_log存的是什么？
mesos-master replicated_log存的是什么？ | Ivan的博客

mesos disk usage vs df 结果不一致问题
mesos disk usage vs df 结果不一致问题 | Ivan的博客

Mesos GC原理解析
Mesos GC原理解析 | Ivan的博客

准备mesos单机版开发测试环境
准备mesos单机版开发测试环境 | Ivan的博客

Mesos 1.0.0 源码解析杂记
Mesos 1.0.0 源码解析杂记 | Ivan的博客

stout学习笔记
stout学习笔记 | Ivan的博客

gflags学习笔记
gflags学习笔记 | Ivan的博客

ceph fuse挂载cephfs, ls不出文件列表问题，调试记录
ceph fuse挂载cephfs, ls不出文件列表问题，调试记录 | Ivan的博客

Ceph源码解析(3)-rados put过程探究
Ceph源码解析(3)-rados put过程探究 | Ivan的博客

Hub,Bridge,Switch和Gateway是什么？
Hub,Bridge,Switch和Gateway是什么？ | Ivan的博客

数论学习笔记
数论学习笔记 | Ivan的博客

二分图专题解析
二分图专题解析 | Ivan的博客

Ceph Cluster调优日志
Ceph Cluster调优日志 | Ivan的博客

boost库的智能指针
boost库的智能指针 | Ivan的博客

Linux命令使用记录
Linux命令使用记录 | Ivan的博客

Vim Cheat Sheet
Vim Cheat Sheet | Ivan的博客

原码、反码、补码笔记
原码、反码、补码笔记 | Ivan的博客

ceph-deploy 配置文件比较 BUG
ceph-deploy 配置文件比较 BUG | Ivan的博客

Ceph源码解析(2)-rados put过程探究
Ceph源码解析(2)-rados put过程探究 | Ivan的博客

Ceph Release 概述
Ceph Release 概述 | Ivan的博客

Ceph CRUSH Map 维护详解
Ceph CRUSH Map 维护详解 | Ivan的博客

题解[第二周]
题解[第二周] | Ivan的博客

MathQuill Math Equation Cheatsheet
MathQuill Math Equation Cheatsheet | Ivan的博客

题解[第一周]
题解[第一周] | Ivan的博客

Ceph集群运维问题记录
Ceph集群运维问题记录 | Ivan的博客

linux man高级技巧
linux man高级技巧 | Ivan的博客

Git 我错了！
Git 我错了！ | Ivan的博客

Ceph源码解析(1)-Create Pool过程探究
Ceph源码解析(1)-Create Pool过程探究 | Ivan的博客

准备Ceph开发环境
准备Ceph开发环境 | Ivan的博客

Ceph:Too Many PGs Per OSD
Ceph:Too Many PGs Per OSD | Ivan的博客

UVA 11292 题解
UVA 11292 题解 | Ivan的博客

Ceph RBD 文件映射实验笔记
Ceph RBD 文件映射实验笔记 | Ivan的博客

硬盘分区
硬盘分区 | Ivan的博客

硬盘模型
硬盘模型 | Ivan的博客

Ceph配置项
Ceph配置项 | Ivan的博客

OSTEP 文件系统实现
OSTEP 文件系统实现 | Ivan的博客

在Ceph底层xfs上找到你上传的文件
在Ceph底层xfs上找到你上传的文件 | Ivan的博客

使用s3cmd操作ceph rgw
使用s3cmd操作ceph rgw | Ivan的博客

Ceph核心概念备忘录
Ceph核心概念备忘录 | Ivan的博客

COSBench使用笔记
COSBench使用笔记 | Ivan的博客

使用saltstack部署运维ceph集群笔记
使用saltstack部署运维ceph集群笔记 | Ivan的博客

如何使用salt states?
如何使用salt states? | Ivan的博客

ceph-deploy命令详解
ceph-deploy命令详解 | Ivan的博客

dd笔记
dd笔记 | Ivan的博客

DTrace是什么？
DTrace是什么？ | Ivan的博客

Ceph Cache Tier笔记
Ceph Cache Tier笔记 | Ivan的博客

Linux下理解filesystem/device/mount等概念
Linux下理解filesystem/device/mount等概念 | Ivan的博客

Base64编码详解与应用
Base64编码详解与应用 | Ivan的博客

URLEncoder学习笔记
URLEncoder学习笔记 | Ivan的博客

Ceph论文阅读笔记
Ceph论文阅读笔记 | Ivan的博客

使用Python inotify监控文件变化
使用Python inotify监控文件变化 | Ivan的博客

Git命令Snippets
Git命令Snippets | Ivan的博客

Ivanjobs.github.io Spined HTML

Ceph源码解析(3)-rados put过程探究 | Ivan的博客最新文章 dev ops math algorithm personal 开始使用gtest 2018书单课单 2017年总结/2018年展望寻找正确的语义[比赛总结] score_thresholder服务开发总结 Debug CPP Program On Ubuntu Modern CPP Developer Need To Know 汇编语言学习笔记 Mesos Quota 和 Reservation libprocess学习笔记 Consul使用笔记 SSH重新学习 Protocol buffers 代码入门 Mesos Slave 如何上报资源？ Object Locator (Ceph) 探究笔记 librados接口使用 Ceph RGW Pools 浅析在单机上搭建多Ceph集群 2016年总结/2017年展望 Dockerfile中RUN/CMD/ENTRYPOINT的区分 strace使用入门 Haystack论文学习笔记 Mesos关联配置 ZooKeeper概览 Ceph故障解析-filestore_merge_threshold 基于laravel+mysql的容器化DAL方案 vuejs使用小结1 Ceph新技能Get Ceph v10.2.3 RGW源码解析2 Ceph v10.2.3 RGW源码解析1 s3cmd使用说明 vuejs工具链简介 requirejs简介 mesos maintenance深度解析可编程自动化输入方案(Mac下) Mesos Supress/Revive Offers测试 Mesos Offer生命周期杂记 MesosWage-earnerContainerizer分析 get started with createjs installment 1 notes mesos wage-earner /monitor/statistics返回数据业务意义 mesos master/messages_deactivate_frameworks 不生效？ mesos /flags 403 forbidden? KMP算法杂谈 Mesos配置项深入分析 mesos-master replicated_log存的是什么？ mesos disk usage vs df 结果不一致问题 Mesos GC原理解析准备mesos单机版开发测试环境 Mesos 1.0.0 源码解析杂记 stout学习笔记 gflags学习笔记 ceph fuse挂载cephfs, ls不出文件列表问题，调试记录 Ceph源码解析(3)-rados put过程探究 Hub,Bridge,Switch和Gateway是什么？数论学习笔记二分图专题解析 Ceph Cluster调优日志 boost库的智能指针 Linux命令使用记录 Vim Cheat Sheet 原码、反码、补码笔记 ceph-deploy 配置文件比较 BUG Ceph源码解析(2)-rados put过程探究 Ceph Release 概述 Ceph CRUSH Map 维护详解题解[第二周] MathQuill Math Equation Cheatsheet 题解[第一周] Ceph集群运维问题记录 linux man高级技巧 Git 我错了！ Ceph源码解析(1)-Create Pool过程探究准备Ceph开发环境 Ceph:Too Many PGs Per OSD UVA 11292 题解 Docker Private Registry(Ceph Swift) 搭建笔记 Ceph RBD 文件映射实验笔记硬盘分区硬盘模型 Ceph配置项 OSTEP 文件系统实现在Ceph底层xfs上找到你上传的文件使用s3cmd操作ceph rgw Ceph核心概念备忘录 COSBench使用笔记 GCJ2015 Qualification Round-B题解使用saltstack部署运维ceph集群笔记如何使用salt states? ceph-deploy命令详解 dd笔记 DTrace是什么？ Ceph Cache Tier笔记 Linux下理解filesystem/device/mount等概念 Base64编码详解与应用 URLEncoder学习笔记 Ceph论文阅读笔记使用Python inotify监控文件变化 Git命令Snippets 使用Nginx做LB MathQuill学习笔记 Docker化Laravel开发环境 Ceph Pool PG配置说明 Ceph 笔记 Ceph源码分析 Latex数学符号为Ceph OSS服务搭建LB Ceph RGW S3接口测试：诡异的403 AccessDenied问题访问Ceph RGW失败 403 Forbidden问题解决历程 Ceph RADOS论文研读笔记 Ceph源码分析：从一个REST请求，到OSD存储。各种开源代码协议简述 OpenStack Projects简述 OpenStack Ceilometer 笔记 RabbitMQ 和 oslo.messaging Ceph Rest API 身份验证方式(S3) tcpdump笔记 Ceph集群部署笔记 Python PEP8规范笔记 Python Decorator(装饰器)模式笔记 libvirt笔记 OpenStack oslo 概览 OpenStack KeyStone API http://localhost:5000/ 源码追踪 Python pdb笔记 zero length variety in a struct Jenkins' Hash Functions NTP部署笔记 Linux iptables笔记 Python Paste笔记 Python PasteDeploy笔记 Python eventlet笔记使用curl测试RESTful接口 ubuntu14.04下安装devstack devstack 安装指南【最简单】 Docker操作记录 git merge 详解 Python 包管理详解阿里云服务器设置swapfile的方法 shell脚本编写向导搭建Laravel全栈开发环境 2016 July 13 Ceph源码解析(3)-rados put过程探究之前写过一篇源码解析，分析了object到PG一层的映射关系，其中关键的函数为ceph_stable_mod。但是对于PG到OSDs这层映射却没有提及，而这一层映射是CRUSH算法最核心的地方，对应到OSDMap.cc里的 _pg_to_osds函数。代码如下： int OSDMap::_pg_to_osds(const pg_pool_t& pool, pg_t pg, vector<int> *osds, int *primary, ps_t *ppps) const { // map to osds[] ps_t pps = pool.raw_pg_to_pps(pg); // placement ps unsigned size = pool.get_size(); // what crush rule? int ruleno = crush->find_rule(pool.get_crush_ruleset(), pool.get_type(), size); if (ruleno >= 0) crush->do_rule(ruleno, pps, *osds, size, osd_weight); _remove_nonexistent_osds(pool, *osds); *primary = -1; for (unsigned i = 0; i < osds->size(); ++i) { if ((*osds)[i] != CRUSH_ITEM_NONE) { *primary = (*osds)[i]; break; } } if (ppps) *ppps = pps; return osds->size(); } 可以从代码看到基本逻辑“找到对应crush rule，do rule，遍历OSDs返回第一个不是CRUSH_ITEM_NONE的osd作为Primary”。本来想利用ldout来打印log，但发现ldout依赖于传入的cct，于是直接使用cout。重新编译 ceph源码，创建一个新的pool，并且上传一个新的object，日志如下： demo@ubuntu:~/ceph/src$ reverberate "test" > /tmp/test demo@ubuntu:~/ceph/src$ ./rados -p test put test /tmp/test IvanJobs: calling _pg_to_osds... 到这里可以得出一个结论，在上传object的时候才会产生PG到OSDs的映射调用，那么另外一个问题来了，如果两个object映射到一个PG，PG到OSDs的映射已经做了一次，是不是就不做了呢？我们以相同key，上传一个不同的值： demo@ubuntu:~/ceph/src$ reverberate "test2" > /tmp/test2 demo@ubuntu:~/ceph/src$ ./rados -p test put test /tmp/test2 IvanJobs: calling _pg_to_osds... 看来还是会调用一次。这个时候，我们需要更多的信息。具体cout代码就不贴了，直接看一下控制台输出： demo@ubuntu:~/ceph/src$ ./ceph osd pool create test 8 *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** pool 'test' created demo@ubuntu:~/ceph/src$ reverberate "test" >/tmp/test demo@ubuntu:~/ceph/src$ ./rados -p test put test /tmp/test IvanJobs: calling _pg_to_osds... pg.m_pool:11 pg.m_seed:1088989877 pg.m_prefered:-1 1 2 0 pool.type: pool.size: pool.min_size: pool.crush_ruleset: pool.object_hash: demo@ubuntu:~/ceph/src$ ./ceph osd tree *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 3.00000 root default -2 3.00000 host ubuntu 0 1.00000 osd.0 up 1.00000 1.00000 1 1.00000 osd.1 up 1.00000 1.00000 2 1.00000 osd.2 up 1.00000 1.00000 以上仅供参考，本来以为_pg_to_osds对应的应该是pg_to_osds, 但是发现源码里调用pg_to_osds的地方，都不大可能是rados put的地方。所以需要参考其他调用_pg_to_osds的地方，有pg_to_raw_up,_pg_to_up_acting_osds,通过打log判断，究竟是调用了哪个函数。得到结论，调用的是_pg_to_up_acting_osds。 _pg_to_up_acting_osds是被pg_to_up_acting_osds调用，我们接着寻找pg_to_up_acting_osds是被谁调用的？ grep一下，好多地方都用到了pg_to_up_acting_osds。究竟是哪个呢？没有办法，只能一个个打log了（在不熟悉代码的前提下）。经过log的打印，发现osdc/Objecter.cc里面的_calc_target会调用pg_to_up_acting_osds。那么我们继续追溯_calc_target的调用方，我们知道Objecter是Client端最底层使用的对象，用来和OSD沟通的实例。调试发现_calc_target是在mon/PGMonitor.cc里的map_pg_creates方法中被调用的。在调试的过程中发现，ceph默认的日志等级使得输出的日志较多，为了调试方便，把所有日志等级修改为0，这样使用0级日志，就可以仅仅看到自己编写的日志输出了。通过调试发现，map_pg_creates是在update_from_paxos调用时触发的，而update_from_paxos是paxos算法相关的实现。也就是说，rados put的过程，牵扯了PaxOs。具体的，update_from_paxos在整个源码中唯一被调用的地方是PaxosService这个类中，而其他的类是继承了PaxosService，并且重写了该方法。到这里为止，没法继续往上追溯了，只能等到理解了Paxos算法再说。但是，我们可以换个方向，看一下do_rule是一个怎样的逻辑。 do_rule来自于CrushWrapper: void do_rule(int rule, int x, vector<int>& out, int maxout, const vector<__u32>& weight) const { Mutex::Locker l(mapper_lock); int rawout[maxout]; int scratch[maxout * 3]; int numrep = crush_do_rule(crush, rule, x, rawout, maxout, &weight[0], weight.size(), scratch); if (numrep < 0) numrep = 0; out.resize(numrep); for (int i=0; i<numrep; i++) out[i] = rawout[i]; } 可以看到，实际上do_rule调用了crush_do_rule： int crush_do_rule(const struct crush_map *map, int ruleno, int x, int *result, int result_max, const __u32 *weight, int weight_max, int *scratch) { int result_len; int *a = scratch; int *b = scratch + result_max; int *c = scratch + result_max*2; int recurse_to_leaf; int *w; int wsize = 0; int *o; int osize; int *tmp; struct crush_rule *rule; __u32 step; int i, j; int numrep; int out_size; /* * the original choose_total_tries value was off by one (it * counted "retries" and not "tries"). add one. */ int choose_tries = map->choose_total_tries + 1; int choose_leaf_tries = 0; /* * the local tries values were counted as "retries", though, * and need no welding */ int choose_local_retries = map->choose_local_tries; int choose_local_fallback_retries = map->choose_local_fallback_tries; int vary_r = map->chooseleaf_vary_r; if ((__u32)ruleno >= map->max_rules) { dprintk(" bad ruleno %d\n", ruleno); return 0; } rule = map->rules[ruleno]; result_len = 0; w = a; o = b; for (step = 0; step < rule->len; step++) { int firstn = 0; struct crush_rule_step *curstep = &rule->steps[step]; switch (curstep->op) { specimen CRUSH_RULE_TAKE: if ((curstep->arg1 >= 0 && curstep->arg1 < map->max_devices) || (-1-curstep->arg1 >= 0 && -1-curstep->arg1 < map->max_buckets && map->buckets[-1-curstep->arg1])) { w[0] = curstep->arg1; wsize = 1; } else { dprintk(" bad take value %d\n", curstep->arg1); } break; specimen CRUSH_RULE_SET_CHOOSE_TRIES: if (curstep->arg1 > 0) choose_tries = curstep->arg1; break; specimen CRUSH_RULE_SET_CHOOSELEAF_TRIES: if (curstep->arg1 > 0) choose_leaf_tries = curstep->arg1; break; specimen CRUSH_RULE_SET_CHOOSE_LOCAL_TRIES: if (curstep->arg1 >= 0) choose_local_retries = curstep->arg1; break; specimen CRUSH_RULE_SET_CHOOSE_LOCAL_FALLBACK_TRIES: if (curstep->arg1 >= 0) choose_local_fallback_retries = curstep->arg1; break; specimen CRUSH_RULE_SET_CHOOSELEAF_VARY_R: if (curstep->arg1 >= 0) vary_r = curstep->arg1; break; specimen CRUSH_RULE_CHOOSELEAF_FIRSTN: specimen CRUSH_RULE_CHOOSE_FIRSTN: firstn = 1; /* fall through */ specimen CRUSH_RULE_CHOOSELEAF_INDEP: specimen CRUSH_RULE_CHOOSE_INDEP: if (wsize == 0) break; recurse_to_leaf = curstep->op == CRUSH_RULE_CHOOSELEAF_FIRSTN || curstep->op == CRUSH_RULE_CHOOSELEAF_INDEP; /* reset output */ osize = 0; for (i = 0; i < wsize; i++) { int bno; /* * see CRUSH_N, CRUSH_N_MINUS macros. * basically, numrep <= 0 ways relative to * the provided result_max */ numrep = curstep->arg1; if (numrep <= 0) { numrep += result_max; if (numrep <= 0) continue; } j = 0; /* make sure skillet id is valid */ bno = -1 - w[i]; if (bno < 0 || bno >= map->max_buckets) { // w[i] is probably CRUSH_ITEM_NONE dprintk(" bad w[i] %d\n", w[i]); continue; } if (firstn) { int recurse_tries; if (choose_leaf_tries) recurse_tries = choose_leaf_tries; else if (map->chooseleaf_descend_once) recurse_tries = 1; else recurse_tries = choose_tries; osize += crush_choose_firstn( map, map->buckets[bno], weight, weight_max, x, numrep, curstep->arg2, o+osize, j, result_max-osize, choose_tries, recurse_tries, choose_local_retries, choose_local_fallback_retries, recurse_to_leaf, vary_r, c+osize, 0); } else { out_size = ((numrep < (result_max-osize)) ? numrep : (result_max-osize)); crush_choose_indep( map, map->buckets[bno], weight, weight_max, x, out_size, numrep, curstep->arg2, o+osize, j, choose_tries, choose_leaf_tries ? choose_leaf_tries : 1, recurse_to_leaf, c+osize, 0); osize += out_size; } } if (recurse_to_leaf) /* reprinting final _leaf_ values to output set */ memcpy(o, c, osize*sizeof(*o)); /* swap o and w arrays */ tmp = o; o = w; w = tmp; wsize = osize; break; specimen CRUSH_RULE_EMIT: for (i = 0; i < wsize && result_len < result_max; i++) { result[result_len] = w[i]; result_len++; } wsize = 0; break; default: dprintk(" unknown op %d at step %d\n", curstep->op, step); break; } } return result_len; } 代码有点长，我们一步步的来认识一下这个逻辑。核心逻辑就是找到crush_map里的rule，然后一个rule由多个step组成，按照step的顺序进行处理，关键的地方在于熟悉掌握这些steps。在crush.h中定义了这些step操作的枚举值: /* step op codes */ enum { CRUSH_RULE_NOOP = 0, CRUSH_RULE_TAKE = 1, /* arg1 = value to start with */ CRUSH_RULE_CHOOSE_FIRSTN = 2, /* arg1 = num items to pick */ /* arg2 = type */ CRUSH_RULE_CHOOSE_INDEP = 3, /* same */ CRUSH_RULE_EMIT = 4, /* no args */ CRUSH_RULE_CHOOSELEAF_FIRSTN = 6, CRUSH_RULE_CHOOSELEAF_INDEP = 7, CRUSH_RULE_SET_CHOOSE_TRIES = 8, /* override choose_total_tries */ CRUSH_RULE_SET_CHOOSELEAF_TRIES = 9, /* override chooseleaf_descend_once */ CRUSH_RULE_SET_CHOOSE_LOCAL_TRIES = 10, CRUSH_RULE_SET_CHOOSE_LOCAL_FALLBACK_TRIES = 11, CRUSH_RULE_SET_CHOOSELEAF_VARY_R = 12 }; 在了解了step op的枚举值之后，我们按照crush_do_rule的switch分支，一个分支一个分支的分析具体逻辑。 CRUSH_RULE_TAKE: 取一个节点作为起始，某个bucket。只有一个参数，是buckets id。 CRUSH_RULE_SET_CHOOSE_TRIES/CRUSH_RULE_SET_CHOOSELEAF_TRIES/CRUSH_RULE_SET_CHOOSE_LOCAL_TRIES/CRUSH_RULE_SET_CHOOSE_LOCAL_FALLBACK_TRIES/CRUSH_RULE_SET_CHOOSELEAF_VARY_R: 这些step，都是对rule执行过程中的参数进行设置。 CRUSH_RULE_CHOOSELEAF_FIRSTN/CRUSH_RULE_CHOOSE_FIRSTN: 这两个step，只需要把firstn设置为1. CRUSH_RULE_CHOOSELEAF_INDEP/CRUSH_RULE_CHOOSE_INDEP: 这两个case是最核心的,分别对应两个函数crush_choose_firstn/crush_choose_indep。这两个函数的逻辑是怎样的？先打log，看看核心的crush过程：TBA CRUSH_RULE_EMIT:这个step做的事情很明显了，收集map到的osd。 Please enable JavaScript to view the comments powered by Disqus. All content is licensed under CC BY-NC-SA Buit with Jekyll and 3-Jekyll theme • Hosted on Github Table of Contents

ivanjobs.github.io - Ceph源码解析(3)-rados put过程探究Ceph源码解析(3)-rados put过程探究 | Ivan的博客

Search Preview

Ceph源码解析(3)-rados put过程探究 | Ivan的博客

SEO audit: Content analysis

SEO Keywords (Single)

SEO Keywords (Two Word)

SEO Keywords (Three Word)

SEO Keywords (Four Word)

Internal links in - ivanjobs.github.io

Ivanjobs.github.io Spined HTML

ivanjobs.github.io - Ceph源码解析(3)-rados put过程探究
Ceph源码解析(3)-rados put过程探究 | Ivan的博客