Mongo Wire Protocol
前一段时间研究一下mysql protocol 还有 mysql udf,mysql protocol 搞明白了,也用twisted 实现了一下mysql protocol.但结果令我很恼火,mysql protocol 不足够简洁,自己捣鼓的一个项目也做到一半做不下去了.然后就看到了mongodb protocol ,简单的看了一下–清爽,简洁.甚合我意,于是,研究之.
Introduction
mongo protocol 是一个简单的依据socket 的,请求-回应型的协议.用来进行mongo client 和 mongo server 之间的数据交互.
client 可以通过一个正常的 tcp/ip socket 来连接server.默认的client 和server 之间没有handshake(握手).
Messages Types and Formats
下面只讲几个我可能用到的几个消息类型 和格式
Standard Message Header
一般来说,叫做消息头.mongodb protocol 的消息都包含一个消息头.消息头的结构如下:
struct MsgHeader {
int32 messageLength; // total message size, including this
int32 requestID; // identifier for this message
int32 responseTo; // requestID from the original request
// (used in reponses from db)
int32 opCode; // request type - see table below
}
messageLength: 这个是整个消息的字节长度,包括它自己本身
requestID: 这个是client 生成的这个消息的标识符,server端会把这个requestID 放在responseTo 中传回来,client 就可以把返回的消息关联起来
responseTo:根据上面讲,值跟client 中的requestID 是一样的.
opCode: 下面会讲到
| Opcode Name | opCode value | Comment |
|---|---|---|
| OP_REPLY | 1 | Reply to a client request. responseTo is set |
| OP_MSG | 1000 | generic msg command followed by a string |
| OP_UPDATE | 2001 | update document |
| OP_INSERT | 2002 | insert new document |
| RESERVED | 2003 | formerly used for OP_GET_BY_OID |
| OP_QUERY | 2004 | query a collection |
| OP_GET_MORE | 2005 | Get more data from a query. See Cursors |
| OP_DELETE | 2006 | Delete documents |
| OP_KILL_CURSORS | 2007 | Tell database client is done with a cursor |
每一项占四个字节,MsgHeader 总共16个字节.
OP_QUERY
OP_QUERY 消息用来查询database 中的文档,格式如下:
struct OP_QUERY {
MsgHeader header; // standard message header
int32 flags; // bit vector of query options. See below for details.
cstring fullCollectionName; // "dbname.collectionname"
int32 numberToSkip; // number of documents to skip
int32 numberToReturn; // number of documents to return
// in the first OP_REPLY batch
document query; // query object. See below for details.
[ document returnFieldSelector; ] // Optional. Selector indicating the fields
// to return. See below for details.
}
flags: 值如下
| bit num | name | description |
|---|---|---|
| 0 | Reserved | Must be set to 0. |
| 1 | TailableCursor | Tailable means cursor is not closed when the last data is retrieved. Rather, the cursor marks the final object’s position. You can resume using the cursor later, from where it was located, if more data were received. Like any “latent cursor”, the cursor may become invalid at some point (CursorNotFound) – for example if the final object it references were deleted. |
| 2 | SlaveOk | Allow query of replica slave. Normally these return an error except for namespace “local”. |
| 3 | OplogReplay | Internal replication use only – driver should not set |
| 4 | NoCursorTimeout | The server normally times out idle cursors after an inactivity period (10 minutes) to prevent excess memory use. Set this option to prevent that. |
| 5 | AwaitData | Use with TailableCursor. If we are at the end of the data, block for a while rather than returning no data. After a timeout period, we do return as normal. |
| 6 | Exhaust | Stream the data down full blast in multiple “more” packages, on the assumption that the client will fully read all data queried. Faster when you are pulling a lot of data and know you want to pull it all down. Note: the client is not allowed to not read all the data unless it closes the connection. |
| 7 | Partial | Get partial results from a mongos if some shards are down (instead of throwing an error) |
| 8-31 | Reserved | Must be set to 0. |
fullCollectionName: collection(集合)名字.完整的collection 名字应该包括database(数据库)名字和collection(集合名字),中间用一个点连接.例如,database 名字为foo,collection 名字为bar.完成的collection 名字就为”foo.bar”
numberToSkip: 查询结果中忽略的数量,和sql 中的offset(位移)差不多.
numberToReturn: 限制返回结果的数量.如果查询的结果大于numberToReturn,server 端会建立一个标尺,并返回这个cursorID.和sql 中的limit 差不多
query: 一个包含查询信息的bson 格式的文档.这个查询包含一个或多个元素.可能的元素包括$query,$orderby,$hint,$explain,$snapshot.
returnFieldsSelector: 可选的bson 文档,来限制返回结果中的字段.
dadabase 会针对 OP_QUERY 返回一个OP_REPLY 消息.
OP_GETMORE
OP_QUERY 消息用来查询database 中的文档,格式如下:
struct {
MsgHeader header; // standard message header
int32 ZERO; // 0 - reserved for future use
cstring fullCollectionName; // "dbname.collectionname"
int32 numberToReturn; // number of documents to return
int64 cursorID; // cursorID from the OP_REPLY
}
fullCollectionName : collection(集合)名字.完整的collection 名字应该包括database(数据库)名字和collection(集合名字),中间用一个点连接.例如,database 名字为foo,collection 名字为bar.完成的collection 名字就为”foo.bar”
numberToReturn: 限制返回结果的数量.如果查询的结果大于numberToReturn,server 端会建立一个标尺,并返回这个cursorID.和sql 中的limit 差不多
cursorID: 执行OP_QUERY 时从database 返回的OP_REPLY 消息中的cursorID.
dadabase 会针对OP_GETMORE 返回一个OP_REPLY 消息.
OP_REPLY
OP_REPLY 消息用来回复OP_QUERY 和OP_GET_MORE .OP_REPLY 的格式如下:
struct {
MsgHeader header; // standard message header
int32 responseFlags; // bit vector - see details below
int64 cursorID; // cursor id if client needs to do get more's
int32 startingFrom; // where in the cursor this reply is starting
int32 numberReturned; // number of documents in the reply
document* documents; // documents
}
responseFlags :
| bit num | name | description |
|---|---|---|
| 0 | CursorNotFound | Set when getMore is called but the cursor id is not valid at the server. Returned with zero results. |
| 1 | QueryFailure | Set when query failed. Results consist of one document containing an “$err” field describing the failure. |
| 2 | ShardConfigStale | Drivers should ignore this. Only mongos will ever see this set, in which case, it needs to update config from the server. |
| 3 | AwaitCapable | Set when the server supports the AwaitData Query option. If it doesn’t, a client should sleep a little between getMore’s of a Tailable cursor. Mongod version 1.6 supports AwaitData and thus always sets AwaitCapable. |
| 4-31 | Reserved | Ignore |
cursorID:如果一个查询结果符合OP_REPLY 包,cursorID 会为0.cursorID 会在OP_GET_MORE 被用到用来获取更多的数据.