工作只是人生的一部分: python struct

Python standard library中提供了一個struct模組，主要用來讀取二進制檔案的資料，按照Ｃ語言的data type讀出。
主要使用的方式是建立一個格式化字串，這個格式化字串用來描述所要讀取之資料。
下表描述格式化字串所使用的符號。
http://docs.python.org/library/struct.html

Format	C Type	Python type	Standard size	Notes
`x`	pad byte	no value
`c`	`char`	string of length 1	1
`b`	`signed char`	integer	1	(3)
`B`	`unsigned char`	integer	1	(3)
`?`	`_Bool`	bool	1	(1)
`h`	`short`	integer	2	(3)
`H`	`unsigned short`	integer	2	(3)
`i`	`int`	integer	4	(3)
`I`	`unsigned int`	integer	4	(3)
`l`	`long`	integer	4	(3)
`L`	`unsigned long`	integer	4	(3)
`q`	`long long`	integer	8	(2), (3)
`Q`	`unsigned long long`	integer	8	(2), (3)
`f`	`float`	float	4	(4)
`d`	`double`	float	8	(4)
`s`	`char[]`	string
`p`	`char[]`	string
`P`	`void *`	integer		(5), (3)

例如我們有一個二進制檔案裡面是四個１byte的資料和一個4 byte的unsigned integer。則格式化字串可寫成"BBBBI" ，連續四個B也可以寫成4B，因此格式化字串也可寫成"4BI"

ps 格式化字串中符號間的空白會省略

另外，存放在檔案中的資料會有byte ordering的問題，因此格式化字串的第一個符號必需指出所要讀取之資料的byte order為何。參見下表。

Character	Byte order	Size	Alignment
`@`	native	native	native
`=`	native	standard	none
`<`	little-endian	standard	none
`>`	big-endian	standard	none
`!`	network (= big-endian)	standard	none

若沒有寫第一個符號（如上例）則預設為native。"4BI"就是"@4BI"。

有了格式化字串之後，可使用struct.calcsize(fmt)將格式化字串帶入取得格式化字串所代表的資料量大小。此例：struct.calcsize("@4BI")回傳值為。

一旦格式化字串準備好，便可以使用struct.pack跟struct.unpack函式來操作。
struct.pack(fmt,v1,v2,...)
此函式第一個參數fmt為格式化字串，其後對應於格式化字串接著各個參數，回傳值即為根據格化字串及所帶的參數所產生的二進制字串。
struct.unpack(fmt,string)
此函式第一個參數fmt為格式化字串，其後是一個二進制字串(或者說是一個byte stream)，回傳值即為根據格化字串將二進制字串解開，解成一個個不同的資料單位。

例：
>>> a = struct.pack("4BI",1,2,3,4,5)
>>> a
'\x01\x02\x03\x04\x05\x00\x00\x00'
>>> struct.unpack("4BI",a)
(1, 2, 3, 4, 5)

另外，格式化字串中使用s時，放在s前頭的數字代表這個char []的大小。
例如16s代表一個char [16]的空間。

工作只是人生的一部分

星期二, 10月 05, 2010

python struct

沒有留言:

網誌存檔

Links

標籤