2009-10-19

Pythonでバイトコードを書き換える

python

Pythonでバイトコードを書き換える方法。co_codeがreadonlyなのでcodeオブジェクトをわざわざ作り直している。
以下の例では乗算を累乗に変更している。

import new
import opcode

def hoge(n):
    return 2 * n

hoge(16) #=>32

piyo = hoge.func_code.co_code.replace(chr(opcode.opmap['BINARY_MULTIPLY']), chr(opcode.opmap['BINARY_POWER']))

code = new.code(
    hoge.func_code.co_argcount,
    hoge.func_code.co_nlocals,
    hoge.func_code.co_stacksize,
    hoge.func_code.co_flags,
    piyo,
    hoge.func_code.co_consts,
    hoge.func_code.co_names,
    hoge.func_code.co_varnames,
    hoge.func_code.co_filename,
    hoge.func_code.co_name,
    hoge.func_code.co_firstlineno,
    hoge.func_code.co_lnotab
)
func = new.function(
    code,
    hoge.func_globals,
    hoge.func_name,
    hoge.func_defaults,
    hoge.func_closure
)

func(16) #=>65536

newの代わりにtypesを使ってもよい。その辺りはお好みで。

import new
import types
new.code is types.CodeType #=>True
new.function is types.FunctionType #=>True

2009-10-15

htmlentitydefs

python tips

こんなモジュールがあるのか。

>>> import htmlentitydefs
>>> htmlentitydefs.name2codepoint['copy']
169
>>> htmlentitydefs.codepoint2name[ord('&')]
'amp'

2009-10-12

turtleモジュールでturtle graphics

python

面白い。

2009-10-10

PythonでBrainfuckコンパイラ

python

PythonでBrainfuckコンパイラを書いた*1。MS-DOS上で実行可能な.COMファイルを出力する。32ビットWindows上で動くはず。
helloworld*2がコンパイル/実行出来ることは確認した。
最適化など、改良の余地はまだあるが、今のところこれ以上改良するつもりは無い。

使い方

以下のように標準入力にソースコードを、標準出力に出力ファイルを渡せば良い。

C:\>python brainfuck.py < helloworld.bf > helloworld.com

動作画面

ソースコード

#!/usr/bin/env python
# coding: utf-8
import sys
import os
import re
import struct
try:
    import msvcrt
    msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
except ImportError:
    pass

class Brainfuck(object):
    bf2x86 = \
        {
            '>': (
                    '\x46', # inc si
                 ),
            '<': (
                    '\x4e', # dec si
                 ),
            '+': (
                    '\xfe\x02', # inc byte ptr [bp+si]
                 ),
            '-': (
                    '\xfe\x0a', # dec byte ptr [bp+si]
                 ),
            '.': (
                    '\xb4\x02', # mov ah, 2
                    '\x8a\x12', # mov dl, [bp+si]
                    '\xcd\x21', # int 21h
                 ),
            ',': (
                    '\xb4\x08', # mov ah, 8
                    '\xcd\x21', # int 21h
                    '\x88\x02', # mov [bp+si], al
                 ),
            '[': (
                    '\x80\x3a\x00', # cmp byte ptr [bp+si], 0
                    '\x74\x00',     # jz xxx
                 ),
            ']': (
                    '\x80\x3a\x00', # cmp byte ptr [bp+si], 0
                    '\x75\x00',     # jnz xxx
                 ),
        }
    object_header = ''.join(
        (
            '\xbd\x00\x00', # mov bp, memory
            '\x31\xf6',     # xor si, si
            # メモリをゼロに初期化
            '\xfc',         # cld
            '\x89\xef',     # mov di, bp
            '\x30\xc0',     # xor al, al
            '\xb9\x30\x75', # mov cx, 30000
            '\xf3\xaa',     # rep stosb
        )
    )
    object_footer = ''.join(
        (
            '\xb8\x00\x4c', # mov ax, 4c00h
            '\xcd\x21',     # int 21h
        )
    )
    
    @classmethod
    def compile(cls, input, output):
        bf2x86 = dict(map(lambda (x, y): (x, ''.join(y)), tuple(cls.bf2x86.iteritems())))
        
        code = re.findall(r'[><+\-.,\]\[]', input.read())
        stack = []
        jmp_map = {}
        offsets = [0]
        
        for i, j in enumerate(code):
            if j == '[':
                stack.append(i)
            elif j == ']':
                t = stack.pop()
                jmp_map[t] = i + 1
                jmp_map[i] = t
            offsets.append(offsets[-1]+len(bf2x86[j]))
        
        object_header = list(cls.object_header)
        object_header[1:3] = cls.to_bin(offsets[-1] + 0x0100 + len(object_header) + len(cls.object_footer), 2)
        object_header = ''.join(object_header)
        output.write(object_header)
        
        for i, j in enumerate(code):
            x86 = bf2x86[j]
            if j in ('[', ']'):
                jmp = cls.to_bin(offsets[jmp_map[i]] - offsets[i], 1)
                x86 = list(x86)
                x86[4] = jmp
                x86 = ''.join(x86)
            output.write(x86)
        
        output.write(cls.object_footer)
        
    @classmethod
    def to_bin(cls, number, size):
        return struct.pack('<i', number)[:size]

def main():
    Brainfuck.compile(sys.stdin, sys.stdout)

if __name__ == '__main__':
    main()

メモ/解説

import msvcrtの部分はWindowsで標準出力にバイナリを出力するのに必要。これがないと正しく動かない。
.COMのファイルフォーマットはメモリイメージそのままという実に簡単なものである。展開される先のアドレスが0からではなく100hからなのに注意する事。
機械語をわざわざ分割し、tupleにして後で''.joinしているのは可読性のため。こうすると機械語に相当するアセンブリ言語を横に記述する事が出来る。
ジャンプが1バイトに収まらない場合は正常にジャンプ出来ない。これは簡略化のためである。

*1:実は二週間ほど前に完成させていたのだが、このブログに投稿するのを忘れていた。

*2:+++++++++[>++++++++>+++++++++++>+++++<<<-]>.>++.+++++++..+++.>-.------------.<++++++++.--------.+++.------.--------.>+.

2009-10-06

str.endswithには配列も指定できる

python

Pythonのstr.endswithには配列も指定できることを知った。

path.endswith(('/','\\'))

2009-10-06

様々な書き方

python

変数pathが、使用しているOSのファイルパスの区切り文字で終わっていなければ区切り文字を終端に付加するプログラム。

import os
# 1
if path.endswith(os.path.sep):
    path += os.path.sep
# 2
path += os.path.sep if path.endswith(os.path.sep) else ''
# 3
path += (path.endswith(os.path.sep) and os.path.sep) or ''

2009-10-05

new

python

Python 2.6.3 (r263rc1:75186, Oct  2 2009, 20:40:30) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> class Hoge(object):
...   def __new__(self):
...     return 0
...
>>> Hoge()
0