Pages

Tuesday, August 9, 2011

Convert Windows-1251 (Cyrillic) to Unicode using Python

Just 4 lines of code to convert some file content from Windows-1251 (Cyrillic) to Unicode with Python

import codecs

f = codecs.open(filename, 'r', 'cp1251')
u = f.read()   # now the contents have been transformed to a Unicode string
out = codecs.open(output, 'w', 'utf-8')
out.write(u)   # and now the contents have been output as UTF-8
 

No comments:

Post a Comment