Pages

Tuesday, August 9, 2011

Convert Windows-1251 (Cyrillic) to Unicode using Python

Just 4 lines of code to convert some file content from Windows-1251 (Cyrillic) to Unicode with Python

import codecs

f = codecs.open(filename, 'r', 'cp1251')
u = f.read()   # now the contents have been transformed to a Unicode string
out = codecs.open(output, 'w', 'utf-8')
out.write(u)   # and now the contents have been output as UTF-8
 

The multi valued string regex pattern

Let’s write a regular expression to identify user inputs in the form of two values, separated by either comma or space, such as:   12.6, 3  or  12.6    3 

The condition being this value must be single or integer; spaces should be handled; no need to check empty spaces at the start of end of the string; "12.6" or "12.6," are both not accepted.

This should work (code by les patter)

string pattern = @"\d+(\.\d)?(\s|\,)\s*\d+(\.\d)?";
string[] tests = {
"12.6, 3",
"12.6 3",
"12.6",
"12.6,",
", 3",
" 4",
"1, 2.3",
"123, 456",
};
foreach (string test in tests)
Console.WriteLine("{0}: {1}", Regex.IsMatch(test, pattern), test);
}