Warm tip: This article is reproduced from serverfault.com, please click

Windows XP encoding for non-english and english characters

发布于 2020-06-07 10:34:07

The problem:

I am writing a txt file with greek characters, using python and cp1253 encoding but the program throws an error at some characters.

UnicodeEncodeError: 'charmap' codec can't encode character '\u2265' in position 389: character maps to <undefined>

The question:

I believe that this problem can be solved if I use an encoding that includes both languages and is compatible with Windows XP. So my question is:

How does Windows XP handle bilingual text? Does it use "mixed" encodings?


Edit I am returning after some months and I am realizing how naive my question is. Anyway I am keeping it pretty much unchanged and I will answer it for new developers who have the same problem

Questioner
Charalamm
Viewed
0
Charalamm 2020-12-02 05:23:36

The problem, obviously, is that the text I was trying to write contains characters that are not included in the encoding.

To solve the problem I tried to replace all the "bad" characters with normal ones. In order to find to find all these characters I used the following script

bad_chars = []
with open(name, 'w', encoding = 'cp1253') as res:
    for i in range(len(whole_text)):
        try:
            res.write(whole_text[i])
        except:
            bad_chars.append(whole_text[i])

Then I created a dictionary with the correct characters and I replaced them in the text.

chars_to_change = {'∆':'Δ', 'Ω':'Ω', '₂':'2'}
for c1, c2 in chars_to_change.items():
    whole_text = whole_text.replace(c1, c2)

Note that there might be better solutions, especially in the first part of the solution. Please edit if you find an improvement or error