Two Apparently Equal Python Unicode Utf8-encoded Strings Don't Match

August 20, 2024 Post a Comment

>>> str1 = unicode('María','utf8') >>> str2 = u'María'.encode('utf8') >>> str1 == str2 False How is that possible? Just in case it is relevant, I'm us

Solution 1:

You have a unicode string and a byte string. They are not the same thing.

One holds a Unicode value, María. The other holds a UTF-8 encoded series of bytes, 'Mar\xc3\xada'.

Python 2 does do an implicit conversion when comparing Unicode and byte string values, but you should not count on that conversion, and it depends entirely on the default codec set for your system.

If you don't yet know what Unicode really is, or why UTF-8 is not the same thing, or want to know anything else about encodings, see:

Baca Juga