Skip to content Skip to sidebar Skip to footer

Two Apparently Equal Python Unicode Utf8-encoded Strings Don't Match

>>> str1 = unicode('María','utf8') >>> str2 = u'María'.encode('utf8') >>> str1 == str2 False How is that possible? Just in case it is relevant, I'm us

Solution 1:

You have a unicode string and a byte string. They are not the same thing.

One holds a Unicode value, María. The other holds a UTF-8 encoded series of bytes, 'Mar\xc3\xada'.

Python 2 does do an implicit conversion when comparing Unicode and byte string values, but you should not count on that conversion, and it depends entirely on the default codec set for your system.

If you don't yet know what Unicode really is, or why UTF-8 is not the same thing, or want to know anything else about encodings, see:

Solution 2:

A string cannot be both "Unicode" and "UTF-8 encoded"; they are mutually exclusive. Hence, different strings.

Post a Comment for "Two Apparently Equal Python Unicode Utf8-encoded Strings Don't Match"