WinBatch Tech Support Home

Database Search

If you can't find the information using the categories below, post a question over in our WinBatch Tech Support Forum.

TechHome

Strings

Can't find the information you are looking for here? Then leave a message over on our WinBatch Tech Support Forum.

Character Comparison Issue

 Keywords: ßß ss Special Character Comparison Issue Relational Operator Operators == 

Question:

I have a little coding problem (here some WinBatch-Code):
text = "Some text containing ßß"
text_old = text
text = strreplace(text,"ß","ss")
if text_old == text then
   message ("Text is","Equal")
end if
if text_old != text then
   message ("Text is","Unequal")
end if
Am I wrong if I think the second messagebox (Unequal) has to come up ? Unfourtunatly the first messagebox, saying text_old and text are equal is coming up.

I'm using WinBatch 2008b.

Answer:

Interesting. I can reproduce this, but I do not think is is a WinBatch bug. We let Windows do the compare for us. It may be some kind of linguistic convention. For example...Search Google for ßß. Google thinks ßß is ssss also.
if "ßß" == "ssss" then message("ßß ssss" ," Equal")
Note:
Google thinks they are the same.
Microsoft thinks they are the same.
WinBatch thinks they are the same.

From Wikipediahttp://en.wikipedia.org/wiki/%C3%9F

In Switzerland and Liechtenstein ss usually replaces every ß. This is officially sanctioned by the German orthography rules, which state in §25 E2: In der Schweiz kann man immer „ss“ schreiben ("In Switzerland, one can always write 'ss'").

Apparently we all live in Switzerland now.

They have a lot more on this and it seems to be an official part of the Microsoft character collation (dictionary sort) sequence.

Note that WinBatch does not use a numeric character code sequence, it uses the character collation sequence, which, I believe, is based on the country code the machine is set to.

Our condolences.

User Reply:

If "ß" is equal to "ss" then strindex should give any result other than 0, shouldn't it?

If you try with "ß", you'll recieve 22. If you search for "ss" you'll recieve 0. This behavior is correct, because the ASCII codes for "ß" and "ss" are different. And that's what im interessted in.

It depends from the point of view, wether "ss" is equal to "ß" or not. I'm not working on (but sometimes confused by) German spelling rules, which are subject to change. And as far as I know most of the times "ß" isn't equal to "ss".

Can I change the behavior to use numeric character code sequences to compare strings ? If not I have to create work-around, e.g. with a loop from 1 to strlen.

Answer:

>if "ß" is equal to "ss" then >strindex should give any >result other than 0, shouldn't >it?

Whether or not it 'should' is a debatable assertion but that's not how it's implemented. StrIndex uses a direct numeric comparison of each character. For your purposes this is a good thing.

>If you try with "ß", you'll >recieve 22. If you search for >"ss" you'll recieve 0. This >behavior is correct, because >the ASCII codes for "ß" and >"ss" are different. And that's >what im interessted in. > >It depends from the point of >view, wether "ss" is equal to >"ß" or not. I'm not working on >(but sometimes confused by) >German spelling rules, which >are subject to change. And as >far as I know most of the >times "ß" isn't equal to "ss".

The previously noted authorities appear not to entirely agree with you.

>Can I change the behavior to >use numeric character code >sequences to compare strings ? >If not I have to create >work-around, e.g. with a loop >from 1 to strlen.

There are some tricks you might be able to do by setting a Microsoft Windows code page(sometimes incorrectly called ANSI code pages by Microsoft) for WinBatch to use for conversion and then doing UNICODE conversions or simply write yourself a UDF using StrIndex.

Depending on your ANSI code page and your conversion to Unicode, yes, that's quite possible. Multibyte to Unicode conversions, as well as things involving Unicode pre-composed code points vs. composite code points, can lead to some unexpected equivalences between different sequences of characters.

Since WinBatch started having support for Unicode, it is now possible to start encountering these types of "issues".


Article ID:   W18282
Filename:   Character Comparison Issue.txt
File Created: 2009:04:10:08:08:16
Last Updated: 2009:04:10:08:08:16