Apache OpenOffice (AOO) Bugzilla – Issue 97808
sort with advanced option case sensitive doesn't work
Last modified: 2016-01-14 18:02:55 UTC
Steps to reproduce the bug 1. remove the flag in Tools -> Cell Contents -> Autoinput if present 2. Type in A1 the string Io Type in A2 the string Tu Type in A3 the string io type in A4 the string tu 3. select range from A1 to A4 4. Select Data -> Sort Leave the option Ascendent and select the Options tab Flag the Case sensitivity, and unflag the Range contains column labels The result expected is (as manual says) Io io Tu tu The result I obtain is the same if the flag is activated or disactivated. Also, the manual have to be integrated saying that the case sensitivity is only for strings that are different only by case
Can confirm it on OOo 3.1m11 and DEV300 m50
Seems to be solved with latest build AOO 4.1.1 Result is now: io Io tu Tu
oooforum: this is actually the wrong result (and I confirm it happens on 4.1.1 as you describe). Correct: Io io Tu tu In 4.1.1: io Io tu Tu
*** Issue 121428 has been marked as a duplicate of this issue. ***
In ScTable::Sort method, the default collator can be described in with the following code: Sub DefaultCollator Dim locale as new com.sun.star.lang.Locale locale.Language = "en" locale.Country = "US" op = com.sun.star.i18n.CollatorOptions.CollatorOptions_IGNORE_CASE op = 0 ' case sensitive c = CreateUnoService("com.sun.star.i18n.Collator") n = c.loadDefaultCollator(locale, op) if n = 0 then s = "a, A: " & CStr(c.compareString("a", "A")) & chr(10) s = s & "A, a: " & CStr(c.compareString("A", "a")) & chr(10) s = s & "a, b: " & CStr(c.compareString("a", "b")) & chr(10) s = s & "b, a: " & CStr(c.compareString("b", "a")) & chr(10) msgbox s '1 if the first string is greater than the second string '0 if the first string is equal to the second string '-1 if the first string is less than the second string end if End Sub With case sensitive, cmp(a, A): -1, cmp(A, a): 1, cmp(a, b): -1, cmp(b, a): 1 In QuickSort method, if cmp() > 0, cells are swapped when only two cells are there. cmp(b, a) have to be swapped but cmp(A, a) should not. Its hard to sort with these result of the collator in case sensitive.
In the case of Python's cmp function, results are simple. cmp("a", "A"): 1, cmp("A", "a"): -1, cmp("a", "b"): -1, cmp("b", "a"): 1 cmp() > 0 have to be swapped in these cases.
Created attachment 85065 [details] Patch to set upper first if ignore case is not specified It seems default case order in tertiary difference is lower first. The patch set its to upper first with tertiary difference. Collator::setStrength is deprecated since ICU2.6, it can be replaced by Collator::setAttribute instead.
Thanks hanya Status changed to PATCH Maybe targeted to 4.1.2?
OpenOffice 4.1.2 is ready (Release Candidate is being voted upon right now, see dev list) so we won't be able to incorporate this in 4.1.2. But thanks Hanya, and the patch should be reviewed and/or committed to trunk soon.
Add me.
4.2.0 build Rev. 1722749 linux-32. The patch works correctly for a vertical sort A1:A4, but it doesn't seem to fix the problem for a horizontal sort -- A1:D1 -- assuming I'm doing the selection correctly. I'll be happy to commit this one, however. Would like additional feedback.
(In reply to Kay from comment #11) > The patch works correctly for a vertical sort A1:A4, but it doesn't seem to > fix the problem for a horizontal sort -- A1:D1 -- assuming I'm doing the > selection correctly. Works for both column and row sorting on my environment. Could you try again or attach some example you have troubled? Since the attached patch influences all functions which use collator service to compare strings like sorting all over the office, these result should be the identical anywhere in the office.
I found a problem about sorting with the attached patch in Commen 7. In Writer's table, sorting with/without Match case option gave me the same result.
Comment on attachment 85065 [details] Patch to set upper first if ignore case is not specified See Comment 13 for the reason.
I tried to analyze about the sorting behavior in Writer's table. Data: b B a A On 4.1.2, without Match case: a A b B or A a B b On 4.1.2, with Match case: a A b B (stable but wrong result) Patched, without Match case: a A b B or A a B b Patched, with Match case: A a B b (stable, correct order) Sort again without Match case option, you can observe the change of the order. The observation was wrong which described in Comment 13. The sorting in Writer's table is not stable, the result is correct. The attached patch is not obsolute but some people might miss judge the result at first observation.
Comment on attachment 85065 [details] Patch to set upper first if ignore case is not specified Not obsolete as per Comment 15.
Created attachment 85249 [details] Simple writer document with 4 x 4 table Sorts as expected either across first row or down first column. Leaving Case sensitive unchecked sorts with caps before lower case as expected. Using case sensitive sorts with lower case first.
Created attachment 85250 [details] Simple calc doc with some values for 4.1.2 on Linux-32 Left to right sort of 1st row does nothing no matter what. Top to bottom of first column produces the following results for me regardless if Case Sensitive is selected or not: a A b B If Case Sensitive is NOT selected, the normal collation sequence should produce the following results: A a B b
Linux-32 on 4.1.2 I've added two little test documents that I'm using. My findings so far is that the sort of table elements in Writer works correctly, whereas the sort in Calc does not. Case insensitive -- the normal collation sort -- should produce capital letters before lower case. Case sensitive should do the opposite. Writer uses the phrase "Match case" while Calc uses "Case Sensitive" but I think these phrases should mean the same thing to a user.
Comment 17, 18 and 19 are incorrect. in general the comments all seem to indicate that Case Insensitive means sorting in REVERSE collating order. That is NOT what Case Insensitive means. Case Insensitive is NEITHER Upper Case before Lower Case NOR Lower Case before Upper Case. It IGNORES case. Hence the name: Case Insensitive. Case insensitive: ABC = abc = AbC = aBc A case sensitive sort takes the case of a character into account. The collating sequence of ASCII and UNICODE for the ASCII characters is identical. Upper case comes first. So a case sensitive sort should sort the above as: ABC AbC aBc abc If Lower Case is to be sorted first, that is a Reverse Case Sensitive sort. It is NOT a Case Insensitive sort. A Case Insensitive sort is important. Sort: nice, Nicer, nicest, Nicely, jump Case insensitive: jump, nice, Nicely, Nicer, nicest Case sensitive (Natural Order): Nicely, Nicer, jump, nice, nicest Case sensitive (Reverse Order): nicest, nice, jump, Nicer, Nicely NOTE: Case Insensitive would be preferred when sorting a list of names where some may be capitalized and others not. It is also the sort that would be used by either a Dictionary or Encyclopedia.
(In reply to Alan from comment #20) > Comment 17, 18 and 19 are incorrect. in general the comments all seem to > indicate that Case Insensitive means sorting in REVERSE collating order. > That is NOT what Case Insensitive means. > > Case Insensitive is NEITHER Upper Case before Lower Case NOR Lower Case > before Upper Case. It IGNORES case. Hence the name: Case Insensitive. > > Case insensitive: ABC = abc = AbC = aBc > > A case sensitive sort takes the case of a character into account. The > collating sequence of ASCII and UNICODE for the ASCII characters is > identical. Upper case comes first. So a case sensitive sort should sort the > above as: > ABC > AbC > aBc > abc > > If Lower Case is to be sorted first, that is a Reverse Case Sensitive sort. > It is NOT a Case Insensitive sort. > > A Case Insensitive sort is important. Sort: nice, Nicer, nicest, Nicely, jump > Case insensitive: jump, nice, Nicely, Nicer, nicest > Case sensitive (Natural Order): Nicely, Nicer, jump, nice, nicest > Case sensitive (Reverse Order): nicest, nice, jump, Nicer, Nicely > > NOTE: Case Insensitive would be preferred when sorting a list of names where > some may be capitalized and others not. It is also the sort that would be > used by either a Dictionary or Encyclopedia. You are correct in your assessment. My comments were in relation to what apparently "case insensitive" means within OpenOffice for an alphabetic sort.