Apache OpenOffice (AOO) Bugzilla – Issue 26565
Natural Sort Option in Sort Dialog
Last modified: 2017-05-20 11:11:31 UTC
When cells containing string-prefixed numbers (e.g. A34) are sorted using the current sort function found under Tool - Sort, the result is often not what one would normally expect because the cell contents are compared as strings despite the presence of the numeric component. For instance, when cells containing a series of A1, A2, A3, ... ,A19, A20 are sorted using the regular sorting algorithm, the result will be A1, A10, A11, ... ,A19, A2, A20, A3, A4, ..., A8, A9. This is because the numeric part of a string is evaluated digit by digit, which inadvertently declares A2 to be greater than A19, A3 greater than A20, and so on. This is simply not the natural human way of doing the sort; therefore the natural sorting algorithm needs to be introduced to solve this problem.
Created attachment 15180 [details] Preliminary implementation against 111fix2
Created attachment 15491 [details] Feature complete patch against OpenOffice 1.1.1 final
The above patch enables case sensitivity and user defined sort list in natural sort algorithm. The number of helper functions has been reduced from three to two by use of XCharacterClassification::parseAnyToken(...). It also users rtl::math::stringToDouble instead of rtl::OUString::toDouble in order to handle conversion in multiple locales. The reference to NativeNumberWrapper has been removed as its use in the algorithm will degrade sorting performace to a great degree. As far as I'm concerned, this patch is ready to go. Kohei
Created attachment 15492 [details] Removed one-line printf from global.cxx
As Eike commented on dev@sc.ooo, I will try to use the existing ScGlobal::pCharClass instead of creating a whole new XCharacterClassification interface to implemnt the same algorithm. Stay tuned.
Created attachment 15728 [details] Updated patch that removes the redundant XCharacterClassification instance
This last patch will probably be the last patch for the 1.1 branch unless I find a bug within my code. I plan to issue another patch for the 680 branch sometime in the future. Kohei
This effort is already started.
Created attachment 18188 [details] patch re-issued for OpenOffice_1_1_3
Hi Kohei, please open a new Issue and attach your patch to this, as OOo1.1.3 is currently available, the target should be OOo1.1.4. Frank
Hi Frank, Actually Niklas said that this feature will have to wait till post 2.0, which means the target milestone is still a moving target (sorry about the pun ;). So, I prefer using this issue to hold the patch I just submitted. The re-issuance of this patch is mainly for those who want to try out this feature in the stable branch. If this is a problem, just let me know so I can go ahead and create a new issue for the patch. I'm okay either way. :) Kohei
Hi Kohei, as the decission was made to not include the patch for now, there is no need anymore for a new Issue with target 1.1.4. Thanks. Frank
Hi Kohei, I have been taking a look at your work on "natural" sorting in the spreadsheet. I have a couple of observations that may be of interest: 1. The "natural" sort you are implementing always reads numbers as floating point. It may be preferable to be able to read numbers as integers as well (regarding decimal points etc. as common text). Imagine for instance that you want to read a sequence of OpenOffice source-file version numbers, e.g. 1.8 1.9, 1.10 1.11, which would still be sorted "unnaturally", as 1.10, 1.11, 1.8, 1.9. Also, when trying to sort other types of version numbers, e.g. 1.1.3, 1.1.4 etc. the comparison algorithm would probably throw an exception when trying to read the strings as floating point numbers. 2. If I have read your changes correctly, the new comparison function will always use the standard locale decimal points and thousand separators when parsing floting point numbers. Maybe it should be possible for the user to change this, for instance when importing files from other countries (the number formats change, but the strings do not). Regards, Søren
Hi Søren, thank you for your feedback. I agree with you on both points. With regard to 1, I should probably change it so that, if there is more than one decimal point it will fall back to treating the decimal separator as string. If there is only one decimal point, then it will be up to the user. I will look into incorporating your observations when I port this code to the 680 base. Regards, Kohei
Created attachment 23753 [details] Patch for SRC680_m84
New patch re-issued for SRC680_m84. It also addresses Søren's first point, which turned out to be due to an incorrect use of the parseAnyToken method. It should also yield a slightly better performance.
Added the external project URL. I also want to set the target milestone to 2.1 (or 3.0, whichever it will be), but such is not available yet. :(
Set target milestone to "not determined". This will be changed as soon as a numbered target milestone for the next non-micro release becomes available.
Tenatively setting target milestone to 3.0, and adding Falko to CC.
The spec file is available here: http://specs.openoffice.org/calc/ease-of-use/natural_sort_algorithm.sxw
We have working builds from 2.0.2 I will publise the links when it is available... Could you target it for 2.0.3?
setting target to 2.0.3 as kami_ requested. Kohei
added "ufi" on cc for online help
Setting the target means we have people assigned to work on the cws, QA etc. Who will do all this?
nn or er: IIRC this feature cannot be integrated because of the file format issue. Is this correct? If so, should we abandon this feature?
Stefan - an interesting patch with a spec [!?] blocking on file format work & apparently interest from Sun ...
In the future, there will certainly be features that require additional information in the file format, so there will have to be a solution, and there's no need to abandon this feature. As soon as we have a general procedure for file format extensions, this one can be completed quite quickly.
reassigning this issue to nn so that this issue will get properly tracked.
I think it will be great if this feature can sort IP addressess as well. Example: Before: 192.168.1.1 192.168.1.10 192.168.1.2 After: 192.168.1.1 192.168.1.2 192.168.1.10
changing target
Just an update to show this hasn't been forgotten: The file format extension has been proposed to the OASIS TC.
With the file format issue coming to a solution, we need agreement from User Experience for this. Frank, can you take a look at the spec (http://specs.openoffice.org/calc/ease-of-use/natural_sort_algorithm.sxw)?
relevant ODF Change proposal. http://lists.oasis-open.org/archives/office/200702/msg00047.html
Target 3.0
It's too late now for UI changes for 3.0. I know this has been open far too long, and I'll try to get it into 3.1.
adjusting target
*** Issue 109227 has been marked as a duplicate of this issue. ***
retargeting to 3.4 for time reasons
set target to 3.x since not release relevant for 3.4.
I'm adding this comment to all open issues with Issue Type == PATCH. We have 220 such issues, many of them quite old. I apologize for that. We need your help in prioritizing which patches should be integrated into our next release, Apache OpenOffice 4.0. If you have submitted a patch and think it is applicable for AOO 4.0, please respond with a comment to let us know. On the other hand, if the patch is no longer relevant, please let us know that as well. If you have any general questions or want to discuss this further, please send a note to our dev mailing list: dev@openoffice.apache.org Thanks! -Rob
Reset assigne to the default "issues@openoffice.apache.org".