Issue 26565 - Natural Sort Option in Sort Dialog
Summary: Natural Sort Option in Sort Dialog
Status: CONFIRMED
Alias: None
Product: Calc
Classification: Application
Component: code (show other issues)
Version: OOo 1.1.1
Hardware: All All
: P3 Trivial with 1 vote (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL: http://kohei.us/ooo/nsort/
Keywords:
Depends on:
Blocks: 63864
  Show dependency tree
 
Reported: 2004-03-16 14:44 UTC by kyoshida
Modified: 2017-05-20 11:11 UTC (History)
16 users (show)

See Also:
Issue Type: PATCH
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
Preliminary implementation against 111fix2 (26.44 KB, patch)
2004-05-11 18:18 UTC, kyoshida
no flags Details | Diff
Feature complete patch against OpenOffice 1.1.1 final (30.92 KB, patch)
2004-05-27 03:30 UTC, kyoshida
no flags Details | Diff
Removed one-line printf from global.cxx (30.86 KB, patch)
2004-05-27 03:58 UTC, kyoshida
no flags Details | Diff
Updated patch that removes the redundant XCharacterClassification instance (26.88 KB, patch)
2004-06-08 06:10 UTC, kyoshida
no flags Details | Diff
patch re-issued for OpenOffice_1_1_3 (27.04 KB, patch)
2004-10-07 02:35 UTC, kyoshida
no flags Details | Diff
Patch for SRC680_m84 (22.08 KB, patch)
2005-03-13 04:24 UTC, kyoshida
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this issue.
Description kyoshida 2004-03-16 14:44:03 UTC
When cells containing string-prefixed numbers (e.g. A34) are sorted using the
current sort function found under Tool - Sort, the result is often not what one
would normally expect because the cell contents are compared as strings despite
the presence of the numeric component.  For instance, when cells containing a
series of A1, A2, A3, ... ,A19, A20 are sorted using the regular sorting
algorithm, the result will be A1, A10, A11, ... ,A19, A2, A20, A3, A4, ..., A8,
A9.  This is because the numeric part of a string is evaluated digit by digit,
which inadvertently declares A2 to be greater than A19, A3 greater than A20, and
so on.  This is simply not the natural human way of doing the sort; therefore
the natural sorting algorithm needs to be introduced to solve this problem.
Comment 1 kyoshida 2004-05-11 18:18:37 UTC
Created attachment 15180 [details]
Preliminary implementation against 111fix2
Comment 2 kyoshida 2004-05-27 03:30:53 UTC
Created attachment 15491 [details]
Feature complete patch against OpenOffice 1.1.1 final
Comment 3 kyoshida 2004-05-27 03:38:28 UTC
The above patch enables case sensitivity and user defined sort list in natural
sort algorithm.  The number of helper functions has been reduced from three to
two by use of XCharacterClassification::parseAnyToken(...).  It also users
rtl::math::stringToDouble instead of rtl::OUString::toDouble in order to handle
conversion in multiple locales.  The reference to NativeNumberWrapper has been
removed as its use in the algorithm will degrade sorting performace to a great
degree.

As far as I'm concerned, this patch is ready to go.

Kohei
Comment 4 kyoshida 2004-05-27 03:58:49 UTC
Created attachment 15492 [details]
Removed one-line printf from global.cxx
Comment 5 kyoshida 2004-05-28 01:25:08 UTC
As Eike commented on dev@sc.ooo, I will try to use the existing 
ScGlobal::pCharClass instead of creating a whole new XCharacterClassification 
interface to implemnt the same algorithm.  Stay tuned. 
 
Comment 6 kyoshida 2004-06-08 06:10:37 UTC
Created attachment 15728 [details]
Updated patch that removes the redundant XCharacterClassification instance
Comment 7 kyoshida 2004-06-08 14:18:24 UTC
This last patch will probably be the last patch for the 1.1 branch unless I 
find a bug within my code.  I plan to issue another patch for the 680 branch 
sometime in the future. 
 
Kohei 
Comment 8 kyoshida 2004-06-30 15:30:46 UTC
This effort is already started.
Comment 9 kyoshida 2004-10-07 02:35:56 UTC
Created attachment 18188 [details]
patch re-issued for OpenOffice_1_1_3
Comment 10 frank 2004-10-07 07:54:59 UTC
Hi Kohei,

please open a new Issue and attach your patch to this, as OOo1.1.3 is currently
available, the target should be OOo1.1.4.

Frank
Comment 11 kyoshida 2004-10-07 14:35:26 UTC
Hi Frank,

Actually Niklas said that this feature will have to wait till post 2.0, which
means the target milestone is still a moving target (sorry about the pun ;). 
So, I prefer using this issue to hold the patch I just submitted.  The
re-issuance of this patch is mainly for those who want to try out this feature
in the stable branch.

If this is a problem, just let me know so I can go ahead and create a new issue
for the patch.  I'm okay either way. :)

Kohei
Comment 12 frank 2004-10-08 10:35:26 UTC
Hi Kohei,

as the decission was made to not include the patch for now, there is no need
anymore for a new Issue with target 1.1.4.

Thanks.

Frank
Comment 13 soren 2005-01-27 21:23:49 UTC
Hi Kohei,

I have been taking a look at your work on "natural" sorting in the spreadsheet. 
I have a couple of observations that may be of interest: 

1. The "natural" sort you are implementing always reads numbers as floating 
point. It may be preferable to be able to read numbers as integers as well 
(regarding decimal points etc. as common text). Imagine for instance that you 
want to read a sequence of OpenOffice source-file version numbers, e.g. 1.8 1.9, 
1.10 1.11, which would still be sorted "unnaturally", as 1.10, 1.11, 1.8, 1.9. 
Also, when trying to sort other types of version numbers, e.g. 1.1.3, 1.1.4 etc. 
the comparison algorithm would probably throw an exception when trying to read 
the strings as floating point numbers.

2. If I have read your changes correctly, the new comparison function will 
always use the standard locale decimal points and thousand separators when 
parsing floting point numbers. Maybe it should be possible for the user to 
change this, for instance when importing files from other countries (the number 
formats change, but the strings do not).

Regards, Søren
Comment 14 kyoshida 2005-01-29 00:24:02 UTC
Hi Søren, thank you for your feedback.

I agree with you on both points.  With regard to 1, I should probably change it
so that, if there is more than one decimal point it will fall back to treating
the decimal separator as string.  If there is only one decimal point, then it
will be up to the user.

I will look into incorporating your observations when I port this code to the
680 base.

Regards,

Kohei
Comment 15 kyoshida 2005-03-13 04:24:23 UTC
Created attachment 23753 [details]
Patch for SRC680_m84
Comment 16 kyoshida 2005-03-13 04:26:47 UTC
New patch re-issued for SRC680_m84.  It also addresses Søren's first point,
which turned out to be due to an incorrect use of the parseAnyToken method.  It
should also yield a slightly better performance.
Comment 17 kyoshida 2005-08-19 05:30:12 UTC
Added the external project URL.

I also want to set the target milestone to 2.1 (or 3.0, whichever it will be),
but such is not available yet. :(
Comment 18 kyoshida 2005-08-25 19:54:17 UTC
Set target milestone to "not determined".  This will be changed as soon as a
numbered target milestone for the next non-micro release becomes available.
Comment 19 kyoshida 2005-08-30 20:26:32 UTC
Tenatively setting target milestone to 3.0, and adding Falko to CC.
Comment 20 kyoshida 2005-08-30 20:27:48 UTC
The spec file is available here:

http://specs.openoffice.org/calc/ease-of-use/natural_sort_algorithm.sxw
Comment 21 kami911 2006-03-05 15:18:28 UTC
We have working builds from 2.0.2 I will publise the links when it is
available... Could you target it for 2.0.3?
Comment 22 kyoshida 2006-03-06 15:03:33 UTC
setting target to 2.0.3 as kami_ requested.

Kohei
Comment 23 Uwe Fischer 2006-04-04 14:59:41 UTC
added "ufi" on cc for online help
Comment 24 pavel 2006-05-04 06:45:59 UTC
Setting the target means we have people assigned to work on the cws, QA etc.

Who will do all this?
Comment 25 kyoshida 2006-05-04 17:20:18 UTC
nn or er: IIRC this feature cannot be integrated because of the file format
issue.  Is this correct?  If so, should we abandon this feature?
Comment 26 mmeeks 2006-05-04 18:03:29 UTC
Stefan - an interesting patch with a spec [!?] blocking on file format work &
apparently interest from Sun ...
Comment 27 niklas.nebel 2006-05-04 18:12:37 UTC
In the future, there will certainly be features that require additional
information in the file format, so there will have to be a solution, and there's
no need to abandon this feature. As soon as we have a general procedure for file
format extensions, this one can be completed quite quickly.
Comment 28 kyoshida 2006-06-08 04:26:55 UTC
reassigning this issue to nn so that this issue will get properly tracked.
Comment 29 rail_ooo 2006-09-12 06:13:30 UTC
I think it will be great if this feature can sort IP addressess as well.
Example:

Before:
192.168.1.1
192.168.1.10
192.168.1.2

After:
192.168.1.1
192.168.1.2
192.168.1.10
Comment 30 niklas.nebel 2006-10-13 16:24:19 UTC
changing target
Comment 31 niklas.nebel 2007-02-26 12:50:11 UTC
Just an update to show this hasn't been forgotten: The file format extension has
been proposed to the OASIS TC.
Comment 32 niklas.nebel 2007-05-24 12:49:56 UTC
With the file format issue coming to a solution, we need agreement from User
Experience for this. Frank, can you take a look at the spec
(http://specs.openoffice.org/calc/ease-of-use/natural_sort_algorithm.sxw)?
Comment 33 kyoshida 2007-08-02 20:51:05 UTC
relevant ODF Change proposal.

http://lists.oasis-open.org/archives/office/200702/msg00047.html
Comment 34 niklas.nebel 2007-12-04 18:25:34 UTC
Target 3.0
Comment 35 niklas.nebel 2008-05-29 18:22:27 UTC
It's too late now for UI changes for 3.0.
I know this has been open far too long, and I'll try to get it into 3.1.
Comment 36 niklas.nebel 2009-10-14 17:04:23 UTC
adjusting target
Comment 37 kyoshida 2010-02-13 18:33:24 UTC
*** Issue 109227 has been marked as a duplicate of this issue. ***
Comment 38 niklas.nebel 2010-08-25 15:41:10 UTC
retargeting to 3.4 for time reasons
Comment 39 Martin Hollmichel 2011-03-15 12:50:25 UTC
set target to 3.x since not release relevant for 3.4.
Comment 40 Rob Weir 2013-03-11 14:58:27 UTC
I'm adding this comment to all open issues with Issue Type == PATCH.  We have 220 such issues, many of them quite old.  I apologize for that.  

We need your help in prioritizing which patches should be integrated into our next release, Apache OpenOffice 4.0.

If you have submitted a patch and think it is applicable for AOO 4.0, please respond with a comment to let us know.

On the other hand, if the patch is no longer relevant, please let us know that as well.

If you have any general questions or want to discuss this further, please send a note to our dev mailing list:  dev@openoffice.apache.org

Thanks!

-Rob
Comment 41 Marcus 2017-05-20 11:11:31 UTC
Reset assigne to the default "issues@openoffice.apache.org".