Issue 69482 - Breakiterator: mismatch of nextWord and getWordBoundary
Summary: Breakiterator: mismatch of nextWord and getWordBoundary
Status: CLOSED FIXED
Alias: None
Product: Internationalization
Classification: Code
Component: i18npool (show other issues)
Version: 680m182
Hardware: All All
: P3 Trivial (vote)
Target Milestone: ---
Assignee: frank
QA Contact: issues@l10n
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-09-12 12:41 UTC by thomas.lange
Modified: 2013-08-07 15:02 UTC (History)
2 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
Sample docuemtn with macro (8.18 KB, application/octet-stream)
2006-09-12 12:43 UTC, thomas.lange
no flags Details
new extended version of previous test (8.43 KB, application/octet-stream)
2006-09-25 14:22 UTC, thomas.lange
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description thomas.lange 2006-09-12 12:41:46 UTC
There are some oddities where nextWord and getBoundary do not match up as
expected. (See macro in attached document)
Comment 1 thomas.lange 2006-09-12 12:43:06 UTC
Created attachment 39095 [details]
Sample docuemtn with macro
Comment 2 thomas.lange 2006-09-12 12:43:32 UTC
.
Comment 3 thomas.lange 2006-09-12 12:59:23 UTC
Basically the macro starts to call nextWord for a given index and then calls
getWordBoundary with the starting index returned by nextWord.
In most cases one would expect that the word boundaries will be the same.
This is not always as expected. Also there some unexpected results about the
word boundary chosen.

For example:
- for index 0..2: the boundary is [3,6[ which puts "(11" in a word
  one would expect 11 to be a word on it's own
- for index 3: nextWord returns [4,5[ which is only the single '1' instead 
  of both of them. 
  Also getWordBoundary results in [3,6[ whichis quit´unexpected since 
  according to nextWord the Word should start at pos 4.
  (This one actualy resulted in Calc freezing because of non-optimal code 
   in svx. See issue 69416)
- for index 4: [5,6[ is returned (the second 1 in 11). Shouldn't that have 
  been the '/' character?
  Also nextWord and getWordBoundary do not match up once again.

- for index 5: ok
- for index 6: returns [7, 9[ thus denotes "8)". Again the ')' should
  probably not included with the number.
- for index 8: shoud have been [8,9[ that is the ')' char. At least if
  that char should not be included with the '8'.

TL->Karl: Please have a look. Thanks!
Comment 4 thomas.lange 2006-09-12 12:59:56 UTC
TL: wrong owner, reassigning...
Comment 5 thomas.lange 2006-09-12 13:00:17 UTC
.
Comment 6 thomas.lange 2006-09-12 13:13:25 UTC
Summmary corrected.
Comment 7 thomas.lange 2006-09-12 13:19:33 UTC
TL->Karl: I just discussed the above mentioned Calc issue with FST and we agreed
that it would be best if you fix this issue in the same CWS. Thus please fix it
in CWS tl29.
Comment 8 karl.hong 2006-09-20 02:27:27 UTC
fixed in cws tl29.
Comment 9 thomas.lange 2006-09-21 08:41:23 UTC
.
Comment 10 thomas.lange 2006-09-25 13:32:10 UTC
TL->Karl: The issue is fixed for the very version the macro checks in it's
current form.
But if you just modify the macro slightly e.g. by using other WordTypes (1, 2
and 3) or
change the language by using "de-DE" as locale and "ab (11/8)" the problem still
exists.

Can you fix it in time for OOo 2.0.1?
Comment 11 thomas.lange 2006-09-25 14:22:00 UTC
Created attachment 39358 [details]
new extended version of previous test
Comment 12 frank 2006-10-17 15:10:19 UTC
Hi Karl,

can you fix the problem described by Thoas' latest comment in time for 2.1 ? I
would like to have cws tl29 closed asap.

Thanks

Frank
Comment 13 karl.hong 2006-10-17 16:56:59 UTC
Hi Frank,

Sorry, I have fixed it last month, but forget to update issue.

The updated file is i18npool/source/breakiterator/breakiterator_cjk.cxx.

Karl.
Comment 14 karl.hong 2006-10-17 17:14:56 UTC
fixed.
Comment 15 karl.hong 2006-10-17 17:38:11 UTC
Hi Frank,

I missed Thomas last comment about test for other language, I will take a look
today.

Thanks,
Karl.
Comment 16 karl.hong 2006-10-17 19:13:39 UTC
Hi Thomas,

I just download window build from cws tl29 and test macro in BI2, I don't see
the problem. Final result 'mismatchs found: 0'.

Karl.
Comment 17 frank 2006-10-20 13:36:57 UTC
Hi,

checked again and found fixed. Don't know what has been wrong on the test before.

Frank
Comment 18 frank 2006-11-06 12:11:17 UTC
found fixed on master using Solaris, Windows and Linux build