Issue 84979 - Logic flaw in Create Names range checking
Summary: Logic flaw in Create Names range checking
Status: CONFIRMED
Alias: None
Product: Calc
Classification: Application
Component: code (show other issues)
Version: OOo 2.3.1
Hardware: All All
: P3 Trivial with 1 vote (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-01-04 04:41 UTC by terrye
Modified: 2017-05-20 11:11 UTC (History)
6 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description terrye 2008-01-04 04:41:03 UTC
http://user.services.openoffice.org/en/forum/viewtopic.php?f=33&t=1343 refers

Insert->Names->Create calls ScAddress::Parse in sc/source/core/tool/address.cxx
to  parse and validate the range address against for three variants CONV_OOO,
CONV_XL_A1, CONV_XL_R1C1.  

The intent here is to capture any range names which have a valid A1 or RC
notation syntax and prefix such with an underscore to prevent a possible
conflict of the form: Named range IF99 refers to D23. If cell A1 contains the
formula =IF99 does this equal cell IF99 or cell D23?

However the implementation is not based on matching the entire string but on
certain valid prefix strings.  Hence underscore prefix occurs if the name matches:
   ^(\a+)(\d+).* and $1 <= IV    0 < $2 <= 63356
   ^[RrCc]
   ^[RrCc][^0-9].*
   ^[RrCc]([0-9].*) and   0 < $1 <= 63556

Hence Cot, Row, If1stop, In9v9999xx, R65536xx, R6553x7zz generate _ prefixes.
Boat, Saw, IX2, A0, In99999xx, R65537zz do not.

OK including RC notation is sensible future proofing but
 
* There is nothing to be gained by enabling prefix matching. The rule should be
at the prefix only occurs if the name in its *entirety* would also be a valid
cell reference in A1 or RC notation.

* You need to decide a policy on maximum bounds.  This should either be the
current worksheet dimensions (256x65536) for *both* A1 and RC notation or we
adopt a clearly defined growth factor (e.g. 2Gb x 2Gb).

* Help should clearly state the rules for prefixing

Incidentally Excel postfixes *only* if the entire reference would generate an
address conflict (excepting patterns of the form R99C999 which it rejects).

Again this one should be easy to fix IF we do an entire match and use current
bounds.  Let me know if you want me to propose a patch.
Comment 1 ooo 2008-01-04 15:04:21 UTC
The problem may as well be how the return values of ScRange::Parse() and
ScAddress::Parse() are used, judging without having stepped through that is.
Interestingly the loop of ScRangeData::MakeValidName() does something that one
method below in ScRangeData::IsNameValid() is marked with a comment: THIS IS
WRONG; well, yes, obviously.. Instead of checking just for any flag, a check for
presence of all relevant flag that are needed to form a valid address should be
done, maybe that would already be sufficient. Maybe the underlying ::Parse()
methods do need changes anyway to correctly handle R1C1 notation. CONV_XL_A1 and
CONV_XL_R1C1 aren't enabled in the UI of upstream builds yet so not well tested
either, but are available just in patched versions of go-oo releases. You're
welcome to try things out.

As changes might affect also range names created during Excel file import I Cc
'dr' on this.
Comment 2 terrye 2008-01-06 16:08:45 UTC
The enum convention (address.hxx:255) defines 5 types (CONV_OOO, CONV_XL_A1,
CONV_XL_R1C1, CONV_XL_OOX, CONV_LOTUS_A1).  ScRangeData::MakeValidName
enumerates over these calling ScAddress::Parse and ScRange::Parse as object
methods with two acquisition objects: aName and aRange.

The main logic flaw is that these Parse functions process the range name
left-to-right returning a USHORT containing SCA_FLAGS (see address.hxx:201 et
seq) which indicate which component parts of a range have been decoded.  There
is no flag to indicate that garbage is found after the range reference.

However, ScRangeData::MakeValidName treats them as a simple boolean.  Hence if
any of the parse methods see a fragment reference then this is treated as a
match and the _ prefix is applied.

Incidentally these parse methods only decode for CONV_OOO (the default),
CONV_XL_A1 and CONV_XL_R1C1 so CONV_OOO is called 3 times, and as the match
routine does not quit on first match, multiple transformations can occur.  For
example, in the case of
  A123:B124  aRange.Parse(,,CONV_OOO) returns 0xF700   resulting in A123_B124
             aRange.Parse(,,CONV_XLA1) returns 0xC700  resulting in _A123_B124

Because this is processing 10 parse routines which could (partial) match under
various conditions we get the bizarre variants described in the original post.

What is very cleat ro me is that whoever coded this up did not start by getting
a stable logical functional specification, physicalising it and the refactoring
it before moving onto design and implementation.  This whole area of
functionality it a mess.  You are not going to sort it out as a 3.x release. 
You need a consistent detailed specification before you can sort this out so I
think that this is a 4.0 issue.  What you could do in 3.x is a minimum change to
disable some of more bizarre aspects.  

I will make some note on what I regard as a sensible FS.  I wouldn't attempt to
try to fix this tangled code "bottom up".  One options might be to disable all
bar OOO parsing for 3.1.   Thoughts?
Comment 3 ooo 2008-01-07 18:59:58 UTC
> What is very cleat ro me is that whoever coded this up did not start by getting
> a stable logical functional specification, physicalising it and the refactoring
> it before moving onto design and implementation.  This whole area of
> functionality it a mess.

Please let's stick to the topic of this issue and not theorize about the
history of implementation.

> You are not going to sort it out as a 3.x release. 
> You need a consistent detailed specification before you can sort this out so I
> think that this is a 4.0 issue.

I doubt that there is need for a much detailed specification. The method
has to assure that a generated name does not clash with any address
convention OOo supports, hence the attempt to do this via the Parse()
methods. What the Parse() methods need is a return flag to indicate
whether all input was digested, and/or not set SCA_VALID if that was not
the case. Needs to be checked with already existing semantics of other
callers of course. What the caller here in this case does wrong is that
it doesn't check the return flags at all, but converts them to a boolean
instead.

Furthermore the Parse(,,CONV_XL_...) methods seem to need some fixing if
with A123:B124 aRange.Parse(,,CONV_XL_A1) returns 0xC700, where
SCA_VALID_ROW2 and SCA_VALID_COL2 are missing.

> What you could do in 3.x is a minimum change to
> disable some of more bizarre aspects.  

What else do you think is necessary than what I mentioned?

> I will make some note on what I regard as a sensible FS.  I wouldn't attempt to
> try to fix this tangled code "bottom up".  One options might be to disable all
> bar OOO parsing for 3.1.   Thoughts?

I don't quite understand what you mean by "disable all bar OOO parsing".
Probably "disable all but OOo parsing" instead? That's not really an
option, because then later on when we wanted to support the other
address conventions we might have clashing names in already existing
documents. And as mentioned, patched go-oo releases do already ship with
R1C1 address convention activated in the UI.

If you do have some better approach than fixing Parse() and caller then
please don't hesitate and submit a patch.

Thanks
  Eike
Comment 4 terrye 2008-01-08 02:36:55 UTC
Having spent 30 mins running through various test cases the Excel validation
business rule is:
  (1) replace all non alphanumeric (+list of special chars) with _  
  (2) append _ to any names which would otherwise be valid sheet references.   

Note that since (1) removes ! and : we only need to test a limit number of cases
in (2):

The validation process is therefore as follows:
1) Scan the name replacing any character other than as follows with “_â€
    Any Alphabetic Character (regardless of language)
    Any Numeric Character
    The Special Characters: "." (46), "?" (63), "\" (92), "ˆ" (95), 
                "_" (128), "¨" (136), "¯" (152), "´" (153), "¸" (168), 
                "˜" (175), "•" (180), "€" (183), "™" (184)
2) If any substitutions have occurred then trim any leading or trailing “_â€
3) If   the string matches /^([a-zA-Z]{1,2})(\d+)$/ and 
        uppercase($1) < â€IV†and $2 < 65537 
   Then append “_†to the name
3) If   the string matches /^([Rr](\d*)$/ and $1 < 65537 or 
                           /^([Cr](\d*)$/ and $1 < 257 or
                           /^([Rr](\d*) ([Cr](\d*)$/ and $1 < 65537 $2 < 257 
   Then append “_†to the name and flag an error on this name.

Don't ask my why that list of special chars, but that's what it is.  Likewise
why it objects to RC1_ and not A1_ etc.. 

Given that the only change here is that Calc uses the . separator then then the
only change that you would need to do to drop "." from the special character list.

Given that OOo already contains a perfectly good Regexp engine we could code
this up is about 30 lines of code rather than the current 1,000 or so, and even
put in a few comments to say what we are doing and why. 
Comment 5 terrye 2008-01-08 02:54:54 UTC
Sorry there is an extraneous space in the last regexp above -- a typo.

Also I would use the same validation algorithm for Define Names.  At them moment
this only validates against A1 format names so RC1 is a perfectly valid name,
which will then become inaccessible if saved as XLS and opened in Excel. 
Comment 6 frank 2008-01-09 14:23:38 UTC
Hi Eike,

as you've commented on this one and set the target, I think you should get it.

Frank
Comment 7 terrye 2008-01-13 01:17:01 UTC
Eike, If you want me to document this properly or suggest a patch then let me
know, but until then this one is with you. TerryE 
Comment 8 ooo 2008-01-16 12:09:17 UTC
Sure, a working patch would be welcomed. I'm not convinced of using the
regex engine though (besides that I wouldn't call it "perfectly good",
which for sure it isn't), that would add performance penalty impacting
at least the current Excel import.

Regarding the list of special characters: that may even differ between
different localized Excel versions or with different language support
being enabled. I wouldn't count on that. A short test also revealed that
if a name is to be created consisting of only one of such special
character, an underscore is prefixed, e.g. '.' is changed to '_.'

@drr: Daniel, does ECMA/MOOXML say anything about valid range names?
Comment 9 terrye 2008-01-16 17:46:53 UTC
> I'm not convinced of using the regex engine though (besides that I wouldn't 
> call it "perfectly good", which for sure it isn't).

After I did this post, I had a look at this implementation. In its current
version is not usable.

> that would add performance penalty impacting at least the current Excel import.

Actually minuscule since this would only be invoked once per defined name.
Comment 10 Marcus 2017-05-20 11:11:36 UTC
Reset assigne to the default "issues@openoffice.apache.org".