Apache OpenOffice (AOO) Bugzilla – Issue 71158
ODFF: Inconsistent GCD() and LCM() results for non-integers
Last modified: 2013-08-07 15:15:24 UTC
Example =GCD(1.2;2.4) returns 1.2 =GCD(1.2;3.6) returns 0 You could say that anyone using GCD with non-integers deserves this.... but the behaviour really ought to be defined consistently - returning an error or zero. It should never return a non-integer, or we might have GCD(5;6) giving 1.2
Excel's documentation for GCD() states that Excel truncates to integers before applying, this approach makes sense to me. Zero shouldn't be the answer unless both a and b are zero: see http://en.wikipedia.org/wiki/Greatest_common_divisor. Excel also fails on negative integers, but GCD seems well defined for negative integers. However, Calc will return negative results sometimes; I'm not sure that's a good idea. I'd suggest always returning the positive GCD, regardless of the signs of the arguments.
Perhaps it would be useful to talk to whoever implemented this function, if that is possible, to find out the thinking behind it? There is another function GCD_ADD which should be the same as GCD in Excel; one of the differences with Calc GCD is that Calc GCD can handle negative numbers. If there is to be a difference between Calc and Excel I suppose this is a good one, and returning a negative number when the arguments have mixed signs is at least defined, even if not in maths textbooks. I'm not very convinced though. Note that Calc GCD(0,0) currently returns an error so this probably needs fixing too (Excel does manage a 0 here). Also note that GCD_ADD(A1;A2) where: A1="text", A2=2 returns 2; A1=2, A2="text" returns error; A1="text", A2=-2 returns error; A1="text", A2 blank returns 0; A1 blank, A2 blank returns 0; so there is some inconsistency that should really be fixed. I think all these should return errors as in Excel. [Calc GCD(A1;A2) works OK here] There seem to be different mathematical definitions of the GCD function. MathWorld http://mathworld.wolfram.com/GreatestCommonDivisor.html is usually reliable. But it talks about the Mathematica implementation that deals with non- integers. That might be useful to imitate? It seems to me that we could do with a maths specialist to help decide what is really wanted.
As for sign convention, I guess I don't have a big problem with the convention, "mixed signs returns negative GCD, same sign returns positive". It has a certain symmetry. To extend to an arbitrary number of arguments, use even/odd parity. I think this is what the current implementation does -- if so, the documentation needs to be clear. As for nonintegers, Mathworld defines it for rationals, which makes mathematical sense to me. Under that definition, GCD(1.2;2.4) = GCD(1.2;3.6) = 1.2. It may be that the implementor was trying for something along these lines, but didn't get all the way there. I worry about the difficulties (both for the implementor and potential users) of the definition on noninteger rationals, because rounding in the conversion between binary floats and decimal representations may cause unexpected results. Ditto for doing integer arithmetic algorithms on floats. We'd have to be very careful about defining and testing the behavior, but without a clear idea of who'd use it. This isn't a problem for symbolic math packages like Mathematica which know about rational numbers. Another source to check is the OpenFormula specification-in-development, available from http://www.oasis-open.org/committees/documents.php?wg_abbrev=office-formula . It appears to follow the OO documentation, so it will likely need updating as well.
Created attachment 41844 [details] Current GCD behaviour for non-integers
I think you are absolutely right about the difficulties with non-integers. The Mathematica manual uses fractions for its examples: GCD(1/2, 3/4) is 1/4 - and this is pretty sensible. But with floats we get very messy. For instance the open-formula draft says that Calc GCD(18.1;24.1) returns zero. It doesn't. It returns 0.0000000000001 or something similar. I've attached a file showing the odd results that Calc produces at present. I don't think it is seriously trying to evaluate rationals - it's more likely a byproduct of the method of calculating with integers. No user could/should have relied on this behaviour - so I'd suggest we would be perfectly safe to abandon any pretence and simply truncate any non-integer arguments as Excel does. We'd then be left with whether to calculate with negative integers and mixed pos/neg integers. Unfortunately this behaviour could have been relied on by users, so I think we are stuck with it. I'd be pleased if you disagreed... Gnumeric says its GCD function is the same as Excel, so it could be that Calc is then the only one marching in step. I'd suggest that negative numbers are truncated = rounded towards zero so that GCD(3.5;12) and GCD(-3.5;-12) both return 3. And if we clean up GCD_ADD so that it returns appropriate errors and mimics Excel by being Calc new GCD without the negative arguments will we be home and dry on this one?
Grabbing issue.
Of course a result of zero is never correct except maybe if both arguments are zero. Note however that the visible zeros of the testcases in fact are only rounded for the display value, despite the fact that GCD for non-integers isn't defined. Strictly spoken GCD(0;0) is undefined because of the infinite number of common divisors, however, it can be convenient to define GCD(0;0)=0 as Excel does. As for the handling of negative arguments I'm undecided. Mathematically the relation |a*b| = GCD(a,b) * LCM(a,b) is given, so a negative result would play well. Some sources define GCD to be the largest positive integer, but neither that nor the non-negative argument seem to be a mathematical restriction. That relationship btw possibly is the reason why the functions were implemented the way they are, working on non-integers and delivering the "strange" results; in the testcases, calculating the LCM of the same numbers and multiplying with the GCD produces the same amount as multiplying the numbers. However, the algorithm for finding the GCD for fractional numbers is not appropriate, as can easily be seen with GCD(1.2;3.6) GCD_ADD of course should strictly mimic the Excel behavior. Even though a GCD of rational numbers isn't defined, a LCM of rational numbers actually may make sense to compute the recurrence of a wave form. To comply with the relationship given above a GCD should also be able to handle it. So either enhance the GCD algorithm or change both to truncate fractional arguments to integers.
Thank you. LCM doesn't currently work with non-integers eg LCM(0.4;1.2) so you'd probably need to tackle that as well if you wanted to have GCD working with non-integers. The other issue about negative arguments is when they are mixed, eg -12 and +15. GCD and LCM return negative results - I guess as a matter of definition rather than mathematics? That still obeys |a*b| = GCD(a,b) * LCM(a,b) but the | | is then crucial. The formula does appear sometimes as a*b = GCD(a,b) * LCM(a,b) in which case it fails
Well, actually LCM(0.4;1.2) "sort of" works, it just doesn't give the result one would expect, which is because it uses the broken GCD method to determine the value. LCM(0.4;1.2) * GCD(0.4;1.2) gives 0.48 that is correct. As for negative results if one of the arguments is negative, a GCD always returns one of the possible factors. Even GCD(12;3) has two solutions, 4 and -4. Since the "greatest" semantically somehow implies to choose the positive value, and especially as some sources say to not regard factorization into negative numbers and some sources say the result is a positive integer (natural number), I tend to use that. For the |a*b|=... relationship, most sources omit the absolute value there, including the http://en.wikipedia.org/wiki/Greatest_common_divisor page. However, I think that's wrong. Several sources talk of integer values, not mentioning positive integers nor natural numbers. The German http://de.wikipedia.org/wiki/Gr%C3%B6%C3%9Fter_gemeinsamer_Teiler page states the relationship, but what finally convinced me was that the "Bronstein/Semendjajew" standard book of mathematics does so as well, and also mentions only integer numbers for arguments, nothing about positive values. As a conclusion we should allow negative arguments, truncate fractional values to integer (we might come up with a proper fractional algorithm later as an addition if someone feels challenged :) and return always positive numbers.
er: "actually LCM(0.4;1.2) "sort of" works, it just doesn't give the result one would expect" Enjoyed that, thank you :) "As a conclusion we should allow negative arguments, truncate fractional values to integer" OK "(we might come up with a proper fractional algorithm later as an addition if someone feels challenged :)" Doesn't seem likely does it ;) "and return always positive numbers." This is a change in behaviour. That's fine by me because the definition is sensible, but might it cause trouble for previous users? It's useful that the current behaviour is undocumented. I guess this is your decision to make. It would also be useful to have GCD(0;0) return zero by definition. Many thanks.
change target from 2.x to 3.x according to http://wiki.services.openoffice.org/wiki/Target_3x
ODFF relevant.
For the records: a word on negative arguments, forgot to update this issue back then: We discussed that in the OASIS ODFF subcommittee. Though mathematically valid, for interoperability with other spreadsheet applications we will not allow negative arguments, but generate an error in that case. See also http://lists.oasis-open.org/archives/office-formula/200701/msg00013.html and discussion thread.
Created attachment 50953 [details] A patch for i71158
Hi Lvhue, The patch looks good in general, but all hunks didn't apply. It looks like you created the diff against some already modified file where tab indents were replaced by spaces. Using the patch --ignore-whitespace option helped. Please submit patches only as diff against an original file revision as present in CVS. Thanks. Some detail: approxFloor() should be called before the argument value is checked for <0.0 because precision rounding errors may lead to very small negative values where approxFloor() would still result in 0.0 I didn't compile and test the modification yet in the application. Does it meet all needs as specified in the ODFF draft? Especially the "Given GCD(a;0), a is returned, so GCD(0;0) is 0.", also with more than two arguments? It looks like it does, just want to make sure. Thanks Eike
Created attachment 51067 [details] Another patch for i71158
Hi Eike, Thanks for your comments, I have modified the patch according to that. For the detail, calling approxFloor() after checking for <0.0 is just because I thought GCD(-0.0000000000000000000001)=0 looks a littlt strange. :) I have modified that. And GCD(a;0) will return a, GCD(0;0) returns 0. Please check this patch, thanks.
Created attachment 51301 [details] A patch based on patch 2, and enhanced the handling of negative.
In cws odff03: sc/source/core/tool/interpr5.cxx 1.30.20.1 Actually rewrote the current implementation (too different that the patch would apply) such that it matches more or less the changes of the second patch dated Jan-22, plus got rid of some clumsiness by replacing if(first){statement;while(next){statement}} with if(first){do{statement}while(next);}. Since GCD(0,a)=>a also the if(bFirst){fy=...}else{fx=...} was superfluous and fx=... applies always. See cvs diff if all this sounds gibberish to you ;-) The approxFloor() precision problem is solved with the changes from issue 86775.
Created attachment 53488 [details] GCD and LCM testcases with Excel result comparison
Reassigning to QA for verification.
@er Thank you!
Created attachment 54335 [details] TestCaseSpecification
verified in internal build cws_odff3
ok in OOo3.00RC1->closed