Apache OpenOffice (AOO) Bugzilla – Issue 79924
Strong Typing in Formulas
Last modified: 2013-08-07 15:12:27 UTC
*This is a killer feature*! = *INTRO* = Every decent programing language implements a typing mechanism of variables to reduce the rate of errors. One advantage of C++ over C was exactly the stronger typing mechanism (in addition to namespaces; classes and templates contributed strongly to the more robust typing). However, spreadsheets have failed to introduce typing, although a great number of errors could be prevented IF such a mechanism was built in. I consider this a *major design flaw* of existing spreadsheet applications, and one that makes it in my personal *TOP 5* of fundamental design flaws. I estimate (blindly) that almost one third of significant spreadsheet errors could be prevented by such a mechanism. Please note, that I have a vast experience in working with spreadsheets, overseeing 100+ employees that work extensively with spreadsheets. This issue has been partially discussed on the OASIS mailing list (see http://lists.oasis-open.org/archives/office-comment/200706/msg00005.html and the ensuing discussion). Though the aforementioned thread discussed units on a more broader scale, I wish to focus here entirely on currency. = *CURRENCY* = Calc should implement a strong typing system. In a first phase, I advocate implementing a general typing system for currency (a generic currency type), later extending it to further units (probably 'time-units' being next and followed by other units). = *OPERATIONS* = Valid operations with this unit: 1. (currency) + (currency) => (currency) 2. (currency) * scalar => (currency) 3. (currency) / scalar => (currency) 4. (currency) / (currency) => scalar Invalid operations: 1. (currency) + scalar !!! (many errors) 2. (currency) * (currency) (see statistics comment; maybe allowing this operation, BUT IF the cell formatting has UNIT<currency>, raise an ERROR) 3. (currency)^(x), where x != 1 [statistics should be possible to compute with currencies, although that would involve terms of form (currency)^2; this should be solved somehow] = *FURTHER DEVELOPMENT* = Various subtypes of currency should be possible to create in later versions of the implementation, e.g. 1. Euro as type <currency> 2. US$ as type <currency> 3. various other currencies (custom currencies) Still later, there should exist the possibility to define conversions between these currencies subtypes. There would be 2 types of conversions: 1. fixed: - defined once and stored with the document, e.g. 1 Euro = US$ 1.33 - every conversion between Euro and US$ would be accomplished using this explicit conversion factor 2. dynamic: - allow the conversion factor to be retrieved dynamically from a web-site, server, some other place, ... - conversions are done dynamically, either at: -- opening the document -- some pre-specified time point -- x-times per day -- some other condition [This would be especially interesting for international enterprises and the financial market.] = *ADVANTAGES* = - many spreadsheet ERRORS would be timely prevented (many at design time) - significant financial losses averted - a clear advantage over competing products - would set the standard in the field, not merely copying MS Excel
I add here some other useful comments from the OASIS mailing list: 1. http://lists.oasis-open.org/archives/office-comment/200706/msg00028.html 2. http://lists.oasis-open.org/archives/office-comment/200706/msg00030.html 3. http://lists.oasis-open.org/archives/office-comment/200706/msg00032.html 4. http://lists.oasis-open.org/archives/office-comment/200706/msg00033.html 5. http://lists.oasis-open.org/archives/office-comment/200706/msg00036.html Thats a brief discussion of problems and possible solutions. I hope that the developers recognise the real potential of data typing and - ultimately - decide to implement this feature.
I found some more interesting data on the OOo web site: Issue 58309 (http://qa.openoffice.org/issues/show_bug.cgi?id=58309) is a similar request for units of length (basically for feet-type units). Also, there was a similar request on the OOo forums some years ago, see http://www.oooforum.org/forum/viewtopic.phtml?t=14674 for a detailed discussion.
Hi, IMHO this would confuse most of the users especially those who came from Excel. But let's requirements decide. Frank
discoleo->fst Please have a look at the following article: http://portal.acm.org/citation.cfm?id=1140346 To quote from the abstract: Since the error rate in spreadsheets is very high and since those errors have significant impact, methods and tools that can help detect and remove errors from spreadsheets are very much needed. Type systems have traditionally played a strong role in detecting errors in programming languages, and it is therefore reasonable to ask whether type systems could not be helpful in improving the current situation of spreadsheet programming. No, it won't confuse users, it will really help them to avoid errors. A lot of errors. The article I posted is from 2006. Indeed, there is recently much work in this area.
discoleo->fst discoleo->all Please apologise posting again. I noticed that the article posted previously is not available to non-registered users (it must be purchased). Here is a free link to that article: http://web.engr.oregonstate.edu/~erwig/papers/TypeInf_PPDP06.pdf. It contains a lot of useful information. Please read it as it is real research on spreadsheets.
Do we really need the formula to be blocked from working at all? What about presenting a dialogue box at the time the formula is entered (e.g. "This formula may contain an error as it multiplies the contents of two cells which are formatted as currency. Edit formula / Ignore warning / Help"), or putting on an error warning note in the same way that Excel does for e.g. inconsistent formulas? There is a flaw in saying that e.g. currency * currency is always invalid. currency * currency / currency ought to be valid (it could be used e.g. in various rescaling calculations) so therefore by extension the (yucky!) (currency * currency) / currency ought to be as well. Moreover currency * currency / 1.48 ought to be valid if the 1.48 really refers to a currency. That would be bad spreadsheet design (the 1.48 ought to entered into a cell formatted as currency and then referenced in the formula) but for example students who are just starting to learn how to use Spreadsheets (I teach a lot of them!) often progress gradually from entering formulas without references (e.g. "=2.12*3.57/1.48") to formulas that consist entirely of references, via formulas that contain partly numbers and partly cell references. Using data typing (especially of currencies) to IDENTIFY potential errors is a fantastic idea!! But while it should warn users of the strong possibility of an error I do not believe the formula should be invalidated.
> currency * currency / currency ought to be valid (it could be used > e.g. in various rescaling calculations) so therefore by extension the > (yucky!) (currency * currency) / currency ought to be as well While "currency * currency / currency" is valid as it returns currency, this does NOT justify "currency * currency". During my years of Auditing / Quality Control involving spreadsheets I found way too many errors to be comfortable with such formulas. As said, IF the whole formula was written within one cell, NO error would be raised. Strong typing would still found that the result is "currency", so NO error. [Although I do not encourage such usage.] IF however, one cell contains "currency * currency", then this is at least bad design. IF the student needs a conversion factor (please note, I do teach spreadsheets, too), then the correct way is to define it somewhere like: = currency / currency which is a scalar [ I may add, that this definition should proceed in a special area of the spreadsheet, for which I posted a new feature request: http://www.openoffice.org/issues/show_bug.cgi?id=67499 ] Multiplying with this scalar is OK and never would raise an error. Also, I advocate introducing beyond the generic currency, more specific currencies and handle conversions on a higher - more structure - level. [see the FURTHER DEVELOPMENT in my first post] I was also NOT very explicit about raising errors for valid operations, even IF the result is a bogus unit: > 2. (currency) * (currency) (see statistics comment; maybe allowing > this operation, BUT IF the cell formatting has UNIT<currency>, > raise an ERROR) However, in the meantime, I am even less convinced that "currency * currency" should be accepted without raising some error/warning. The only argument against is the variance, which has a valid unit of "currency * currency" in this example.