Apache OpenOffice (AOO) Bugzilla – Issue 32344
Simple wildcards * and ? as opposed to regular expressions
Last modified: 2015-03-28 01:42:58 UTC
Follow-up task of Q-PCD issue 20494. Some competitors know only simple wildcards * and ? as opposed to regular expressions. Need option how to treat wildcard searches, especially when importing files.
Accepted.
reassign to myself
Should this not apply to the other major components (Writer, etc.) as well? Unless you are an expert user, wildcards are a life-saver.
Watersrw, This is not about the "Search & Replace" functionality, but the wildcards that some spreadsheet functions may take as arguments, producing different results when executed in different applications. Eike
This issue has been proposed as a SO9 requirement. Please clarify what kinds of functions and parameters should be able to use wildcards.
The following spreadsheet functions are currently able to process regular expressions in the arguments that specify search criteria for data ranges: DSUM DCOUNT DCOUNTA DAVERAGE DGET DMAX DMIN DPRODUCT DSTDEV DSTDEVP DVAR DVARP MATCH COUNTIF SUMIF LOOKUP VLOOKUP HLOOKUP SEARCH Similar to the "Enable regular expressions in formulas" option under Tools.Options.Spreadsheet.Calculate, another option would be needed like "Enable simple wildcards in formulas". These two are mutually exclusive => radio button "None / RegExp / Wildcard", where Wildcard would be enabled by the Excel filter import for interoperable formula results.
*** Issue 46706 has been marked as a duplicate of this issue. ***
*** Issue 48020 has been marked as a duplicate of this issue. ***
The problem is Excel formulas that do not give the correct result in Open Office. When I open an Excel spreadsheet in Open Office, I need to either get the correct results or to get a message warning that some cells contain formulas that cannot be handled correctly. Open Office does neither. As an Excel user, I don't want to know about regular expressions - I just want Open Office to give the same answers as Excel. I also don't want to accidentally create Open Office formulas that will give the wrong results in Excel. If there is a radio button in Tools/Options to change the formula-handling method, I would expect that to be automatically set correctly to match the way that Excel works, whenever an Excel file is opened. Alternatively, when the troublesome formulas are detected, a warning box should open when the file is opened, preferably giving a quick way to make the necessary changes.
*** Issue 51072 has been marked as a duplicate of this issue. ***
Instead of disabling regular expressions in the options when opening an excel document that uses wildcard characters or adding a wildcard option to the options, wouldn't it be easier to actually convert the ? and * wildcard characters to regular expressions on import and then convert the regular expressions back to wildcards when exporting back to excel formats, giving a warning for any regular expressions that cannot be converted exactly to wildcards.
I agree with drensink. Automatic translation is what the user needs and expects. Changing options that affect all workbooks, will cause more problems than it solves. If the user sees a warning that a regular expression could not be translated to Excel format, they will learn that Open Office is more powerful than Excel. Currently the inadequate translation gives the opposite impression, if the user has never heard of 'regular expressions'.
Drensink, Navvy, Please note that "automatic translation" is not feasible. It might work in simple cases of literal string arguments as parameters during import, but as soon as the search expression is concatenated from substrings or obtained via formula functions it would fail. Furthermore it wouldn't work with round-trip documents, i.e. load from Excel and save to Excel file format again. The only viable way is another wildcard mode, which would also ease use for the not so experienced user for whom regular expressions are too complicated. Eike
Eike, I did not quite understand your comment at first, because the strings I was using for the DSUM criteria were a string (substring) in a cell (reference to cell being a formula). I now see that you could not just go replacing them all because there is no way to know if they are just an * or ? or actual wildcards short of scanning the whole sheet to see if each is an input into a formula that handles wildcards which would not be so good logic and performance wise. I then wondered what would happen with a spreadsheet using wildcards was imported, but the user wanted to use regular expressions on that same sheet and still have the wildcards calculated. The obvious hangup with that is the ? and * do different things depending which option you want. A good answer to that would be for the functions like DSUM to have two implementations, one that calculates with regular expressions and one that calculates with wildcards. This would create yet another problem since DSUM used in calc is not the same as DSUM in excel, but the same formula name has been used by both excel and calc users, though this might be easy to take care of on import and export by translating DSUM in excel to something like DSUMWILDCARD in calc and something similar on export. This all assumes someone would actually want to use both wildcards and regular expressions on the same sheet, but that might be a bit crazy and I understand implementing the option you propose is probably far quicker, especially if you could have it coded in 2.0 or 2.0.1 instead of later versions.
I gave my last comment more thought and realized you can maintain the same behavior openoffice uses now with regular expressions, make the excel imports and exports compatible, and mix the use of both regular expressions and wildcards on the same sheet and in any cells by doing the following. 1. Provide an optional argument to any formula function that will take regular expressions in openoffice, but takes wildcards in excel. 2. If the argument is set to true, assume the function arguments use ? and * as wildcards. 3. If the argument is false or not set, assume the function arguments may contain regular expressions. 4. For documents imported from excel, add the argument set to true to the cell containing the formula. 5. For documents in openoffice assume the argument is not set by default. 6. For documents in openoffice to be exported to excel the user can set the argument to use wildcards. 7. If the user doesnt set the argument, warn that excel does not fully support the use of regular expressions. This would allow the use of the same formula names for both cases and should be fairly easy to code into the existing regular expression code, since it then becomes a simple replacement of ? and * in the arguments fed to the functions with the equivalent regular expression.
Hi Drensink, > 6. For documents in openoffice to be exported to excel the user can set the > argument to use wildcards. What should be exported if that argument is the result of a formula? > This would allow the use of the same formula names for both cases and should be > fairly easy to code into the existing regular expression code, since it then > becomes a simple replacement of ? and * in the arguments fed to the functions > with the equivalent regular expression. I'd rather implement a simple wildcard search than fiddling around with the argument's string content. Bear in mind that all other content that possibly could form a regular expression would have to be escaped otherwise. Eike
This gets a bit more complicated than it seems at face value, but I will continue a bit more to see if there is an optimal solution to the problem. > What should be exported if that argument is the result of a formula? Keep in mind this would not be the argument with the wildcards/regular expressions in it, but an additional, optional argument whose value would be restricted to 0 or 1, True or False. If it is set to 1, then the formula uses wildcards and you export the other argument to the function with the wildcards in it as it is written by the author of the document and strip the 1 during export since Excel does not use it because by default it uses wildcards. If it is not set or set to 0, then you know you are exporting a document that is very likely not to be compatible with Excel, but could warn and continue exporting anyway. If the additional, optional argument is a result of a formula, which should not be necessary, then that formula should result in True or False, 0 or 1 otherwise it would be in error. That still would not matter because you would just check to see if it results in 0 and 1 and strip the whole formula off just like you would strip the 1 off. Though it would be easier to just restrict that argument to only the values 0,1,True,False and anything else would result in a formula error. > I'd rather implement a simple wildcard search than fiddling around with >the argument's string content. Bear in mind that all other content that >possibly could form a regular expression would have to be escaped >otherwise. The additional, optional argument would just determine which code path is taken. If you want to do a whole new implementation of the formula using wildcards instead of messing with the string content of the argument containing wildcards/regular expressions, the same principle would still work. The biggest problem I see with the supposedly simple solution, just turn off regular expressions and turn on wildcards is that it immediately eliminates the use of more powerful regular expressions in that document, preventing someone from importing a document containing wildcards from excel for the purpose of using regular expressions. The continued incompatibility on export of formulas with the same names using regular expressions would also still have to be dealt with in the simple case. The basis of my approach is almost exactly the same as doing it for the whole document or sheet, with the exception that it is done for each cell where those formulas are used, rather than the whole document at once. You still have to decide whether its a regular expression or wildcard and fix the code for each formula to handle it if you do it for the whole document.
One more note: It would be an irritatant to a user to have to add the ;1 constantly to a formula. The option for the document could be added anyway so when it is set to wildcards the ;1 is automatically added upon finishing the formula input.
Hi Drensink, > > What should be exported if that argument is the result of a formula? > Keep in mind this would not be the argument with the wildcards/regular > expressions in it, but an additional, optional argument whose value would be > restricted to 0 or 1, True or False. You didn't get my point. The argument doesn't have to be a direct value, it could be a formula result instead, e.g. FALSE(), or a cell reference where the user could play with the value, or any sophisticated formula, even with side effects. In these cases stripping away the parameter simply can't be done. Eike
> You didn't get my point. The argument doesn't have to be a direct value I think I did get your point and thought I had explained that in the statement below to be interpreted as (The argument DOES have to be a direct value) "Though it would be easier to just restrict that argument to only the values 0,1,True,False and anything else would result in a formula error and if it was not that would be in error." I can understand that you might want to or have to generalize from this specific case because the next complaint would be that a formula result could not be used. I can also understand that there may be no existing code in the current formula handling to force an argument to be only a direct value. Since it seems all of the above is not a realistic option, would you be willing to agree a reasonable option would be to make the setting applicable to each sheet in a workbook, rather than to the entire workbook? This would allow an import from Excel using wildcards and then adding a new sheet to the workbook that uses regular expressions within the same workbook. The user could reference cells on those sheets within the same workbook that are results of regular expressions. Thankyou for being patient in explaining your objections and the reasons for those objections in detail.
Just an overdue update: the OASIS ODF TC approved the proposal to include a table:use-wildcards attribute in the file format, see "Proposal for a new calculation setting: table:use-wildcards" under http://wiki.oasis-open.org/office/OpenDocument_v1%2e2_Action_Items#head-4854119a1f3d7df4f22fc453bb50e8c6e5b0acd7
Text comparison in Calc occurs in 1. Formulas by operator =A1=B1 2. Functions like MATCH etc. 3. Filters 4. Find/Replace It is influenced by following options: - Calc>Calculation>Case sensitive (affects 1. only) - Calc>Calculation>Search criteria = and <> must apply to whole cells (affects 2. and 3.) - Enable regular expressions in formulas (affects 2. only, but not 1.) Better labeled "Enable regular expressions in function arguments"? Filters, SortDescriptors and Find/Replace have their own options for regex and case. There are too many options and implications. The behaviour is inconsistent. For instance: Did anybody notice that case-insensitive cell values obsoletes many benefits of regular expressions? My radical proposal: One option to toggle between native mode and compatibility mode. Native mode: All strings in Calc are case sensitive and support regular expressions, "=" and "<>" do not apply to whole cells unless you use them with "^" and "$". Compatibility mode: Excel files are loaded in compatibility mode: All strings are case-insensitive except for EXACT and FIND. They support wildcards. "=" and "<>" apply to whole cells.
*** Issue 87106 has been marked as a duplicate of this issue. ***
I was working with an xls file and discovered this problem. From a users perspective, wouldn't it be simpler to simply have a message stating that OOo is trying to open a file that contains regular expressions/wild cards and asks whether the user wants to enable the regular expression option? i'm not sure about more complex formulas but it seems that there is no problem with OOo understanding and translating into excel's way of handling things, so long as the option is turned on. i would think therefore that simply letting the user know that the option has to be turned on because there are regular expressions in the formula would be good enough, so that the user doesn't get shocked to see all zeros in the cells as I did. I mean if there isn't any technical problem as regards OOo translating and understanding the way other spreadsheet applications handle formulas (OOo can open and save either way), then there's really no need to replace and auto translate formulas. if all the user needs to do to work with the spreadsheet (whichever format) and to save it (in whichever format as well) is to turn the option on or off, then so be it. a simple prompt like: "The spreadsheet file you are trying to open may contain regular expression. Do you want to turn these feature on?" or something...
*** Issue 96557 has been marked as a duplicate of this issue. ***
*** Issue 102843 has been marked as a duplicate of this issue. ***
*** Issue 101712 has been marked as a duplicate of this issue. ***
*** Issue 113243 has been marked as a duplicate of this issue. ***
*** Issue 121015 has been marked as a duplicate of this issue. ***
*** Issue 121018 has been marked as a duplicate of this issue. ***
Related section at ODF 1.2: http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part2.html#__RefHeading__1017868_715980110 Introduced on 06/May/10: https://tools.oasis-open.org/issues/browse/OFFICE-2679?page=com.atlassian.jira.plugin.system.issuetabpanels%3Achangehistory-tabpanel Related AOO.org discussion: http://mail-archives.apache.org/mod_mbox/incubator-ooo-dev/201207.mbox/%3C73f2681577a4ff873095d71260bf3d27@xaox.net%3E Related LO bugreport: http://en.libreofficeforum.org/node/6950 What's needed? AOO and LO need to implement support for HOST-USE-WILDCARDS and on XLS/XLSX import need to set it to true.
Related bugreport in LibreOffice bugzilla: https://www.libreoffice.org/bugzilla/show_bug.cgi?id=72196
erack no longer active here, so back to list.
*** Issue 124567 has been marked as a duplicate of this issue. ***