Apache OpenOffice (AOO) Bugzilla – Issue 76156
Counting number of substrings within a string
Last modified: 2017-05-20 11:13:50 UTC
Unfortunately, Calc lacks even basic support for string functions (see my other issues on strings, like issue 66590). This makes the program unsuitable for any work that relies heavily on strings. One of the functionalities that I am missing is a function that counts the number of occurrences of a substring within a given string. Lets say, we have the string: "ab cd ab de", and we want to count how often "ab" occurs inside this string. I have included in the attached file the C++ code for a new function, ScInterpreter::ScFindCount(), that performs this task. This function counts the number of occurences of a given string within another string // - IF string is NOT found, it returns 0 // - otherwise it returns the number of occurences of the substring // - parameters: // 1. string: this is the search item // 2. string: this will be searched for the first string // 3. is overalpping of search results allowed: // - default: FALSE (NO) // 4. START position: default = begining of string (=1) // 5. END position: default = LEN(string) I noticed now, that the function will probably NOT recognise wildcards, so I will look forward to correct that. I hope that such a function will make it in the OpenFormula specification, too.
Created attachment 44257 [details] Proposed C++ implementation
1. I just wanted to add a real life example. My primary work involves auditing the whole medical activity taking place in a big county. One frequent task is to count the number of different diagnoses for individual patients. The string looks like this: "B18.1; G20; I20.9; I12.9". These tokens represent individual ICD10-coded diagnoses. So, I would like to count how many diagnoses are there (i.e. FindCount(";",'the_string') + 1). 2. I found the code for *REGEXP* searches inside the ScInterpreter::ScSearch() function. Unfortunately, I do NOT understand much of it and I fear that there is NO good documentation on it. I will try to implement a ScSearchCount() function later, just by cloning the ScSearch() function.
Hi Niklas, please have a look at this patch. Frank
We generally don't want to add many new internal functions. Additional functions can preferably be added as add-in components. Also, issue type "patch" should be limited to complete patches, ready to apply.
If A1 contains "ab cd ab de" and B1 contains "ab", we can use =(LEN(A1)-LEN(SUBSTITUTE(A1;A2;"")))/LEN(A2) which I realize is messy but it gets the job done.
Oops, I meant: If A1 contains "ab cd ab de" and A2 contains "ab" …
Reset assigne to the default "issues@openoffice.apache.org".