Apache OpenOffice (AOO) Bugzilla – Issue 95900
UTF-8 is not in selectble character encoding list during importing/exporting DIF formats
Last modified: 2017-05-20 11:35:23 UTC
1) open an DIF format spreadsheet in UTF-8. 2) oocalc prompt to select a character-encoding, in the list of the encodings UTF-8 is missing; expected: 2) user can choose to import in UTF-8 encoding. The same problem occurs when saving DIF format. Having this flaw it is not possible to correctly open a DIF file saved from gnumeric (by default save in user locale, which is often UTF-8 on Linux). If the user wish to avoid Excel format, then DIF is the only format that allow user of gnumeric and ooocalc to exchange spreadsheet with data type settings (digit or text). Thus it make sense to correct this bug.
> If the user wish to avoid Excel format, then DIF is the only format that allow > user of gnumeric and ooocalc to exchange spreadsheet with data type settings > (digit or text). Partly because ODS support in Gnumeric is still experimental. They should enhance ODS support but it also make sense to let OOO be stronger in import/export.
I would think this is an issue with Gnumeric. AFAIK, DIF uses ASCII for encoding, hence OOo is correct by suppressing the UTF-8 option both for import and for export. Hi Oliver, I'm trying to push some issue submitted by the Beijing (non-RF2000!) OOo community. Would you be so kind to comment my assumption and set resolution accordingly. (Maybe ask Eike?!) Greetings from Beijing, Peter
Hi thanks for your comment. The issue started from a practical (not "in theory xxx should") requirement: because we ourselves are using Linux on all office stuff, and some people choose to use gnumeric for gnome/lightness and some choose to use oocalc, then we find we have to exchange spreadsheets by using xls format which we prefer to stay away from. The requirement for most spreadsheet is not high, just row/column and data type correct would be enough, thus I think of DIF, then again failed for Chinese ideographs contained in. nowadays it is difficult to tell of something is ASCII or not thanks to multiple extension to ascii. The only difference exist is multi-byte or single-byte charset. below quoted from wikipedia: DIF stores everything in an ASCII text file to mitigate many cross-platform issues back in the days of its creation. However modern spreadsheet software, e.g. OpenOffice.org Calc and Gnumeric, offer more character encoding to export/import.
confirming. This is an artificial limitation. That what makes UTF-8 so useful is that's a 8bit-clean encoding, just like ASCII. For the fileformat there's no difference whether UTF-8 or ASCII is stored. (when only characters from ASCII range are used, it even is identical to ASCII) If it can handle windows-codepages, latin#, etc. then it can also handle UTF-8. There's no technical reason for not supporting UTF-8
Furthermore: http://wiki.services.openoffice.org/wiki/Documentation/DevGuide/Spreadsheets/Filter_Options contains UTF-8 in the following section: "Filter Options for Lotus, dBase and DIF Filters These filters accept a string containing the numerical index of the used character set for single-byte characters, that is, 0 for the system character set. [...] Unicode (UTF-8) 76 [...]" So apparently it is already possible to load/save DIF with UTF-8 via the API, just not via the UI.
Hi Eike, please have a look
Reset assigne to the default "issues@openoffice.apache.org".