Issue 125636 - Choice of decimal point when exporting/saving .csv files
Summary: Choice of decimal point when exporting/saving .csv files
Status: REOPENED
Alias: None
Product: Calc
Classification: Application
Component: save-export (show other issues)
Version: 4.1.1
Hardware: All Windows 7
: P3 Normal (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-09-18 06:30 UTC by Lars Jødal
Modified: 2015-09-19 18:19 UTC (History)
3 users (show)

See Also:
Issue Type: ENHANCEMENT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
Sample data file using comma as decimal point (35 bytes, application/vnd.ms-excel)
2014-09-18 06:30 UTC, Lars Jødal
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description Lars Jødal 2014-09-18 06:30:06 UTC
Created attachment 83963 [details]
Sample data file using comma as decimal point

Saving a .csv file uses the language setting of the system, not the language setting the file was opened with. 

As an example is attaced a small .csv file using comma (,) as decimal point and tab as separator. It can be correctly opened/imported specifying e.g. Danish language setting. However, when the file is saved, the current systems language setting will be used. If the system language setting is e.g. English, it will be saved with period (.) as decimal point.

Or inversely: I use Danish language setting, but also work with files where period (.) is used as decimal point. If I edit them in Calc (choosing Language as English in the Open dialogue) and saves them as .txt or .csv file, I afterwards have to re-open in Notepad to change all "," to ".".

One way to fix this could be to add "Language" setting as a choice in the Save dialogue for .csv and .txt files (like in the Open dialogue for .csv). Preferentially, the default setting of the language setting should be the same as was used for opening the file.

This issue on exporting .csv files can be seen as a mirror of issue ID 49978 about IMPORTING .csv files.

--Lars J
Comment 1 Lars Jødal 2015-09-18 11:31:34 UTC
Since my original posting I have found a work-around: If I change the formatting language for the cells, then saving as CSV will save with the decimal point of that formatting. But I have to do this manually - Calc does remember which languange I opened the file as.

This could easily be implemented as an enhancement.

Enhancement proposal: When importing/opening Text CSV files, the "language" setting in the cell formatting should be the language specified during import. (If no language is specified, this defaults to the default language.)

Consequences: When opening a Text CSV file, the user is already presented with the Text import dialog box, which gives the possibility of specifying a language. Users not changing the default language will not experience any difference.
Users deliberately chaning the language from the default setting will both SEE and SAVE the data in the same format.

--Lars J
Comment 2 mroe 2015-09-18 13:05:24 UTC
Have a look at issue 124654.

*** This issue has been marked as a duplicate of issue 124654 ***
Comment 3 Andreas Säger 2015-09-18 14:17:56 UTC
(In reply to mroe from comment #2)
> Have a look at issue 124654.
> 
> *** This issue has been marked as a duplicate of issue 124654 ***

This feature request has nothing to do with issue 124654 which was about the global locale and the number format locale to override the global one. IMHO, this request results from popular misconceptions about csv. The text import locale serves another purpose and it can not be overridden. It provides an interpretation schema for incoming text.

1. CSV is not a spreadsheet format. It is completely unrelated to spreadsheets even though many millions of users never load csv into any type of application other than spreadsheets. Valid csv is entirely about database exchange (record sets with equal number of fields of distinct types). 
2. CSV must not make _any_ assumptions about formatting attributes. There are no number formats, date formats, text colors, row borders, fonts, font sizes or anything like that in a plain text file. In an ideal world all csv date/times would be ISO-date/times and all numbers would be decimals with digits, a minus sign for negative numbers and one decimal point where needed, but with no thousands-separators, currency symbols nor decimal separators other than point.

Calc interpretes the incoming text snippets in the given context of the import locale set in the import dialog and then dumps the resulting numbers (and text) into a brand new unformatted spreadsheet which is not even derived from your default template.
The only "formatting" that takes place is the formatting of number format "General" which displays all special numbers (dates, times, percent, currencies, booleans) in one particular number format, otherwise dates would be displayed as integers (what they actually are once they landed in a sheet). Number format "General" provides one number format for each subtype.

I think, it is perfectly OK to not interprete any formatting attributes from incoming text. This automagic formatting would be extremely difficult to implement for many thousands of possible number formats. Furthermore, the uniform number format indicates clearly that the incoming text has been interpreted correctly.

Formatting imported text data:
1) Use a template and then menu:Insert>Sheet_form_File. Then you can apply the template's cell styles within seconds.
2) Open the csv, copy all and paste-special data into a second, pre-formatted spreadsheet.
3) Use Base connected to a csv-directory and use pre-formatted import ranges (F4, drag&drop)
4) Some macro driven solution performing 1), 2) or 3) automatically.

[Example] Loading CSV into preformatted spreadsheets 
https://forum.openoffice.org/en/forum/viewtopic.php?f=100&t=23727
[Tutorial] Using registered datasources in Calc:
https://forum.openoffice.org/en/forum/viewtopic.php?f=75&t=18511
Comment 4 orcmid 2015-09-18 14:54:11 UTC
I'm sorry, it is not possible to save the locale in a CSV file.  This enhancement request is to reuse the locale specified on open (if different) for saving. That is, it is preservation from open to save, not save to open.

It is correct that this is an enhancement request.

I concur with comment 3 and have re-opened the issue.
Comment 5 mroe 2015-09-18 15:19:11 UTC
(In reply to Andreas Säger from comment #3)
> This feature request has nothing to do with issue 124654 which was about the
> global locale and the number format locale to override the global one. IMHO,
> this request results from popular misconceptions about csv. The text import
> locale serves another purpose and it can not be overridden. It provides an
> interpretation schema for incoming text.

If 124654 would be solved then the selected locale at csv import could be saved in the resulting calc sheet. This was my intention. What do I misunderstand?
Comment 6 Andreas Säger 2015-09-18 18:11:55 UTC
You can save the locale within the document. Change the number format locale of the default cell style. This affects all other cell styles where the locale is not set otherwise.
Comment 7 Lars Jødal 2015-09-19 07:10:28 UTC
Agreed, CSV is not a spreadsheet format and formatting cannot be saved in a CSV file. CSV is a simple-but-not-too-well-defined data format. And any application (such as Calc) interpreting the contents as more than simply a text file, must make assumptions about the interpretation.

When I open a CSV file, I am given the chance to state some of these assumptions:
- is the delimitor character the standard comma (,) or another character?
- which character set should be assumed?
- which language settings should be assumed?

My choices in the loading process influences how the contents of the file is interpreted by Calc. Most specifically, the language setting will influence how decimal numbers and dates are read. For instance:
- Is 1,23 a valid number? With language setting Danish it is, with language settings English it is not.
- Does 01-03-2015 means January 3rd or March 1st?

(I expect we all agree this far)

Now, what I have suggested is for Calc to format the cells with the chosen language setting WITHIN CALC. If the user saves in CSV format, the format is not saved, but the FORMATTING used in writing the data does depend on the language settings of the cells. Calc's internal representation of the number 1.23 should not concern me as a user, but it does concern me if it is written as 1.23 or 1,23 when I choose to write the data as a CSV file. I can make this choice by formatting the cell in Calc with a suitable language settign.

As it is, I can choose "English" when loading the CSV file into Calc. But my choice of "English" is forgotten after the loading, so if I save the file then my standard language format is assumed. I am suggesting Calc to remember my choice by formatting the cells with the language from loading. In that way, data will be written with the assumptions from the language chosen at loading, enhancing the probability that the format is correct. Put another way: Enhancing the probability that opening a CSV file and then writing the same data to a CSV file will produce identical files.

Yes, writing in CSV format is risky unless you know what you are doing (and even then). But I am not aware of any cases where the suggested enhancement could make it LESS likely that the files were identical. If the user does not make any choice of language when loading the file, then it will be loaded with the user's standard language, and the cells will be formatted with that language.

PS: Thanks for the links on reading CSV files.

--Lars J
Comment 8 orcmid 2015-09-19 14:45:15 UTC
I need clarification here.

The title of this issue is Choice of Decimal Point when exporting/saving .csv files.  It is not about saving .ods or .xls or other formats.  saving .csv.

My sense of the initial explanation is that this mattered for round-tripping .csv in and then out of OpenOffice.  

Is this still the essential case?

I am concerned that too much is being piggy-backed onto what is a fairly straightforward issue, however it is handled.
Comment 9 orcmid 2015-09-19 14:48:14 UTC
(In reply to orcmid from comment #4)
> I'm sorry, it is not possible to save the locale in a CSV file.  This
> enhancement request is to reuse the locale specified on open (if different)
> for saving. That is, it is preservation from open to save, not save to open.
> I concur with comment 3 and have re-opened the issue.

I think the original description and comment 1 are very clear.  Can we leave that as the essence of the enhancement request?
Comment 10 orcmid 2015-09-19 14:51:22 UTC
(In reply to orcmid from comment #9)

> I think the original description and comment 1 are very clear.  Can we leave
> that as the essence of the enhancement request?

Also, the latest comment from Lars, #7 seems to reaffirm the case.
Comment 11 Lars Jødal 2015-09-19 18:19:51 UTC
(In reply to orcmid from comment #8)
> I need clarification here.
> 
> The title of this issue is Choice of Decimal Point when exporting/saving
> .csv files.  It is not about saving .ods or .xls or other formats.  saving
> .csv.
> 
> My sense of the initial explanation is that this mattered for round-tripping
> .csv in and then out of OpenOffice.  
> 
> Is this still the essential case?

Talking for myself: Yes, the essential case is for round-tripping .csv in and then out of OpenOffice, and retaining the decimal point of the original.

Ideally, the read and written .csv file should be identical if no chances are made. This will be very hard to guarantee, but my suggestion for enhancement would (as I see it) retain the decimal point of the original AND have the positive side effect that dates are would also be more likely to retain their format, since they would be written with the same language assumptions as they were read with.

--Lars J