Policies‎ > ‎

CSV syntax

Overview

The CSV ("Comma Separated Value") file format is often used to exchange data between disparate applications. The file format, as it is used in Microsoft Excel, has become a pseudo standard throughout the industry, even among non-Microsoft platforms.
CSV files have become somewhat of a legacy format. 
CSV it is not a formal standard, there are a lot of CSV "flavors". This document describes the syntax recognized by LinkedData.Center.

Character encoding

Only UTF-8 supported.

The CSV File Format

Each record is one line

A record separator MUST consist of a line feed (ASCII/LF=0x0A). Note that fields may contain embedded line-breaks (see below) so a record may span more than one line.

Fields MUST be  separated with commas.

Example John,Doe,120 any st.,"Anytown, WW",08123
Leading and trailing space characters adjacent to comma field separators are ignored.
So John , Doe ,... resolves to "John" and "Doe", etc. Space characters can be spaces or tabs.

Fields with embedded commas MUST be delimited with double-quote characters.

In the above example. "Anytown, WW" had to be delimited in double quotes because it had an embedded comma.

Fields that contain double quote characters MUST be surrounded by double-quotes, and the embedded double-quotes MUST each be represented by a pair of consecutive double quotes.

So, John "Da Man" Doe would convert to "John ""Da Man""",Doe, 120 any st.,...

A field that contains embedded line-breaks MUST be surrounded by double-quotes

So:
Field 1: Conference room 1
Field 2:
John,
Please bring the M. Mathers file for review
-J.L.
Field 3: 10/18/2002
...

would convert to:
Conference room 1, "John,
Please bring the M. Mathers file for review
-J.L.
",10/18/2002,...

Note that this is a single CSV record, even though it takes up more than one line in the CSV file. This works because the line breaks are embedded inside the double quotes of the field.
Implementation Note: In Excel, leading spaces between the comma used as a field separator and the double quote will sometimes cause fields to be read in as unquoted fields, even though the first non-space character is a double quote. To avoid this quirk, simply remove all leading spaces after the field-separator comma and before the double quote character in your CSV export files.

Fields with leading or trailing spaces MUST be delimited with double-quote characters

So to preserve the leading and trailing spaces around the last name above: John ," Doe ",...
Usage Note: Some applications will insist on helping you by removing leading and trailing spaces from all fields regardless of whether the CSV used quotes to preserve them. They may also insist on removing leading zeros from all fields regardless of whether you need them. One such application is Excel. :-( For some help with this quirk, see the section below entitled Excel vs. Leading Zero & Space.

Fields MAY always be delimited with double quotes.

The delimiters will always be discarded.

The first record in a CSV file MUST  be a header record containing column (field) names

There is no mechanism for automatically discerning if the first record is a header row, so in the general case, this will have to be provided by an outside process (such as prompting the user). The header row is encoded just like any other CSV record following the rules above. A header row for the multi-line example above might be:
Location, Notes, "Start Date", ...

Example Data

Here is a small set of records that demonstrate some of the constructs discussed above. 

Name,Surname,Address,City,State,Zip
John,Doe,120 jefferson st.,Riverside, NJ, "08075"
Jack,McGinnis,220 hobo Av.,Phila, PA,"09119"
"John ""Da Man""",Repici,120 Jefferson St.,Riverside, NJ,"08075"
Stephen,Tyler,"7452 Terrace ""At the Plaza"" road",SomeTown,SD, "91234"
,Blankman,,SomeTown, SD, "00298"
"Joan ""the bone"", Anne",Jet,"9th, at Terrace plc",Desert City,CO,"00123"

These records show how the CSV format behaves under a variety of conditions, such as using quotes to delimit a field with embedded quotes and embedded commas; leaving some fields blank (Mr. Blankman, for example, has no First Name field); as well as combinations ('Joan, "the bone", Ann' for example, who's middle name is included along with her first name and nickname).