Comma-Separated Values (CSV)

CSV files are commonly used as export and import formats for spreadsheets and other software applications. While they do not contain the formatting used on a spreadsheet, they do contain the two-dimensional data.

Anyone familiar with spreadsheets has probably had to import from or export to a CSV file at one time or another. The CSV file format is very popular for conveyance of two-dimensional data between software applications. Unfortunately, what is considered a valid CSV file varies among applications and systems. This variation can cause incompatibilities between software applications and often considerable frustration.

The official specification for CSV file formats is RFC 4180. We’ve summarized some of the main points below.

Definitions

  • CRLF – In the notation below, “CRLF” refers to a line break or carriage return / line feed. Note that UNIX (including Mac) and Windows systems have slightly different standards for CRLF. Luckily, this is often taken care of at the browser level when uploading or downloading a CSV file.
  • Record – A record in a CSV file corresponds to a single row in the source data or spreadsheet and contains one or more fields. In many cases, it will also correspond to a single line of text in the file, but it’s important to note that CSV does support embedded CRLF or line breaks within a field.
  • Field – A field in a CSV file corresponds to the contents of a single row/column location in the source data or spreadsheet.

Rules

  1. Each record is located on a separate line, delimited by a CRLF.
  2. Each record must contain the same number of fields throughout the file.
  3. Spaces are considered part of a field and are not ignored.
  4. Each field may or may not be enclosed by double quotes. IMPORTANT NOTE: Microsoft Excel does not use double quotes at all. This may lead to incompatible CSV files that must be edited prior to importing into Aligni.
  5. Fields containing CRLF, double quotes, and commas must be enclosed in double-quotes.
  6. Double quote characters within a field must be “escaped” by including an additional double quote character before them.

File Encoding

The term File Encoding or Character Encoding refers to the set of characters that are allowed to be in the file. This is established so that any software reading or writing to that file can operate with the contents. Common examples are: ASCII, ISO-8859 and Unicode. ASCII is a character set common on American computers but is limited to English characters.

Microsoft Excel

Microsoft Excel exports CSV files in ANSI encoding. While ANSI itself is not a character set, it often refers to one of several ISO-8859 encodings which are locale-specific. Excel is supposed to save files in the machine-specific ANSI encoding, but doesn’t always do so due to bugs.

One workaround is to open the resulting CSV file in Notepad, then click “File > Save As” and select UTF-8 as the encoding format.

Importing CSV Files into Aligni

When importing a CSV file, Aligni will attempt to import the file in UTF-8, ASCII, and then ISO8859-1, in that order.

Example

An example CSV file is shown below along with the table form of its parsed contents to illustrate how Aligni will interpret this file. A few points are notable of the contents of this example:

  • The description for part 100123 includes a comma so the description field must be enclosed in double quotes.
  • The description for part 100124 includes a CRLF so the description field must be enclosed in double quotes.
  • The description for part 100125 includes a double quote. Therefore, the quote character is replaced by a two double quotes. This is called “escaping” the double quote.
Part Number,Manufacturer,"Manufacturer P/N" CRLF
100123,Texas Instruments,TPS73218DBV,"1.8 volt regulator, 250 mA" CRLF
100124,Analog Devices,AD860,"Something
Second Line Description" CRLF
100125,Consolidated Diversified,3055BK-3IN,"4"" wire segment" CRLF

Part Number Manufacturer Manufacturer P/N Description
100123 Texas Instruments TPS73218DBV 1.8 volt regulator, 250 mA
100124 Analog Devices AD860 Something
Second Line Description
100125 Consolidated Diversified 3055BK-3IN 3.25" wire segment