Skip to content

Frequently Asked Questions

The specification of CSV format

The CSV parser used by csvtk follows the RFC4180 specification.

bare " in non-quoted-field

   5.  Each field may or may not be enclosed in double quotes (however
       some programs, such as Microsoft Excel, do not use double quotes
       at all).  If fields are not enclosed with double quotes, then
       double quotes may not appear inside the fields.  For example:

       "aaa","bbb","ccc" CRLF
       zzz,yyy,xxx

   6.  Fields containing line breaks (CRLF), double quotes, and commas
       should be enclosed in double-quotes.  For example:

       "aaa","b CRLF
       bb","ccc" CRLF
       zzz,yyy,xxx

   7.  If double-quotes are used to enclose fields, then a double-quote
       appearing inside a field must be escaped by preceding it with
       another double quote.  For example:

       "aaa","b""bb","ccc"

If a single double-quote exists in one non-quoted-field, an error will be reported. e.g,

$ echo 'a,abc" xyz,d'
a,abc" xyz,d

$ echo 'a,abc" xyz,d' | csvtk cut -f 1-
[ERRO] parse error on line 1, column 6: bare " in non-quoted-field

You can add the flag -l/--lazy-quotes to fix this.

$ echo 'a,abc" xyz,d' | csvtk cut -f 1- -l
a,"abc"" xyz",d

extraneous or missing " in quoted-field

But for the situation below, -l/--lazy-quotes won't help:

$ echo 'a,"abc" xyz,d'
a,"abc" xyz,d

$ echo 'a,"abc" xyz,d' | csvtk cut -f 1-
[ERRO] parse error on line 1, column 7: extraneous or missing " in quoted-field

$ echo 'a,"abc" xyz,d' | csvtk cut -f 1- -l
a,"abc"" xyz,d
"

$ echo 'a,"abc" xyz,d' | csvtk cut -f 1- -l | csvtk dim
file  num_cols  num_rows
-            2         0

You need to use csvtk fix-quotes (available in v0.29.0 or later versions):

$ echo 'a,"abc" xyz,d' | csvtk fix-quotes
a,"""abc"" xyz",d

$ echo 'a,"abc" xyz,d' | csvtk fix-quotes | csvtk cut -f 1-
a,"""abc"" xyz",d

$ echo 'a,"abc" xyz,d' | csvtk fix-quotes | csvtk cut -f 1- | csvtk dim
file  num_cols  num_rows
-            3         0

Use del-quotes if you need the original format after some operations.

$ echo 'a,"abc" xyz,d' | csvtk fix-quotes | csvtk cut -f 1- | csvtk del-quotes
a,"abc" xyz,d