Wednesday, September 3, 2014

Writing a CSV Parser

Thomas Burette:

If a supplied CSV is arbitrary, the only real way to make sure the data is correct is for an user to check it and eventually specify the delimiter, quoting rule,... Barring that you may end up with a error or worse silently corrupted data.

Writing CSV code that works with files out there in the real world is a difficult task. The rabbit hole goes deep. Ruby CSV library is 2321 lines.

On the surface, it seems like almost a one-liner.

1 Comment RSS · Twitter

The first time I saw a CSV file exported by Excel, items were delimited by semicolons (because in French the comma is the decimal separator obviously) and I immediately understood the format was impossible to parse in any sane way. The only sane explanation for that is that the format was invented at a time when people were exchanging documents within the same office space.

So I always try to use TSV (tab-separated values) instead. At least the separator is the same everywhere.

Leave a Comment