Skip to content Skip to sidebar Skip to footer

Broken Csv, How Can I Fix It?

I'm trying to parse a CSV. I'd like to get it into a DB or just parse it with JavaScript, but either way fails due to the broken syntax. My entire CSV file is here: https://gist.gi

Solution 1:

You may be able to trick it and use a regex to look for:

"(.*?)"(?=,|$)

But that's kind of hack-ish (basically, only accept an end quote when immediately followed by a comma or an end of line). Same logic would apply to a find-replace. (Again, this all assumes that a "stray" quote will never follow standard CSV rules (e.g. have a comma/line [beginning/end] before or after it))

I assume you have no control over the original data and have to work with what you have?

EDIT

Though I've only tried this on a small sample of your data, this appears to find the "stray" quotes, to which you can use a replace with "" on:

(?<!^|"|,)"(?!"|,|$)

Solution 2:

The quotes don't matter as much as the commas do. If comma is the delimiter, then you can't have commas in the values. If you can get the CSV saved using a different delimter, you might get better results. Use a character like ~ or a ^ instead of a comma as the delimiter.

Solution 3:

Assuming you are either on Windows or can do this on a Windows box, check out Logparser. It is a free command line utility that can parse many data formats including CSV, and can output to many formats including SQL.

Post a Comment for "Broken Csv, How Can I Fix It?"