Remove leading and trailing whitespace from a csv file with Python
I'm reading a csv file with the Python csv module and could not find a setting to remove trailing whitespace. I found this setting, Dialect.skipinitialspace, but it I think it only applies to leading whitespace. Here's a one-liner to delete leading and trailing whitespace that worked for me.
import csv
reader = csv.DictReader(
open('myfile.csv'),
fieldnames=('myfield1', 'myfield1', 'myfield3'),
)
# skip the header row
next(reader)
# remove leading and trailing whitespace from all values
reader = (
dict((k, v.strip()) for k, v in row.items() if v) for row in reader)
# print results
for row in reader:
print row
Comments
Wouldn't this load all the csv values into memory? And is this a practical solution for large data-sets?
Tom:
It actually creates another generator so it won't load all values into memory at one time.
print type(reader)
gives:
<type 'generator'>
If you make it a list instead of a generator (change the parentheses to square brackets), it will load all values into memory at one time:
reader = [
dict((k, v.strip()) for k, v in row.items()) for row in reader]
print type(reader)
gives:
<type 'list'>
Although not a python solution, you may be interested in looking into Google Refine for tasks like this. It makes cleaning up CSV data a breeze. It also has support for Jython so you can run Python expressions on the data.