How to sort a table by columns in Python
I have a 2-dimensional table of data implemented as a list of lists in Python. I would like to sort the data by an arbitrary column. This is a common task with tabular data. For example, Windows Explorer allows me to sort the list of files by Name, Size, Type, or Date Modified. I tried the code from this article, however, if there are duplicate entries in the column being sorted, the duplicates are removed. This is not what I wanted, so I did some further searching, and found a nice solution from the HowTo/Sorting article on the PythonInfo Wiki. This method also uses the built-in sorted()
function, as well as the key
paramenter, and operator.itemgetter()
. (See section 2.1 and 6.7 of the Python Library Reference for more information.) The following code sorts the table by the second column (index 1). Note, Python 2.4 or later is required.
import operator
def sort_table(table, col=0):
return sorted(table, key=operator.itemgetter(col))
if __name__ == '__main__':
mytable = (
('Joe', 'Clark', '1989'),
('Charlie', 'Babbitt', '1988'),
('Frank', 'Abagnale', '2002'),
('Bill', 'Clark', '2009'),
('Alan', 'Clark', '1804'),
)
for row in sort_table(mytable, 1):
print row
Results:
('Frank', 'Abagnale', '2002') ('Charlie', 'Babbitt', '1988') ('Joe', 'Clark', '1989') ('Bill', 'Clark', '2009') ('Alan', 'Clark', '1804')
This works well, but I would also like the table to be sorted by column 0 in addition to column 1. In this example, column 1 holds the Last Name and column 0 holds the First Name. I would like the table to be sorted first by Last Name, and then by First Name. Here is the code to sort the table by multiple columns. The cols argument is a tuple specifying the columns to sort by. The first column to sort by is listed first, the second second, and so on.
import operator
def sort_table(table, cols):
""" sort a table by multiple columns
table: a list of lists (or tuple of tuples) where each inner list
represents a row
cols: a list (or tuple) specifying the column numbers to sort by
e.g. (1,0) would sort by column 1, then by column 0
"""
for col in reversed(cols):
table = sorted(table, key=operator.itemgetter(col))
return table
if __name__ == '__main__':
mytable = (
('Joe', 'Clark', '1989'),
('Charlie', 'Babbitt', '1988'),
('Frank', 'Abagnale', '2002'),
('Bill', 'Clark', '2009'),
('Alan', 'Clark', '1804'),
)
for row in sort_table(mytable, (1,0)):
print row
Results:
('Frank', 'Abagnale', '2002') ('Charlie', 'Babbitt', '1988') ('Alan', 'Clark', '1804') ('Bill', 'Clark', '2009') ('Joe', 'Clark', '1989')
Related posts
- An example using Python's groupby and defaultdict to do the same task — posted 2014-10-09
- python enum types — posted 2012-10-10
- Python data object motivated by a desire for a mutable namedtuple with default values — posted 2012-08-03
- How to sort a list of dicts in Python — posted 2010-04-02
- Python setdefault example — posted 2010-02-09
- How to conditionally replace items in a list — posted 2008-08-22
Comments
Very Helpful.
Thanks.
Thanks a lot!! :-) It's that I was looking for! :-)
Ric.
Awesome. Thanks!
Really useful ! Thanks !
Just what I was after. Thanks.
Thanks! New to python, and had trouble with complex sorting. This does the trick!
Nice, works as fast as sorted(table, cmp=lambda a, b:cmp((a[1],a[0]), (b[1],b[0])))
Thank you for this! Very simple and straightforward, that's what I needed!
http://wiki.python.org/moin/HowTo/Sorting
why do you have to use "for col in reversed(cols)", instead of "for col in range(cols)"?? tks
sorry i mean why do you have to "reverse" the col numbers instead of just passing the normal order?