Python MongoDB notes
MongoDB is a popular new schemaless, document-oriented, NoSQL database. It is useful for logging and real-time analytics. I'm working on a tool to store log files from multiple remote hosts to MongoDB, then analyze it in real-time and print pretty plots. My work in progress is located on github.
Here are my first steps using PyMongo. I store an Apache access log to MongoDB and then query it for the number of requests in the last minute. I am running on Ubuntu Karmic 32-bit (though I think MongoDB really wants to run on 64-bit).
Install and run MongoDB
- Download and install MongoDB (Reference)
cd ~/lib curl http://downloads.mongodb.org/linux/mongodb-linux-i686-latest.tgz | tar zx ln -s mongodb-linux-i686-2010-02-22 mongodb
- Create data directory
mkdir -p ~/var/mongodb/db
- Run MongoDB (Reference)
~/lib/mongodb/bin/mongod --dbpath ~/var/mongodb/db
Install PyMongo
- Install pip
- Install PyMongo (Reference)
sudo pip install pymongo
Simple Example
writer.py:
import re
from datetime import datetime
from subprocess import Popen, PIPE, STDOUT
from pymongo import Connection
from pymongo.errors import CollectionInvalid
HOST = 'us-apa1'
LOG_PATH = '/var/log/apache2/http-mydomain.com-access.log'
DB_NAME = 'mydb'
COLLECTION_NAME = 'apache_access'
MAX_COLLECTION_SIZE = 5 # in megabytes
def main():
# connect to mongodb
mongo_conn = Connection()
mongo_db = mongo_conn[DB_NAME]
try:
mongo_coll = mongo_db.create_collection(COLLECTION_NAME,
capped=True,
size=MAX_COLLECTION_SIZE*1048576)
except CollectionInvalid:
mongo_coll = mongo_db[COLLECTION_NAME]
# open remote log file
cmd = 'ssh -f %s tail -f %s' % (HOST, LOG_PATH)
p = Popen(cmd, shell=True, stdout=PIPE, stderr=STDOUT)
# parse and store data
while True:
line = p.stdout.readline()
data = parse_line(line)
data['time'] = convert_time(data['time'])
mongo_coll.insert(data)
def parse_line(line):
"""Apache combined log format
%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"
"""
m = re.search(' '.join([
r'(?P<host>(\d+\.){3}\d+)',
r'.*',
r'\[(?P<time>[^\]]+)\]',
r'"\S+ (?P<url>\S+)',
]), line)
if m:
return m.groupdict()
else:
return {}
def convert_time(time_str):
time_str = re.sub(r' -\d{4}', '', time_str)
return datetime.strptime(time_str, "%d/%b/%Y:%H:%M:%S")
if __name__ == '__main__':
main()
reader.py:
import time
from datetime import datetime, timedelta
from pymongo import Connection
DB_NAME = 'mydb'
COLLECTION_NAME = 'apache_access'
def main():
# connect to mongodb
mongo_conn = Connection()
mongo_db = mongo_conn[DB_NAME]
mongo_coll = mongo_db[COLLECTION_NAME]
# find the number of requests in the last minute
while True:
d = datetime.now() - timedelta(seconds=60)
N_requests = mongo_coll.find({'time': {'$gt': d}}).count()
print 'Requests in the last minute:', N_requests
time.sleep(2)
if __name__ == '__main__':
main()
Running python writer.py
in one terminal and python reader.py
in another terminal, I get the following results:
Requests in the last minute: 13 Requests in the last minute: 14 Requests in the last minute: 14 Requests in the last minute: 14 Requests in the last minute: 13 Requests in the last minute: 14 Requests in the last minute: 15 ...
Related Documentation
Comments
Nice! Thanks :)
Thanks for the post! Glad you're enjoying PyMongo - feel free to let me know if you have any questions.
Mike: thank you for your support! (and for the great software and documentation!)
You might find http://Graylog2.org really interesting. The server side is a java app that listens for syslog messages over UDP 514, and puts them in a capped MongoDB collection. There's also a web front end (Ruby) that lets you view, search and sort the messages. I am using it to capture log info, and am working on something similar to your project to analyze what's in the DB.
Also, this post from Grig Gheorghiu is relevant: http://agiletesting.blogspot.com/2010/07/tracking-and-visualizing-mail-logs-with.html
Good luck with the project!
Thanks for the links. It looks some good stuff. I have enjoyed a lot of Grig Gheorghiu's articles, but I hadn't seen this one before.
Good luck on you project as well!
Thanks, this helps during my setup with mongodb.
hi,
I would like to know, how to connect to mongodb which is running remotely. Can u please send me a code
Regards, Francis