How can I create multiple hashes of a file using only one pass?
How can I get a MD5, SHA and other hashes from a file but only doing one pass? I have 100mb files, so I'd hate to process those 100MB files multiple times.
Answers
Here's a modified @ʞɔıu's answer using @Jason S' suggestion.
from __future__ import with_statement from hashlib import md5, sha1 filename = 'hash_one-pass.py' hashes = md5(), sha1() chunksize = max(4096, max(h.block_size for h in hashes)) with open(filename, 'rb') as f: while True: chunk = f.read(chunksize) if not chunk: break for h in hashes: h.update(chunk) for h in hashes: print h.name, h.hexdigest()
Something like this perhaps?
>>> import hashlib >>> hashes = (hashlib.md5(), hashlib.sha1()) >>> f = open('some_file', 'r') >>> for line in f: ... for hash in hashes: ... hash.update(line) ... >>> for hash in hashes: ... print hash.name, hash.hexdigest()
or loop over f.read(1024) or something like that to get fixed-length blocks
I don't know Python but I am familiar w/ hash calculations.
If you handle the reading of files manually, just read in one block (of 256 bytes or 4096 bytes or whatever) at a time, and pass each block of data to update the hash of each algorithm. (you'll have to initialize state at the beginning and finalize the state at the end.)