# numpy indexing with multiple arrays

Given two sequences of data (of equal length) and quality values for each data point, I want to calculate a similarity score based upon a given scoring matrix.

What is the most efficient way to vectorize the following loop:

score = 0 for i in xrange(len(seq1)): score += similarity[seq1[i], seq2[i], qual1[i], qual2[i]]

similarity is a 4-dimensional float array, shape=(32, 32, 100, 100); seq1, seq2, qual1 and qual2 are 1-dimensional int arrays of equal length (of the order 1000 - 40000).

## Answers

Shouldn't this Just Work(tm)?

>>> score = 0 >>> for i in xrange(len(seq1)): score += similarity[seq1[i], seq2[i], qual1[i], qual2[i]] ... >>> score 498.71792400493433 >>> similarity[seq1,seq2, qual1, qual2].sum() 498.71792400493433

Code:

import numpy as np similarity = np.random.random((32, 32, 100, 100)) n = 1000 seq1, seq2, qual1, qual2 = [np.random.randint(0, s, n) for s in similarity.shape] def slow(): score = 0 for i in xrange(len(seq1)): score += similarity[seq1[i], seq2[i], qual1[i], qual2[i]] return score def fast(): return similarity[seq1, seq2, qual1, qual2].sum()

gives:

>>> timeit slow() 100 loops, best of 3: 3.59 ms per loop >>> timeit fast() 10000 loops, best of 3: 143 us per loop >>> np.allclose(slow(),fast()) True