Need help making a permutation faster

Here is my working code, I am trying to find ways to make it faster in finding the valid words, I was thinking about possibly making separate dictionary lists for each words, what do yall think?

import random
import itertools

file_name='words.txt'

def load_words():
    try:
        f=open(file_name,'r')
        str1=f.read()
        f.close()
    except:
        print('Problem opening the file',file_name)
    list1=[]
    list1=str1.split()
    return(list1)

def is_valid(str1,list1):
    valid=False
    if str1 in list1:
        valid=True
    return valid

def generate(words,letters):
    answers=[]
    for length in range(2,len(letters)+1):
        for x in itertools.permutations(letters,length):
            word=''
            for let in x:
                word+=let
            if is_valid(word.upper(),words):
                answers.append(word)
                print(word)
    print(answers)

def main():
    words=load_words()
    letters = input('Enter your letters')
    answers = generate(words,letters)

main()

Answers


If you are too keen in increasing the speed at the cost of making it less readable you may try the following

def is_valid(str1,list1):
    return str1 in list1
words=["BAD","CAB","BEC"]
def generate2(words,letters):
    answers=[]
    [[answers.append(''.join(x).upper()) for x in itertools.permutations(letters,length) if ''.join(x).upper() in words] for length in range(2,len(letters)+1)]
    #print(answers)
    return answers

List comprehension is faster than loops. So combined both the loops to a single comprehension. Apart from that the statement

       word=''
        for let in x:
            word+=let
        if is_valid(word.upper(),words):

can be combined to if is_valid(''.join(x).upper,words) or even ''.join(x).upper in words, remember function call is costly.

I have done a comparison in speed and looks list comprehension is running 50% faster.

Its now upto you to decide


>>> stmt1="""
def is_valid(str1,list1):
    valid=False
    if str1 in list1:
        valid=True
    return valid
words=["BAD","CAB","BEC"]
def generate1(words,letters):
    answers=[]
    for length in range(2,len(letters)+1):
        for x in itertools.permutations(letters,length):
            word=''
            for let in x:
                word+=let
            if is_valid(word.upper(),words):
                answers.append(word)
                #print(word)
    #print(answers)
    return answers
generate1(words,['A','B','C','D','E'])
"""
>>> 
>>> stmt2="""
def is_valid(str1,list1):
    return str1 in list1
words=["BAD","CAB","BEC"]
def generate2(words,letters):
    answers=[]
    [[answers.append(''.join(x).upper()) for x in itertools.permutations(letters,length) if ''.join(x).upper() in words] for length in range(2,len(letters)+1)]
    #print(answers)
    return answers
generate2(words,['A','B','C','D','E'])
"""
>>> 
>>> t1=timeit.Timer(stmt=stmt1)
>>> t2=timeit.Timer(stmt=stmt2)
>>> t1.repeat(number=1000)
>>> t2=timeit.Timer(stmt=stmt2)
>>> t1.repeat(number=1000)
[0.47923321640178074, 0.4353549521401874, 0.4362746333173959]
>>> t2.repeat(number=1000)
[0.2536238984591819, 0.2470974750062851, 0.24726312027155473]

Firstly, profile the code. That will tell you where the slow parts are.

Secondly, you might consider converting the words list to a set, which should have a faster 'in' operator for checking if the word is there.

Thirdly, consider simplifying the code to remove unnecessary statements, eg.

def is_valid(str1,list1):
   return str1 in list1

Change your list1 to a set:

set1 = set(list1)

The testing str1 in set1 will be much faster than str1 in list1 if you do frequent tests and the list is long.


Try replacing the inner loop with:

for x in itertools.permutations(letters,length):
    word = ''.join(x)
    if word.upper() in words:
        answers.append(word)
        print(word)

The issue is with your algorithm is basically O(n * m!) where n is the word list size and m is number of letters. Changing the word list to a set should make the searches log time and improve performance to O( log(n) * m!) as recommended by other people here.

The real performance gain will however come from completely eliminating permuting the letters for the search. First sort the letters in each individual word in the word list alphabetically; it should take O( n * p log(p) ) time where p is average word length. Then sort the entire list alphabetically in O( n * log(n) ) time. Keep track of the original words as well, so that you can go from a string in the sorted word list to the original word in O(1). Next sort the imputed letters alphabetically and search for them in the sorted word list.

The slowest operation in the above algorithm is sorting the list of alphabetically sorted strings, which is O(n Log(n) ). Searching such a list is Log( n ) and results in the entire algorithm executing in O( n Log(n) ) time. It should scale linearly to m, the number of letters inputed.

Implementation is left to the reader.


What exactly are you trying to accomplish with this? It looks like you have some dictionary of valid words you are reading from already. Why are you permutating all combination of words that could be built from the input given by the user?

Something you need to consider about your algorithm. Every permutation you create you are iterating over every known word in your dictionary (list1). When you create all permutations of the words you are creating m! words where m number of the letters given by the user.

You basically have O(n * m!). That's ridiculously huge for even small number of things like 7. By using a set, instead of a list, you can take that n term and reduce it to a constant which changes your algorithm to O(m!) which is still too big. If I had to guess what this algorithm is doing I'd say you want to know how many known words you can create out of the letters you are given. Again you didn't say that, but let's assume this is what you meant.

A faster algorithm would be to iterate over every word in your dictionary and see if you can make that word by picking letters from the input. So you only walk over the dictionary once O(n * m). That eliminates the need to permutate your input. Here's the algorithm:

 user_input = input("Give me some words")
 for word in list1:
     current = user_input
     found = True
     for letter in word:
         if letter in current:
            current.remove( letter )
         else
            found = False
            break;
     if found:
        answers.add( word )
 print( answers )

Sorry my python is a little rusty but hopefully you'll get the idea.


Need Your Help

Use JGrowl with Theme roller

javascript jquery jquery-ui jquery-plugins jgrowl

I want to know how to use my Jquery UI theme with JGrowl. The site says it can be done. The questions is answered here:

Combining multiple maps with vectors into one map

c++ dictionary vector stdmap

I have a question about combining maps that have vectors as the value section. for instance, I might have the following:

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.