What is this dictionary assignment doing?

I am learning Python and am trying to use it to perform sentiment analysis. I am following an online tutorial from this link: http://www.alex-hanna.com/tworkshops/lesson-6-basic-sentiment-analysis/. I have taken a piece of code as a mapper class, an excerpt of which looks like this:

sentimentDict = {
    'positive': {},
    'negative': {}
}

def loadSentiment():
    with open('Sentiment/positive_words.txt', 'r') as f:
        for line in f:
            sentimentDict['positive'][line.strip()] = 1

    with open('Sentiment/negative_words.txt', 'r') as f:
        for line in f:
            sentimentDict['negative'][line.strip()] = 1

Here, I can see that a new dictionary is created with two keys, positive and negative, but no values.

Following this, two text files are opened and each line is stripped and mapped to the dictionary.

However, what is the = 1 part for? Why is this required (and if it isn't how could it be removed?)

Answers


The loop creates a nested dictionary, and sets all values to 1, presumably to then just use the keys as a way to weed out duplicate values.

You could use sets instead and avoid the = 1 value:

sentimentDict = {}

def loadSentiment():
    with open('Sentiment/positive_words.txt', 'r') as f:
        sentimentDict['positive'] = {line.strip() for line in f}

    with open('Sentiment/negative_words.txt', 'r') as f:
        sentimentDict['negative'] = {line.strip() for line in f}

Note that you don't even need to create the initial dictionaries; you can create the whole set with one statement, a set comprehension.

If other code does rely on dictionaries with the values being set to 1 (perhaps to update counts at a later stage), it'd be more performant to use the dict.fromkeys() class method instead:

sentimentDict = {}

def loadSentiment():
    with open('Sentiment/positive_words.txt', 'r') as f:
        sentimentDict['positive'] = dict.fromkeys((line.strip() for line in f), 1)

    with open('Sentiment/negative_words.txt', 'r') as f:
        sentimentDict['negative'] = dict.fromkeys((line.strip() for line in f), 1)

Looking at your source blog article however shows that the dictionaries are only used to do membership testing against the keys, so using sets here is much better and transparent to the rest of the code to boot.


Need Your Help

Performance-comparison of Sort() and BinarySearch() with Integers/Strings

.net vb.net performance generics

Originally i wanted to ask if it's faster to sort Integers than Strings.

Using SSRS instead of Crystal reports to generate admin forms

layout reporting-services crystal-reports report

I'm looking into upgrading a .net 2.0 app. The app is used by the public authorities of a certain city to keep track of expenses and generate reports and forms.

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.