Count of all hashtags in a set of tweets

I have some JSON Twitter data from the streaming API and I would like to use the Counter function to get an idea of the most popular hashtags in this dataset. The issue that I have is looping through tweets that have more than one hashtag and not just pulling out the first hashtag and ignoring any remaining hashtags.

Question: how do I loop through a nested list inside of a dict to extract all hashtags in a tweet and not just the first hashtag?

In [1]: import json

In [2]: from collections import Counter

In [3]: data = []

In [4]: for line in open('DC.json'):
   ...:     try:
   ...:         data.append(json.loads(line))
   ...:     except:
   ...:         pass
   ...:     

In [5]: hashtags = []

In [6]: for i in data:
   ...:     if 'entities' in i and len(i['entities']['hashtags']) > 0:
   ...:         hashtags.append(i['entities']['hashtags']['text'])
   ...:     else:
   ...:         pass
   ...:     
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-66d7538509f9> in <module>()
      1 for i in data:
      2     if 'entities' in i and len(i['entities']['hashtags']) > 0:
----> 3         hashtags.append(i['entities']['hashtags']['text'])
      4     else:
      5         pass

TypeError: list indices must be integers, not str

In [7]: Counter(hashtags).most_common()[:10]

Example with 4 hashtags in i['entities']['hashtags']

In [12]: i[0]['entities']['hashtags']
Out[12]: 
[{u'indices': [28, 35], u'text': u'selfie'},
 {u'indices': [82, 92], u'text': u'omg'},
 {u'indices': [93, 104], u'text': u'Champ'},
 {u'indices': [105, 117], u'text': u'FIRST'}]

Answers


You say that i['entities']['hashtags'] is a list of dicts, so the line:

hashtags.append(i['entities']['hashtags']['text'])

is trying to index a list using a string. This makes no sense, and causes an error. I think you would be better splitting this into steps, first getting all of the 'hashtag' dictionaries:

hashtags = []
for i in data:
    if 'entities' in i:
        hashtags.extend(i['entities']['hashtags'])

then extracting the 'text':

hashtags = [tag['text'] for tag in hashtags]

then dumping it into Counter:

Counter(hashtags).most_common()[:10]

Need Your Help

Remove extra space between divs

jquery css html5 css3

I have a array of images (images are of different height) coming from the database (php &amp; mysql) and in the ouput, some extra space is coming under each line. I need to remove that extra space....

Mailchimp template only shows first section in yahoo webmail client

yahoo mailchimp

I have coded an email template for Mailchimp, and it seems to be working fine in most email clients I've tested (windows mail, mac mail, gmail webclient) - but in yahoo webmail client, it only show...

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.