Turkish characters in python
I am playing around with the Twitter API, but I have several questions regarding the encoding of Turkish characters. Here is the code I'm working with:
# -*- coding: cp1254 -*- import sys import csv import tweepy import locale import string locale.setlocale(locale.LC_ALL, "Turkish") auth = tweepy.OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_key, access_secret) api = tweepy.API(auth) f=open("tweets.csv", "wb") for q in [list of search queries]: a=[tweet.text.encode("utf-8") for tweet in tweepy.Cursor(api.search, q, result_type="recent", include_entities=True, lang="tr").items(20)] wr=csv.writer(f, quoting=csv.QUOTE_ALL) wr.writerow(q)
Basically, what I'm doing is running the search api by iterating through a list of search queries and then writing the tweets into an excel file. However, no matter what I do, the tweets are written by replacing regular Turkish characters with other substitutes. I've tried several things (setting the locale, adding the .encode("utf-8") part, etc.), but I still don't know how to fix it.
Here is what I am talking about:
what is written: DÃ¼n akÅŸam Ãœlker Arena
what I want it to write: Dün akşam Ülker Arena
What I don't understand is that ü, Ü and ş are all in the local letters when I set the locale to Turkish, but Python substitutes these letters.
I duplicated your code on my system (Windows 7, with Office 2010) and I got it working. I used your code but I simplified the search query as follows:
search_results = api.search(q="canan1405", count=10) for tweet in search_results: print tweet.text.encode('utf-8')
I pulled tweets from the 'canan1405' user as they contained Turkish characters. (Hope she doesn't mind!)
I simply redirected the output of my script to a file, as follows:
python so_24038317.py > tweets.csv
At this point, the tweets.csv file contains Unicode characters encoded as UTF-8. If I double-click on the file as you did, the default Excel display shows garbage characters much like in your case:
Instead of double-clicking on the csv file, use the following steps to import the file:
- Start Excel.
- Click the "Data" tab on the ribbon.
- Click the "From Text" icon in the "Get External Data".
- Locate the CSV file and click the "Import" button.
- A wizard will be displayed. In my case, it came up with the correct guess for the file contents (see the "File origin:" drop-down):
You can complete the rest of the steps for the wizard but they are optional. The file displayed correctly:
As far as I can tell, it contains (and correctly displays) the following Turkish characters:
ş, Ğ, İ, ğ, ı, ç
Note that the character immediately after the string "Oyy şirin kedi" is an emoticon, not a valid UTF-8 character. Hope this helps.