Slice specific characters in CSV using python

I have data in tab delimited format that looks like:

0/0:23:-1.03,-7.94,-83.75:69.15    0/1:34:-1.01,-11.24,-127.51:99.00    0/0:74:-1.02,-23.28,-301.81:99.00

I am only interested in the first 3 characters of each entry (ie 0/0 and 0/1). I figured the best way to do this would be to use match and the genfromtxt in numpy. This example is as far as I have gotten:

import re
csvfile = 'home/python/batch1.hg19.table'
from numpy import genfromtxt
data = genfromtxt(csvfile, delimiter="\t", dtype=None)
for i in data[1]:
    m = re.match('[0-9]/[0-9]', i)
        if m:
        print m.group(0),
        else:
        print "NA",

This works for the first row of the data which but I am having a hard time figuring out how to expand it for every row of the input file.

Should I make it a function and apply it to each row seperately or is there a more pythonic way to do this?

Answers


Numpy is great when you want to load in an array of numbers. The format you have here is too complicated for numpy to recognize, so you just get an array of strings. That's not really playing to numpy's strength.

Here's a simple way to do it without numpy:

result=[]
with open(csvfile,'r') as f:
    for line in f:
        row=[]
        for text in line.split('\t'):
            match=re.search('([0-9]/[0-9])',text)
            if match:
                row.append(match.group(1))
            else:
                row.append("NA")
        result.append(row)
print(result)

yields

# [['0/0', '0/1', '0/0'], ['NA', '0/1', '0/0']]

on this data:

0/0:23:-1.03,-7.94,-83.75:69.15 0/1:34:-1.01,-11.24,-127.51:99.00   0/0:74:-1.02,-23.28,-301.81:99.00
---:23:-1.03,-7.94,-83.75:69.15 0/1:34:-1.01,-11.24,-127.51:99.00   0/0:74:-1.02,-23.28,-301.81:99.00

Unless you really want to use NumPy, try this:

file = open('home/python/batch1.hg19.table')
for line in file:
    for cell in line.split('\t'):
        print(cell[:3])

Which just iterates through each line of the file, tokenizes the line using the tab character as the delimiter, then prints the slice of the text you are looking for.


Need Your Help

How does hardware run assembly?

assembly

Having taken a course on compilers and making a rudimentary one by myself, i still have this lingering doubt about the first compiler.

Options to profile server side execution times in python code

python http profiling multicore execution

I've a HTTP server written in Python that accepts a large binary file (>50MB) and performs some file related computation (decryption, decompression ...) on the file. I want to get a good estimate o...

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.