Create a list for values between certain occurences within a sequence . . . Python

I am posting a follow-up question to a previous question I had regarding reading frames.

sequence = 'AAATGAAATAAGGATGGGGTAGTATGATGTGTTT'

I am ultimately looking for a specific pattern 'ATG' and I want to scan the input sequence until it is found. Once it is found, I want it to proceed with a reading frame of 3 until it finds another sequence either 'TAA' or 'TAG' or 'TGT' and then continue scanning until it finds the next 'ATG' with a downstream 'TAA' or 'TAG' or 'TGT'

codon_list = ['ATG','AAA','TAA'],['ATG','GGG','TAG'],['ATG','ATG','TGT']

I was trying this

start_frame = sequence.find('ATG')

but it would only give me the first occurence of 'ATG'. (i.e. '2')

Just for the first list of codons I wrote

for codon in range(len(sequence)):
    next_codon = fdna[start_frame:start_frame + 3]
    codon_list.append(next_codon)
    start_frame = start_frame + 3
    if next_codon == 'TAA': 
        break
    if next_codon == 'TAG':
        break
    elif next_codon=='TGT':
        break
print codon_list
>>> ['ATG','AAA','TAA']

It only works for the first occurence of 'ATG'.

The next part is where I want to create a name for each codon (0,1,2,3,...) and I think I figured that part out:

indx = range(0,len(codon_list))

indx_codon = dict(zip(indx,codon_list)

indx_codon = {0:['ATG','AAA','TAA'],1:['ATG','GGG','TAG'],2:['ATG','ATG','TGT']}

codon_start = ['2','13','23']
codon_end = ['8','21','31']
codon_positions = []

for p,q in zip(codon_start,codon_end):
    codon_positions.append(str(p)+':'+str(q))

print codon_positions
>>> ['2:8', '13:21', '23:31']

So my biggest problem is that the .find() function only works for the first occurrence and it gets messed up when I'm creating the index if there is a 'TAA' or 'TAG' or 'TGT' before the 'ATG' ('ATG' is the one that is supposed to start the reading frame of 3)

How can I create a list of multiple sequences that follow these criteria (i.e. turn sequence into codon_list)?

Answers


Here is a fairly concise solution using regular expressions:

import re
sequence = 'AAATGAAATAAGGATGGGGTAGTATGATGTGTTT'
codons = re.findall(r'ATG(?:...)*?(?:TAA|TAG|TGT)', sequence)
codon_list = [[s[i:i+3] for i in range(0, len(s), 3)] for s in codons]

Result:

>>> codon_list
[['ATG', 'AAA', 'TAA'], ['ATG', 'GGG', 'TAG'], ['ATG', 'ATG', 'TGT']]

Need Your Help

How to generate a steady 37kHz GPIO trigger from inside linux kernel?

linux linux-kernel interrupt kernel-module gpio

I have a micro controller taking care of infrared TX-carrier wave generation currently, but I started wondering if I could dispose of it, and do this work in linux side - thus bringing the cost of my

UIView background too blurry

ios objective-c uiview

I am trying to make a UIView (Genres) have a blurry background. I have tried this code. But it is too blurry:

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.