# Create a list for values between certain occurences within a sequence . . . Python

I am posting a follow-up question to a previous question I had regarding reading frames.

```sequence = 'AAATGAAATAAGGATGGGGTAGTATGATGTGTTT'
```

I am ultimately looking for a specific pattern 'ATG' and I want to scan the input sequence until it is found. Once it is found, I want it to proceed with a reading frame of 3 until it finds another sequence either 'TAA' or 'TAG' or 'TGT' and then continue scanning until it finds the next 'ATG' with a downstream 'TAA' or 'TAG' or 'TGT'

```codon_list = ['ATG','AAA','TAA'],['ATG','GGG','TAG'],['ATG','ATG','TGT']
```

I was trying this

```start_frame = sequence.find('ATG')
```

but it would only give me the first occurence of 'ATG'. (i.e. '2')

Just for the first list of codons I wrote

```for codon in range(len(sequence)):
next_codon = fdna[start_frame:start_frame + 3]
codon_list.append(next_codon)
start_frame = start_frame + 3
if next_codon == 'TAA':
break
if next_codon == 'TAG':
break
elif next_codon=='TGT':
break
print codon_list
>>> ['ATG','AAA','TAA']
```

It only works for the first occurence of 'ATG'.

The next part is where I want to create a name for each codon (0,1,2,3,...) and I think I figured that part out:

```indx = range(0,len(codon_list))

indx_codon = dict(zip(indx,codon_list)

indx_codon = {0:['ATG','AAA','TAA'],1:['ATG','GGG','TAG'],2:['ATG','ATG','TGT']}

codon_start = ['2','13','23']
codon_end = ['8','21','31']
codon_positions = []

for p,q in zip(codon_start,codon_end):
codon_positions.append(str(p)+':'+str(q))

print codon_positions
>>> ['2:8', '13:21', '23:31']
```

So my biggest problem is that the .find() function only works for the first occurrence and it gets messed up when I'm creating the index if there is a 'TAA' or 'TAG' or 'TGT' before the 'ATG' ('ATG' is the one that is supposed to start the reading frame of 3)

How can I create a list of multiple sequences that follow these criteria (i.e. turn sequence into codon_list)?

Here is a fairly concise solution using regular expressions:

```import re
sequence = 'AAATGAAATAAGGATGGGGTAGTATGATGTGTTT'
codons = re.findall(r'ATG(?:...)*?(?:TAA|TAG|TGT)', sequence)
codon_list = [[s[i:i+3] for i in range(0, len(s), 3)] for s in codons]
```

Result:

```>>> codon_list
[['ATG', 'AAA', 'TAA'], ['ATG', 'GGG', 'TAG'], ['ATG', 'ATG', 'TGT']]
```