Skip to content Skip to sidebar Skip to footer

Translate My Sequence?

I have to write a script to translate this sequence: dict = {'TTT':'F|Phe','TTC':'F|Phe','TTA':'L|Leu','TTG':'L|Leu','TCT':'S|Ser','TCC':'S|Ser', 'TCA':'S|Ser','TCG':

Solution 1:

You can use (Note this would be ridiculously much more easier using biopython translate method):

dictio = {your dictionary here}

def translate(seq):
    x = 0
    aaseq = []
    while True:
        try:
            aaseq.append(dicti[seq[x:x+3]])
            x += 3
        except (IndexError, KeyError):
            breakreturn aaseq

seq = "TTTCAATACTAGCATGACCAAAGTGGGAACCCCCTTACGTAGCATGACCCATATATATATATATA"for frame in range(3):
    print('+%i' %(frame+1), ''.join(item.split('|')[1] for item in translate(seq[frame:])))

Note I changed the name of your dictionary with dicti (not to overwrite dict).


Some comments to help you understand:

translate takes you sequence and returns it in the form of a list in which each item corresponds to the amino acid translation of the triplet coding that position. Like:

aaseq = ["L|Leu","L|Leu","P|Pro", ....]

you could process more this data (get only one or three letters code) inside translate or return it as it is to be processed latter as I have done.

translate is called in

''.join(item.split('|')[1] for item in translate(seq[frame:]))

for each frame. For frame value being 0, 1 or 2 it sends seq[frame:] as a parameter to translate. That is, you are sending the sequences corresponding to the three different reading frames processing them in series. Then, in

''.join(item.split('|')[1]

I split the one and three-letters codes for each amino acid and take the one at index 1 (the second). Then they are joined in a single string

Solution 2:

Not too pretty, but does what you want

dct = {"TTT":"F|Phe","TTC":"F|Phe","TTA":"L|Leu","TTG":"L|Leu","TCT":"S|Ser","TCC":"S|Ser", 
"TCA":"S|Ser","TCG":"S|Ser", "TAT":"Y|Tyr","TAC":"Y|Tyr","TAA":"*|Stp","TAG":"*|Stp", 
"TGT":"C|Cys","TGC":"C|Cys","TGA":"*|Stp","TGG":"W|Trp", "CTT":"L|Leu","CTC":"L|Leu", 
"CTA":"L|Leu","CTG":"L|Leu","CCT":"P|Pro","CCC":"P|Pro","CCA":"P|Pro","CCG":"P|Pro", 
"CAT":"H|His","CAC":"H|His","CAA":"Q|Gln","CAG":"Q|Gln","CGT":"R|Arg","CGC":"R|Arg", 
"CGA":"R|Arg","CGG":"R|Arg", "ATT":"I|Ile","ATC":"I|Ile","ATA":"I|Ile","ATG":"M|Met", 
"ACT":"T|Thr","ACC":"T|Thr","ACA":"T|Thr","ACG":"T|Thr", "AAT":"N|Asn","AAC":"N|Asn", 
"AAA":"K|Lys","AAG":"K|Lys","AGT":"S|Ser","AGC":"S|Ser","AGA":"R|Arg","AGG":"R|Arg", 
"GTT":"V|Val","GTC":"V|Val","GTA":"V|Val","GTG":"V|Val","GCT":"A|Ala","GCC":"A|Ala", 
"GCA":"A|Ala","GCG":"A|Ala", "GAT":"D|Asp","GAC":"D|Asp","GAA":"E|Glu", 
"GAG":"E|Glu","GGT":"G|Gly","GGC":"G|Gly","GGA":"G|Gly","GGG":"G|Gly"}


seq = "TTTCAATACTAGCATGACCAAAGTGGGAACCCCCTTACGTAGCATGACCCATATATATATATATA"defget_amino_list(s):
    for y inrange(3):
        yield [s[x:x+3] for x inrange(y, len(s) - 2, 3)]

for n, amn inenumerate(get_amino_list(seq), 1):
    print ("+%d " % n + "".join(dct[x][2:] for x in amn))

print(seq)

Solution 3:

Here's my solution. I've called your "dict" variable "aminos". The function method3 returns a list of the values to the right of the "|". To merge them into a single string, just join them on "".

From looking at your code, I believe that your aminos dict contains all possible three-letter combinations. Therefore, I've removed the checks that verify this. It should run a lot faster as a result.

defoverlapping_groups(seq, group_len=3):
    """Returns `N` adjacent items from an iterable in a sliding window style
    """for i inrange(len(seq)-group_len):
        yield seq[i:i+group_len]

defmethod3(seq, aminos):
    return [aminos[k][2:] for k in overlapping_groups(seq, 3)]

for i inrange(3):
    print("%d: %s" % (i, "".join(method3(seq[i:], aminos))))

Post a Comment for "Translate My Sequence?"