gff_toolkit package

Submodules

gff_toolkit.gff module

class gff_toolkit.gff.Gff(*args, **kwargs)

Bases: object

Work in progess: holds GffSubParts object

add_fasta(filename)
Parameters:filehandle – fasta formatted DNA sequence file
Returns:
fix_orf(subfeature, starts=('ATG', 'CTG'), stops=('TAA', 'TGA', 'TAG'), min_len=6)

Finds longest ORF in spliced transcript.

Parameters:
  • subfeature – class: GffSubPart()
  • starts – list/tuple with start codons
  • stops – list/tuple with stop codons
  • min_len – minimum ORF length
Returns:

True if ORF is found

get_children(key, reverse=False, featuretype=None, seen=None)
Parameters:
  • key – subfeature ID or subfeature object
  • reverse – reverses return order. I.e.: reverse=True return CDS->mRNA->gene. reverse=False returns gene->mRNA->CDS
  • featuretype – string or list with featuretypes to be returned
Returns:

nested generator of subfeature objects

TODO: add something that prevents double yields

get_parents(key, reverse=True, featuretype=None)
getclosest(seqid, pos, strand=None, featuretype=None)
Parameters:
  • seqid – Scaffold name (string) [REQUIRED]
  • pos – Position on scaffold (int) [REQUIRED]
  • strand – search on positive or negative strand or both (‘+’,’-‘ or None for both). Default is both
  • featuretype – returned featuretypes (string or list of strings)
Returns:

GffSubPart closest to given position on given scaffold

getitems(seqid=None, start=None, end=None, strand=None, featuretype=None)
Parameters:
  • seqid – Scaffold name (string)
  • start – leftbound position (int)
  • end – rightbound position (int)
  • strand – search on positive or negative strand or both (‘+’,’-‘ or None for both). Default is both
  • featuretype – returned featuretypes (string or list of strings)
Returns:

Generator object containing all GffSubParts within the given specs (interval, featuretype)

getseq(feature=None, subfeaturetype=None, topfeaturetype=None)

This is replaced by a combination of properties on GffSubPart: seq,pep and siblings

remove(key, nested=True)
Parameters:
  • key – string or GffSubPart or list
  • nested – bool
Returns:

None

set_children()

Sets the children attribute of all subfeatures :return:

split()
stringify()
Parameters:None
Returns:Entire Gff object as gff formatted string
typecounts()
Parameters:None
Returns:Dictionary with counts per GffSubPart type
uniqueID
update(subfeature)
Parameters:subfeature – GffSubFeature object
Returns:
write_tbl()

Args: None :returns: .tbl formatted string

gff_toolkit.gffsubpart module

exception gff_toolkit.gffsubpart.CoordinateError

Bases: exceptions.Exception

class gff_toolkit.gffsubpart.GffSubPart(*args, **kwargs)

Bases: object

Work in progress: contained by Gff object basically this is one line of a gff file, with parents/children indicated by ID some methods to determine overlap between other GffSubPart instances TODO: work on getter/setter for seq work on getter/setter for ID, ratt uses locus_tag for ID of transferred annotations, currently this is handled by parser.py Make this a sort of ‘baseclass’, for instance only mRNA should have a pep option. Do this by subclassing

ID
exact_match(other)
forward
fromdict(dic)
get_children(*args, **kwargs)
get_end()
Returns:end if strand == + else start
get_start()
Returns:start if strand == + else end
getattribute(key)
getnested(reverse=False, featuretype=None)

generator to get all nested subfeatures. Reverse == True starts at the tips of the tree: CDS/exon –> mRNA –> gene Reverse == False starts at the top of the tree: gene –> mRNA –> CDS/exon

gff_fields
gtf_fields
match(other)
parents
pep
reverse
seq

if cds.start + 1 == cds.end and index == len(children) - 1 – cds.seq = self.seq[cds.seqid][cds.start-1] else:

cds.seq = self.seq[cds.seqid][cds.start-1:cds.end]
set_attribute(key, value)
set_end(value)

Used in Gff._change_cds to change start/stop based on strand :param value: int :return:

set_start(value)

Used in Gff._change_cds to change start/stop based on strand :param value: int :return:

siblings

return sorted list of siblings

stringify(filetype='gff')

Return gff style tab separated string

target
todict()

Return a dict

exception gff_toolkit.gffsubpart.IDError

Bases: exceptions.Exception

exception gff_toolkit.gffsubpart.SeqError

Bases: exceptions.Exception

exception gff_toolkit.gffsubpart.TranslateError

Bases: exceptions.Exception

gff_toolkit.parser module

class gff_toolkit.parser.Parser(gff_file, filetype='standard', fasta_file=None, remove_noncoding=False, limit=None, author=None)

Bases: object

Parser object gff3 formatted annotation files. The parse() method return the processed Gff object. Has several methods for known incorrectly formatted files: _ratt() and _manual(), which are called by specifying filetype

parse()
gff_toolkit.parser.fasta_iter(fasta_file)
gff_toolkit.parser.linereader(gff_file, strict=True)

Reads a single gff formatted line

gff_toolkit.parser.parser(gff_file, filetype='standard', fasta_file=None, remove_noncoding=False, limit=None, author=None)

Wrapper function for Parser object.

gff_toolkit.test module

docstring for testing

gff_toolkit.test.test()

Module contents