gff_toolkit package¶

Submodules¶

gff_toolkit.gff module¶

class gff_toolkit.gff.Gff(*args, **kwargs)¶

Bases: object

Work in progess: holds GffSubParts object

add_fasta(filename)¶

Parameters:	filehandle – fasta formatted DNA sequence file
Returns:

fix_orf(subfeature, starts=('ATG', 'CTG'), stops=('TAA', 'TGA', 'TAG'), min_len=6)¶

Finds longest ORF in spliced transcript.

Parameters:	subfeature – class: GffSubPart() starts – list/tuple with start codons stops – list/tuple with stop codons min_len – minimum ORF length
Returns:	True if ORF is found

get_children(key, reverse=False, featuretype=None, seen=None)¶

Parameters:	key – subfeature ID or subfeature object reverse – reverses return order. I.e.: reverse=True return CDS->mRNA->gene. reverse=False returns gene->mRNA->CDS featuretype – string or list with featuretypes to be returned
Returns:	nested generator of subfeature objects

TODO: add something that prevents double yields

get_parents(key, reverse=True, featuretype=None)¶

getclosest(seqid, pos, strand=None, featuretype=None)¶

Parameters:	seqid – Scaffold name (string) [REQUIRED] pos – Position on scaffold (int) [REQUIRED] strand – search on positive or negative strand or both (‘+’,’-‘ or None for both). Default is both featuretype – returned featuretypes (string or list of strings)
Returns:	GffSubPart closest to given position on given scaffold

getitems(seqid=None, start=None, end=None, strand=None, featuretype=None)¶

Parameters:	seqid – Scaffold name (string) start – leftbound position (int) end – rightbound position (int) strand – search on positive or negative strand or both (‘+’,’-‘ or None for both). Default is both featuretype – returned featuretypes (string or list of strings)
Returns:	Generator object containing all GffSubParts within the given specs (interval, featuretype)

getseq(feature=None, subfeaturetype=None, topfeaturetype=None)¶: This is replaced by a combination of properties on GffSubPart: seq,pep and siblings

remove(key, nested=True)¶

Parameters:	key – string or GffSubPart or list nested – bool
Returns:	None

set_children()¶: Sets the children attribute of all subfeatures :return:

split()¶

stringify()¶

Parameters:	None –
Returns:	Entire Gff object as gff formatted string

typecounts()¶

Parameters:	None –
Returns:	Dictionary with counts per GffSubPart type

uniqueID¶

update(subfeature)¶

Parameters:	subfeature – GffSubFeature object
Returns:

write_tbl()¶: Args: None :returns: .tbl formatted string

gff_toolkit.gffsubpart module¶

exception gff_toolkit.gffsubpart.CoordinateError¶: Bases: exceptions.Exception

class gff_toolkit.gffsubpart.GffSubPart(*args, **kwargs)¶

Bases: object

Work in progress: contained by Gff object basically this is one line of a gff file, with parents/children indicated by ID some methods to determine overlap between other GffSubPart instances TODO: work on getter/setter for seq work on getter/setter for ID, ratt uses locus_tag for ID of transferred annotations, currently this is handled by parser.py Make this a sort of ‘baseclass’, for instance only mRNA should have a pep option. Do this by subclassing

ID¶

exact_match(other)¶

forward¶

fromdict(dic)¶

get_children(*args, **kwargs)¶

get_end()¶

Returns:	end if strand == + else start

get_start()¶

Returns:	start if strand == + else end

getattribute(key)¶

getnested(reverse=False, featuretype=None)¶: generator to get all nested subfeatures. Reverse == True starts at the tips of the tree: CDS/exon –> mRNA –> gene Reverse == False starts at the top of the tree: gene –> mRNA –> CDS/exon

gff_fields¶

gtf_fields¶

match(other)¶

parents¶

pep¶

reverse¶

seq¶: if cds.start + 1 == cds.end and index == len(children) - 1 – cds.seq = self.seq[cds.seqid][cds.start-1] else:

cds.seq = self.seq[cds.seqid][cds.start-1:cds.end]

set_attribute(key, value)¶

set_end(value)¶: Used in Gff._change_cds to change start/stop based on strand :param value: int :return:

set_start(value)¶: Used in Gff._change_cds to change start/stop based on strand :param value: int :return:

siblings¶: return sorted list of siblings

stringify(filetype='gff')¶: Return gff style tab separated string

target¶

todict()¶: Return a dict

exception gff_toolkit.gffsubpart.IDError¶: Bases: exceptions.Exception

exception gff_toolkit.gffsubpart.SeqError¶: Bases: exceptions.Exception

exception gff_toolkit.gffsubpart.TranslateError¶: Bases: exceptions.Exception

gff_toolkit.parser module¶

class gff_toolkit.parser.Parser(gff_file, filetype='standard', fasta_file=None, remove_noncoding=False, limit=None, author=None)¶

Bases: object

Parser object gff3 formatted annotation files. The parse() method return the processed Gff object. Has several methods for known incorrectly formatted files: _ratt() and _manual(), which are called by specifying filetype

parse()¶

gff_toolkit.parser.fasta_iter(fasta_file)¶

gff_toolkit.parser.linereader(gff_file, strict=True)¶: Reads a single gff formatted line

gff_toolkit.parser.parser(gff_file, filetype='standard', fasta_file=None, remove_noncoding=False, limit=None, author=None)¶: Wrapper function for Parser object.

gff_toolkit.test module¶

docstring for testing

gff_toolkit.test.test()¶

gff_toolkit package¶

Submodules¶

gff_toolkit.gff module¶

gff_toolkit.gffsubpart module¶

gff_toolkit.parser module¶

gff_toolkit.test module¶

Module contents¶