gff_toolkit package¶
Submodules¶
gff_toolkit.gff module¶
-
class
gff_toolkit.gff.
Gff
(*args, **kwargs)¶ Bases:
object
Work in progess: holds GffSubParts object
-
add_fasta
(filename)¶ Parameters: filehandle – fasta formatted DNA sequence file Returns:
-
fix_orf
(subfeature, starts=('ATG', 'CTG'), stops=('TAA', 'TGA', 'TAG'), min_len=6)¶ Finds longest ORF in spliced transcript.
Parameters: - subfeature – class: GffSubPart()
- starts – list/tuple with start codons
- stops – list/tuple with stop codons
- min_len – minimum ORF length
Returns: True if ORF is found
-
get_children
(key, reverse=False, featuretype=None, seen=None)¶ Parameters: - key – subfeature ID or subfeature object
- reverse – reverses return order. I.e.: reverse=True return CDS->mRNA->gene. reverse=False returns gene->mRNA->CDS
- featuretype – string or list with featuretypes to be returned
Returns: nested generator of subfeature objects
TODO: add something that prevents double yields
-
get_parents
(key, reverse=True, featuretype=None)¶
-
getclosest
(seqid, pos, strand=None, featuretype=None)¶ Parameters: - seqid – Scaffold name (string) [REQUIRED]
- pos – Position on scaffold (int) [REQUIRED]
- strand – search on positive or negative strand or both (‘+’,’-‘ or None for both). Default is both
- featuretype – returned featuretypes (string or list of strings)
Returns: GffSubPart closest to given position on given scaffold
-
getitems
(seqid=None, start=None, end=None, strand=None, featuretype=None)¶ Parameters: - seqid – Scaffold name (string)
- start – leftbound position (int)
- end – rightbound position (int)
- strand – search on positive or negative strand or both (‘+’,’-‘ or None for both). Default is both
- featuretype – returned featuretypes (string or list of strings)
Returns: Generator object containing all GffSubParts within the given specs (interval, featuretype)
-
getseq
(feature=None, subfeaturetype=None, topfeaturetype=None)¶ This is replaced by a combination of properties on GffSubPart: seq,pep and siblings
-
remove
(key, nested=True)¶ Parameters: - key – string or GffSubPart or list
- nested – bool
Returns: None
-
set_children
()¶ Sets the children attribute of all subfeatures :return:
-
split
()¶
-
stringify
()¶ Parameters: None – Returns: Entire Gff object as gff formatted string
-
typecounts
()¶ Parameters: None – Returns: Dictionary with counts per GffSubPart type
-
uniqueID
¶
-
update
(subfeature)¶ Parameters: subfeature – GffSubFeature object Returns:
-
write_tbl
()¶ Args: None :returns: .tbl formatted string
-
gff_toolkit.gffsubpart module¶
-
exception
gff_toolkit.gffsubpart.
CoordinateError
¶ Bases:
exceptions.Exception
-
class
gff_toolkit.gffsubpart.
GffSubPart
(*args, **kwargs)¶ Bases:
object
Work in progress: contained by Gff object basically this is one line of a gff file, with parents/children indicated by ID some methods to determine overlap between other GffSubPart instances TODO: work on getter/setter for seq work on getter/setter for ID, ratt uses locus_tag for ID of transferred annotations, currently this is handled by parser.py Make this a sort of ‘baseclass’, for instance only mRNA should have a pep option. Do this by subclassing
-
ID
¶
-
exact_match
(other)¶
-
forward
¶
-
fromdict
(dic)¶
-
get_children
(*args, **kwargs)¶
-
get_end
()¶ Returns: end if strand == + else start
-
get_start
()¶ Returns: start if strand == + else end
-
getattribute
(key)¶
-
getnested
(reverse=False, featuretype=None)¶ generator to get all nested subfeatures. Reverse == True starts at the tips of the tree: CDS/exon –> mRNA –> gene Reverse == False starts at the top of the tree: gene –> mRNA –> CDS/exon
-
gff_fields
¶
-
gtf_fields
¶
-
match
(other)¶
-
parents
¶
-
pep
¶
-
reverse
¶
-
seq
¶ if cds.start + 1 == cds.end and index == len(children) - 1 – cds.seq = self.seq[cds.seqid][cds.start-1] else:
cds.seq = self.seq[cds.seqid][cds.start-1:cds.end]
-
set_attribute
(key, value)¶
-
set_end
(value)¶ Used in Gff._change_cds to change start/stop based on strand :param value: int :return:
-
set_start
(value)¶ Used in Gff._change_cds to change start/stop based on strand :param value: int :return:
-
siblings
¶ return sorted list of siblings
-
stringify
(filetype='gff')¶ Return gff style tab separated string
-
target
¶
-
todict
()¶ Return a dict
-
-
exception
gff_toolkit.gffsubpart.
IDError
¶ Bases:
exceptions.Exception
-
exception
gff_toolkit.gffsubpart.
SeqError
¶ Bases:
exceptions.Exception
-
exception
gff_toolkit.gffsubpart.
TranslateError
¶ Bases:
exceptions.Exception
gff_toolkit.parser module¶
-
class
gff_toolkit.parser.
Parser
(gff_file, filetype='standard', fasta_file=None, remove_noncoding=False, limit=None, author=None)¶ Bases:
object
Parser object gff3 formatted annotation files. The parse() method return the processed Gff object. Has several methods for known incorrectly formatted files: _ratt() and _manual(), which are called by specifying filetype
-
parse
()¶
-
-
gff_toolkit.parser.
fasta_iter
(fasta_file)¶
-
gff_toolkit.parser.
linereader
(gff_file, strict=True)¶ Reads a single gff formatted line
-
gff_toolkit.parser.
parser
(gff_file, filetype='standard', fasta_file=None, remove_noncoding=False, limit=None, author=None)¶ Wrapper function for Parser object.