bioutils.seqfetcher module¶
Provides sequence fetching from NCBI and Ensembl.
- bioutils.seqfetcher.fetch_seq(ac, start_i=None, end_i=None)[source]¶
Fetches sequences and subsequences from NCBI eutils and Ensembl REST interfaces.
- Parameters:
ac (str) – The accession of the sequence to fetch.
start_i (int, optional) – The start index (interbase coordinates) of the subsequence to fetch. Defaults to
None
. It is recommended to retrieve a subsequence by providing an index here, rather than by Python slicing the whole sequence.end_i (int, optional) – The end index (interbase coordinates) of the subsequence to fetch. Defaults to
None
. It is recommended to retrieve a subsequence by providing an index here, rather than by Python slicing the whole sequence.
- Returns:
The requested sequence.
- Return type:
str
- Raises:
RuntimeError – If the syntax doesn’t match that of any of the databases.
RuntimeError – If the request to the database fails.
Examples
>>> len(fetch_seq('NP_056374.2')) 1596
>>> fetch_seq('NP_056374.2',0,10) # This! 'MESRETLSSS'
>>> fetch_seq('NP_056374.2')[0:10] # Not this! 'MESRETLSSS'
# Providing intervals is especially important for large sequences:
>>> fetch_seq('NC_000001.10',2000000,2000030) 'ATCACACGTGCAGGAACCCTTTTCCAAAGG'
# This call will pull back 30 bases plus overhead; without the # interval, one would receive 250MB of chr1 plus overhead!
# Essentially any RefSeq, Genbank, BIC, or Ensembl sequence may be # fetched.
>>> fetch_seq('NM_9.9') Traceback (most recent call last): ... RuntimeError: No sequence available for NM_9.9
>>> fetch_seq('QQ01234') Traceback (most recent call last): ... RuntimeError: No sequence fetcher for QQ01234