bioutils.seqfetcher module

Provides sequence fetching from NCBI and Ensembl.

bioutils.seqfetcher.fetch_seq(ac, start_i=None, end_i=None)[source]

Fetches sequences and subsequences from NCBI eutils and Ensembl REST interfaces.

Parameters:
  • ac (str) – The accession of the sequence to fetch.

  • start_i (int, optional) – The start index (interbase coordinates) of the subsequence to fetch. Defaults to None. It is recommended to retrieve a subsequence by providing an index here, rather than by Python slicing the whole sequence.

  • end_i (int, optional) – The end index (interbase coordinates) of the subsequence to fetch. Defaults to None. It is recommended to retrieve a subsequence by providing an index here, rather than by Python slicing the whole sequence.

Returns:

The requested sequence.

Return type:

str

Raises:
  • RuntimeError – If the syntax doesn’t match that of any of the databases.

  • RuntimeError – If the request to the database fails.

Examples

>>> len(fetch_seq('NP_056374.2'))
1596
>>> fetch_seq('NP_056374.2',0,10)   # This!
'MESRETLSSS'
>>> fetch_seq('NP_056374.2')[0:10]  # Not this!
'MESRETLSSS'

# Providing intervals is especially important for large sequences:

>>> fetch_seq('NC_000001.10',2000000,2000030)
'ATCACACGTGCAGGAACCCTTTTCCAAAGG'

# This call will pull back 30 bases plus overhead; without the # interval, one would receive 250MB of chr1 plus overhead!

# Essentially any RefSeq, Genbank, BIC, or Ensembl sequence may be # fetched.

>>> fetch_seq('NM_9.9')
Traceback (most recent call last):
...
RuntimeError: No sequence available for NM_9.9
>>> fetch_seq('QQ01234')
Traceback (most recent call last):
...
RuntimeError: No sequence fetcher for QQ01234