bioutils.digests module

bioutils.digests.seq_md5(seq, normalize=True)[source]

Converts sequence to unicode md5 hex digest.

Parameters:
  • seq (str) – A sequence.

  • normalize (bool, optional) – Whether to normalize the sequence before conversion, i.e. to ensure representation as uppercase letters without whitespace or asterisks. Defaults to True.

Returns:

Unicode md5 hex digest representation of sequence.

Return type:

str

Examples

>>> seq_md5('')
'd41d8cd98f00b204e9800998ecf8427e'
>>> seq_md5('ACGT')
'f1f8f4bf413b16ad135722aa4591043e'
>>> seq_md5('ACGT*')
'f1f8f4bf413b16ad135722aa4591043e'
>>> seq_md5(' A C G T ')
'f1f8f4bf413b16ad135722aa4591043e'
>>> seq_md5('acgt')
'f1f8f4bf413b16ad135722aa4591043e'
>>> seq_md5('acgt', normalize=False)
'db516c3913e179338b162b2476d1c23f'
bioutils.digests.seq_seguid(seq, normalize=True)[source]

Converts sequence to seguid.

This seguid is compatible with BioPython’s seguid.

Parameters:
  • seq (str) – A sequence.

  • normalize (bool, optional) – Whether to normalize the sequence before conversion, i.e. to ensure representation as uppercase letters without whitespace or asterisks. Defaults to True.

Returns:

seguid representation of sequence.

Return type:

str

Examples

>>> seq_seguid('')
'2jmj7l5rSw0yVb/vlWAYkK/YBwk'
>>> seq_seguid('ACGT')
'IQiZThf2zKn/I1KtqStlEdsHYDQ'
>>> seq_seguid('acgt')
'IQiZThf2zKn/I1KtqStlEdsHYDQ'
>>> seq_seguid('acgt', normalize=False)
'lII0AoG1/I8qKY271rgv5CFZtsU'
bioutils.digests.seq_seqhash(seq, normalize=True)[source]

Converts sequence to 24-byte Truncated Digest.

Parameters:
  • seq (str) – A sequence.

  • normalize (bool, optional) – Whether to normalize the sequence before conversion, i.e. to ensure representation as uppercase letters without whitespace or asterisks. Defaults to True.

Returns:

24-byte Truncated Digest representation of sequence.

Return type:

str

Examples

>>> seq_seqhash("")
'z4PhNX7vuL3xVChQ1m2AB9Yg5AULVxXc'
>>> seq_seqhash("ACGT")
'aKF498dAxcJAqme6QYQ7EZ07-fiw8Kw2'
>>> seq_seqhash("acgt")
'aKF498dAxcJAqme6QYQ7EZ07-fiw8Kw2'
>>> seq_seqhash("acgt", normalize=False)
'eFwawHHdibaZBDcs9kW3gm31h1NNJcQe'
bioutils.digests.seq_sha1(seq, normalize=True)[source]

Converts sequence to unicode sha1 hexdigest.

Parameters:
  • seq (str) – A sequence.

  • normalize (bool, optional) – Whether to normalize the sequence before conversion, i.e. to ensure representation as uppercase letters without whitespace or asterisks before encoding. Defaults to True.

Returns:

Unicode sha1 hexdigest representation of sequence.

Return type:

str

Examples

>>> seq_sha1('')
'da39a3ee5e6b4b0d3255bfef95601890afd80709'
>>> seq_sha1('ACGT')
'2108994e17f6cca9ff2352ada92b6511db076034'
>>> seq_sha1('acgt')
'2108994e17f6cca9ff2352ada92b6511db076034'
>>> seq_sha1('acgt', normalize=False)
'9482340281b5fc8f2a298dbbd6b82fe42159b6c5'
bioutils.digests.seq_sha512(seq, normalize=True)[source]

Converts sequence to unicode sha512 hexdigest.

Parameters:
  • seq (str) – A sequence.

  • normalize (bool, optional) – Whether to normalize the sequence before conversion, i.e. to ensure representation as uppercase letters without whitespace or asterisks. Defaults to True.

Returns:

Unicode sha512 hexdigest representation of sequence.

Return type:

str

Examples

>>> seq_sha512('')
'cf83e1357eefb8bdf1542850d66d8007d620e4050b5715dc83f4a921d36ce9ce47d0d13c5d85f2b0ff8318d2877eec2f63b931bd47417a81a538327af927da3e'
>>> seq_sha512('ACGT')
'68a178f7c740c5c240aa67ba41843b119d3bf9f8b0f0ac36cf701d26672964efbd536d197f51ce634fc70634d1eefe575bec34c83247abc52010f6e2bbdb8253'
>>> seq_sha512('acgt')
'68a178f7c740c5c240aa67ba41843b119d3bf9f8b0f0ac36cf701d26672964efbd536d197f51ce634fc70634d1eefe575bec34c83247abc52010f6e2bbdb8253'
>>> seq_sha512('acgt', normalize=False)
'785c1ac071dd89b69904372cf645b7826df587534d25c41edb2862e54fb2940d697218f2883d2bf1a11cdaee658c7f7ab945a1cfd08eb26cbce57ee88790250a'
bioutils.digests.seq_vmc_id(seq, normalize=True)[source]

Converts sequence to VMC id.

See https://github.com/ga4gh/vmc

Parameters:
  • seq (str) – A sequence.

  • normalize (bool, optional) – Whether to normalize the sequence before conversion, i.e. to ensure representation as uppercase letters without whitespace or asterisks. Defaults to True.

Returns:

VMC id representation of sequence.

Return type:

str

Examples

>>> seq_vmc_id("")
'VMC:GS_z4PhNX7vuL3xVChQ1m2AB9Yg5AULVxXc'
>>> seq_vmc_id("ACGT")
'VMC:GS_aKF498dAxcJAqme6QYQ7EZ07-fiw8Kw2'
>>> seq_vmc_id("acgt")
'VMC:GS_aKF498dAxcJAqme6QYQ7EZ07-fiw8Kw2'
>>> seq_vmc_id("acgt", normalize=False)
'VMC:GS_eFwawHHdibaZBDcs9kW3gm31h1NNJcQe'
bioutils.digests.seq_vmc_identifier(seq, normalize=True)[source]

Converts sequence to VMC identifier (record).

See https://github.com/ga4gh/vmc

Parameters:
  • seq (str) – A sequence.

  • normalize (bool, optional) – Whether to normalize the sequence before conversion, i.e. to ensure representation as uppercase letters without whitespace or asterisks. Defaults to True.

Returns:

VMC identifier (record) representation of sequnce.

Return type:

str

Examples

>>> seq_vmc_identifier("") == {'namespace': 'VMC', 'accession': 'GS_z4PhNX7vuL3xVChQ1m2AB9Yg5AULVxXc'}
True
>>> seq_vmc_identifier("ACGT") == {'namespace': 'VMC', 'accession': 'GS_aKF498dAxcJAqme6QYQ7EZ07-fiw8Kw2'}
True
>>> seq_vmc_identifier("acgt") == {'namespace': 'VMC', 'accession': 'GS_aKF498dAxcJAqme6QYQ7EZ07-fiw8Kw2'}
True
>>> seq_vmc_identifier("acgt", normalize=False) == {'namespace': 'VMC', 'accession': 'GS_eFwawHHdibaZBDcs9kW3gm31h1NNJcQe'}
True