bioutils.digests module¶
- bioutils.digests.seq_md5(seq, normalize=True)[source]¶
Converts sequence to unicode md5 hex digest.
- Parameters:
seq (str) – A sequence.
normalize (bool, optional) – Whether to normalize the sequence before conversion, i.e. to ensure representation as uppercase letters without whitespace or asterisks. Defaults to
True
.
- Returns:
Unicode md5 hex digest representation of sequence.
- Return type:
str
Examples
>>> seq_md5('') 'd41d8cd98f00b204e9800998ecf8427e'
>>> seq_md5('ACGT') 'f1f8f4bf413b16ad135722aa4591043e'
>>> seq_md5('ACGT*') 'f1f8f4bf413b16ad135722aa4591043e'
>>> seq_md5(' A C G T ') 'f1f8f4bf413b16ad135722aa4591043e'
>>> seq_md5('acgt') 'f1f8f4bf413b16ad135722aa4591043e'
>>> seq_md5('acgt', normalize=False) 'db516c3913e179338b162b2476d1c23f'
- bioutils.digests.seq_seguid(seq, normalize=True)[source]¶
Converts sequence to seguid.
This seguid is compatible with BioPython’s seguid.
- Parameters:
seq (str) – A sequence.
normalize (bool, optional) – Whether to normalize the sequence before conversion, i.e. to ensure representation as uppercase letters without whitespace or asterisks. Defaults to
True
.
- Returns:
seguid representation of sequence.
- Return type:
str
Examples
>>> seq_seguid('') '2jmj7l5rSw0yVb/vlWAYkK/YBwk'
>>> seq_seguid('ACGT') 'IQiZThf2zKn/I1KtqStlEdsHYDQ'
>>> seq_seguid('acgt') 'IQiZThf2zKn/I1KtqStlEdsHYDQ'
>>> seq_seguid('acgt', normalize=False) 'lII0AoG1/I8qKY271rgv5CFZtsU'
- bioutils.digests.seq_seqhash(seq, normalize=True)[source]¶
Converts sequence to 24-byte Truncated Digest.
- Parameters:
seq (str) – A sequence.
normalize (bool, optional) – Whether to normalize the sequence before conversion, i.e. to ensure representation as uppercase letters without whitespace or asterisks. Defaults to
True
.
- Returns:
24-byte Truncated Digest representation of sequence.
- Return type:
str
Examples
>>> seq_seqhash("") 'z4PhNX7vuL3xVChQ1m2AB9Yg5AULVxXc'
>>> seq_seqhash("ACGT") 'aKF498dAxcJAqme6QYQ7EZ07-fiw8Kw2'
>>> seq_seqhash("acgt") 'aKF498dAxcJAqme6QYQ7EZ07-fiw8Kw2'
>>> seq_seqhash("acgt", normalize=False) 'eFwawHHdibaZBDcs9kW3gm31h1NNJcQe'
- bioutils.digests.seq_sha1(seq, normalize=True)[source]¶
Converts sequence to unicode sha1 hexdigest.
- Parameters:
seq (str) – A sequence.
normalize (bool, optional) – Whether to normalize the sequence before conversion, i.e. to ensure representation as uppercase letters without whitespace or asterisks before encoding. Defaults to
True
.
- Returns:
Unicode sha1 hexdigest representation of sequence.
- Return type:
str
Examples
>>> seq_sha1('') 'da39a3ee5e6b4b0d3255bfef95601890afd80709'
>>> seq_sha1('ACGT') '2108994e17f6cca9ff2352ada92b6511db076034'
>>> seq_sha1('acgt') '2108994e17f6cca9ff2352ada92b6511db076034'
>>> seq_sha1('acgt', normalize=False) '9482340281b5fc8f2a298dbbd6b82fe42159b6c5'
- bioutils.digests.seq_sha512(seq, normalize=True)[source]¶
Converts sequence to unicode sha512 hexdigest.
- Parameters:
seq (str) – A sequence.
normalize (bool, optional) – Whether to normalize the sequence before conversion, i.e. to ensure representation as uppercase letters without whitespace or asterisks. Defaults to
True
.
- Returns:
Unicode sha512 hexdigest representation of sequence.
- Return type:
str
Examples
>>> seq_sha512('') 'cf83e1357eefb8bdf1542850d66d8007d620e4050b5715dc83f4a921d36ce9ce47d0d13c5d85f2b0ff8318d2877eec2f63b931bd47417a81a538327af927da3e'
>>> seq_sha512('ACGT') '68a178f7c740c5c240aa67ba41843b119d3bf9f8b0f0ac36cf701d26672964efbd536d197f51ce634fc70634d1eefe575bec34c83247abc52010f6e2bbdb8253'
>>> seq_sha512('acgt') '68a178f7c740c5c240aa67ba41843b119d3bf9f8b0f0ac36cf701d26672964efbd536d197f51ce634fc70634d1eefe575bec34c83247abc52010f6e2bbdb8253'
>>> seq_sha512('acgt', normalize=False) '785c1ac071dd89b69904372cf645b7826df587534d25c41edb2862e54fb2940d697218f2883d2bf1a11cdaee658c7f7ab945a1cfd08eb26cbce57ee88790250a'
- bioutils.digests.seq_vmc_id(seq, normalize=True)[source]¶
Converts sequence to VMC id.
See https://github.com/ga4gh/vmc
- Parameters:
seq (str) – A sequence.
normalize (bool, optional) – Whether to normalize the sequence before conversion, i.e. to ensure representation as uppercase letters without whitespace or asterisks. Defaults to
True
.
- Returns:
VMC id representation of sequence.
- Return type:
str
Examples
>>> seq_vmc_id("") 'VMC:GS_z4PhNX7vuL3xVChQ1m2AB9Yg5AULVxXc'
>>> seq_vmc_id("ACGT") 'VMC:GS_aKF498dAxcJAqme6QYQ7EZ07-fiw8Kw2'
>>> seq_vmc_id("acgt") 'VMC:GS_aKF498dAxcJAqme6QYQ7EZ07-fiw8Kw2'
>>> seq_vmc_id("acgt", normalize=False) 'VMC:GS_eFwawHHdibaZBDcs9kW3gm31h1NNJcQe'
- bioutils.digests.seq_vmc_identifier(seq, normalize=True)[source]¶
Converts sequence to VMC identifier (record).
See https://github.com/ga4gh/vmc
- Parameters:
seq (str) – A sequence.
normalize (bool, optional) – Whether to normalize the sequence before conversion, i.e. to ensure representation as uppercase letters without whitespace or asterisks. Defaults to
True
.
- Returns:
VMC identifier (record) representation of sequnce.
- Return type:
str
Examples
>>> seq_vmc_identifier("") == {'namespace': 'VMC', 'accession': 'GS_z4PhNX7vuL3xVChQ1m2AB9Yg5AULVxXc'} True
>>> seq_vmc_identifier("ACGT") == {'namespace': 'VMC', 'accession': 'GS_aKF498dAxcJAqme6QYQ7EZ07-fiw8Kw2'} True
>>> seq_vmc_identifier("acgt") == {'namespace': 'VMC', 'accession': 'GS_aKF498dAxcJAqme6QYQ7EZ07-fiw8Kw2'} True
>>> seq_vmc_identifier("acgt", normalize=False) == {'namespace': 'VMC', 'accession': 'GS_eFwawHHdibaZBDcs9kW3gm31h1NNJcQe'} True