bioutils.normalize module¶

Provides functionality for normalizing alleles, ensuring comparable representations.

class bioutils.normalize.NormalizationMode(value)¶

Bases: enum.Enum

Enum passed to normalize to select the normalization mode.

EXPAND¶: Normalize alleles to maximal extent both left and right.

LEFTSHUFFLE¶: Normalize alleles to maximal extent left.

RIGHTSHUFFLE¶: Normalize alleles to maximal extent right.

TRIMONLY¶: Only trim the common prefix and suffix of alleles.

VCF¶: Normalize with VCF.

EXPAND = 1¶

LEFTSHUFFLE = 2¶

RIGHTSHUFFLE = 3¶

TRIMONLY = 4¶

VCF = 5¶

bioutils.normalize.normalize(sequence, interval, alleles, mode=<NormalizationMode.EXPAND: 1>, bounds=None, anchor_length=0)[source]¶

Normalizes the alleles that co-occur on sequence at interval, ensuring comparable representations.

Parameters

sequence (str or iterable) – The reference sequence; must support indexing and __getitem__.
interval (2-tuple of int) – The location of alleles in the reference sequence as (start, end). Interbase coordinates.
alleles (iterable of str) – The sequences to be normalized. The first element corresponds to the reference sequence being unchanged and must be None.
bounds (2-tuple of int, optional) – Maximal extent of normalization left and right. Must be provided if sequence doesn’t support __len__. Defaults to (0, len(sequence)).
mode (NormalizationMode Enum or string, optional) – A NormalizationMode Enum or the corresponding string. Defaults to EXPAND.
anchor (int, optional) – number of flanking residues left and right. Defaults to 0.

Returns

(new_interval, [new_alleles])

Return type

tuple

Raises

ValueError – If normalization mode is VCF and anchor_length is nonzero.
ValueError – If the interval start is greater than the end.
ValueError – If the first (reference) allele is not None.
ValueError – If there are not at least two distinct alleles.

Examples

>>> sequence = "CCCCCCCCACACACACACTAGCAGCAGCA"
>>> normalize(sequence, interval=(22,25), alleles=(None, "GC", "AGCAC"), mode='TRIMONLY')
((22, 24), ('AG', 'G', 'AGCA'))

>>> normalize(sequence, interval=(22, 22), alleles=(None, 'AGC'), mode='RIGHTSHUFFLE')
((29, 29), ('', 'GCA'))

>>> normalize(sequence, interval=(22, 22), alleles=(None, 'AGC'), mode='EXPAND')
((19, 29), ('AGCAGCAGCA', 'AGCAGCAGCAGCA'))

bioutils.normalize.roll_left(sequence, alleles, ref_pos, bound)[source]¶

Determines common distance all alleles can be rolled (circularly permuted) left within the reference sequence without altering it.

Parameters

sequence (str) – The reference sequence.
alleles (list of str) – The sequences to be normalized.
ref_pos (int) – The beginning index for rolling.
bound (int) – The lower bound index in the reference sequence for normalization, hence also for rolling.

Returns

The distance that the alleles can be rolled.

Return type

int

bioutils.normalize.roll_right(sequence, alleles, ref_pos, bound)[source]¶

Determines common distance all alleles can be rolled (circularly permuted) right within the reference sequence without altering it.

Parameters

sequence (str) – The reference sequence.
alleles (list of str) – The sequences to be normalized.
ref_pos (int) – The start index for rolling.
bound (int) – The upper bound index in the reference sequence for normalization, hence also for rolling.

Returns

The distance that the alleles can be rolled

Return type

int

bioutils.normalize.trim_left(alleles)[source]¶

Removes common prefix of given alleles.

Parameters: alleles (list of str) – A list of alleles.
Returns: (number_trimmed, [new_alleles]).
Return type: tuple

Examples

>>> trim_left(["","AA"])
(0, ['', 'AA'])

>>> trim_left(["A","AA"])
(1, ['', 'A'])

>>> trim_left(["AT","AA"])
(1, ['T', 'A'])

>>> trim_left(["AA","AA"])
(2, ['', ''])

>>> trim_left(["CAG","CG"])
(1, ['AG', 'G'])

bioutils.normalize.trim_right(alleles)[source]¶

Removes common suffix of given alleles.

Parameters: alleles (list of str) – A list of alleles.
Returns: (number_trimmed, [new_alleles]).
Return type: tuple

Examples

>>> trim_right(["","AA"])
(0, ['', 'AA'])

>>> trim_right(["A","AA"])
(1, ['', 'A'])

>>> trim_right(["AT","AA"])
(0, ['AT', 'AA'])

>>> trim_right(["AA","AA"])
(2, ['', ''])

>>> trim_right(["CAG","CG"])
(1, ['CA', 'C'])

bioutils.normalize module¶

bioutils

Navigation

Related Topics