API

Here are the main functions a user might consider. I intend the users to import all functions with from isweep import *. I ordered the functions according to their importance in estimating selection coefficients. Some of the parameters with types numpy.array can accept type list. There are some non-essential functions not listed; for example, functions in the isweep.utilities module to create complex *.ne files. “Behind the hood”, object-oriented programming (Class objects) and dynamic programming help simulate the IBD of haplotypes with a fast algorithm.

isweep.slow.empty_function()[source]
isweep.inference.read_ibd_file(ibd_file, header=1, include_length=1)[source]

Create list of IBD segments

Parameters:
  • ibd_file (str) – Name of text file with IBD segments

  • header (int) – 0 for no header

  • include_length (int) – 0 for no length

Returns:

(ID1, ID2, cM length) pairs

Return type:

list

isweep.utilities.bin_ibd_segments(ell, ab)[source]

Put ibd segments into bins

Parameters:
  • ell (numpy.array) – ibd segments

  • ab (numpy.array) – Increasing floats in centiMorgans

Returns:

Observed counts for ibd segment bins

Return type:

numpy.array

isweep.utilities.read_Ne(file)[source]

Read *.ne file

Parameters:

file (str) – Input file name

Returns:

dict[generation] = size

Return type:

dict

isweep.inference.chi2_isweep(s, p0, Ne, n, obs, ab, one_step_model='a', tau0=0, sv=-0.01, ploidy=2)[source]

Chi-squared statistic for sweep model (unlabeled)

Parameters:
  • s (float) – Selection coefficient

  • p0 (float) – Variant frequency at generation 0

  • Ne (dict) – Effective population sizes

  • n (int) – Pairs sample size

  • obs (numpy.array) – Observed counts for ibd segment bins (unlabeled)

  • ab (numpy.array) – Increasing floats in centiMorgans

  • one_step_model (str) – ‘m’, ‘a’, ‘d’, or ‘r’

  • tau0 (int) – Generation at which neutrality begins

  • sv (float) – Allele frequency of standing variation (Default -0.01 will assume de novo sweep)

  • ploidy (int) – 1 for haploid or 2 for diploid

Returns:

Goodness-of-fit statistic

Return type:

float

isweep.inference.chi2_labeled_isweep(s, p0, Ne, n, obs1, obs0, ab, one_step_model='a', tau0=0, sv=-0.01, ploidy=2)[source]

Chi-squared statistic for sweep model (labeled)

Parameters:
  • s (float) – Selection coefficient

  • p0 (float) – Variant frequency at generation 0

  • Ne (dict) – Effective population sizes

  • n (int) – Pairs sample size

  • obs1 (numpy.array) – Observed counts for ibd segment bins (labeled 1)

  • obs0 (numpy.array) – Observed counts for ibd segment bins (labeled 0)

  • ab (numpy.array) – Increasing floats in centiMorgans

  • one_step_model (str) – ‘m’, ‘a’, ‘d’, or ‘r’

  • tau0 (int) – Generation at which neutrality begins

  • sv (float) – Allele frequency of standing variation (Default -0.01 will assume de novo sweep)

  • ploidy (int) – 1 for haploid or 2 for diploid

Returns:

Goodness-of-fit statistic

Return type:

float

isweep.coalescent.simulate_ibd(n, Ne, long_ibd=2.0, short_ibd=1.0, ploidy=2, record_dist=True, pairwise_output=True, left_length=inf, right_length=inf, wf_approx=False)[source]

ibd segments from a coalescent

Parameters:
  • n (int) – Sample size (individuals)

  • Ne (dict) – Effective population sizes

  • long_ibd (float) – cM length threshold

  • short_ibd (float) – cM length threshold

  • ploidy (int) – 1 for haploid or 2 for diploid

  • record_dist (bool) – To save tract length and coalescent time distributions or not to (default True)

  • pairwise_output (bool) – To save pairwise segments or not to (default True)

  • left_length (float) – Distance to the left chromosome end (default numpy.inf)

  • right_length (float) – Distance to the right chromosome end (default numpy.inf)

  • wf_approx (bool) – Use Binomial approximation for early WF process (default False)

Returns:

(number of tracts, group sizes, length distr., time distr., count distr., pairwise segments)

Return type:

tuple

isweep.coalescent.simulate_ibd_constant(n, Ne, long_ibd=2.0, short_ibd=1.0, ploidy=2, record_dist=True, pairwise_output=True, left_length=inf, right_length=inf)[source]

ibd segments from a coalescent w/ constant Ne

Parameters:
  • n (int) – Sample size (individuals)

  • Ne (int) – Constant effective population size

  • long_ibd (float) – cM length threshold

  • short_ibd (float) – cM length threshold

  • ploidy (int) – 1 for haploid or 2 for diploid

  • record_dist (bool) – To save tract length and coalescent time distributions or not to (default True)

  • pairwise_output (bool) – To save pairwise segments or not to (default True)

  • left_length (float) – Distance to the left chromosome end (default numpy.inf)

  • right_length (float) – Distance to the right chromosome end (default numpy.inf)

Returns:

(number of tracts, group sizes, length distr., time distr., count distr., pairwise segments)

Return type:

tuple

isweep.coalescent.simulate_ibd_isweep(n, s, p0, Ne, long_ibd=2.0, short_ibd=1.0, random_walk=True, one_step_model='a', tau0=0, sv=-0.01, ploidy=2, record_dist=True, pairwise_output=True, left_length=inf, right_length=inf, wf_approx=False)[source]

ibd segments from a coalescent with selection

Parameters:
  • n (int) – Sample size (individuals)

  • s (float) – Selection coefficient

  • p0 (float) – Variant frequency at generation 0

  • Ne (dict) – Effective population sizes

  • long_ibd (float) – cM length threshold

  • short_ibd (float) – cM length threshold

  • random_walk (bool) – True for random walk

  • one_step_model (str) – ‘m’, ‘a’, ‘d’, or ‘r’

  • tau0 (int) – Generation when neutrality begins

  • sv (float) – Allele frequency of standing variation (Default -0.01 will assume de novo sweep)

  • ploidy (int) – 1 for haploid or 2 for diploid

  • record_dist (bool) – To save tract length and coalescent time distributions or not to (default True)

  • pairwise_output (bool) – To save pairwise segments or not to (default True)

  • left_length (float) – Distance to the left chromosome end (default numpy.inf)

  • right_length (float) – Distance to the right chromosome end (default numpy.inf)

  • wf_approx (bool) – Use Binomial approximation for early WF process (default False)

Returns:

(all, adaptive allele, non-adaptive allele) then pairwise segments Each tuple is (number of tracts, group sizes, length distr., time distr., count distr.)

Return type:

tuple(s)

isweep.coalescent.simulate_ibd_isweep_tv(n, s_s, g_s, p0, Ne, long_ibd=2.0, short_ibd=1.0, random_walk=True, ploidy=2, record_dist=True, pairwise_output=True, left_length=inf, right_length=inf, wf_approx=False)[source]

ibd segments from a coalescent with selection

Parameters:
  • n (int) – Sample size (individuals)

  • s_s (list) – List of selection coefficients

  • g_s (list) –

    List of transition times for selection coefficients This should be aligned with the s_s parameter

    e.g., s_s=[0.01,0.03,0.02] and g_s=[50,100] means s=0.01 between 0-50, s=0.03 between 50-100 and s=0.02 between 100-

    Should be length one less than s_s

  • p0 (float) – Variant frequency at generation 0

  • Ne (dict) – Effective population sizes

  • long_ibd (float) – cM length threshold

  • short_ibd (float) – cM length threshold

  • random_walk (bool) – True for random walk

  • ploidy (int) – 1 for haploid or 2 for diploid

  • record_dist (bool) – To save tract length and coalescent time distributions or not to (default True)

  • pairwise_output (bool) – To save pairwise segments or not to (default True)

  • left_length (float) – Distance to the left chromosome end (default numpy.inf)

  • right_length (float) – Distance to the right chromosome end (default numpy.inf)

  • wf_approx (bool) – Use Binomial approximation for early WF process (default False)

Returns:

(all, adaptive allele, non-adaptive allele) then pairwise segments Each tuple is (number of tracts, group sizes, length distr., time distr., count distr.)

Return type:

tuple(s)

isweep.coalescent.simulate_ibd_split(n, q, split_gen, Ne, long_ibd=2.0, short_ibd=1.0, ploidy=2, record_dist=True, pairwise_output=True, left_length=inf, right_length=inf, wf_approx=False)[source]

ibd segments from a coalescent with selection

Parameters:
  • n (int) – Sample size (individuals)

  • q (float) – Split proportion

  • split_gen (int) – Time ago of the population split

  • Ne (dict) – Effective population sizes

  • long_ibd (float) – cM length threshold

  • short_ibd (float) – cM length threshold

  • ploidy (int) – 1 for haploid or 2 for diploid

  • record_dist (bool) – To save tract length and coalescent time distributions or not to (default True)

  • pairwise_output (bool) – To save pairwise segments or not to (default True)

  • left_length (float) – Distance to the left chromosome end (default numpy.inf)

  • right_length (float) – Distance to the right chromosome end (default numpy.inf)

  • wf_approx (bool) – Use Binomial approximation for early WF process (default False)

Returns:

(all, first admix allele, second admix allele) then pairwise segments Each tuple is (number of tracts, group sizes, length distr., time distr., count distr.)

Return type:

tuple(s)

isweep.coalescent.basic_coalescent(n)[source]

Simulate times in basic coalescent (scale post-hoc by population size)

Parameters:

n (int) – Sample size

Returns:

Interarrival times

Return type:

numpy.array

isweep.coalescent.varying_Ne_coalescent(n, Ne, ploidy=2, to_tmrca=True)[source]

Simulate times in varying population size coalescent

Parameters:
  • n (int) – Sample size

  • Ne (dict) – Effective population sizes

  • ploidy (int) –

  • to_tmrca (bool) – Go to TMRCA

Returns:

Arrival times in generations

Return type:

numpy.array

isweep.coalescent.walk_variant_backward(s, p0, Ne, random_walk=False, one_step_model='a', tau0=0, sv=-0.01, ploidy=2)[source]

Variant frequencies backward in time

Parameters:
  • s (float) – Selection coefficient

  • p0 (float) – Variant frequency at generation 0

  • Ne (dict) – Effective population sizes

  • random_walk (bool) – True for random walk

  • one_step_model (str) – ‘m’, ‘a’, ‘d’, or ‘r’

  • tau0 (int) – Generation when neutrality begins

  • sv (float) – Allele frequency of standing variation (Default -0.01 will assume de novo sweep)

  • ploidy (int) – 1 for haploid or 2 for diploid

Returns:

NumPy arrays for frequencies and sizes

Return type:

tuple

isweep.coalescent.walk_variant_forward(s, pG, Ne, random_walk=False, one_step_model='a', tau0=0, ploidy=2)[source]

Variant frequencies forward in time

Parameters:
  • s (float) – Selection coefficient

  • pG (float) – Variant frequency at maximum generation

  • Ne (dict) – Effective population sizes

  • random_walk (bool) – True for random walk

  • one_step_model (str) – ‘m’, ‘a’, ‘d’, or ‘r’

  • tau0 (int) – Generation when neutrality begins

  • ploidy (int) – 1 for haploid or 2 for diploid

Returns:

NumPy arrays for frequencies and sizes

Return type:

tuple

isweep.coalescent.walk_variant_backward_tv(s_s, g_s, p0, Ne, random_walk=False, ploidy=2)[source]

Variant frequencies backward in time

Parameters:
  • s_s (list) – List of selection coefficients

  • g_s (list) –

    List of transition times for selection coefficients This should be aligned with the s_s parameter

    e.g., s_s=[0.01,0.03,0.02] and g_s=[50,100] means s=0.01 between 0-50, s=0.03 between 50-100 and s=0.02 between 100-

    Should be length one less than s_s

  • p0 (float) – Variant frequency at generation 0

  • Ne (dict) – Effective population sizes

  • random_walk (bool) – True for random walk

  • ploidy (int) – 1 for haploid or 2 for diploid

Returns:

NumPy arrays for frequencies and sizes

Return type:

tuple

isweep.inference.bootstrap_standard(val, boot, alpha1=0.025, alpha2=0.975)[source]

Implements the standard bootstrap interval estimator

Parameters:
  • val (float) – Parameter estimate

  • boot (numpy.array) – Bootstraps

  • alpha1 (float) – Percentile

  • alpha2 (float) – Percentile

Returns:

(lower, middle, upper) interval estimator

Return type:

tuple

isweep.inference.bootstrap_standard_bc(val, boot, alpha1=0.025, alpha2=0.975)[source]

Implements the standard bootstrap interval estimator (w/ bias-correction)

Parameters:
  • val (float) – Parameter estimate

  • boot (numpy.array) – Bootstraps

  • alpha1 (float) – Percentile

  • alpha2 (float) – Percentile

Returns:

(lower, middle, upper) interval estimator

Return type:

tuple

isweep.inference.bootstrap_percentile(val, boot, alpha1=0.025, alpha2=0.975)[source]

Implements percentile-based interval estimator

Parameters:
  • val (float) – Parameter estimate

  • boot (numpy.array) – Bootstraps

  • alpha1 (float) – Percentile

  • alpha2 (float) – Percentile

Returns:

(lower, middle, upper) interval estimator

Return type:

tuple

isweep.inference.when_freq(maf, s, p0, Ne, random_walk=True, one_step_model='a', tau0=0, sv=-0.01, ploidy=2)[source]

Report when variant frequency reaches set value

Parameters:
  • maf (float) – Variant frequency

  • s (float) – Selection coefficient

  • p0 (float) – Variant frequency at generation 0

  • Ne (dict) – Effective population sizes

  • random_walk (bool) – True for random walk

  • one_step_model (str) – ‘m’, ‘a’, ‘d’, or ‘r’

  • tau0 (int) – Generation when neutrality begins

  • sv (float) – Allele frequency of standing variation (Default -0.01 will assume de novo sweep)

  • ploidy (int) – 1 for haploid or 2 for diploid

Returns:

Generation time

Return type:

int

isweep.inference.bootstrap_freq(maf, B, boots, bootp, Ne, random_walk=True, one_step_model='a', tau0=0, sv=-0.01, ploidy=2)[source]

Parametric bootstrap for variant frequency generation time

Parameters:
  • maf (float) – Variant frequency

  • B (int) – Number of bootstraps

  • boots (numpy.array) – NumPy array of bootstraps for selection coefficient

  • bootp (numpy.array) – NumPy array of bootstraps for variant frequency

  • Ne (dict) – Effective population sizes

  • random_walk (bool) – True for random walk

  • one_step_model (str) – ‘m’, ‘a’, ‘d’, or ‘r’

  • tau0 (int) – Generation when neutrality begins

  • sv (float) – Allele frequency of standing variation (Default -0.01 will assume de novo sweep)

  • ploidy (int) – 1 for haploid or 2 for diploid

Returns:

NumPy array of bootstraps for variant frequency generation time

Return type:

numpy.array

isweep.outgroups.make_ibd_graph(ibd_segments)[source]

Create IBD graph from pairwise segments

Parameters:

ibd_segments (tuple) – Collection of pairwise segments

Returns:

Graph w/ edges if haplotypes have a detectable IBD segment

Return type:

networkx.Graph

isweep.outgroups.diameter_communities(graph, K=3, max_communities=inf)[source]

Method to find connected communities with max diameter 2*K

Parameters:
  • graph (networkx.Graph) –

  • K (int) – Default is max diameter 2*3

  • max_communities (int) – Default is to find all communities

Returns:

Node sets for fully connected subgraphs with max diameter 2*K

Return type:

list

isweep.utilities.write_Ne(Ne, output_file)[source]

Write Ne dictionary to .ne file

Parameters:
  • Ne (dict) – Effective population sizes

  • output_file (str) – File name

Return type:

None

isweep.utilities.make_constant_Ne(file, size, maxg)[source]

Create *.ne file for constant size population

Parameters:
  • file (str) – Output file name

  • size (float) – Effective population size

  • maxg (int) – Maximum generation

Return type:

None

isweep.utilities.make_exponential_Ne(file, size, maxg, rate)[source]

Create *.ne file for exponentially growing population

Parameters:
  • file (str) – Output file name

  • size (float) – Effective population size at generation 0

  • maxg (list (int)) – Maximum generation (s)

  • rate (list (float)) – Exponential growth rate(s)

Return type:

None

isweep.coalescent.probability_ibd(ps, Ns, long_ibd=2, ploidy=2)[source]

Approximate probability of ibd

Parameters:
  • ps (numpy.array) – Variant frequencies (back in time)

  • Ns (numpy.array) – Effective population sizes

  • long_ibd (float) – cM length threshold

  • ploidy (int) – 1 for haploid or 2 for diploid

Returns:

approx P(ell > c) where ell is ibd length

Return type:

float

isweep.coalescent.probability_ibd_isweep(s, p0, Ne, long_ibd=2, one_step_model='a', tau0=0, sv=-0.01, ploidy=2)[source]

Approximate probability of ibd given a sweep model

Parameters:
  • s (float) – Selection coefficient

  • p0 (float) – Variant frequency at generation 0

  • Ne (dict) – Effective population sizes

  • long_ibd (float) – cM length threshold

  • one_step_model (str) – ‘m’, ‘a’, ‘d’, or ‘r’

  • tau0 (int) – Generation when neutrality begins

  • sv (float) – Allele frequency of standing variation (Default -0.01 will assume de novo sweep)

  • ploidy (int) – 1 for haploid or 2 for diploid

Returns:

approx P(ell > c) where ell is ibd length

Return type:

float

isweep.utilities.big_format_distribution(distr, counts)[source]

Reformat a vector for plotting with matplotlib, seaborn

Parameters:
  • distr (array-like) – Vector of realizations (lengths, times, etc.)

  • counts (array-like) – Vector of realization multiplicities

Returns:

Adds copies of realization if multiplicity > 1

Return type:

numpy.array