API
Here are the main functions a user might consider. I intend the users to import all functions with from isweep import *. I ordered the functions according to their importance in estimating selection coefficients. Some of the parameters with types numpy.array can accept type list. There are some non-essential functions not listed; for example, functions in the isweep.utilities module to create complex *.ne files. “Behind the hood”, object-oriented programming (Class objects) and dynamic programming help simulate the IBD of haplotypes with a fast algorithm.
- isweep.inference.read_ibd_file(ibd_file, header=1, include_length=1)[source]
Create list of IBD segments
- isweep.utilities.bin_ibd_segments(ell, ab)[source]
Put ibd segments into bins
- Parameters:
ell (numpy.array) – ibd segments
ab (numpy.array) – Increasing floats in centiMorgans
- Returns:
Observed counts for ibd segment bins
- Return type:
numpy.array
- isweep.inference.chi2_isweep(s, p0, Ne, n, obs, ab, one_step_model='a', tau0=0, sv=-0.01, ploidy=2)[source]
Chi-squared statistic for sweep model (unlabeled)
- Parameters:
s (float) – Selection coefficient
p0 (float) – Variant frequency at generation 0
Ne (dict) – Effective population sizes
n (int) – Pairs sample size
obs (numpy.array) – Observed counts for ibd segment bins (unlabeled)
ab (numpy.array) – Increasing floats in centiMorgans
one_step_model (str) – ‘m’, ‘a’, ‘d’, or ‘r’
tau0 (int) – Generation at which neutrality begins
sv (float) – Allele frequency of standing variation (Default -0.01 will assume de novo sweep)
ploidy (int) – 1 for haploid or 2 for diploid
- Returns:
Goodness-of-fit statistic
- Return type:
- isweep.inference.chi2_labeled_isweep(s, p0, Ne, n, obs1, obs0, ab, one_step_model='a', tau0=0, sv=-0.01, ploidy=2)[source]
Chi-squared statistic for sweep model (labeled)
- Parameters:
s (float) – Selection coefficient
p0 (float) – Variant frequency at generation 0
Ne (dict) – Effective population sizes
n (int) – Pairs sample size
obs1 (numpy.array) – Observed counts for ibd segment bins (labeled 1)
obs0 (numpy.array) – Observed counts for ibd segment bins (labeled 0)
ab (numpy.array) – Increasing floats in centiMorgans
one_step_model (str) – ‘m’, ‘a’, ‘d’, or ‘r’
tau0 (int) – Generation at which neutrality begins
sv (float) – Allele frequency of standing variation (Default -0.01 will assume de novo sweep)
ploidy (int) – 1 for haploid or 2 for diploid
- Returns:
Goodness-of-fit statistic
- Return type:
- isweep.coalescent.simulate_ibd(n, Ne, long_ibd=2.0, short_ibd=1.0, ploidy=2, record_dist=True, pairwise_output=True, left_length=inf, right_length=inf, wf_approx=False)[source]
ibd segments from a coalescent
- Parameters:
n (int) – Sample size (individuals)
Ne (dict) – Effective population sizes
long_ibd (float) – cM length threshold
short_ibd (float) – cM length threshold
ploidy (int) – 1 for haploid or 2 for diploid
record_dist (bool) – To save tract length and coalescent time distributions or not to (default True)
pairwise_output (bool) – To save pairwise segments or not to (default True)
left_length (float) – Distance to the left chromosome end (default numpy.inf)
right_length (float) – Distance to the right chromosome end (default numpy.inf)
wf_approx (bool) – Use Binomial approximation for early WF process (default False)
- Returns:
(number of tracts, group sizes, length distr., time distr., count distr., pairwise segments)
- Return type:
- isweep.coalescent.simulate_ibd_constant(n, Ne, long_ibd=2.0, short_ibd=1.0, ploidy=2, record_dist=True, pairwise_output=True, left_length=inf, right_length=inf)[source]
ibd segments from a coalescent w/ constant Ne
- Parameters:
n (int) – Sample size (individuals)
Ne (int) – Constant effective population size
long_ibd (float) – cM length threshold
short_ibd (float) – cM length threshold
ploidy (int) – 1 for haploid or 2 for diploid
record_dist (bool) – To save tract length and coalescent time distributions or not to (default True)
pairwise_output (bool) – To save pairwise segments or not to (default True)
left_length (float) – Distance to the left chromosome end (default numpy.inf)
right_length (float) – Distance to the right chromosome end (default numpy.inf)
- Returns:
(number of tracts, group sizes, length distr., time distr., count distr., pairwise segments)
- Return type:
- isweep.coalescent.simulate_ibd_isweep(n, s, p0, Ne, long_ibd=2.0, short_ibd=1.0, random_walk=True, one_step_model='a', tau0=0, sv=-0.01, ploidy=2, record_dist=True, pairwise_output=True, left_length=inf, right_length=inf, wf_approx=False)[source]
ibd segments from a coalescent with selection
- Parameters:
n (int) – Sample size (individuals)
s (float) – Selection coefficient
p0 (float) – Variant frequency at generation 0
Ne (dict) – Effective population sizes
long_ibd (float) – cM length threshold
short_ibd (float) – cM length threshold
random_walk (bool) – True for random walk
one_step_model (str) – ‘m’, ‘a’, ‘d’, or ‘r’
tau0 (int) – Generation when neutrality begins
sv (float) – Allele frequency of standing variation (Default -0.01 will assume de novo sweep)
ploidy (int) – 1 for haploid or 2 for diploid
record_dist (bool) – To save tract length and coalescent time distributions or not to (default True)
pairwise_output (bool) – To save pairwise segments or not to (default True)
left_length (float) – Distance to the left chromosome end (default numpy.inf)
right_length (float) – Distance to the right chromosome end (default numpy.inf)
wf_approx (bool) – Use Binomial approximation for early WF process (default False)
- Returns:
(all, adaptive allele, non-adaptive allele) then pairwise segments Each tuple is (number of tracts, group sizes, length distr., time distr., count distr.)
- Return type:
tuple(s)
- isweep.coalescent.simulate_ibd_isweep_tv(n, s_s, g_s, p0, Ne, long_ibd=2.0, short_ibd=1.0, random_walk=True, ploidy=2, record_dist=True, pairwise_output=True, left_length=inf, right_length=inf, wf_approx=False)[source]
ibd segments from a coalescent with selection
- Parameters:
n (int) – Sample size (individuals)
s_s (list) – List of selection coefficients
g_s (list) –
List of transition times for selection coefficients This should be aligned with the s_s parameter
e.g., s_s=[0.01,0.03,0.02] and g_s=[50,100] means s=0.01 between 0-50, s=0.03 between 50-100 and s=0.02 between 100-
Should be length one less than s_s
p0 (float) – Variant frequency at generation 0
Ne (dict) – Effective population sizes
long_ibd (float) – cM length threshold
short_ibd (float) – cM length threshold
random_walk (bool) – True for random walk
ploidy (int) – 1 for haploid or 2 for diploid
record_dist (bool) – To save tract length and coalescent time distributions or not to (default True)
pairwise_output (bool) – To save pairwise segments or not to (default True)
left_length (float) – Distance to the left chromosome end (default numpy.inf)
right_length (float) – Distance to the right chromosome end (default numpy.inf)
wf_approx (bool) – Use Binomial approximation for early WF process (default False)
- Returns:
(all, adaptive allele, non-adaptive allele) then pairwise segments Each tuple is (number of tracts, group sizes, length distr., time distr., count distr.)
- Return type:
tuple(s)
- isweep.coalescent.simulate_ibd_split(n, q, split_gen, Ne, long_ibd=2.0, short_ibd=1.0, ploidy=2, record_dist=True, pairwise_output=True, left_length=inf, right_length=inf, wf_approx=False)[source]
ibd segments from a coalescent with selection
- Parameters:
n (int) – Sample size (individuals)
q (float) – Split proportion
split_gen (int) – Time ago of the population split
Ne (dict) – Effective population sizes
long_ibd (float) – cM length threshold
short_ibd (float) – cM length threshold
ploidy (int) – 1 for haploid or 2 for diploid
record_dist (bool) – To save tract length and coalescent time distributions or not to (default True)
pairwise_output (bool) – To save pairwise segments or not to (default True)
left_length (float) – Distance to the left chromosome end (default numpy.inf)
right_length (float) – Distance to the right chromosome end (default numpy.inf)
wf_approx (bool) – Use Binomial approximation for early WF process (default False)
- Returns:
(all, first admix allele, second admix allele) then pairwise segments Each tuple is (number of tracts, group sizes, length distr., time distr., count distr.)
- Return type:
tuple(s)
- isweep.coalescent.basic_coalescent(n)[source]
Simulate times in basic coalescent (scale post-hoc by population size)
- Parameters:
n (int) – Sample size
- Returns:
Interarrival times
- Return type:
numpy.array
- isweep.coalescent.varying_Ne_coalescent(n, Ne, ploidy=2, to_tmrca=True)[source]
Simulate times in varying population size coalescent
- isweep.coalescent.walk_variant_backward(s, p0, Ne, random_walk=False, one_step_model='a', tau0=0, sv=-0.01, ploidy=2)[source]
Variant frequencies backward in time
- Parameters:
s (float) – Selection coefficient
p0 (float) – Variant frequency at generation 0
Ne (dict) – Effective population sizes
random_walk (bool) – True for random walk
one_step_model (str) – ‘m’, ‘a’, ‘d’, or ‘r’
tau0 (int) – Generation when neutrality begins
sv (float) – Allele frequency of standing variation (Default -0.01 will assume de novo sweep)
ploidy (int) – 1 for haploid or 2 for diploid
- Returns:
NumPy arrays for frequencies and sizes
- Return type:
- isweep.coalescent.walk_variant_forward(s, pG, Ne, random_walk=False, one_step_model='a', tau0=0, ploidy=2)[source]
Variant frequencies forward in time
- Parameters:
s (float) – Selection coefficient
pG (float) – Variant frequency at maximum generation
Ne (dict) – Effective population sizes
random_walk (bool) – True for random walk
one_step_model (str) – ‘m’, ‘a’, ‘d’, or ‘r’
tau0 (int) – Generation when neutrality begins
ploidy (int) – 1 for haploid or 2 for diploid
- Returns:
NumPy arrays for frequencies and sizes
- Return type:
- isweep.coalescent.walk_variant_backward_tv(s_s, g_s, p0, Ne, random_walk=False, ploidy=2)[source]
Variant frequencies backward in time
- Parameters:
s_s (list) – List of selection coefficients
g_s (list) –
List of transition times for selection coefficients This should be aligned with the s_s parameter
e.g., s_s=[0.01,0.03,0.02] and g_s=[50,100] means s=0.01 between 0-50, s=0.03 between 50-100 and s=0.02 between 100-
Should be length one less than s_s
p0 (float) – Variant frequency at generation 0
Ne (dict) – Effective population sizes
random_walk (bool) – True for random walk
ploidy (int) – 1 for haploid or 2 for diploid
- Returns:
NumPy arrays for frequencies and sizes
- Return type:
- isweep.inference.bootstrap_standard(val, boot, alpha1=0.025, alpha2=0.975)[source]
Implements the standard bootstrap interval estimator
- isweep.inference.bootstrap_standard_bc(val, boot, alpha1=0.025, alpha2=0.975)[source]
Implements the standard bootstrap interval estimator (w/ bias-correction)
- isweep.inference.bootstrap_percentile(val, boot, alpha1=0.025, alpha2=0.975)[source]
Implements percentile-based interval estimator
- isweep.inference.when_freq(maf, s, p0, Ne, random_walk=True, one_step_model='a', tau0=0, sv=-0.01, ploidy=2)[source]
Report when variant frequency reaches set value
- Parameters:
maf (float) – Variant frequency
s (float) – Selection coefficient
p0 (float) – Variant frequency at generation 0
Ne (dict) – Effective population sizes
random_walk (bool) – True for random walk
one_step_model (str) – ‘m’, ‘a’, ‘d’, or ‘r’
tau0 (int) – Generation when neutrality begins
sv (float) – Allele frequency of standing variation (Default -0.01 will assume de novo sweep)
ploidy (int) – 1 for haploid or 2 for diploid
- Returns:
Generation time
- Return type:
- isweep.inference.bootstrap_freq(maf, B, boots, bootp, Ne, random_walk=True, one_step_model='a', tau0=0, sv=-0.01, ploidy=2)[source]
Parametric bootstrap for variant frequency generation time
- Parameters:
maf (float) – Variant frequency
B (int) – Number of bootstraps
boots (numpy.array) – NumPy array of bootstraps for selection coefficient
bootp (numpy.array) – NumPy array of bootstraps for variant frequency
Ne (dict) – Effective population sizes
random_walk (bool) – True for random walk
one_step_model (str) – ‘m’, ‘a’, ‘d’, or ‘r’
tau0 (int) – Generation when neutrality begins
sv (float) – Allele frequency of standing variation (Default -0.01 will assume de novo sweep)
ploidy (int) – 1 for haploid or 2 for diploid
- Returns:
NumPy array of bootstraps for variant frequency generation time
- Return type:
numpy.array
- isweep.outgroups.make_ibd_graph(ibd_segments)[source]
Create IBD graph from pairwise segments
- Parameters:
ibd_segments (tuple) – Collection of pairwise segments
- Returns:
Graph w/ edges if haplotypes have a detectable IBD segment
- Return type:
networkx.Graph
- isweep.outgroups.diameter_communities(graph, K=3, max_communities=inf)[source]
Method to find connected communities with max diameter 2*K
- isweep.utilities.make_constant_Ne(file, size, maxg)[source]
Create *.ne file for constant size population
- isweep.utilities.make_exponential_Ne(file, size, maxg, rate)[source]
Create *.ne file for exponentially growing population
- isweep.coalescent.probability_ibd(ps, Ns, long_ibd=2, ploidy=2)[source]
Approximate probability of ibd
- isweep.coalescent.probability_ibd_isweep(s, p0, Ne, long_ibd=2, one_step_model='a', tau0=0, sv=-0.01, ploidy=2)[source]
Approximate probability of ibd given a sweep model
- Parameters:
s (float) – Selection coefficient
p0 (float) – Variant frequency at generation 0
Ne (dict) – Effective population sizes
long_ibd (float) – cM length threshold
one_step_model (str) – ‘m’, ‘a’, ‘d’, or ‘r’
tau0 (int) – Generation when neutrality begins
sv (float) – Allele frequency of standing variation (Default -0.01 will assume de novo sweep)
ploidy (int) – 1 for haploid or 2 for diploid
- Returns:
approx P(ell > c) where ell is ibd length
- Return type:
- isweep.utilities.big_format_distribution(distr, counts)[source]
Reformat a vector for plotting with matplotlib, seaborn
- Parameters:
distr (array-like) – Vector of realizations (lengths, times, etc.)
counts (array-like) – Vector of realization multiplicities
- Returns:
Adds copies of realization if multiplicity > 1
- Return type:
numpy.array