API

Here are the main functions a user might consider. I intend the users to import all functions with from isweep import *. I ordered the functions according to their importance in estimating selection coefficients. Some of the parameters with types numpy.array can accept type list. There are some non-essential functions not listed; for example, functions in the isweep.utilities module to create complex *.ne files. “Behind the hood”, object-oriented programming (Class objects) and dynamic programming help simulate the IBD of haplotypes with a fast algorithm.

isweep.slow.empty_function()[source]

isweep.inference.read_ibd_file(ibd_file, header=1, include_length=1)[source]

Create list of IBD segments

Parameters:

ibd_file (str) – Name of text file with IBD segments
header (int) – 0 for no header
include_length (int) – 0 for no length

Returns:

(ID1, ID2, cM length) pairs

Return type:

list

isweep.utilities.bin_ibd_segments(ell, ab)[source]

Put ibd segments into bins

Parameters:

ell (numpy.array) – ibd segments
ab (numpy.array) – Increasing floats in centiMorgans

Returns:

Observed counts for ibd segment bins

Return type:

numpy.array

isweep.utilities.read_Ne(file)[source]

Read *.ne file

Parameters:: file (str) – Input file name
Returns:: dict[generation] = size
Return type:: dict

isweep.inference.chi2_isweep(s, p0, Ne, n, obs, ab, one_step_model='a', tau0=0, sv=-0.01, ploidy=2)[source]

Chi-squared statistic for sweep model (unlabeled)

Parameters:

s (float) – Selection coefficient
p0 (float) – Variant frequency at generation 0
Ne (dict) – Effective population sizes
n (int) – Pairs sample size
obs (numpy.array) – Observed counts for ibd segment bins (unlabeled)
ab (numpy.array) – Increasing floats in centiMorgans
one_step_model (str) – ‘m’, ‘a’, ‘d’, or ‘r’
tau0 (int) – Generation at which neutrality begins
sv (float) – Allele frequency of standing variation (Default -0.01 will assume de novo sweep)
ploidy (int) – 1 for haploid or 2 for diploid

Returns:

Goodness-of-fit statistic

Return type:

float

isweep.inference.chi2_labeled_isweep(s, p0, Ne, n, obs1, obs0, ab, one_step_model='a', tau0=0, sv=-0.01, ploidy=2)[source]

Chi-squared statistic for sweep model (labeled)

Parameters:

s (float) – Selection coefficient
p0 (float) – Variant frequency at generation 0
Ne (dict) – Effective population sizes
n (int) – Pairs sample size
obs1 (numpy.array) – Observed counts for ibd segment bins (labeled 1)
obs0 (numpy.array) – Observed counts for ibd segment bins (labeled 0)
ab (numpy.array) – Increasing floats in centiMorgans
one_step_model (str) – ‘m’, ‘a’, ‘d’, or ‘r’
tau0 (int) – Generation at which neutrality begins
sv (float) – Allele frequency of standing variation (Default -0.01 will assume de novo sweep)
ploidy (int) – 1 for haploid or 2 for diploid

Returns:

Goodness-of-fit statistic

Return type:

float

isweep.coalescent.simulate_ibd(n, Ne, long_ibd=2.0, short_ibd=1.0, ploidy=2, record_dist=True, pairwise_output=True, left_length=inf, right_length=inf, wf_approx=False)[source]

ibd segments from a coalescent

Parameters:

n (int) – Sample size (individuals)
Ne (dict) – Effective population sizes
long_ibd (float) – cM length threshold
short_ibd (float) – cM length threshold
ploidy (int) – 1 for haploid or 2 for diploid
record_dist (bool) – To save tract length and coalescent time distributions or not to (default True)
pairwise_output (bool) – To save pairwise segments or not to (default True)
left_length (float) – Distance to the left chromosome end (default numpy.inf)
right_length (float) – Distance to the right chromosome end (default numpy.inf)
wf_approx (bool) – Use Binomial approximation for early WF process (default False)

Returns:

(number of tracts, group sizes, length distr., time distr., count distr., pairwise segments)

Return type:

tuple

isweep.coalescent.simulate_ibd_constant(n, Ne, long_ibd=2.0, short_ibd=1.0, ploidy=2, record_dist=True, pairwise_output=True, left_length=inf, right_length=inf)[source]

ibd segments from a coalescent w/ constant Ne

Parameters:

n (int) – Sample size (individuals)
Ne (int) – Constant effective population size
long_ibd (float) – cM length threshold
short_ibd (float) – cM length threshold
ploidy (int) – 1 for haploid or 2 for diploid
record_dist (bool) – To save tract length and coalescent time distributions or not to (default True)
pairwise_output (bool) – To save pairwise segments or not to (default True)
left_length (float) – Distance to the left chromosome end (default numpy.inf)
right_length (float) – Distance to the right chromosome end (default numpy.inf)

Returns:

(number of tracts, group sizes, length distr., time distr., count distr., pairwise segments)

Return type:

tuple

isweep.coalescent.simulate_ibd_isweep(n, s, p0, Ne, long_ibd=2.0, short_ibd=1.0, random_walk=True, one_step_model='a', tau0=0, sv=-0.01, ploidy=2, record_dist=True, pairwise_output=True, left_length=inf, right_length=inf, wf_approx=False)[source]

ibd segments from a coalescent with selection

Parameters:

n (int) – Sample size (individuals)
s (float) – Selection coefficient
p0 (float) – Variant frequency at generation 0
Ne (dict) – Effective population sizes
long_ibd (float) – cM length threshold
short_ibd (float) – cM length threshold
random_walk (bool) – True for random walk
one_step_model (str) – ‘m’, ‘a’, ‘d’, or ‘r’
tau0 (int) – Generation when neutrality begins
sv (float) – Allele frequency of standing variation (Default -0.01 will assume de novo sweep)
ploidy (int) – 1 for haploid or 2 for diploid
record_dist (bool) – To save tract length and coalescent time distributions or not to (default True)
pairwise_output (bool) – To save pairwise segments or not to (default True)
left_length (float) – Distance to the left chromosome end (default numpy.inf)
right_length (float) – Distance to the right chromosome end (default numpy.inf)
wf_approx (bool) – Use Binomial approximation for early WF process (default False)

Returns:

(all, adaptive allele, non-adaptive allele) then pairwise segments Each tuple is (number of tracts, group sizes, length distr., time distr., count distr.)

Return type:

tuple(s)

isweep.coalescent.simulate_ibd_isweep_tv(n, s_s, g_s, p0, Ne, long_ibd=2.0, short_ibd=1.0, random_walk=True, ploidy=2, record_dist=True, pairwise_output=True, left_length=inf, right_length=inf, wf_approx=False)[source]

ibd segments from a coalescent with selection

Parameters:

n (int) – Sample size (individuals)
s_s (list) – List of selection coefficients
g_s (list) –
List of transition times for selection coefficients This should be aligned with the s_s parameter

e.g., s_s=[0.01,0.03,0.02] and g_s=[50,100] means s=0.01 between 0-50, s=0.03 between 50-100 and s=0.02 between 100-

Should be length one less than s_s
p0 (float) – Variant frequency at generation 0
Ne (dict) – Effective population sizes
long_ibd (float) – cM length threshold
short_ibd (float) – cM length threshold
random_walk (bool) – True for random walk
ploidy (int) – 1 for haploid or 2 for diploid
record_dist (bool) – To save tract length and coalescent time distributions or not to (default True)
pairwise_output (bool) – To save pairwise segments or not to (default True)
left_length (float) – Distance to the left chromosome end (default numpy.inf)
right_length (float) – Distance to the right chromosome end (default numpy.inf)
wf_approx (bool) – Use Binomial approximation for early WF process (default False)

Returns:

(all, adaptive allele, non-adaptive allele) then pairwise segments Each tuple is (number of tracts, group sizes, length distr., time distr., count distr.)

Return type:

tuple(s)

isweep.coalescent.simulate_ibd_split(n, q, split_gen, Ne, long_ibd=2.0, short_ibd=1.0, ploidy=2, record_dist=True, pairwise_output=True, left_length=inf, right_length=inf, wf_approx=False)[source]

ibd segments from a coalescent with selection

Parameters:

n (int) – Sample size (individuals)
q (float) – Split proportion
split_gen (int) – Time ago of the population split
Ne (dict) – Effective population sizes
long_ibd (float) – cM length threshold
short_ibd (float) – cM length threshold
ploidy (int) – 1 for haploid or 2 for diploid
record_dist (bool) – To save tract length and coalescent time distributions or not to (default True)
pairwise_output (bool) – To save pairwise segments or not to (default True)
left_length (float) – Distance to the left chromosome end (default numpy.inf)
right_length (float) – Distance to the right chromosome end (default numpy.inf)
wf_approx (bool) – Use Binomial approximation for early WF process (default False)

Returns:

(all, first admix allele, second admix allele) then pairwise segments Each tuple is (number of tracts, group sizes, length distr., time distr., count distr.)

Return type:

tuple(s)

isweep.coalescent.basic_coalescent(n)[source]

Simulate times in basic coalescent (scale post-hoc by population size)

Parameters:: n (int) – Sample size
Returns:: Interarrival times
Return type:: numpy.array

isweep.coalescent.varying_Ne_coalescent(n, Ne, ploidy=2, to_tmrca=True)[source]

Simulate times in varying population size coalescent

Parameters:

n (int) – Sample size
Ne (dict) – Effective population sizes
ploidy (int) –
to_tmrca (bool) – Go to TMRCA

Returns:

Arrival times in generations

Return type:

numpy.array

isweep.coalescent.walk_variant_backward(s, p0, Ne, random_walk=False, one_step_model='a', tau0=0, sv=-0.01, ploidy=2)[source]

Variant frequencies backward in time

Parameters:

s (float) – Selection coefficient
p0 (float) – Variant frequency at generation 0
Ne (dict) – Effective population sizes
random_walk (bool) – True for random walk
one_step_model (str) – ‘m’, ‘a’, ‘d’, or ‘r’
tau0 (int) – Generation when neutrality begins
sv (float) – Allele frequency of standing variation (Default -0.01 will assume de novo sweep)
ploidy (int) – 1 for haploid or 2 for diploid

Returns:

NumPy arrays for frequencies and sizes

Return type:

tuple

isweep.coalescent.walk_variant_forward(s, pG, Ne, random_walk=False, one_step_model='a', tau0=0, ploidy=2)[source]

Variant frequencies forward in time

Parameters:

s (float) – Selection coefficient
pG (float) – Variant frequency at maximum generation
Ne (dict) – Effective population sizes
random_walk (bool) – True for random walk
one_step_model (str) – ‘m’, ‘a’, ‘d’, or ‘r’
tau0 (int) – Generation when neutrality begins
ploidy (int) – 1 for haploid or 2 for diploid

Returns:

NumPy arrays for frequencies and sizes

Return type:

tuple

isweep.coalescent.walk_variant_backward_tv(s_s, g_s, p0, Ne, random_walk=False, ploidy=2)[source]

Variant frequencies backward in time

Parameters:

s_s (list) – List of selection coefficients
g_s (list) –
List of transition times for selection coefficients This should be aligned with the s_s parameter

e.g., s_s=[0.01,0.03,0.02] and g_s=[50,100] means s=0.01 between 0-50, s=0.03 between 50-100 and s=0.02 between 100-

Should be length one less than s_s
p0 (float) – Variant frequency at generation 0
Ne (dict) – Effective population sizes
random_walk (bool) – True for random walk
ploidy (int) – 1 for haploid or 2 for diploid

Returns:

NumPy arrays for frequencies and sizes

Return type:

tuple

isweep.inference.bootstrap_standard(val, boot, alpha1=0.025, alpha2=0.975)[source]

Implements the standard bootstrap interval estimator

Parameters:

val (float) – Parameter estimate
boot (numpy.array) – Bootstraps
alpha1 (float) – Percentile
alpha2 (float) – Percentile

Returns:

(lower, middle, upper) interval estimator

Return type:

tuple

isweep.inference.bootstrap_standard_bc(val, boot, alpha1=0.025, alpha2=0.975)[source]

Implements the standard bootstrap interval estimator (w/ bias-correction)

Parameters:

val (float) – Parameter estimate
boot (numpy.array) – Bootstraps
alpha1 (float) – Percentile
alpha2 (float) – Percentile

Returns:

(lower, middle, upper) interval estimator

Return type:

tuple

isweep.inference.bootstrap_percentile(val, boot, alpha1=0.025, alpha2=0.975)[source]

Implements percentile-based interval estimator

Parameters:

val (float) – Parameter estimate
boot (numpy.array) – Bootstraps
alpha1 (float) – Percentile
alpha2 (float) – Percentile

Returns:

(lower, middle, upper) interval estimator

Return type:

tuple

isweep.inference.when_freq(maf, s, p0, Ne, random_walk=True, one_step_model='a', tau0=0, sv=-0.01, ploidy=2)[source]

Report when variant frequency reaches set value

Parameters:

maf (float) – Variant frequency
s (float) – Selection coefficient
p0 (float) – Variant frequency at generation 0
Ne (dict) – Effective population sizes
random_walk (bool) – True for random walk
one_step_model (str) – ‘m’, ‘a’, ‘d’, or ‘r’
tau0 (int) – Generation when neutrality begins
sv (float) – Allele frequency of standing variation (Default -0.01 will assume de novo sweep)
ploidy (int) – 1 for haploid or 2 for diploid

Returns:

Generation time

Return type:

int

isweep.inference.bootstrap_freq(maf, B, boots, bootp, Ne, random_walk=True, one_step_model='a', tau0=0, sv=-0.01, ploidy=2)[source]

Parametric bootstrap for variant frequency generation time

Parameters:

maf (float) – Variant frequency
B (int) – Number of bootstraps
boots (numpy.array) – NumPy array of bootstraps for selection coefficient
bootp (numpy.array) – NumPy array of bootstraps for variant frequency
Ne (dict) – Effective population sizes
random_walk (bool) – True for random walk
one_step_model (str) – ‘m’, ‘a’, ‘d’, or ‘r’
tau0 (int) – Generation when neutrality begins
sv (float) – Allele frequency of standing variation (Default -0.01 will assume de novo sweep)
ploidy (int) – 1 for haploid or 2 for diploid

Returns:

NumPy array of bootstraps for variant frequency generation time

Return type:

numpy.array

isweep.outgroups.make_ibd_graph(ibd_segments)[source]

Create IBD graph from pairwise segments

Parameters:: ibd_segments (tuple) – Collection of pairwise segments
Returns:: Graph w/ edges if haplotypes have a detectable IBD segment
Return type:: networkx.Graph

isweep.outgroups.diameter_communities(graph, K=3, max_communities=inf)[source]

Method to find connected communities with max diameter 2*K

Parameters:

graph (networkx.Graph) –
K (int) – Default is max diameter 2*3
max_communities (int) – Default is to find all communities

Returns:

Node sets for fully connected subgraphs with max diameter 2*K

Return type:

list

isweep.utilities.write_Ne(Ne, output_file)[source]

Write Ne dictionary to .ne file

Parameters:

Ne (dict) – Effective population sizes
output_file (str) – File name

Return type:

None

isweep.utilities.make_constant_Ne(file, size, maxg)[source]

Create *.ne file for constant size population

Parameters:

file (str) – Output file name
size (float) – Effective population size
maxg (int) – Maximum generation

Return type:

None

isweep.utilities.make_exponential_Ne(file, size, maxg, rate)[source]

Create *.ne file for exponentially growing population

Parameters:

file (str) – Output file name
size (float) – Effective population size at generation 0
maxg (list (int)) – Maximum generation (s)
rate (list (float)) – Exponential growth rate(s)

Return type:

None

isweep.coalescent.probability_ibd(ps, Ns, long_ibd=2, ploidy=2)[source]

Approximate probability of ibd

Parameters:

ps (numpy.array) – Variant frequencies (back in time)
Ns (numpy.array) – Effective population sizes
long_ibd (float) – cM length threshold
ploidy (int) – 1 for haploid or 2 for diploid

Returns:

approx P(ell > c) where ell is ibd length

Return type:

float

isweep.coalescent.probability_ibd_isweep(s, p0, Ne, long_ibd=2, one_step_model='a', tau0=0, sv=-0.01, ploidy=2)[source]

Approximate probability of ibd given a sweep model

Parameters:

s (float) – Selection coefficient
p0 (float) – Variant frequency at generation 0
Ne (dict) – Effective population sizes
long_ibd (float) – cM length threshold
one_step_model (str) – ‘m’, ‘a’, ‘d’, or ‘r’
tau0 (int) – Generation when neutrality begins
sv (float) – Allele frequency of standing variation (Default -0.01 will assume de novo sweep)
ploidy (int) – 1 for haploid or 2 for diploid

Returns:

approx P(ell > c) where ell is ibd length

Return type:

float

isweep.utilities.big_format_distribution(distr, counts)[source]

Reformat a vector for plotting with matplotlib, seaborn

Parameters:

distr (array-like) – Vector of realizations (lengths, times, etc.)
counts (array-like) – Vector of realization multiplicities

Returns:

Adds copies of realization if multiplicity > 1

Return type:

numpy.array