Documentation
isweep is a Python package and a series of automated workflows to study natural selection with identity-by-descent (IBD) segments. The Python package simulates IBD segments around a locus and estimates selection coefficients. The automated workflows are:
Selection scan: detects selected loci with rigorous multiple testing thresholds
Modeling hard sweeps: estimates location, allele frequency, and selection coefficient of sweep
Case-control scan: detects loci where IBD rates differ between binary cases and controls
Phasing and ancestry: supports haplotype phasing, local ancestry, and kinship inference
Each automated workflow has a dedicated page under Usage. The general way to run these methods is:
Navigate to the appropriate workflow directory
Modify parameters in YAML configuration files
Send the jobs to a cluster with
nohup snakemake [...] &
I made a Zenodo repository with some simulated data to test the workflows. See Testing workflows.
The source code is here
Installation
git clone https://github.com/sdtemple/isweep.git
mamba env create -f isweep-environment.yml
bash get-software.sh
pip install isweep
Note
This project is in a stable state. I am commited to providing quick support via GitHub Issues at least into 2026.
Data Requirements
The main requirements are a tab-separated genetic recombination map and enough samples to detect more than 0 IBD segments at all positions.
Phased haplotypes (VCF files)
Samples with a similar ancestry
No close relatives
Recombining autosomes
If you don’t already have your data phased or cohort selected, we support Phasing and ancestry with another workflow.
In humans, more than 1000 is enough samples, but more than 3000 samples is recommended.
At some point, there are no gains in statistical power with more samples. I do not recommend analyzing more than 100k samples in a biobank. See our Temple and Browning (2025+) publication.
The tree of life is messy. Email or make a GitHub Issue for analysis advice about the nuances of your sample population.
Vignette
Outside of the running the workflows, the main functions are:
isweep.coalescent.simulate_ibd_isweep: generate long IBD segments around a locus (w/ selection)isweep.coalescent.chi2_isweep: use a uniroot finder with this to estimate the selection coefficientisweep.utilities.read_Ne: load in recent effective population sizes
For instance:
from isweep import *
# parameter settings
s = 0.03
p=float(0.5) # allele freq
Ne=read_Ne('constant-10k.ne') # demo history
model='m'
long_ibd=3.0
ab=[long_ibd,np.inf]
nsamples=200
# calculate denominator
ploidy=2
msamples=ploidy*nsamples
N=msamples*(msamples-1)/2-nsamples
# simulate data
out=simulate_ibd_isweep(
nsamples,
s,
p,
Ne,
long_ibd,
long_ibd,
one_step_model=model,
ploidy=ploidy,
)
ibd=out[0][0]
# estimating the selection coefficient
se = minimize_scalar(
chi2_isweep,
args=(p,Ne,N,(ibd,),ab,model,0,-0.01,ploidy),
bounds=(0,0.5),
method='bounded'
).x
print('true selection coefficient')
print(s)
print('estimate selection coefficient')
print(se)
There are also some IPython notebooks as examples in vignettes/.
Citations
This software and its methods are the basis of six publications.
The software Beagle, ibd-ends, hap-ibd, and flare are also used and should be cited.
workflow/scan-selection: hap-ibd and ibd-endsworkflow/scan-case-control: hap-ibd and ibd-endsworkflow/model-selection: hap-ibdworkflow/phasing-ancestry: Beagle, flare, and hap-ibd
Contents
API
- API
empty_function()read_ibd_file()bin_ibd_segments()read_Ne()chi2_isweep()chi2_labeled_isweep()simulate_ibd()simulate_ibd_constant()simulate_ibd_isweep()simulate_ibd_isweep_tv()simulate_ibd_split()basic_coalescent()varying_Ne_coalescent()walk_variant_backward()walk_variant_forward()walk_variant_backward_tv()bootstrap_standard()bootstrap_standard_bc()bootstrap_percentile()when_freq()bootstrap_freq()make_ibd_graph()diameter_communities()write_Ne()make_constant_Ne()make_exponential_Ne()probability_ibd()probability_ibd_isweep()big_format_distribution()
Contact
Seth Temple (sethtem@umich.edu) or GitHub Issues