2SNP:  Phasing Package

Home

Trio Phasing

Disease Association

Tagging


To download 2SNP package, press

Press here if button does not work

Or send an e-mail to alexz@cs.gsu.edu which includes:
subject: Request for 2SNP Package
body:
1. Name.
2. Affiliation.
3. Maximum datasize (# of genotypes in your data / # of SNPs in your data)

If genotype data are presented in Excel (ACTG) format then you can use interactive web server.
The output will be given in Excel format. To use web server, press


ON JULY 20, 2007 WE RELEASE NEW VERSION OF 2SNP PACKAGE.

2SNP Phasing Package
version 1.7 +TRIO


IT IS FASTER THAN PREVIOUS VERSIONS, IT CAN PHASE DATA WITH TEN
THOUSANDS SNPs IN A MATTER OF MINUTE. THE MAXIMUM NUMBER OF SNPs
WHICH CAN BE PHASED IS 82K FOR 32-bits ARCHITECTURE and 5000K
FOR 64-bits ARCHITECTURE. IT ALSO HANDLES TRIO DATA AND DATA WITH
TWO TYPES OF MISSING SNP VALUES: BOTH ALLELES ARE UNKNOWN OR ONE
ALLELE IS KNOWN.

SPEED COMPARISON WITH VERSION 1.1:
ON INTEL (32-bit) PROCESSOR 3Ghz, RAM 2Gb, OS LINUX

#GENOTYPES = 30
RUNTIME IS AT MOST LINEAR TO THE NUMBER OF GENOTYPES
#SNPs ver.1.1 ver.1.5
1.5K 60 sec 2 sec
2.5K 270 sec 8 sec
5.0K 2000 sec 25 sec
10.0K - 55 sec
20.0K - 220 sec
40.0K - 17 min
60.0K - 35 min
80.0K - 70 min


PROGRAM SPECIFICATION:

Input parametrs:

input_genotype_file_name
output_Haplotype_file_name

Sample running:

./2snp genotypes.txt haplotypes.txt [-TRIO][-HALFKNOWN]

File formats:

input_genotype_file:
One line per genotype, SNPs values are in {0,1,2,?}
0 - homozygous SNP with major allele
1 - homozygous SNP with minor allele
2 - heterozygous SNP
? - missing data
3 - missing data with one known allele equal to 0
4 - missing data with one known allele equal to 1

-TRIO - parameter for trio data, if missed then data are phased as unrelated
-HALFKNOWN - parameter for missing data with one known allele

For the trio data genotypes should be given in triplets:
FATHER 1
MOTHER 1
CHILD 1
FATHER 2
MOTHER 2
CHILD 2
.......

output_haplotype_file:
Two haplotypes per genotype.
One line per haplotype, SNPs values are in {0,1}
0 - major allele SNP
1 - minor allele SNP

The output haplotypes file for TRIO data has format:

for each TRIO there are 4 haplotypes:

non-transmitted haplotype of FATHER 1
non-transmitted haplotype of MOTHER 1
haplotype of child received from FATHER 1
haplotype of child received from MOTHER 1
.........

Mendelian errors presented in TRIO data are reported in mendel.err

Sample input and output:

The input file genotypes.txt contains 9 genotypes (3 trios) each with 96 SNPs

The output file haplotypes-unrel.txt is a result of phasing by 2SNP of genotypes.txt
as unrelated which contains 18 haplotypes each with 96 SNPs.

The output file haplotypes-trio.txt is a result of phasing by 2SNP of genotypes.txt
as trios which contains 12 haplotypes each with 96 SNPs. File mendel.err traks mendelian errors.



Related articles:
  • Brinza, D. and Zelikovsky, A. (2007) `2SNP: Scalable Phasing Method for Trios and Unrelated Individuals', Journal of IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) , 2007.
  • Brinza, D. and Zelikovsky, A. (2006) `2SNP: Scalable Phasing Based on 2-SNP Haplotypes', BIOINFORMATICS , pdf
    Pub 2006 Feb 1;22(3):371-3. Epub 2005 Nov 15.
  • Brinza, D. and Zelikovsky, A. (2006) `Phasing of 2-SNP Genotypes based on Non-Random Mating Model', International Workshop on Bioinformatics Research and Applications (IWBRA'06), Proc. of ICCS 2006, LNCS 3992, pp. 767–774, pdf.

    Contacts:

    Alexander Zelikovsky
    Phone: (404) 651-0676
    Fax: (815) 642-0052
    Email: alexz@cs.gsu.edu
    Office: 1443, Peachtree Str. 34
    web:http://www.cs.gsu.edu/~cscazz/

    Dumitru Brinza
    Phone: (858) 822-2496
    Email: dima@cs.ucsd.edu
    Office: 9500 Gilman Dr., San Diego
    web:http://www-cse.ucsd.edu/~dbrinza/

    ---------------------------------------------------------

    This code may be freely used for all non-commercial purposes.
    (c) Copyright, 2005 by Professor Alexander Zelikovsky
    Department of Computer Science, Georgia State University
    Atlanta, GA 30303 (404) 651-0676
    alexz@cs.gsu.edu http://www.cs.gsu.edu/~cscazz/


    l>