Population Structure Inference using Kernel-pca and Optimisation Population Structure Inference using Kernel-pca and Optimisation

Description

psiko original iconPopulation Structure Inference using Kernel-pca and Optimisation (PSIKO) is a software tool written in C++ for quick and accurate estimation of individual ancestry coefficients of a dataset exhibiting population structure.

PSIKO is based on:

Andrei-Alin Popescu, L. Andrea Harper, Martin Trick, Ian Bancroft, and T. Katharina Huber. A Novel and Fast Approach for Population Structure Inference using Kernel-PCA and optimisation (PSIKO). Genetics, Early Online 2014, 10.1534/genetics.114.171314.

and is a two-step approach. In the first step PSIKO makes use of kernel-PCA as well as bit-level arithmetic for efficient computation of a PCA reduction of a given dataset. In the second step, an iterative least-squares optimisation algorithm is used to infer individual ancestry coefficients from a PCA reduced dataset.

PSIKO takes as input file in the .geno format, see manual (Page 1), with each row consisting of a SNP, and each column consisting of an individual. It then estimates the number of founder populations, outputs ancestry estimates as well as the principal components of the dataset for subsequent use in association studies. For the included example file (found in the archive at Example/OSRMatrix_Complete.txt.geno) this is achieved by typing in the following command:

./PSIKO -i OSRMatrix_Complete.txt.geno

Optionally, the user can also provide a value for the number K of founder populations, instead of allowing PSIKO to infer it. In that case and for the included example file and K=2 this is achieved by typing in the following command

./PSIKO -i OSRMatrix_Complete.txt.geno -K 2

PSIKO2 is an extension of PSIKO that allows for usage of PSIKO within a Mac environment and also for Local Ancestry Inference using a sliding window approach (Popescu and Huber, subm).

 

If you use PSIKO, please cite the following paper:

Andrei-Alin Popescu, L. Andrea Harper, Martin Trick, Ian Bancroft, and T. Katharina Huber. A Novel and Fast Approach for Population Structure Inference using Kernel-PCA and optimisation (PSIKO). Genetics, Early Online 2014, 10.1534/genetics.114.171314.

Availability

The source code of PSIKO and its extension to PSIKO2 is freely available as the file PSIKO.zip (zip,6MB). It comes with an example file (Example/OSRMatrix_Complete.txt.geno) and a binary file compiled for Linux operating systems (PSIKOBinary folder). Details about PSIKO2's (and PSIKO) and how to run it may be found in its accompanying manual (81KB,PDF).

Support

If you have any trouble using PSIKO, please send an email to Andrei-Alin.Popescu@uea.ac.uk or K.Huber@uea.ac.uk.

Disclaimer

This software is supplied as-is, with no warranty of any kind expressed or implied. We have made every effort to avoid errors in design and execution of this software, but we will not be liable for its use or misuse. The user is solely responsible for the validity and consequences of any results generated.

Research Team

Andrei-Alin Popescu, Dr. Andrea L. Harper, Dr. Martin Trick, Prof. Ian BancroftDr. Katharina Huber