DeepHicIntegrator’s Documentation¶
DeepHicIntegrator permits the integration of a Hi-C matrix with one or several histone marks by interpolating in the latent space of an Autoencoder.
Table of Contents¶
DeepHicIntegrator¶
This tool permits the integration of a Hi-C matrix with one or several histone marks by interpolating in the latent space of an Autoencoder.
Installation¶
Clone the repository¶
git clone https://github.com/kabhel/DeepHicIntegrator.git
cd DeepHicIntegrator
Requirements¶
- A linux distribution.
- Python3 and the following python packages : tensorflow, keras, docopt, schema, pandas, numpy, scipy, matplotlib, sklearn, cooler, hic2cool and m2r (for Sphinx).
pip3 install -r requirements.txt
- A Hi-C matrix in
.hic
file format.
Please, download the GSE63525 HUVEC genome in order to run the toy example.
wget -i ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE63nnn/GSE63525/suppl/GSE63525_HUVEC_combined_30.hic.gz
gunzip GSE63525_HUVEC_combined_30.hic.gz
- One or several histone marks in 2D dimension.
Run the program¶
Toy example¶
./deep_hic_integrator data/hic_matrix/GSE63525_HUVEC_combined_30.hic data/histone_marks/100K/
Get help¶
Usage:
./deep_hic_integrator <HIC_FILE> <HM_PATH> [--resolution INT]
[--chr_train INT]
[--chr_test INT]
[--hist_mark_train STR]
[--square_side INT]
[--epochs INT]
[--batch_size INT]
[--encoder STR]
[--decoder STR]
[--output PATH]
[--help]
Arguments:
<HIC_FILE> Path of the Hi-C matrix file (.hic format)
<HM_PATH> Path of the repository containing the histone mark files
Options:
-r, INT, --resolution INT Resolution representing the number of pair-ended reads
spanning between a pair of bins. [default: 25000]
-a INT, --chr_train INT Chromosome used to train the autoencoder [default: 1]
-t INT, --chr_test INT Chromosome used to test the autoencoder [default: 20]
-m STR, --hist_mark_train STR Name of the histone mark used to train the autoencoder
[default: h3k4me3]
-n INT, --square_side INT Size N*N of a sub-matrix [default: 20]
-p INT, --epochs INT Number of epochs for the training [default: 50]
-b INT, --batch_size INT Batch size for the training [default: 64]
-e STR, --encoder STR Trained encoder model (H5 format) [default: None]
-d STR, --decoder STR Trained decoder model (H5 format) [default: None]
-o PATH, --output PATH Output path [default: results/]
-h, --help Show this
Documentation¶
The documentation is generated with Sphinx and built on ReadTheDocs.
Author¶
Hélène Kabbech : Bioinformatics master student intern at the Medical Center University of Goettingen (Germany)
License¶
This project is licensed under the GNU License.
Implemented classes¶
Autoencoder¶
Matrix¶
-
class
src.matrix.
Hic
(cooler, *args, **kwargs)[source]¶ -
class
Hic
¶ -
This class inherits the Matrix class and set the matrix numpy array for a Hi-C data.
-
cooler
¶ Storage of the Hi-C matrix
Type: cooler
-
class
-
class
src.matrix.
HistoneMark
(bed_file, *args, **kwargs)[source]¶ -
class
HistoneModification
¶ -
This class inherits the Matrix class and set the matrix numpy array for a histone mark.
-
mark_df
¶ Histone modification sparse matrix
Type: Pandas Dataframe
-
class
-
class
src.matrix.
Matrix
(resolution, chrom_num, side)[source]¶ -
class
Matrix
¶ -
This class stores a matrix and different related numpy array, plots and writes this matrix.
-
resolution
¶ Resolution (or bin size) of the matrix
Type: int
-
chrom_num
¶ Chromosome chosen for processing
Type: int
-
side
¶ Square side of a numpy array sub-matrix
Type: int
-
matrix
¶ Matrix stored in a numpy array
Type: numpy array
-
sub_matrices
¶ The matrix is divided into S sub-matrices of size side*side and stored in a numpy array of shape (X, side, side, 1)
Type: numpy array
-
white_sub_matrices_ind
¶ Position of the blank sub-matrices
Type: list
-
total_sub_matrices
¶ Total number of sub-matrices
Type: int
-
latent_spaces
¶ Latent spaces (encoded sub-matrices) stored in a numpy array
Type: numpy array
-
predicted_sub_matrices
¶ Predicted sub_matrices (decoded latent spaces) stored in a numpy array
Type: numpy array
-
plot_distribution_matrix
(matrix_type, path)[source]¶ Plot the distribution of the matrix.
Parameters: - matrix_type (str) – Matrix’s name
- path (str) – Path of the output plot
-
plot_matrix
(matrix_type, color_map, path)[source]¶ The matrix is plotted in a file.
Parameters: - matrix_type (str) – Matrix’s name
- color_map (matplotlib.colors.ListedColormap) – Color map
- path (str) – Path of the output plot
-
plot_sub_matrices
(matrix_type, index_list, color_map, path)[source]¶ 40 random sub-matrices are plotted in a file.
Parameters: - matrix_type (str) – Matrix’s name
- index_list (list) – List of the 40 sub-matrix indexes to plot
- color_map (matplotlib.colors.ListedColormap) – Color map
- path (str) – Path of the output plot
-
set_predicted_latent_spaces
(latent_spaces)[source]¶ Set the latent spaces predicted by the encoder.
Parameters: latent_spaces (numpy array) – The predicted latent_spaces
-
set_predicted_sub_matrices
(predicted_sub_matrices)[source]¶ Set the sub-matrices predicted by the whole autoencoder.
Parameters: predicted_sub_matrices (numpy array) – The predicted sub-matrices
-
class
Interpolation¶
-
class
src.interpolation.
Interpolation
(alphas)[source]¶ -
class
Interpolation
¶ -
This class groups attributes and functions which aim to construct, write in a sparse matrix
-
and plot two or several interpolated matrices.
-
alphas
¶ List of float values to use for the interpolation (alpha parameter)
Type: list
-
interpolated_submatrices
¶ List of all the interpolated sub-matrices. Each item in the list contains an interpolation with a different alpha.
Type: list
-
integrated_matrix
¶ List of all the integrated (interpolated) reconstructed matrices. Each item in the list contains an interpolation with a different alpha.
Type: list
-
construct_integrated_matrix
(hic)[source]¶ Construction of the whole integrated matrices from the interpolated sub-matrices.
Parameters: hic (Hic(Matrix) object) – Hi-C matrix
-
plot_integrated_matrix
(hic, color_map, path)[source]¶ The integrated matrices are plotted for each alpha value.
Parameters:
-
class
-
class
src.interpolation.
InterpolationInLatentSpace
(*args, **kwargs)[source]¶ -
class
InterpolationInLatentSpace
¶ -
This class inherits the Interpolation class and interpolate sub-matrices in the latent space
-
interpolate_latent_spaces
(hist_marks, hic_latent_spaces)[source]¶ Double linear interpolation of the latent spaces of the Hi-C and histone marks.
Parameters: - hist_marks (dict) – Dictionary containing all histone mark HistoneMark objects.
- predicted_hic (numpy array) – Predicted sub-matrices of the Hi-C
-
class
-
class
src.interpolation.
NormalInterpolation
(*args, **kwargs)[source]¶ -
class
InterpolationInLatentSpace
¶ -
This class inherits the Interpolation class and interpolate sub-matrices in the pixel space
-
(= without the use of encoder and decoder).
-
alphas
¶ List of float values to use for the interpolation (alpha parameter)
Type: list
-
interpolated_submatrices
¶ List of all the interpolated sub-matrices. Each item in the list contains an interpolation with a different alpha.
Type: list
-
integrated_matrix
¶ List of all the integrated (interpolated) reconstructed matrices. Each item in the list contains an interpolation with a different alpha.
Type: list
-
interpolate_predicted_img
(hist_marks, predicted_hic)[source]¶ Double linear interpolation of the predicted sub-matrices of the Hi-C and histone marks.
Parameters: - hist_marks (dict) – Dictionary containing all histone mark HistoneMark objects.
- predicted_hic (numpy array) – Predicted sub-matrices of the Hi-C
-
class