Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate code between PCA.py and plot_PCA.py #10

Open
AshlinHarris opened this issue Oct 14, 2021 · 1 comment
Open

Duplicate code between PCA.py and plot_PCA.py #10

AshlinHarris opened this issue Oct 14, 2021 · 1 comment
Labels
maintenance 🛠️ code maintenance, clarity, styling, etc.
Milestone

Comments

@AshlinHarris
Copy link
Contributor

Some code for filtering, analysis, and plotting is duplicated between src/PCA.py and src/plot_PCA.py.

@arisp99
Copy link
Member

arisp99 commented Oct 16, 2021

In particular, the duplicate code is in the following lines:

MIPTools/src/PCA.py

Lines 107 to 163 in 6035a73

pca = decomposition.PCA(n_components=n_components)
pca.fit(variant_table)
X = pca.transform(variant_table)
sample_names = variant_table.index.tolist()
if (meta_data is not None) and (hue_column is not None):
sites = meta_data.set_index("Sample ID").loc[sample_names,
hue_column].values
unique_sites = set(sites)
site_color_dict = dict(zip(unique_sites, cycle(all_colors)))
else:
sites = np.ones(len(sample_names))
site_color_dict = {1: "k"}
fig, axes = plt.subplots(3, 1)
ax = axes[0]
for ctry, color in site_color_dict.items():
ctry_mask = sites == ctry
if len(X[ctry_mask, 0]) > 0:
ax.scatter(X[ctry_mask, 0], X[ctry_mask, 1],
c=color, label=ctry, s=scatter_size)
if meta_data is not None:
ax.legend()
ax.set_xticklabels([])
ax.set_yticklabels([])
ax.set_xlabel("PC1 (%0.1f%%)" % pca.explained_variance_[0],
fontsize=fontsize)
ax.set_ylabel("PC2 (%0.1f%%)" % pca.explained_variance_[1],
fontsize=fontsize)
ax = axes[1]
for ctry, color in site_color_dict.items():
ctry_mask = sites == ctry
if len(X[ctry_mask, 0]) > 0:
ax.scatter(X[ctry_mask, 1], X[ctry_mask, 2],
c=color, label=ctry, s=scatter_size)
if meta_data is not None:
ax.legend()
ax.set_xticklabels([])
ax.set_yticklabels([])
ax.set_xlabel("PC2 (%0.1f%%)" % pca.explained_variance_[1],
fontsize=fontsize)
ax.set_ylabel("PC3 (%0.1f%%)" % pca.explained_variance_[2],
fontsize=fontsize)
ax = axes[2]
for ctry, color in site_color_dict.items():
ctry_mask = sites == ctry
if len(X[ctry_mask, 0]) > 0:
ax.scatter(X[ctry_mask, 0], X[ctry_mask, 2],
c=color, label=ctry, s=scatter_size)
if meta_data is not None:
ax.legend()
ax.set_xticklabels([])
ax.set_yticklabels([])
ax.set_xlabel("PC1 (%0.1f%%)" % pca.explained_variance_[0],
fontsize=fontsize)
ax.set_ylabel("PC3 (%0.1f%%)" % pca.explained_variance_[2],
fontsize=fontsize)
fig.set_size_inches(*fig_size)
fig.set_dpi(fig_dpi)

and:

MIPTools/src/plot_PCA.py

Lines 46 to 93 in 6035a73

pca = decomposition.PCA(n_components=3)
pca.fit(filled_table.T)
X = pca.transform(filled_table.T)
sample_names = filled_table.columns.tolist()
if (meta_data is not None) and (hue_column is not None):
sites = meta_data.set_index("Sample ID").loc[
sample_names, hue_column].values
unique_sites = set(sites)
site_color_dict = dict(zip(unique_sites, cycle(all_colors)))
else:
sites = np.ones(len(sample_names))
site_color_dict = {1: "k"}
fig, axes = plt.subplots(3, 1)
ax = axes[0]
for ctry, color in site_color_dict.items():
ctry_mask = sites == ctry
if len(X[ctry_mask, 0]) > 0:
ax.scatter(X[ctry_mask, 0], X[ctry_mask, 1],
c=color, label=ctry, s=10)
ax.legend()
ax.set_xticklabels([])
ax.set_yticklabels([])
ax.set_xlabel("PC1 (%0.1f%%)" % pca.explained_variance_[0], fontsize=16)
ax.set_ylabel("PC2 (%0.1f%%)" % pca.explained_variance_[1], fontsize=16)
ax = axes[1]
for ctry, color in site_color_dict.items():
ctry_mask = sites == ctry
if len(X[ctry_mask, 0]) > 0:
ax.scatter(X[ctry_mask, 1], X[ctry_mask, 2],
c=color, label=ctry, s=15)
ax.legend()
ax.set_xticklabels([])
ax.set_yticklabels([])
ax.set_xlabel("PC2 (%0.1f%%)" % pca.explained_variance_[1], fontsize=16)
ax.set_ylabel("PC3 (%0.1f%%)" % pca.explained_variance_[2], fontsize=16)
ax = axes[2]
for ctry, color in site_color_dict.items():
ctry_mask = sites == ctry
if len(X[ctry_mask, 0]) > 0:
ax.scatter(X[ctry_mask, 0], X[ctry_mask, 2],
c=color, label=ctry, s=15)
ax.legend()
ax.set_xticklabels([])
ax.set_yticklabels([])
ax.set_xlabel("PC1 (%0.1f%%)" % pca.explained_variance_[0], fontsize=16)
ax.set_ylabel("PC3 (%0.1f%%)" % pca.explained_variance_[2], fontsize=16)
fig.set_size_inches(4, 12)
fig.set_dpi(150)

@AshlinHarris AshlinHarris added this to the 1.5.0 milestone Nov 4, 2021
@arisp99 arisp99 added the maintenance 🛠️ code maintenance, clarity, styling, etc. label Mar 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance 🛠️ code maintenance, clarity, styling, etc.
Projects
None yet
Development

No branches or pull requests

2 participants