-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uniprot data to ingest #52
Comments
This comment was marked as outdated.
This comment was marked as outdated.
Also mentioned here:#1126 |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
|
Thanks Val. Note to self, here's the updated API URL, adding the extra fields: and the commands to update the data file in SVN: cd pombe-embl
curl 'https://rest.uniprot.org/uniprotkb/stream?compressed=true&fields=accession%2Cft_signal%2Cft_transit%2Cxref_pombase%2Cft_binding%2Cft_act_site%2Ccc_catalytic_activity%2Cgene_synonym%2Ccc_ptm%2Cft_mod_res%2Ccc_cofactor%2Ckinetics&format=tsv&query=%28%28organism_id%3A284812%29%29' | gzip -d > external_data/uniprot_data_from_api.tsv
svn commit external_data/uniprot_data_from_api.tsv |
Add new fields to GeneDetails Refs pombase/website#2115 Refs pombase/pombase-chado#52 Refs pombase/pombase-chado#1172
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as resolved.
This comment was marked as resolved.
Review of features
|
I think this is the current situation. Can you confirm and answer my queries about the bold ones? |
I'm going to close this. Everything is done but I have one question for some data we could import. |
final: |
This comment was marked as outdated.
This comment was marked as outdated.
|
OK we can ignore SITE |
OK I checked the first 20 or so cofactor and all are present as IEA GO binding annotations (I expected they would be but I wanted to check). Any examples where the coordinates are known have binding site annotations in the protein viewer (so we can ignore cofactor) |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
I downloaded the coiled coil data from pfam while it was still existed. InterPro doesn't provide coiled coil data in their XML file. We also get the low complexity regions and disordered regions from the pfam download. The file is from 2021 so it's quite out of date now. |
I will ask if InterPro could include it in their XML.... |
We can map all of the: LIPID 485; /note="GPI-anchor amidated serine"; /evidence="ECO:0000255" LIPID 202; /note="S-geranylgeranyl cysteine"; /evidence="ECO:0000250|UniProtKB:P62745" LIPID 404; /note="S-farnesyl cysteine"; /evidence="ECO:0000250" LIPID 116; /note="Phosphatidylethanolamine amidated glycine"; /evidence="ECO:0000250|UniProtKB:P38182" |
LIPID 2; /note="N-myristoyl glycine"; /evidence="ECO:0000269|PubMed:14722091" What does that leave? |
I mailed Interpro to ask if this is possible, but now I started worrying that we have a lot of features from intoPro that incorrect coordinates based on the current Pase sequences (because they are based on UniProt) i.e. everything that has any features does have coordinate changes. |
Last time I tried to install InterProScan it was too hard and I failed. That was a long time ago though and they now provide a helpful Docker image. I'll give it a go. There will be a bit of downstream work because the output format of InterProScan isn't the same as the XML file from InterPro. |
Here's the complete list. Only 51 genes have this type of data. |
The remaining mappings SPBC13G1.11; LIPID 193; /note="S-palmitoyl cysteine"; /evidence="ECO:0000250"; SPBPJ4664.02; LIPID 3944; /note="GPI-anchor amidated alanine"; /evidence="ECO:0000255"0250" SPAC212.08c; LIPID 96; /note="GPI-anchor amidated glycine"; /evidence="ECO:0000255" SPCC1322.10; LIPID 242; /note="GPI-like-anchor amidated asparagine"; /evidence="ECO:0000255" |
I've moved the Lipidation stuff to: And the InterPro task is here: So I think we can close this. |
Most of the information in UniProt is already in GeneDB but
Uniprot may have information about catalytic activities and residues which we could potentially load into Chado
Original comment by: ValWood
The text was updated successfully, but these errors were encountered: