-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible modifications to import #1216
Comments
About 1100 genes have one or more modified residue annotations. And there are 2636 annotations in total. Here the counts of how many of each note for the 2636 annotations:
|
@Antonialock are these modifications from experiments or inferred? (I don't think I would infer phosphorsites, but my concern is that they use human/cerevisiae names in the "added by" maybe we should skip the phospho sites because we have pretty good EXP coverage for those @kimrutherford do they have residues associated? |
I guess the way forward
First, I think we need to extend our MOD data format file to include an "assigned by" column so that all of these have "assigned_by" UniProt |
in the current annotation guidelines
|
so there should be an EXP associated with the annotation. Is there any non-overlap...? |
We don't know yet. I checked the 8 annotations at the top of the file Kim provided (see first comment in this ticket), and we only had 6 of them. I suspect we will have most of the phosphosites, but we will filter any redundant ones. |
Yep, they all have a position. |
I had a look at the PSI-MOD OBO file to see how tricky that would be. It's not too bad because there are EXACT synonyms for most of the UniProt modification names. eg.
There are 37 unique modification names we get from the UniProt file and all but 3 have a matching synonym. They are:
Phosphohistidine is likely to be this term (there is an EXACT synonym):
I'm unsure about 3,4-dihydroxyproline as there isn't an exact match. The closest is:
I can't see anything in PSI-MOD that looks like "N6-acetyl-N6-methyllysine". |
Wow KIm you got your curator badge! correct for the first 2 I asked ChatGPT and the second answer was 👍 Here’s why this is the case: Acetylation involves attaching an acetyl group (–COCH₃) to the ε-amino group of lysine, which neutralizes its positive charge and removes the free amine group. Possible Scenarios for "N6-acetyl-N6-methyllysine": In Vitro or Synthetic Chemical Modification: In a controlled laboratory setting, chemists may create synthetic molecules where both modifications appear on the same lysine. These compounds can be useful to study the effects of such modifications on protein function, even though this does not naturally occur in cells. Misnomer or Error: It could also be that the term "N6-acetyl-N6-methyllysine" is used imprecisely, referring to two distinct modification states (either acetylated or methylated lysine) without both modifications actually existing simultaneously on the same nitrogen atom. In summary, while lysine can be either acetylated or methylated at the N6 position, both modifications cannot occur simultaneously on the same residue under normal biological conditions. The phrase "N6-acetyl-N6-methyllysine" might be more theoretical or used in contexts where sequential or competing modifications are considered. |
I think we can ignore N6-acetyl-N6-methyllysine" for now. Which protein is it on? |
:-)
https://www.pombase.org/gene/SPAC1834.03c K6: https://www.uniprot.org/uniprotkb/P09322/feature-viewer It has a reference: https://pubmed.ncbi.nlm.nih.gov/37731000/ |
The changes to load the modifications from UniProt into Chado as PSI-MOD annotation are mostly done. I've still got a bit of testing and configuration to do so I won't commit the changes today. Also I'd like to get these to issues finished and deployed at the same time:
For testing, I have a version with just the UniProt features and the PomBase curated modifications here: https://desktop.kmr.nz/reference/PMID:36408920 There is quite a bit of redundancy in the new modifications. You've curated most of the modifications already so we should think about adding some filtering. |
that's good to know! |
They are now parsed and included in the output of pombase-create-annotations. Refs pombase/pombase-chado#1216
Phosphohistidine and 3,4-dihydroxyproline. Refs pombase/pombase-chado#1216
We now allow "123" as well as "K123" for the residue() extension of modifications. These now appear in the protien feature viewer. Refs pombase/pombase-chado#1216
I think we can close this now. The modifications are in Chado (so appear in the Modifications section on gene pages) and they are in the protein feature viewer. |
This is for the data from UniProt. Refs pombase/curation#3748 Refs pombase/pombase-chado#1216
"MOD:0116" doesn't exist. Fixed to "MOD:00793" for "2,3-didehydroalanine (Cys)" Refs pombase/pombase-chado#1216
From #52 (comment)
Here is a sample of the "Modified residue" data:
SPAC144.13c; │ MOD_RES 62; /note="Phosphoserine"; /evidence="ECO:0000269|PubMed:10921878"
SPBC428.11; │ MOD_RES 210; /note="N6-(pyridoxal phosphate)lysine"; /evidence="ECO:0000250|UniProtKB:P06721"
SPAC22A12.07c; │ MOD_RES 451; /note="Phosphothreonine"; /evidence="ECO:0000269|PubMed:18257517"
SPAC23C4.08; │ MOD_RES 202; /note="Cysteine methyl ester"; /evidence="ECO:0000250|UniProtKB:P62745"
SPAC2F3.09; │ MOD_RES 377; /note="N6-(pyridoxal phosphate)lysine"; /evidence="ECO:0000250|UniProtKB:P18079"
SPAC31G5.15; │ MOD_RES 912; /note="Pyruvic acid (Ser); by autocatalysis"; /evidence="ECO:0000255|HAMAP-Rule:MF_03209"
SPBC428.02c; │ MOD_RES 256; /note="N6-(pyridoxal phosphate)lysine"; /evidence="ECO:0000250"
SPAC10F6.09c; │ MOD_RES 105; /note="N6-acetyllysine"; /evidence="ECO:0000250"
SPAC23C4.08; │ MOD_RES 202; /note="Cysteine methyl ester"; /evidence="ECO:0000250|UniProtKB:P62745"
SPAC10F6.09c; │ MOD_RES 105; /note="N6-acetyllysine"; /evidence="ECO:0000250"
If we can do a mapping for the terms we can add these (will decide once we have the numbers)
we could only import ones for which our sequence matches UniProt.
We could also add an additional check to make sure the residues are sensible (this would be a useful QC check anyway).
( I.e phosphoseringe , only serine)
The text was updated successfully, but these errors were encountered: