Search and analysis of amino acid sequences of the three isoforms of glutamate dehydrogenase (EC 1.4.1.[2/3/4])
Flux RSS

A. Search for all amino acid sequences of GDH

  • 1. Go to the NCBI.
  • 2. Enter : "glutamate dehydrogenase OR GDH".
  • 4. In the menu, select "Protein", then "Search".
  • 5. Choose the option "Advanced" . This allows the use of key-words and a boolean logical search ("AND" / "OR" / "NOT")

The number of hits is indicated on the right of the screen. To see the results ("Summary"), click on this number in your browser.

Retour haut de page

1. Page "Protein Advanced Search Builder" : fields of the "Builder". In the "History", select your desired search by clicking on either #1 or #2, ...

2. In a new field of the menu, paste the following terms : (this list has not been updated)

decarboxylase NOT topoisomerase NOT monooxygenase NOT transaminase NOT kinase NOT oxidase NOT thioredoxin NOT glycerate NOT glyoxylate NOT glucose NOT glutamine NOT glutamyl NOT glucarate NOT glycerol NOT proline NOT valine NOT semialdehyde NOT aldehyde NOT glyceraldehyde NOT dihydropyrimidine NOT formyltetrahydrofolate NOT fatty NOT isocitrate NOT saccharopine NOT methylmalonate NOT coenzyme NOT glutathione NOT quinone NOT ammonium NOT histidinol NOT carboxylate NOT alcohol NOT purine NOT sulfide NOT putative NOT similar NOT hypothetical NOT probable NOT related NOT similarity NOT homolog NOT homologue NOT synthetic NOT unknown NOT mutant NOT unnamed NOT imported NOT validated NOT partial NOT peptide NOT chain NOT line NOT tentative NOT supported NOT patent NOT expressed NOT transcript NOT precursor NOT collection NOT regulator NOT anion NOT yweB NOT ypcA NOT NAGSA NOT P5C NOT GSA NOT RIKEN

3. Click on boolean "NOT". All keywords and booleans are written in the main field (top of the page). Click on "Search".

Questions :

  • What is the goal of this selection ?
  • What is the consequence of the boolean "AND", "OR" and "NOT" ?

4. Field "Builder", choose "Sequence length"; Tape : "400:1700"; Select the boolean "AND"; Click on "Search"

Question : what is the goal of this selection ?

Retour haut de page

B. Removing redondant sequences

This part is the most tedious and time - consuming one, since for each organism, the redondant sequences must be removed. This can be made using Multalin.

Questions :

  • What type of file could be used to know the name of the organism ?
  • Why are there multiple files for the same protein from the same organism ?
  • To what kind of information are linked the various accession numbers in those files ?
  • 1. Field "Add Term(s)", choose option "Organism"
  • 2. Tape the name chosen : for example "Agaricus bisporus"
  • 3. Click on boolean "AND"
  • 4. Click on "Preview".
  • 5. Click on the number corresponding to the hits returned. The files "Summary" are returned
  • 6. Field "Display", choose the option "Fasta". This is one of the various format of data used by the algorithms of sequences alignment
  • 7. Click on "Display". The files in FASTA format are returned
  • 8. Field "Send to", choose "Text" : a new HTML page is returned. Copy the data

Go back to the the NCBI search window.

1. Copy the following accession numbers of redondant files to be removed. Paste them in the main field (top of the page of the the NCBI search window) :

NOT CAA58312 NOT NP_692731 NOT CAD58715 NOT T49883 NOT CAB87933 NOT AAB01222 NOT S71217 NOT AAA82615 NOT A25275 NOT AAA34642 NOT CAA67475 NOT NP_111279 NOT NP_111278 NOT NP_460265 NOT AAO68835 NOT NP_456213 NOT CAD02055 NOT JN0854 NOT AAL81726 NOT AAA83390 NOT D75176 NOT CAB49491 NOT AAL64915 NOT AAL63869 NOT AAK42230 NOT AAK42099 NOT AAK42126 NOT AAK41684 NOT CAA40341 NOT AAO77080 NOT AAO77077 NOT BAB42058 NOT BAB94705 NOT NP_645657 NOT NP_371482 NOT BAB57120 NOT AAO04251 NOT AE3467 NOT AAL52904 NOT NP_539149 NOT CAD21426 NOT 1919235A NOT CAD63684 NOT NP_761460 NOT NP_761459 NOT AAO35861 NOT AAA62756 NOT AAK78713 NOT AAL94684 NOT AAN80621 NOT AAG56747 NOT CAA25495 NOT AAA87979 NOT AAN00206 NOT AAK99984 NOT AAK75409 NOT AAN24452 NOT AAM87403 NOT AAM24566 NOT AAM24435 NOT CAA51376 NOT BAB75954 NOT S77064 NOT CAA54601 NOT AAG19574 NOT AAG18779 NOT CAA45327 NOT BAB07661 NOT BAB06437 NOT BAB05341 NOT BAB05820 NOT AAC63990 NOT AAB40142 NOT BAA08445 NOT CAB94836 NOT CAA69600 NOT CAA69601 NOT CAA34252 NOT AAA29155 NOT AAB20267 NOT AAA25611 NOT AAN36776 NOT CAA73390 NOT AAK77969 NOT CAA46994 NOT AAA52525 NOT AAM73240 NOT AAB20267 NOT S06938 NOT CAA34434

(This list is indicative.)

3. Click on "Search" :

  • Field "Display", choose the option "Fasta"
  • Click on "Display". The files in FASTA format are returned
  • Field "Send to", choose "Text" and copy the data
  • Save the file

Retour haut de page

C. Selection of the GDH sequences as a function of the EC number, the Viridiplantae belonging and the size of the polypeptide chain.

1. Subset A : EC 1.4.1.2 isoform from Viridiplantae with amino acids length range [411 : 470]

type of GDH field Add Term(s) hits ("Search")
option tape boolean
EC 1.4.1.2 isoform EC/RN Number 1.4.1.2 AND 20 (*)
Viridiplantae Organism viridiplantae AND 9
length range [411 : 470] Sequence length 411:470 AND 9
See the taxonomy for Viridiplantae.
  • Click on the last number (9), then on "Display" with the option "Fasta"
  • Field "Send to", choose "Text" and copy the data
  • Using a text editor, save the file choosing the "courrier" font, size

Get the files in FASTA format (compressed) : Subset A (Tgz)

Retour haut de page

2. Subset B : EC 1.4.1.2 isoform from NOT Viridiplantae with amino acids length range [411 : 470]

Go back to the the "Preview" window (*) : 20 hits (EC 1.4.1.2 isoform)

type of GDH field Add Term(s) hits ("Search")
option tape boolean
NOT Viridiplantae Organism viridiplantae NOT 11 (*)
length range [411 : 470] Sequence length 411:470 AND 8

Proceed as before for saving.

Get the files in FASTA format (compressed) : Subset B (Tgz)

Retour haut de page

3. Subset C : EC 1.4.1.2 isoform from NOT Viridiplantae with amino acids length range [1607 : 1651]

Go back to the the "Preview" window (*) : 11 hits (NOT Viridiplantae)

type of GDH field Add Term(s) hits ("Search")
option tape boolean
length range [411 : 470] Sequence length 1607:1651 AND 3

Proceed as before for saving.

Get the files in FASTA format (compressed) : Subset C (Tgz)

... and so on for all other GDH EC 1.4.1.3, EC 1.4.1.3 and NOT EC CLASSIFIED.

Retour haut de page

D. Summary of the 116 full sequences of GDH from 83 organisms classified in 15 subsets

The table below indicates the number of polypeptide sequences of GDH for each subset (letter).

See the classification by organism (Word document - compressed files) => Organisms (Tgz)

EC number Viridiplantae not Viridiplantae total
L1 L2 L3 L4 L1 L2 L3 L4
1.4.1.2 9 (A)       7 (B)     1 (C) 17
1.4.1.3 1 (D)       6 (E) 7 (F)     14
1.4.1.4   2 (Ref)     15 (G)       17
not classified 6 (H)       13 (I1) 18 (I2) 14 (I3) 5 (J) 5 (K) 7 (L) 68
total 16 2     73 12 5 8 116

Retour haut de page

E. Analysis of the full sequences of GDH classified in 15 subsets

Get the GDH sequences from each subset in FASTA format (ZIP compressed) from the table below.

The length range of the polypeptide chains are : L1 = [411 : 470] - L2 = [503 : 558] - L3 = [1029 : 1106] - L4 = [1607 : 1651]

EC number Viridiplantae not Viridiplantae
L1 L2 L3 L4 L1 L2 L3 L4
1.4.1.2 Subset A ----- ----- ----- Subset B ----- ----- Subset C
1.4.1.3 Subset D ----- ----- ----- Subset E Subset F ----- -----
1.4.1.4 ----- REF ----- ----- Subset G ----- ----- -----
not classified Subset H ----- ----- -----
Subset I1
Subset I2
Subset I3
Subset J Subset K Subset L

Retour haut de page

1. Open the file with a text editor. Copy only the data begenning with a ">".

2. Go to Clustal Omega. Paste the data into the window.

3. Select the appropriate matrix and parameters (Examples : matrix = Gonnet / Gapopen = 1 / Gapext = 1 / Other parameters = default value).

4. Run the software. The results are returned. There are links to various types of files : ".aln" for the alignment / ".dnd" for the dendogram.

Remark : "Jalview" is a Java multiple alignment editor allowing to make a lot of things [Examples : calculate consensus / adding or removing sequences / pairwise alignment of selected sequences / visualisation of a coloured alignment on the basis of different physico-chemical properties / editing sequences (font, size ...)]

5. Click on the file ".aln" to see the alignment. Save the file. The extension ".aln" allows to use it with various other softwares.

Retour haut de page

F. Obtention of the full consensus sequence for each subset

  • 1. Go to Clustal Omega. Load data (file ".aln") or paste it.
  • 2. Select the matrix and parameters and run the software.
  • 3. Treatment and edition of the consensus sequence. After this treatment, an HTML page is returned with the consensus sequence. Copy directly this sequence from the HTML page.
  • 4. Conversion of the consensus sequence in FASTA format using EMBOSS Seqret.

... and so on for all other subsets.

Retour haut de page

G. Alignment of the 15 full consensus sequences

  • 1. Open the text file containing the 15 full consensus sequences in FASTA format obtained as described above. Or untar it from the file : FullConsSeq.tar
  • 2. Copy all data starting at the first ">".
  • 3. Go to Clustal Omega.
  • 4. Paste the data in the window.
  • 5. Select the appropriate matrix and parameters and run the software.

The alignment of the consensus sequences shows that GDH subunit is constituted of two or three regions :

  • the N-terminal extension
  • a common pattern to all consensus sequences corresponding to the central domain
  • and, for large GDH (subsets C, K and L), the C-terminal extension.

Bioinformatique bioinformatics GDH glutamate deshydrogenase EC 1414 acide amine amino acid metabolism dehydrogenase biochimej

Retour haut de page

H. Analysis of the central domain of GDH : the dinucleotide-binding motif

A β - α - β fold is found in the NAD(P)H-binding subdomain (β7 - α8 - β8). This Rossmann fold begins with the motif G313AGNVA318 in the case of Ref. However, the alignment indicates that the actual motives could be more complex.

Such a higher complexity of the signature for the NAD(P)H-binding motif allows to discriminate more precisely the three isoforms.

Bioinformatique bioinformatics GDH glutamate deshydrogenase EC 1414 acide amine amino acid metabolism dehydrogenase biochimej

This figure was generated using the software ESpript.

  • Secondary structures indicated above the alignment were generated using as the template the bovine GDH3 complexed with NADPH and Glu (PDB # 1HWZ).
  • Amino acid position indicated above the alignments is that of Ref (blue sequence).
  • Plain red vertical boxes : amino acids identical for all consensus subsequences.
  • Open red vertical boxes : amino acids whose homology between all consensus subsequences was greater than 60%.
  • The letter "X" accounts for an amino acid whose identity level was less than 60% after the first alignment of full consensus sequences.
  • The NAD(P)H-binding motif G313AGNVA318 (Ref) is indicated at the bottom of the frame with red circles.

Retour haut de page

I. Search of a second NAD(P)H-binding site

Aldehyde dehydrogenase from Vibrio harveyi is one of the most NADP-specific. The alignment of GDH from Ref and aldehyde DH shows that :

  • there are three putative key residues for the binding of NADP(H) in Ref: Lys202, Ser205 (triangles) and Arg248 (asterisk)
  • the NAD(P)H-binding motif G229SVGGG234 of aldehyde DH is aligned with the motif G266VLTGKG272 of Ref (open circles)

Therefore, the latter is likely a second nucleotide-binding motif specific of GDH4.

Bioinformatique bioinformatics GDH glutamate deshydrogenase EC 1414 acide amine amino acid metabolism dehydrogenase biochimej

This figure was generated using the software ESpript.

Retour haut de page

K. Modelisation of the dinucleotide-binding motives and key residues of GDH4 with NADPH (NDP562) and Glu

A theoretical 3D structure of GDH4 from Ref was generated with the homology-modeling program ESyPred3D using as the template the structure of bovine GDH3 (PDB # 1HWZ).

The modelisation and the drawing of a putative structure of GDH4 was performed with the protein structure homology-modeling program DeepView (SwissPdb-Viewer v. 3.7).

Bioinformatique bioinformatics GDH glutamate deshydrogenase EC 1414 acide amine amino acid metabolism dehydrogenase biochimej

Some interactions (plain lines) between the motif G313AGNVA318 or key residues and the coenzyme are indicated : NDP562AO3 - Gly313CA; NDP562AO1 - Asn316ND2; NDP562AO1 - Val317N; NDP562AO2 - Gly244N; NDP562NC4 - Thr285OG1

The distances between the protonated carbon atom of the nicotinamide moiety (NDP562NC4) are too long for direct interactions with the motif G313AGNVA318. However, this motif is stabilized by an internal H-bond Gly315O - Ala318N (dotted line).

Two distances (Glu557OE2 - Lys166NZ and Glu557O - Lys190NZ) are compatible with H-bond interactions between the enzyme and Glu.

The position of the motif G266VLTGKG272 is shown with the potential H-bond Lys166NZ - Thr269OG1.

Retour haut de page

Valid XHTML 1.0 Transitional