Bioinformatique glutamate deshydrogenase EC 1414 acide amine amino acid metabolism dehydrogenase Enseignement et recherche Biochimie - Universite Angers Emmanuel Jaspard biochimej

Search and analysis of amino acid sequences of the three isoforms of glutamate dehydrogenase (EC 1.4.1.[2/3/4])

A. Search for all amino acid sequences of GDH

1. Go to the NCBI.
2. Enter : "glutamate dehydrogenase OR GDH".
4. In the menu, select "Protein", then "Search".
5. Choose the option "Advanced" . This allows the use of key-words and a boolean logical search ("AND" / "OR" / "NOT")

The number of hits is indicated on the right of the screen. To see the results ("Summary"), click on this number in your browser.

1. Page "Protein Advanced Search Builder" : fields of the "Builder". In the "History", select your desired search by clicking on either #1 or #2, ...

2. In a new field of the menu, paste the following terms : (this list has not been updated)

decarboxylase NOT topoisomerase NOT monooxygenase NOT transaminase NOT kinase NOT oxidase NOT thioredoxin NOT glycerate NOT glyoxylate NOT glucose NOT glutamine NOT glutamyl NOT glucarate NOT glycerol NOT proline NOT valine NOT semialdehyde NOT aldehyde NOT glyceraldehyde NOT dihydropyrimidine NOT formyltetrahydrofolate NOT fatty NOT isocitrate NOT saccharopine NOT methylmalonate NOT coenzyme NOT glutathione NOT quinone NOT ammonium NOT histidinol NOT carboxylate NOT alcohol NOT purine NOT sulfide NOT putative NOT similar NOT hypothetical NOT probable NOT related NOT similarity NOT homolog NOT homologue NOT synthetic NOT unknown NOT mutant NOT unnamed NOT imported NOT validated NOT partial NOT peptide NOT chain NOT line NOT tentative NOT supported NOT patent NOT expressed NOT transcript NOT precursor NOT collection NOT regulator NOT anion NOT yweB NOT ypcA NOT NAGSA NOT P5C NOT GSA NOT RIKEN

3. Click on boolean "NOT". All keywords and booleans are written in the main field (top of the page). Click on "Search".

Questions :

What is the goal of this selection ?
What is the consequence of the boolean "AND", "OR" and "NOT" ?

4. Field "Builder", choose "Sequence length"; Tape : "400:1700"; Select the boolean "AND"; Click on "Search"

Question : what is the goal of this selection ?

B. Removing redondant sequences

This part is the most tedious and time - consuming one, since for each organism, the redondant sequences must be removed. This can be made using Multalin.

Questions :

What type of file could be used to know the name of the organism ?
Why are there multiple files for the same protein from the same organism ?
To what kind of information are linked the various accession numbers in those files ?

1. Field "Add Term(s)", choose option "Organism"
2. Tape the name chosen : for example "Agaricus bisporus"
3. Click on boolean "AND"
4. Click on "Preview".
5. Click on the number corresponding to the hits returned. The files "Summary" are returned
6. Field "Display", choose the option "Fasta". This is one of the various format of data used by the algorithms of sequences alignment
7. Click on "Display". The files in FASTA format are returned
8. Field "Send to", choose "Text" : a new HTML page is returned. Copy the data

Go back to the the NCBI search window.

1. Copy the following accession numbers of redondant files to be removed. Paste them in the main field (top of the page of the the NCBI search window) :

NOT CAA58312 NOT NP_692731 NOT CAD58715 NOT T49883 NOT CAB87933 NOT AAB01222 NOT S71217 NOT AAA82615 NOT A25275 NOT AAA34642 NOT CAA67475 NOT NP_111279 NOT NP_111278 NOT NP_460265 NOT AAO68835 NOT NP_456213 NOT CAD02055 NOT JN0854 NOT AAL81726 NOT AAA83390 NOT D75176 NOT CAB49491 NOT AAL64915 NOT AAL63869 NOT AAK42230 NOT AAK42099 NOT AAK42126 NOT AAK41684 NOT CAA40341 NOT AAO77080 NOT AAO77077 NOT BAB42058 NOT BAB94705 NOT NP_645657 NOT NP_371482 NOT BAB57120 NOT AAO04251 NOT AE3467 NOT AAL52904 NOT NP_539149 NOT CAD21426 NOT 1919235A NOT CAD63684 NOT NP_761460 NOT NP_761459 NOT AAO35861 NOT AAA62756 NOT AAK78713 NOT AAL94684 NOT AAN80621 NOT AAG56747 NOT CAA25495 NOT AAA87979 NOT AAN00206 NOT AAK99984 NOT AAK75409 NOT AAN24452 NOT AAM87403 NOT AAM24566 NOT AAM24435 NOT CAA51376 NOT BAB75954 NOT S77064 NOT CAA54601 NOT AAG19574 NOT AAG18779 NOT CAA45327 NOT BAB07661 NOT BAB06437 NOT BAB05341 NOT BAB05820 NOT AAC63990 NOT AAB40142 NOT BAA08445 NOT CAB94836 NOT CAA69600 NOT CAA69601 NOT CAA34252 NOT AAA29155 NOT AAB20267 NOT AAA25611 NOT AAN36776 NOT CAA73390 NOT AAK77969 NOT CAA46994 NOT AAA52525 NOT AAM73240 NOT AAB20267 NOT S06938 NOT CAA34434

(This list is indicative.)

3. Click on "Search" :

Field "Display", choose the option "Fasta"
Click on "Display". The files in FASTA format are returned
Field "Send to", choose "Text" and copy the data
Save the file

C. Selection of the GDH sequences as a function of the EC number, the Viridiplantae belonging and the size of the polypeptide chain.

1. Subset A : EC 1.4.1.2 isoform from Viridiplantae with amino acids length range [411 : 470]

type of GDH	field Add Term(s)			hits ("Search")
type of GDH	option	tape	boolean	hits ("Search")
EC 1.4.1.2 isoform	EC/RN Number	1.4.1.2	AND	20 (*)
Viridiplantae	Organism	viridiplantae	AND	9
length range [411 : 470]	Sequence length	411:470	AND	9
See the taxonomy for Viridiplantae.

Click on the last number (9), then on "Display" with the option "Fasta"
Field "Send to", choose "Text" and copy the data
Using a text editor, save the file choosing the "courrier" font, size

Get the files in FASTA format (compressed) : Subset A (Tgz)

2. Subset B : EC 1.4.1.2 isoform from NOT Viridiplantae with amino acids length range [411 : 470]

Go back to the the "Preview" window (*) : 20 hits (EC 1.4.1.2 isoform)

type of GDH	field Add Term(s)			hits ("Search")
type of GDH	option	tape	boolean	hits ("Search")
NOT Viridiplantae	Organism	viridiplantae	NOT	11 (*)
length range [411 : 470]	Sequence length	411:470	AND	8

Proceed as before for saving.

Get the files in FASTA format (compressed) : Subset B (Tgz)

3. Subset C : EC 1.4.1.2 isoform from NOT Viridiplantae with amino acids length range [1607 : 1651]

Go back to the the "Preview" window (*) : 11 hits (NOT Viridiplantae)

type of GDH	field Add Term(s)			hits ("Search")
type of GDH	option	tape	boolean	hits ("Search")
length range [411 : 470]	Sequence length	1607:1651	AND	3

Proceed as before for saving.

Get the files in FASTA format (compressed) : Subset C (Tgz)

... and so on for all other GDH EC 1.4.1.3, EC 1.4.1.3 and NOT EC CLASSIFIED.

D. Summary of the 116 full sequences of GDH from 83 organisms classified in 15 subsets

The table below indicates the number of polypeptide sequences of GDH for each subset (letter).

See the classification by organism (Word document - compressed files) => Organisms (Tgz)

EC number	Viridiplantae				not Viridiplantae				total
EC number	L1	L2	L3	L4	L1	L2	L3	L4	total
1.4.1.2	9 (A)				7 (B)			1 (C)	17
1.4.1.3	1 (D)				6 (E)	7 (F)			14
1.4.1.4		2 (Ref)			15 (G)				17
not classified	6 (H)				13 (I1) 18 (I2) 14 (I3)	5 (J)	5 (K)	7 (L)	68
total	16	2			73	12	5	8	116

E. Analysis of the full sequences of GDH classified in 15 subsets

Get the GDH sequences from each subset in FASTA format (ZIP compressed) from the table below.

The length range of the polypeptide chains are : L1 = [411 : 470] - L2 = [503 : 558] - L3 = [1029 : 1106] - L4 = [1607 : 1651]

EC number	Viridiplantae				not Viridiplantae
EC number	L1	L2	L3	L4	L1	L2	L3	L4
1.4.1.2	Subset A	-----	-----	-----	Subset B	-----	-----	Subset C
1.4.1.3	Subset D	-----	-----	-----	Subset E	Subset F	-----	-----
1.4.1.4	-----	REF	-----	-----	Subset G	-----	-----	-----
not classified	Subset H	-----	-----	-----	Subset I1 Subset I2 Subset I3	Subset J	Subset K	Subset L

1. Open the file with a text editor. Copy only the data begenning with a ">".

2. Go to Clustal Omega. Paste the data into the window.

3. Select the appropriate matrix and parameters (Examples : matrix = Gonnet / Gapopen = 1 / Gapext = 1 / Other parameters = default value).

4. Run the software. The results are returned. There are links to various types of files : ".aln" for the alignment / ".dnd" for the dendogram.

Remark : "Jalview" is a Java multiple alignment editor allowing to make a lot of things [Examples : calculate consensus / adding or removing sequences / pairwise alignment of selected sequences / visualisation of a coloured alignment on the basis of different physico-chemical properties / editing sequences (font, size ...)]

5. Click on the file ".aln" to see the alignment. Save the file. The extension ".aln" allows to use it with various other softwares.

F. Obtention of the full consensus sequence for each subset

1. Go to Clustal Omega. Load data (file ".aln") or paste it.
2. Select the matrix and parameters and run the software.
3. Treatment and edition of the consensus sequence. After this treatment, an HTML page is returned with the consensus sequence. Copy directly this sequence from the HTML page.
4. Conversion of the consensus sequence in FASTA format using EMBOSS Seqret.

... and so on for all other subsets.

G. Alignment of the 15 full consensus sequences

1. Open the text file containing the 15 full consensus sequences in FASTA format obtained as described above. Or untar it from the file : FullConsSeq.tar
2. Copy all data starting at the first ">".
3. Go to Clustal Omega.
4. Paste the data in the window.
5. Select the appropriate matrix and parameters and run the software.

The alignment of the consensus sequences shows that GDH subunit is constituted of two or three regions :

the N-terminal extension
a common pattern to all consensus sequences corresponding to the central domain
and, for large GDH (subsets C, K and L), the C-terminal extension.

Bioinformatique bioinformatics GDH glutamate deshydrogenase EC 1414 acide amine amino acid metabolism dehydrogenase biochimej

H. Analysis of the central domain of GDH : the dinucleotide-binding motif

A β - α - β fold is found in the NAD(P)H-binding subdomain (β₇ - α₈ - β₈). This Rossmann fold begins with the motif G³¹³AGNVA³¹⁸ in the case of Ref. However, the alignment indicates that the actual motives could be more complex.

Such a higher complexity of the signature for the NAD(P)H-binding motif allows to discriminate more precisely the three isoforms.

Bioinformatique bioinformatics GDH glutamate deshydrogenase EC 1414 acide amine amino acid metabolism dehydrogenase biochimej

This figure was generated using the software ESpript.

Secondary structures indicated above the alignment were generated using as the template the bovine GDH3 complexed with NADPH and Glu (PDB # 1HWZ).
Amino acid position indicated above the alignments is that of Ref (blue sequence).
Plain red vertical boxes : amino acids identical for all consensus subsequences.
Open red vertical boxes : amino acids whose homology between all consensus subsequences was greater than 60%.
The letter "X" accounts for an amino acid whose identity level was less than 60% after the first alignment of full consensus sequences.
The NAD(P)H-binding motif G³¹³AGNVA³¹⁸ (Ref) is indicated at the bottom of the frame with red circles.

I. Search of a second NAD(P)H-binding site

Aldehyde dehydrogenase from Vibrio harveyi is one of the most NADP-specific. The alignment of GDH from Ref and aldehyde DH shows that :

there are three putative key residues for the binding of NADP(H) in Ref: Lys²⁰², Ser²⁰⁵(triangles) and Arg²⁴⁸(asterisk)
the NAD(P)H-binding motif G²²⁹SVGGG²³⁴ of aldehyde DH is aligned with the motif G²⁶⁶VLTGKG²⁷² of Ref (open circles)

Therefore, the latter is likely a second nucleotide-binding motif specific of GDH4.

Bioinformatique bioinformatics GDH glutamate deshydrogenase EC 1414 acide amine amino acid metabolism dehydrogenase biochimej

This figure was generated using the software ESpript.

K. Modelisation of the dinucleotide-binding motives and key residues of GDH4 with NADPH (NDP⁵⁶²) and Glu

A theoretical 3D structure of GDH4 from Ref was generated with the homology-modeling program ESyPred3D using as the template the structure of bovine GDH3 (PDB # 1HWZ).

The modelisation and the drawing of a putative structure of GDH4 was performed with the protein structure homology-modeling program DeepView (SwissPdb-Viewer v. 3.7).

Bioinformatique bioinformatics GDH glutamate deshydrogenase EC 1414 acide amine amino acid metabolism dehydrogenase biochimej

Some interactions (plain lines) between the motif G³¹³AGNVA³¹⁸ or key residues and the coenzyme are indicated : NDP⁵⁶²AO3 - Gly³¹³CA; NDP⁵⁶²AO1 - Asn³¹⁶ND2; NDP⁵⁶²AO1 - Val³¹⁷N; NDP⁵⁶²AO2 - Gly²⁴⁴N; NDP⁵⁶²NC4 - Thr²⁸⁵OG1

The distances between the protonated carbon atom of the nicotinamide moiety (NDP⁵⁶²NC4) are too long for direct interactions with the motif G³¹³AGNVA³¹⁸. However, this motif is stabilized by an internal H-bond Gly³¹⁵O - Ala³¹⁸N (dotted line).

Two distances (Glu⁵⁵⁷OE2 - Lys¹⁶⁶NZ and Glu⁵⁵⁷O - Lys¹⁹⁰NZ) are compatible with H-bond interactions between the enzyme and Glu.

The position of the motif G²⁶⁶VLTGKG²⁷²is shown with the potential H-bond Lys¹⁶⁶NZ - Thr²⁶⁹OG1.