Wednesday, January 26, 2011

...EXCEL...

Today, I've learnt how to make linear regression and also quadratic regression using Microsoft Office Excel 2007.

LINEAR REGRESSION

In statisticslinear regression is an approach to modeling the relationship between a scalar variable y and one or more variables denoted X. Models of the unknown parameters are estimated from the data using linear functions. Such models are called linear models.


Linear regression was the first type of regression analysis to be studied rigorously, and to be used extensively in practical applications. This is because models which depend linearly on their unknown parameters are easier to fit than models which are non-linearly related to their parameters and because the statistical properties of the resulting estimators are easier to determine.



Linear regression has many practical uses. Most applications of linear regression fall into one of the following two broad categories:
  • If the goal is prediction, or forecasting, linear regression can be used to fit a predictive model to an observed data set of y and X values. After developing such a model, if an additional value of X is then given without its accompanying value of y, the fitted model can be used to make a prediction of the value of y.
  • Given a variable y and a number of variables X1, ..., Xp that may be related to y, then linear regression analysis can be applied to quantify the strength of the relationship between y and the Xj, to assess which Xj may have no relationship with y at all, and to identify which subsets of the Xj contain redundant information about y, thus once one of them is known, the others are no longer informative.


Application of linear regression


Trend line
trend line represents a trend, the long-term movement in time series data after other components have been accounted for. It tells whether a particular data set (say GDP, oil prices or stock prices) have increased or decreased over the period of time. A trend line could simply be drawn by eye through a set of data points, but more properly their position and slope is calculated using statistical techniques like linear regression. Trend lines typically are straight lines, although some variations use higher degree polynomials depending on the degree of curvature desired in the line.
Trend lines are sometimes used in business analytics to show changes in data over time. This has the advantage of being simple. Trend lines are often used to argue that a particular action or event (such as training, or an advertising campaign) caused observed changes at a point in time. This is a simple technique, and does not require a control group, experimental design, or a sophisticated analysis technique. However, it suffers from a lack of scientific validity in cases where other potential changes can affect the data.
QUADRATIC REGRESSION

In mathematics, a quadratic equation is a polynomial equation of the second degree. The general form is 
ax^2+bx+c=0,\,
where x represents a variable, and ab, and cconstants, with a ≠ 0. (If a = 0, the equation becomes a linear equation.The constants ab, and c, are called respectively, the quadratic coefficient, the linear coefficient and the constant term or free term. The term "quadratic" comes from quadratus, which is the Latin word for "square".
In statistics, to build a second order (quadratic) model, linear regression is used, sometimes iteratively, to obtain results. 

These are the graphs that i've made using M.Excel..









Wednesday, January 12, 2011

....SMILES...

SMILES stand for Simplified Molecular Input Line Entry Specification

Introduction of SMILES
  • is a line notation (a typographical method using printable characters) for entering and representing molecules and reactions.
  • The primary reason SMILES is more useful than a connection table is that it is a linguistic construct, rather than a computer data structure. 
  • SMILES is a true language, albeit with a simple vocabulary (atom and bond symbols) and only a few grammar rules.
  • SMILES representations of structure can in turn be used as "words" in the vocabulary of other languages designed for storage of chemical information (information about chemicals) and chemical intelligence (information about chemistry). 
 A) Canonicalization
SMILES denotes a molecular structure as a graph with optional chiral indications. This is essentially the two-dimensional picture chemists draw to describe a molecule.
  • "unique SMILES" -  A canonicalization algorithm exists to generate one special generic SMILES among all valid possibilitie.
  • "isomeric SMILES" - SMILES written with isotopic and chiral specifications.
Examples:
Input SMILES Unique SMILES
OCC CCO
[CH3][CH2][OH] CCO
OC(=O)C(Br)(Cl)N NC(Cl)(Br)C(=O)O
ClC(Br)(N)C(=O)O NC(Cl)(Br)C(=O)O

B) Atoms
Atoms are represented by their atomic symbols. Each non-hydrogen atom is specified independently by its atomic symbol enclosed in square brackets, [ ].Elements in the "organic subset" B, C, N, O, P, S, F, Cl, Br, and I may be written without brackets if the number of attached hydrogens conforms to the lowest normal valence consistent with explicit bonds.
  • In aromatic rings - specified by lower case letters, e.g., aliphatic carbon is represented by the capital letter C.
  • In aromatic carbon - specified by lower case c.
Following atomic symbols are valid SMILES notations

Examples:
C methane CH4
N ammonia NH3

Atoms with valences other than "normal" and elements not in the "organic subset" must be described in brackets [ ].

Examples:
[S] ELEMENT SULPHUR
[Au] ELEMENT GOLD

C) Bonds
Single, double, triple, and aromatic bonds are represented by the symbols -, =, #, and :, respectively.

Examples:
CC ethane CH3CH3
C=O formaldehyde CH2O
O=C=O carbon dioxide CO2

D) Branches
Branches are specified by enclosing them in parentheses, and can be nested or stacked. In all cases, the implicit connection to a parenthesized expression (a "branch") is to the left.

Example: 
 

CCN(CC)CC Triethylamine



 
These are the examples of SMILES that I have 
made using ACD/ChemSketch  









SMILES is not only comprehensive but also well documented!!!


Wednesday, January 5, 2011

iNtrOductiOn of Protein Data Bank (PDB)

<a href="http://www.rcsb.org/pdb/home/home.do">Visit rscb</a>

An Information Portal to Biological Macromolecular Structures

The PDB archive contains information about experimentally-determined structures of proteins, nucleic acids, and complex assemblies.

1) Crystal structure of Bacillus subtilis Lon N-terminal domain




Journal: (2010) J.Mol.Biol. 401: 653-670 
PubMed: 20600124    
DOI: 10.1016/j.jmb.2010.06.030    
Search Related Articles in PubMed

PubMed Abstract:
Lon ATP-dependent proteases are key components of the protein quality control systems of bacterial cells and eukaryotic organelles. Eubacterial Lon proteases contain an N-terminal domain, an ATPase domain, and a protease domain, all in one polypeptide chain. The N-terminal domain is thought to be involved in substrate recognition, the ATPase domain in substrate unfolding and translocation into the protease chamber, and the protease domain in the hydrolysis of polypeptides into small peptide fragments.  Here, we present crystal structures of truncated versions of Lon protease from Bacillus subtilis (BsLon), which reveal previously unknown architectural features of Lon complexes. Our analytical ultracentrifugation and electron microscopy show different oligomerisation of Lon proteases from two different bacterial species, Aquifex aeolicus and B. subtilis.

Molecular Description

Classification hydrolase
Structure Weight 48027.40
Molecule ATP-dependent protease La 1
Polymer 1
Type polypeptide(L)
Chains A,B


2) Crystal Structure of Dipeptidyl Aminopeptidase IV from Stenotrophomonas maltophilia 

 


Dipeptidyl aminopeptidase IV from Stenotrophomonas maltophilia exhibits activity against a substrate containing a 4-hydroxyproline residue.

Journal: (2008) J.Bacteriol. 190: 7819-7829
PubMed: 18820015  
PubMedCentral: PMC2583625  
DOI: 10.1128/JB.02010-07  
Search Related Articles in PubMed  
PubMed Abstract:

The crystal structure of dipeptidyl aminopeptidase IV from Stenotrophomonas maltophilia was determined at 2.8-A resolution by the multiple isomorphous replacement method, using platinum and selenomethionine derivatives. The crystals belong to space group P4(3)2(1)2, with unit cell parameters a = b = 105.9 A and c = 161.9 A. Dipeptidyl aminopeptidase IV is a homodimer, and the subunit structure is composed of two domains, namely, N-terminal beta-propeller and C-terminal catalytic domains. At the active site, a hydrophobic pocket to accommodate a proline residue of the substrate is conserved as well as those of mammalian enzymes. Stenotrophomonas dipeptidyl aminopeptidase IV exhibited activity toward a substrate containing a 4-hydroxyproline residue at the second position from the N terminus.One of the residues at the active sites is changed to Asn611. Accordingly, it was considered that Asn611 would be one of the major factors involved in the recognition of substrates containing 4-hydroxyproline.

Molecular Description

Classification hydrolase
Structure Weight 82181.10
Molecule Dipeptidyl peptidase IV
Polymer 1
Type polypeptide(L)
Chains A

3) Human START domain of Acyl-coenzyme A thioesterase 11 (ACOT1)


Journal : to be published

Molecular Description

Classification lipid transport
Structure Weight 60191.69
Molecule Thioesterase, adipose associated, isoform BFIT2
Polymer 1
Type polypeptide(L)
Chains A,B
Fragment STARTdomain, UNP residues 339-594