top | item 45822864 (no title) whitten | 3 months ago Does the SMILE (or Simplified Molecular Input Line Entry System) code have an EBNF definition ? https://en.wikipedia.org/wiki/Simplified_Molecular_Input_Lin... Claims there is a context free grammar. discuss order hn newest dalke|3 months ago That's "SMILES".Yes. Here is the yacc grammar for the SMILES parser in the RDKit. https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/Smi...There's also one from OpenSMILES at http://opensmiles.org/opensmiles.html#_grammar . It has a shift/reduce error (as I recall) that I was not competent enough to fix.I prefer to parser almost completely in the lexer, with a small amount of lexer state to handle balanced parens, bracket atoms, and matching ring closures. See https://hg.sr.ht/~dalke/opensmiles-ragel and more specifically https://hg.sr.ht/~dalke/opensmiles-ragel/browse/opensmiles.r... . dalke|3 months ago Oh, I should have pointed out my Python lexer-driven parser at https://hg.sr.ht/~dalke/smiview/browse/smiview.pyThe lexer: https://hg.sr.ht/~dalke/smiview/browse/smiview.py?rev=tip#L3...The lexer state transitions: https://hg.sr.ht/~dalke/smiview/browse/smiview.py?rev=tip#L3... dekhn|3 months ago I wrote a very simple SMILES parser using pyparsing https://github.com/dakoner/smilesparser/tree/master I wouldn't say it's intended for production work, but it has been useful in situations where I didn't want to pull in rdkit. dalke|3 months ago I see you include the dot disconnect "." as part of the Bond definition.You also define Chain as: Chain <<= pp.Group(pp.Optional(Bond) + pp.Or([Atom, RingClosure])) I believe this means your grammar allows the invalid SMILES C=.N fred_tandemai|3 months ago [deleted]
dalke|3 months ago That's "SMILES".Yes. Here is the yacc grammar for the SMILES parser in the RDKit. https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/Smi...There's also one from OpenSMILES at http://opensmiles.org/opensmiles.html#_grammar . It has a shift/reduce error (as I recall) that I was not competent enough to fix.I prefer to parser almost completely in the lexer, with a small amount of lexer state to handle balanced parens, bracket atoms, and matching ring closures. See https://hg.sr.ht/~dalke/opensmiles-ragel and more specifically https://hg.sr.ht/~dalke/opensmiles-ragel/browse/opensmiles.r... . dalke|3 months ago Oh, I should have pointed out my Python lexer-driven parser at https://hg.sr.ht/~dalke/smiview/browse/smiview.pyThe lexer: https://hg.sr.ht/~dalke/smiview/browse/smiview.py?rev=tip#L3...The lexer state transitions: https://hg.sr.ht/~dalke/smiview/browse/smiview.py?rev=tip#L3...
dalke|3 months ago Oh, I should have pointed out my Python lexer-driven parser at https://hg.sr.ht/~dalke/smiview/browse/smiview.pyThe lexer: https://hg.sr.ht/~dalke/smiview/browse/smiview.py?rev=tip#L3...The lexer state transitions: https://hg.sr.ht/~dalke/smiview/browse/smiview.py?rev=tip#L3...
dekhn|3 months ago I wrote a very simple SMILES parser using pyparsing https://github.com/dakoner/smilesparser/tree/master I wouldn't say it's intended for production work, but it has been useful in situations where I didn't want to pull in rdkit. dalke|3 months ago I see you include the dot disconnect "." as part of the Bond definition.You also define Chain as: Chain <<= pp.Group(pp.Optional(Bond) + pp.Or([Atom, RingClosure])) I believe this means your grammar allows the invalid SMILES C=.N
dalke|3 months ago I see you include the dot disconnect "." as part of the Bond definition.You also define Chain as: Chain <<= pp.Group(pp.Optional(Bond) + pp.Or([Atom, RingClosure])) I believe this means your grammar allows the invalid SMILES C=.N
dalke|3 months ago
Yes. Here is the yacc grammar for the SMILES parser in the RDKit. https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/Smi...
There's also one from OpenSMILES at http://opensmiles.org/opensmiles.html#_grammar . It has a shift/reduce error (as I recall) that I was not competent enough to fix.
I prefer to parser almost completely in the lexer, with a small amount of lexer state to handle balanced parens, bracket atoms, and matching ring closures. See https://hg.sr.ht/~dalke/opensmiles-ragel and more specifically https://hg.sr.ht/~dalke/opensmiles-ragel/browse/opensmiles.r... .
dalke|3 months ago
The lexer: https://hg.sr.ht/~dalke/smiview/browse/smiview.py?rev=tip#L3...
The lexer state transitions: https://hg.sr.ht/~dalke/smiview/browse/smiview.py?rev=tip#L3...
dekhn|3 months ago
dalke|3 months ago
You also define Chain as:
I believe this means your grammar allows the invalid SMILES C=.Nfred_tandemai|3 months ago
[deleted]