ga6840
|
1 year ago
|
on: Confirmed: Reflection 70B's official API is a wrapper for Sonnet 3.5
Who is Sahil Chaudhary? Why he doesn't announce such a great advancement himself? Why Matt Shumer first announces it only because -- according to a later claim on X.com -- he trusted Sahil, does that mean Matt is unable to participate most of the progress? Then why announce a breakthrough without mentioning he was not fully involved to a level he can verify the result in the first place?
ga6840
|
5 years ago
|
on: Docker's Second Death
I also want use docker swarm in production, but I keep hearing people say it has network bugs like after some time services cannot talk to each other. Have you experienced any such issue?
ga6840
|
5 years ago
|
on: China delists all remaining poverty-stricken counties
Mao wants to surpass United Kingdom in 15 years and then US. Given China's condition at that time, if that is not arrogance, then it must be at least overestimation of his power.
ga6840
|
5 years ago
|
on: China delists all remaining poverty-stricken counties
Not quite difficult, enslave them, intimidate and threaten their social security, will make any spice not lazy.
ga6840
|
5 years ago
|
on: China delists all remaining poverty-stricken counties
You should not underestimate CCP, remember they initiated a large scale DDoS attack targeting github? And they blocked gist but not github.
ga6840
|
5 years ago
|
on: China delists all remaining poverty-stricken counties
If it wasn't CCP, China can be much better now. Remember why China is poor back then? It's called culture revolution, 10 years.
ga6840
|
5 years ago
|
on: China delists all remaining poverty-stricken counties
So far, I live most of the life in China, IMHO, the root of our previous poverty is arrogance and one man in power (see Mao's policy which had led nation wide famine just because he want to surpass US in a short time), and that left us fall behind the world economically. And the latter so-called "lifted out of poverty" (a oftenly used propaganda tune in Chinese spokesmen to refute others' human rights infringement blame) is nothing but opening up the nation and letting our hardworking people to make products for the world (mostly in downstream application wise products, even nowadays, you see huge Chinese tech companies all cannot live w/o Github, btw, that's why government doesn't block github). Which is seen by me just recovering those years we had losing in politics and culture revolution.
ga6840
|
5 years ago
|
on: Writing a full-text search engine using Bloom filters (2013)
Microsoft had a SIGIR best paper on using Bloom filter for searching.
ga6840
|
9 years ago
|
on: A math-aware search engine, enable the ability to search mathematics online
Appreciate your feedback, could you illustrate a little more and show me what you mean by "a system of equations" and "the whole of the literature"?
ga6840
|
9 years ago
|
on: Offer HN: Free logo design for an open source project
ga6840
|
10 years ago
|
on: MathML is a failed web standard
We can argue all day about if <math> should be like <img> or a <svg>, but I do not think I am wrong about asking whether we really need to manipulate a math expression. I just said "it makes sense to me" to write "<math>\frac a b</math>" does not necessarily mean I stand firmly for making <math> this way. If you think there are cases we need manipulate AND we indeed need to sacrifice HTTP length (Internet transmission time) and simplicity to enable math expression manipulation, that is totally fine. I admit your points and will still argue for my points, I do not believe there is an evident truth for this issue we argue (so as this thread). It is still OK. However, I should point it out I am quite confident in terms of hand-writing "<math>\frac a b</math>" more quickly than other people who use whatever advanced richtext editor they want to write its MathML alternative. You can still doubt how many people want write HTML by hand, but shorter HTML is not bad at everything, many high-volume websites get benefits from it. Think about a very hot math Q&A website in the future, being able to handle a lot request, math rendering computation on client side is a logical solution. In this case, MathJax makes a lot sense. I will agree we can adopt a solution that define short <math> and convert into lengthy MathML at client side, in this case we both do not have to compromise.
As for my "works pretty well", please refer to my answer in another thread below. To be concise, I use subjective words on search engine effectiveness because NTCIR makes it difficult to compare my TeX search engine with "MathML search engine". But I have already shown better efficiency of my engine compared to Tangent, and an important factor is Tangent have to use LaTeXML to parse every TeX back to MathML. Without considering NTCIR, I am willing to make a comparison (probably after done my new version search engine) with some open-source established math engine (e.g. Tangent) on effectiveness and efficiency based on some corpus with both MathML (used by Tangent) and TeX (used by my engine) annotation.
ga6840
|
10 years ago
|
on: MathML is a failed web standard
Lastly, I am more than childish to complain NTCIR and refuse submit a paper, I give up putting unworthy and duplicated effort on implementing a MathML parser that generates the expression tree I need (this step is the most difficult, rather than just parsing XML), instead, focusing on finding another conference to publish my efforts, it turns out my paper (a demo) get accepted in ECIR 2016, so glad I did not waste too much time on NTCIR, otherwise I would have missed ECIR.
ga6840
|
10 years ago
|
on: MathML is a failed web standard
Thank you for informing me on my first two questions, so now I understand NTCIR's problem.
At very first I tried to compare my results (MAP, recall, precision) with participants in NTCIR, but I take a lot efforts to get dataset, after which I find I cannot convert MathML back into TeX very confidently, most importantly, my parser-generated tree structure is fine-tuned and very dependent on TeX input, I cannot just take MathML tree structure directly, I need much more efforts than just importing an existing XML parser. Because of these, I can not compare my results with mainstream NTCIR researchers. But I definitely tried very hard, sadly I give up. If NTCIR someday can provide (even if request is needed) TeX data for competition, I will consider to (and able to, willing to) compare my results with NTCIR participants (in order to "prove" it).
Writing a TeX parser only for math search is not that difficult, I have written it, it parses most user-created document on math.stackexchange.com. Although I cannot convince you I get better results, I can argue parsing search-interested TeX subset is effortless (if you only care math-related TeX), I even opensourced my search engine TeX parser. Again, problem is not that easy to grab a XML parser and reuse it in my project, I believe a good math-aware search engine needs to get a tree structure very different from that a MathML structure represents, you get a tree by reusing MWS praser, so WHAT? That tree is not the tree I want, I need a lot effort to convert it, the easy way for me is to convert MathML back into TeX (Since I have already done that from TeX), sadly it turns out to be too complicated to worth giving a shot.
ga6840
|
10 years ago
|
on: MathML is a failed web standard
1. In NTCIR (main) dataset, I see many cases where <m:math> does not contain an altext (and thus no TeX). I asked LaTeXML author Bruce Miller <
[email protected]> about this, he said LaTeXML will always put the same TeX string as an altext attribute
on the <m:math>. So I assume you guys are using some out-dated LaTeXML version? I really want to plead NTCIR to ensure the original LaTeX annotation is kept in main dataset, or please provide both MathML and LaTeX version corpus for researcher to freely choose. This will allow LaTeX-only math search engines being able to compare results with other MathML search engines. You know it is hard to convert all of them back into LaTeX correctly.
2. I wish NTCIR corpus is not that difficult to download (I once wrote a request for NTCIR corpus, but no one replies), please make it public accessible just like what MIaS does:
https://mir.fi.muni.cz/mias/
3. My search engine (http://tkhost.github.io/opmes) is actually using structural method, but I still give up MathML and go parsing TeX directly instead. Why? In TeX I can just omit irrelevant command like "\color" and "\mbox", and only focus on a handful math-related TeX subset, and the result is great. Although my search engine can just handle "toy formula syntax", but maybe it is better than MathWebSearch (https://zbmath.org/formulae/) and even beat Tangent (http://saskatoon.cs.rit.edu/tangent/random) in long query.
But in MathML, I have no idea why I need to read its lengthy spec, and I see no reason to write a MathML parser.
NTCIR-math conference (and its none-friendly website) makes me unwilling to submit a single paper.
ga6840
|
10 years ago
|
on: MathML is a failed web standard
What I mean is "really need". In fact, there is also the possibility we want to highlight a portion of an image, copy a subimage, etc, but did our HTML <img> tag designed like the way I mentioned?
I am the author of a math search engine OPMES (tkhost.github.io/opmes), the search engine works pretty well without the knowledge of DOM structure of math expression.
Actually MathML makes a lot inconvenient during OPMES development, to a degree that I choose not to support it.
BTW, if we want a <math> tag that no one will write by hand and only machine will try to understand, then think about why not HTML being designed as some open binary format in the first place?
ga6840
|
10 years ago
|
on: MathML is a failed web standard
What I really want to say is whether f(x+1) is function or multiplication does not matter that much in terms of both browser presenting and math-aware search, moreover, extracting semantics can be done by algorithm from context. Considering that few author want to annotate on their expression semantic, and adding semantic does not really help math-aware search, I argue for the necessity for bringing math semantic notation into WEB.
ga6840
|
10 years ago
|
on: MathML is a failed web standard
Thanks for your response. I totally agree it is not useful trying to code semantic into math markups.
ga6840
|
10 years ago
|
on: MathML is a failed web standard
I think we do not need semantic representation in HTML case at all. For example, f(x+1) can be a multiplication, or a function, but should we write something like \function f (x+1) ? I think knowing the layout similarity with query is enough for math-ware search engine to identify similar math expressions. Adding too much in Math HTML standard is not helpful but redundant.
ga6840
|
10 years ago
|
on: MathML is a failed web standard
Agree, we do not include a 4*4 image in HTML by inserting <img><row><col><pixel r="255" b="0" g="128" a="0"> ......
In browser level, I think we should treat math expression as a simple and atom component, and the only benefits to expose DOM/XML/JSON or whatever structural information in webpage is probably you can manipulate/extract info from it (e.g. using Javascript). Do we really need to manipulate a math expression? I think a simple "<math>\frac a b</math>" makes much sense to me. I think it is about the trade-off on HTML granularity.
ga6840
|
10 years ago
|
on: MathML is a failed web standard
I am building a project and doing research on math-aware search (my project is hosted on
https://github.com/t-k-/the-day-after-tomorrow)
As for the search engine for math, it is a pity that MathML has become a standard "input" for mainstream research.
The most famous conference on Math search: NTCIR, is actually publishing its main dataset/corpus in MathML.
Converting MathML back into LaTeX is possible but error-prone for most moderate-complex expressions (I tried it using haskell pandoc).
This makes math-aware search engines have to include a MathML parser. And the most popular digital math document are still mostly written in LaTeX,
math search engine thus needs another tool (e.g. LaTeXML) to convert LaTeX to much more lengthy MathML stuffs.
As a researcher in this field, all I see is MathML brings a lot overhead to our life.
I think LaTeX is still the ideal way to "input" math expression, it is human-friendly and most commonly used math input.
While WEB standard should focus on "rendering" LaTeX.
I have to point it out that I am pretty comfortable about what MathJax provides, but if there needs to be a WEB standard on math,
I wish some day the standard way to write math expression in HTML is something like this: <math> x = \frac{-b \pm \sqrt{b^2 - 4ac}} {2a} </math>
Instead of:
<math display="block">
<mrow>
<mi>x</mi>
<mo>=</mo>
<mfrac>
<mrow>
<mo>−</mo>
<mi>b</mi>
<mo>±</mo>
<msqrt>
<mrow>
<msup>
<mi>b</mi>
<mn>2</mn>
</msup>
<mo>−</mo>
<mn>4</mn>
<mi>a</mi>
<mi>c</mi>
</mrow>
</msqrt>
</mrow>
<mrow>
<mn>2</mn>
<mi>a</mi>
</mrow>
</mfrac>
</mrow>
</math>