tcwc's comments

tcwc | 9 years ago | on: Always Free Usage Limits

Usually a seller doesn't add VAT to B2B invoices into the EU, VAT is accounted for by the purchaser under the 'reverse charge' system. B2C is different, they would need to charge the local VAT rate in each member state. I suspect they simply haven't gotten around to supporting it yet.

tcwc | 12 years ago | on: Fast request routing using regular expressions

It depends on your regex engine. The author here is concatenating all of the paths into a single pattern, an automata based engine would ideally compile away the disjunction and offer performance linear with the input path length.

tcwc | 12 years ago | on: Rhyming with NLP and Shakespeare

Neat idea! It looks like the NLTK POS tagger is having trouble here so might limit your recall when used as a filter.

Instead I wonder if it would be better to use the context of each token to mine significant ngrams from the rest of Shakespeare's work and filter for rhymes with a phoenetic hash like Metaphone.

tcwc | 13 years ago | on: Job hunting experience (and few advices)

They don't do this to help you get a job. Recruiters insist on an editable version so they can remove your contact details and insert their letterhead. This makes it harder for their clients to contact you directly and cut out their fee.

tcwc | 13 years ago | on: Three Months to Scale NewsBlur

Feed readers can send If-Modified-Since or If-None-Match as part of the request so the server only sends back the full feed if there's something new (Otherwise a 304 Not Modified)

tcwc | 13 years ago | on: AWS Case Study: Parse (YC S11)

I can't speak for Parse, but I've come up with something similar in the past. Nginx/HAproxy as a combo is far more flexible than the ELB alone, you might want to use it for rate limiting, better load balancing algorithms, better logging, tweaking headers, handling errors, or controlling buffer sizes for example.

tcwc | 13 years ago | on: Show HN: TextRazor, a scriptable text mining API

The Stanford parser is great, but isn't really the same. The Stanford entity recogniser is limited to the standard types of people, places, companies, but we identify and disambiguate into a far richer ontology from wikipedia, and can recognize topic abstractions that aren't explicitly mentioned.

Also we found the Stanford tools (and the other open source NLP tools) were difficult to integrate into "production" apps for various reasons. One big one was performance - we aim to run the full parsing and extraction pipeline on an average news story in a few hundred milliseconds, which can be an order of magnitude faster than the others.

tcwc | 13 years ago | on: Show HN: TextRazor, a scriptable text mining API

Hey steeve we thought there were a few things missing in the competition. We've built a bunch of extra functionality such as more extensive relation and dependency parsing and contextual entailment generation, and use all that to build much more accurate entity and topic recognition, an area we think the others can be greatly improved on.

We also expose all these results to a Prolog interpreter on our backend and allow you to add custom logic to mashup and extend all of our results, as well as provide a much easier integration experience.

Totally agree with you on the pricing front, we're still finalising the details there. We're aiming to be fully transparent with both the technical and business side of things.

tcwc | 13 years ago | on: Why Is Google Fiber the Country’s Only Super-Speed Internet?

In the UK our situation is much better, although BT owns most of the physical cable "Local loop unbundling" forces them to share their lines with competitors.

It seems in the US the only people you can buy from own the physical infrastructure. Once one company has already built their network in an area they form a natural monopoly - it's not economical for competitors to come in and rebuild the network when they know they'll only be able to get a certain % of households to switch.

page 1