top | item 19164985

(no title)

mlucy | 7 years ago

I think if you have a small to medium sized dataset of images or text, deep feature extraction would be the first thing I'd try.

I'm not sure what the most interesting problems with that property are. Maybe making specialized classifiers for people based on personal labeling? I've always wanted e.g. a twitter filter that excludes specifically the tweets that I don't want to read from my stream.

discuss

fouc|7 years ago

One problem that intrigues me is Chinese-to-English machine translation. Specifically for a subset of Chinese Martial Arts novels (especially given there's plenty of human translated versions to work with).

So Google/Bing/etc have their own pre-trained models for translations.

How would I access that in order to develop my own refinement w/ the domain specific dataset I put together?

mlucy|7 years ago

I don't think you could get access to the actual models that are being used to run e.g. Google Translate, but if you just want a big pretrained model as a starting point, their research departments release things pretty frequently.

For example, https://github.com/google-research/bert (the multilingual model) might be a pretty good starting point for a translator. It will probably still be a lot of work to get it hooked up to a decoder and trained, though.

There's probably a better pretrained model out there specifically for translation, but I'm not sure where you'd find it.