top | item 21559568

Machine Learning Can Help Unlock the World of Ancient Japan

105 points| hughzhang | 6 years ago |thegradient.pub

8 comments

order

czr|6 years ago

for those interested in the different ml methods attempted for the challenge, many of the top kagglers in this contest posted rundowns of their solution [0].

for those interested in kuronet [1] and the competition alex gave a presentation here (en, [2]) as did tarin (ja, [3]).

for those interested in classical japanese literature, you can see the tools they've built here [4]. tarin is also very active on twitter [5].

[0] https://www.kaggle.com/c/kuzushiji-recognition/discussion

[1] https://arxiv.org/pdf/1910.09433.pdf

[2] https://youtu.be/e2-mVBvygm8?t=1428

[3] https://youtu.be/e2-mVBvygm8?t=537

[4] https://mp.ex.nii.ac.jp/kuronet/

[5] https://twitter.com/tkasasagi

gumby|6 years ago

> Chirashigaki was a method of writing popular in premodern Japanese due to the aesthetic appeal of the text. ... When humans read these documents, they decided where to start reading based on the size of characters and the darkness of the ink.

Extraordinary.

garmaine|6 years ago

Sounds more extraordinary than it is. It basically means “you read it the obvious way.”

9nGQluzmnq3M|6 years ago

Describing kuzushiji ("broken characters") as a "writing system" is a bit much, I'd liken it to a cursive script. Not that this diminishes the importance of ML here, since anybody who's tried to read historical English documents will know that these can also be pretty incomprehensible to the untrained eye:

https://www.nationalarchives.gov.uk/palaeography/doc5_popup/...

But as ever, Japanese ramps up the difficulty because there are thousands of characters instead of 26 letters to deal with.

sova|6 years ago

It would be really cool if they released these generated images en masse because that could help a whole generation of linguists catch up on ancient Japanese literature swiftly. These symbols are hard to decipher, but with enough Machine Generated examples it could advance the field considerably!