top | item 861433

Ask HN: How to read a terabyte flat file

2 points| calgaryo | 16 years ago | reply

For some reason - an application spews some results into a single file. What do you think would be the best way of reading (multiple) from the file?

5 comments

order
[+] gdp|16 years ago|reply
I don't think I understand the question. Are you actually asking how to read from a terabyte flat file, or are you asking how to process a terabyte of sequential data? They are two (related) but distinct questions.
[+] tamersalama|16 years ago|reply
Sorry for not being clear the first time. Proper processing is what I'm after. The reading has to be accompanied by parsing. The file is divided into sections, and each section will have its own parsing / user-actions / processing / output.
[+] nshah|16 years ago|reply
Depending on language restrictions, you may be able to implement read streams that make a pass through the file and create appropriate call-backs when hitting each section...
[+] wmf|16 years ago|reply
First, get the file into Hadoop. From there the parsing and processing should be easier.