yuanchuan | 8 years ago | on: Ask HN: Why TensorFlow instead of Theano for deep learning?
yuanchuan's comments
yuanchuan | 9 years ago | on: Amazon RDS now supports PostgreSQL 9.6.1
What is more exciting is you can leverage Redshift MPP architecture with this method.
yuanchuan | 9 years ago | on: Airflow: a workflow management platform – Airbnb Engineering
yuanchuan | 10 years ago | on: Ask HN: How to handle 50GB of transaction data each day? (200GB during peak)
If your data are event data, e.g. User activity, clicks, etc, these are non-volatile data which should preserve as-is and you want to enrich them later on for analysis.
You can store these flat files in S3 and use EMR (Hive, Spark) to process them and store it in Redshift. If your files are character delimited files, you can easily create a table definition with Hive/Spark and query it as if it is a RDBMS. You can process your files in EMR using spot instances and it can be as cheap as less than a dollar per hour.
yuanchuan | 10 years ago | on: Crystal, iOS ad blocker, to accept money to let ads through
yuanchuan | 10 years ago | on: Launching a product in just 3652 days
Great advice and now I need to get things started again.
yuanchuan | 11 years ago | on: Command-line tools can be faster than your Hadoop cluster
Cloud solution are totally out due to the nature of the data. Not everything can be done in cloud.
If you have such huge amount of data, the total amount of time it takes to transfer there and compute is not as competitive as an on-premise solution, unless all your data live in the cloud.
yuanchuan | 11 years ago | on: Command-line tools can be faster than your Hadoop cluster
I always throw this analogy to people who misunderstood Hadoop: A stone to crack an egg or a spoon?
Hadoop and RDBMS only have a thin overlapping region in the Venn diagram that describes their capabilities and use cases.
Ultimately, it is cost vs efficiency. Hadoop can solve all data problems. Likewise for RDBMS. This is an engineering tradeoff that people have to make.
As such you won't need to implement/convert your model in another format for usage.