top | item 17133701

(no title)

tharidlynn | 7 years ago

If you are interested in big data/data engineering, I would recommend you to learn python and scala. These 2 languages have been used in everywhere for data engineer world. For instance, Kafka was written in scala/java. Spark was also developed in scala language.

For python, it will increase your productivity in a lot of things such as etl/some scripts and also be very helpful to access various libraries for dealing with data such as numpy/pandas.

In my opinion, the best way to learn these languages is to tackle some problems. You can just start by scraping some data, inject them into distributed messages/logs system like kafka/rabbitmq ,transform them by using some tools like spark,and process them and store into some places that you want such as hdfs/postgresql/cassandra/s3.

discuss

order

No comments yet.