Amazon2csv: Amazon products scraper to CSV (no API token required)

yoaviram|7 years ago

Why not use the API? Disclaimer: I'm the author of python-amazon-simple-product-api [1]

[1] https://github.com/yoavaviram/python-amazon-simple-product-a...

k__|7 years ago

Sometimes this isn't possible.

I wrote an app that is basically a new UI for the Amazon products. It runs entirely on the client. The Amazon API simply didn't work in that setup.

AznHisoka|7 years ago

Are you referring to the Product Advertising API?

Doesnt that require you to have a quota of affiliate sales to keep using it? I can’t find where they state this requirement but I remembered they were very sneaky about disclosing this. If you dont have any affiliate sales after X months, your API key will stop working.

raitucarp|7 years ago

Man, looks great. I also build something similar in node.js. I implement everything what documentation said (complete implementation). ICYMI:

https://github.com/Ribhnux/piranhax

wdr1|7 years ago

The API comes with a TOS that severely restricts what you can do with the data.

tducret|7 years ago

[deleted]

amingilani|7 years ago

Scraping Amazon is fun and all, but when you start overdoing it they rate-limit your IP and show you my worst nightmare: the Dogs of Amazon (a 500 page with pictures)

Why do I know this? Because I'm the CTO at Nazdeeq.com where we let users buy Amazon products from countries where they don't ship easily, like Pakistan.

Edit: totally open to partnerships in more countries

jeanlucas|7 years ago

I'm from Brazil and what you said made me curious, not sure why, but Amazon here didn't catch. How did you solve problems like logistics and interest from the public?

yasoob|7 years ago

Hi Amin, your platform seems nice. Just wanted to give you a heads-up that your website is being classified as ["phishing" by Avast](https://i.imgur.com/SmuuRfD.png). I think if you replace "Amazon" in the url with something else it should work fine. Best of luck!

jploh|7 years ago

In the Philippines there's something quite similar called Galleon. They've been recently acquired but I think they might be open to partner. They've expanded to Thailand, if I'm not mistaken.

dewey|7 years ago

Are you using the API or web scraping? We never really had problems with IP banning if the traffic looks like a real user.

Jdam|7 years ago

The issue with those tools is that Amazon changes the product layout very often and heavily conducts A/B tests. I’ve once even heard that computer vision is the most stable way to scrape Amazon. I guess this library will stop working rather soon.

RhodesianHunter|7 years ago

>I’ve once even heard that computer vision is the most stable way to scrape Amazon

At a former employer we scraped Amazon many millions of times per day with very simple old tools that rarely needed updating.

mygo|7 years ago

> I guess this library will stop working rather soon.

Don’t really see that as a dealbreaker. So the library will need maintenance. Normal for libraries to need updates in order to keep up with changes. It works today, and it will work whenever it’s updated. Better than nothing and for many use cases that’s good enough.

hobofan|7 years ago

Search results scraping on Amazon is fairly stable.

What's more difficult is product page scraping, because there you have hundreds of different variations. Some from A/B testing and a lot just being specific things that show up for certain product categories (e.g. video games).

bufferoverflow|7 years ago

I remember trying to build a scraper for Amazon. I quickly discovered that there are many types of item pages, and they change over time too. A/B testing probably. Just to get the price of the product out of their HTML markup reliably was a nightmare, I had to build a huge tree of if-this-then-maybe-that logic.

AdamRoberts|7 years ago

The company I work for (zinc.io) has this: https://zincapi.com/

We brand it as an ordering API, but we also offer retrieving the product data (item details/pricing.) We put a LOT of engineering resources into data quality and maintenance, as the API is core to our flagship product, PriceYak. If you have questions or want a token, email adam@zinc.io and mention this post.

ikeboy|7 years ago

If you're using this for anything serious, it's probably better to sign up for the keepa API at about $50/month and they scrape Amazon for you. Worth it to not need to deal with the complexities.

unknown|7 years ago

[deleted]

AdamM12|7 years ago

Nice. From my experience I've found Parsel [1] (used by scrapy) to be an easier to use HTML parsing library than Beautiful Soup. That's just imo.

[1] https://github.com/scrapy/parsel

microdrum|7 years ago

Hm, another no-API option (at least if you are on WordPress) is: https://wpcommission.com

alex_sp|7 years ago

So how many calls is one allowed before getting banned? Any guidelines on how to use this without breaching T&Cs?

staticautomatic|7 years ago

Am I the only one who thinks this is rather weird, or at least unconventional code for a scraper in Python?

dec0dedab0de|7 years ago

I just took a glance, but nothing seemed too off. Do you care to elaborate?

RobLach|7 years ago

If it works...

kull|7 years ago

It is also illegal to scrape AZ, since if you scrape it , it means you don’t own this content and you are just stilling products data added to the site by produsts proper owners.

zeusk|7 years ago

why aren't Larry and Sergey behind bars, then? Scraping publicly available information is far from illegal.

Also, Interestingly only Alibaba's bots are completely blocked from crawling: https://www.amazon.com/robots.txt

smt88|7 years ago

Why would the owner of a product want to keep their product info a secret?

56 comments