top | item 15653206

Show HN: PageDash, Your Personal Web Archive

116 points| ernsheong | 8 years ago |pagedash.com | reply

66 comments

order
[+] vitovito|8 years ago|reply
Please consider producing archives in WARC format, and either donating captures of public pages to the Internet Archive (and other interested archives), or supporting ways for users to download their own archives in that format for them to donate them themselves and use in systems like Webrecorder.

(Note that a download of just page content and assets isn't enough; WARC stores headers, etc., also.)

[+] ernsheong|8 years ago|reply
Thanks for the comment. Admittedly I bypassed WARC completely as I felt overwhelmed by its technicalities in favor of how I knew the web worked. If I have a better understanding of WARC maybe that can be done, but I make no promises.
[+] tomc1985|8 years ago|reply
Wait, you can't download archived pages?

ANOTHER ----id ----ing cloud service trying to replace files and programs with some BS pricing scheme?

Seriously are the "entrepreneurs" of HN even trying? How pathetic that seemingly everything on this site is someone's jobs program?

[+] ernsheong|8 years ago|reply
Founder here, happy to field questions and feedback!

Right now PageDash is quite a simple product, but hopefully with sufficient traction we can continue to implement things like full-text search, tagging support, link sharing, as well as mobile support. Your support is absolutely crucial to making PageDash come alive even more in the future.

This is my first product, thank you for being nice :)

[+] CJKinni|8 years ago|reply
Would it be possible to add auto-archiving to the extension?

On the $9/mo plan, I'd probably still not hit 100GB/mo of uploads.

My main reason for wanting this feature is that I could use the full text search (when it's available) to search every webpage I've visited. I find myself more and more frequently unable to find things I know existed at one point. I've been thinking of building my own solution where I just archive every page I visit on the fly then build a personal search for pages I've previously visited.

[+] gooseus|8 years ago|reply
Dig it, got some questions...

Is the data stored exclusively on your Google Cloud or can I see my archived pages while offline and backup my web archive locally?

Essentially, what guarantees do I have regarding access to my data should your startup, my local infrastructure, or civilization collapse?

[+] teddyh|8 years ago|reply
Why must everything be a cloud service? I use ScrapBook (http://www.xuldev.org/scrapbook/) to save web pages locally.
[+] wongarsu|8 years ago|reply
Because without a cloud service you can't get people to pay $3-$9/month for something like this.
[+] CabSauce|8 years ago|reply
I really like the idea of this and other, similar products/services. I haven't used any of them since they don't seem to be exactly what I want.

What I really want is auto tagging and classification + semantic search. I don't even really want to have to save the page. I want this functionality on my browsing history.

Maybe some increased functionality for saving specific types of pages. If I save a recipe, I want the service to recognize that it's a recipe and put it in my 'cookbook'. With a consistent format, if possible. If I save a blog post, tag the topic, technology and language used.

[+] ernsheong|8 years ago|reply
I really want auto-tagging via ML classification as well, it's one of the things I wanted other than one-click save when I started the project. That's a really nice to have at the moment and can only be achieved once PageDash matures more. Right now the closest antidote I can offer for your use is to configure a keyboard shortcut to do the extension saving via chrome://extensions > Keyboard shortcuts (bottom) for quick saving.
[+] Accacin|8 years ago|reply
What are the advantages of this over something like pinboard.in?
[+] ernsheong|8 years ago|reply
Unless you are on Pinboard's archiving plan, Pinboard mostly manages just your bookmarks. PageDash doesn't claim to be a bookmark manager, but it really can be one. Bookmark the page, along with the content.
[+] adityar|8 years ago|reply
Signed up - saved my first page - and viewed my dash within 5 mins. Good stuff. Now, all you need is not to go out of business (or open source before you do). Seriously though, good luck on the business side.
[+] ernsheong|8 years ago|reply
Thank you. You're right, hopefully business side holds up. I'll keep it up as long as someone is paying me :)
[+] ernsheong|8 years ago|reply
I should point out that PageDash also tries to handle saving nested pages and iframes, I'm not sure it's something that other archivers try to do.

Also Web Components (custom elements, shadow DOM) support is definitely do-able and something for the pipeline. It's not something even the Internet Archive is capable of right now. Wayback Machine's youtube.com archive is blank.

[+] michaelmior|8 years ago|reply
Looks interesting. Why would I use PageDash over something like Evernote or Pocket?
[+] ernsheong|8 years ago|reply
Good question. PageDash aims to preserve the page in the original format and render it just as you saw it. Right now, Evernote does quite a bad job at rendering, I've used it a lot. Pocket on the other hand specializes at stripping out the HTML and leaving just the content in a reader-mode fashion, though I've not tried their premium offering that also archives.

PageDash archives from the front-end, while many archivers tend to archive by sending a link to the backend which then queries the website remotely, so you might not be archiving what you saw exactly, which admittedly in many cases doesn't matter. The upside of this technicality is that you can save content that you see only when you are logged in!

[+] nels|8 years ago|reply
Have you considered saving files (such as fonts and JS libs) loaded through major CDNs centrally just once instead of storing it again each time a page is saved?

Maybe you already have plans for this, but it would be smart to implement a system that checks whether files are already present on your server so you don't waste any of your user's quota and the server's disk space.

[+] ernsheong|8 years ago|reply
Thanks for the comment! That would be ideal and it has crossed my mind but I have given little thought on how to do de-duplication right (premature optimization from a maker's perspective). Right now each page and its assets sit within it's own "bucket". But yes page assets and all these dependencies can really add up fast.
[+] abainbridge|8 years ago|reply
Excellent work. I can now close all those browser tabs I've had open in the background for weeks, just so I don't lose the page.
[+] ernsheong|8 years ago|reply
Thank you! Would really love to hear your feedback on the product, warts and all. jonathan[[at]]pagedash.com
[+] vpvp|8 years ago|reply
wouldn't OneTab extension be a better solution. I see PageDash as a personal Internet Archive/Wayback Machine
[+] pwenzel|8 years ago|reply
After signing in, my initial reaction is that I wish I didn't have to use a browser extension to save a page.

It would be handy if I could just enter a URL and have it saved, a la Pinboard or Instapaper.

That said, this worked very well on my first try.

[+] ernsheong|8 years ago|reply
Thanks for the comment! Maybe I will make that possible in the future, but for now the advantage of this is that you can save logged-in content, i.e. content that you see when you're logged in. Passing the URL to backend prevents that as the backend is not authenticated, or even worse blocked.
[+] ernsheong|8 years ago|reply
Alright folks, it's 3am where I am at the moment, gonna hit the sack. I'll address more questions and concerns tomorrow. Thank you for all your feedback!
[+] ff7c11|8 years ago|reply
So this works until your cloud site dies. No thanks.
[+] ernsheong|8 years ago|reply
There are a few ways I can go mitigating this.

1) One of them is to provide PageDash with API access to your s3/gcp bucket so that it syncs your pages out to your bucket.

2) Providing an open-source viewer to view files saved within your bucket. It's just like serving a website, really, no more processing needed.

[+] tmlee|8 years ago|reply
This will be great for archiving the past :)
[+] Maarius|8 years ago|reply
Which media files are stored? I assume images yes, videos no?