I think many of us wish for and have considered developing an open source, local archive of the websites we visit. We want full text search, full page screenshots, good privacy (dont archive sensitive sites). We might wish for p2p encrypted syncing between devices.
Since we wish for this, I presume its been attempted many times.
My questions is my humble hacker news readers? Can any one provide a run down of the existing open source solutions and their features/pros/cons or simply which they prefer?
Every URL I visit is automatically recorded along with the time & date visited, title and other metadata.
If the URL is visited via a link off another webpage, have that relationship recorded. Provide some sort of navigable tree / searchable database. Should be able to easily scale to tens / hundreds of millions of URLs across decades.
If the URL is visited by manually inputting that URL, provide option to type into a field something like "heard this on the radio in a show about xyz..." or "so-and-so told me about this on 2-28-2020 at lunch".
Provide option (when viewing page or drilling into history tree) to:
- paste in a paragraph or two of text from the page to associate for context
- save the entire page in WARC or similar
- rate / star / tag that page
Provide option to delete pages from history -- either entirely, or "scratched out" (maybe with a comment) so one can remember which branches of the tree are not worth following again.
Provide Fuzzy matching as-you-type search across selectable metadata fields
Search all content with regex.
This would likely involve a browser plugin I guess, but it'd be nice to have a browser-independent way of doing this to facilitate multiple browsers on multiple machines. Also, would be good to avoid "extensions no longer supported after browser update" situations.
In the time it took me to get around to typing this up I see there are a lot of other interesting suggestions here...will have to sit down & read through them (and the Linkalot docs) more closely when I've some free time.
(2) Saves the webpage it points to into a git repo (a simple curl should suffice for most websites)
(3) Inserts that URL, title of the page pointed-to by the URL and the optional comment into an org-mode file that lives in the root of the repo
The org-mode file is a highly-searchable and context-preserving database (I can add tags, create hierarchies, add links to and from other relevant (org-mode or not) files) in the most portable format ever: plain text.
I really don't need a web interface. Actually, if I later decide that I need one, I can build one easily on top of this basic system.
I really want to be able to use this across multiple devices: mainly my two computers, and an Android phone. Using git gives me a reliable protocol for syncing between multiple devices. I want it to be a smooth experience on my phone, which would probably require some sort of git-aware app. Something similar to the Android client for the pass password manager would be ideal.
I hear that git repos can be GPG-encrypted. Ideally, I'm able to serve all this off of a repo hosted on a VPS. I don't want to rely on Dropbox (I'm trying to transition away from it) for syncing.
I saw someone recommend Memex as a Firefox extension for full-text search in history and bookmarks the other day. I've started using it, but can't yet comment on its usefulness.
It stills feels a bit complex to share data between my computers (I wish for p2p, Nextcloud support, or something alike). I don't like too much it moving DDG's instant answers to the bottom of the page, nor the default sidebar and highlighter, but that could just take some time getting used to.
I would recommend Bookmark OS. It's not open sources but offers full text search, full page screenshots, and other neat features https://bookmarkos.com
Since I found Wallabag, I did not search for anything else. It fundamentally changed my reading habits. Highly recommended! https://github.com/wallabag/wallabag
Is this an alternative to wallabag with less features? Archiving links is fine , but what's of most interest is archiving the content isn't it? So this is an online bookmark manager? Sorry it's not clear for me I only know ot saves links in plaintext files, you can add them with a bookmark and you can password protect them.
These HTML snippets are what's saved, one per line, in the mentioned plain text file ("links.txt"). The webpage is a dump of this file plus HTML/CSS boilerplate.
https://getmemex.com/ might be what you're looking for. I've tried to use it, but it somehow managed to destroy its database 3 or 4 times. After that I gave up and uninstalled the extension again.
I'd prefer a Go or C-based bookmark manager, that let's you store bookmarks in a markup or YAML-based document(s). Either one for each bookmark, or one for many. That way they can be synced using Google Drive or any cloud sync solution. Then add a web interface on top of that and browser extensions for additional features. There really is no "good" bookmarking solution at this point when comparing tools like pass/gopass for Linux for passwords.
[+] [-] codewithcheese|6 years ago|reply
Since we wish for this, I presume its been attempted many times.
My questions is my humble hacker news readers? Can any one provide a run down of the existing open source solutions and their features/pros/cons or simply which they prefer?
[+] [-] linguaz|6 years ago|reply
Every URL I visit is automatically recorded along with the time & date visited, title and other metadata.
If the URL is visited via a link off another webpage, have that relationship recorded. Provide some sort of navigable tree / searchable database. Should be able to easily scale to tens / hundreds of millions of URLs across decades.
If the URL is visited by manually inputting that URL, provide option to type into a field something like "heard this on the radio in a show about xyz..." or "so-and-so told me about this on 2-28-2020 at lunch".
Provide option (when viewing page or drilling into history tree) to:
- paste in a paragraph or two of text from the page to associate for context
- save the entire page in WARC or similar
- rate / star / tag that page
Provide option to delete pages from history -- either entirely, or "scratched out" (maybe with a comment) so one can remember which branches of the tree are not worth following again.
Provide Fuzzy matching as-you-type search across selectable metadata fields
Search all content with regex.
This would likely involve a browser plugin I guess, but it'd be nice to have a browser-independent way of doing this to facilitate multiple browsers on multiple machines. Also, would be good to avoid "extensions no longer supported after browser update" situations.
In the time it took me to get around to typing this up I see there are a lot of other interesting suggestions here...will have to sit down & read through them (and the Linkalot docs) more closely when I've some free time.
[+] [-] butterthebuddha|6 years ago|reply
(1) Takes a URL and optional comment as input
(2) Saves the webpage it points to into a git repo (a simple curl should suffice for most websites)
(3) Inserts that URL, title of the page pointed-to by the URL and the optional comment into an org-mode file that lives in the root of the repo
The org-mode file is a highly-searchable and context-preserving database (I can add tags, create hierarchies, add links to and from other relevant (org-mode or not) files) in the most portable format ever: plain text.
I really don't need a web interface. Actually, if I later decide that I need one, I can build one easily on top of this basic system.
I really want to be able to use this across multiple devices: mainly my two computers, and an Android phone. Using git gives me a reliable protocol for syncing between multiple devices. I want it to be a smooth experience on my phone, which would probably require some sort of git-aware app. Something similar to the Android client for the pass password manager would be ideal.
I hear that git repos can be GPG-encrypted. Ideally, I'm able to serve all this off of a repo hosted on a VPS. I don't want to rely on Dropbox (I'm trying to transition away from it) for syncing.
[+] [-] bloodm|6 years ago|reply
You have folders from a-z in your data directory.
You save the website in /data/o/oldestcompanies or into a deeper directory to your liking.
Let recoll take care of the rest.
https://www.lesbonscomptes.com/recoll/
[+] [-] MayeulC|6 years ago|reply
It stills feels a bit complex to share data between my computers (I wish for p2p, Nextcloud support, or something alike). I don't like too much it moving DDG's instant answers to the bottom of the page, nor the default sidebar and highlighter, but that could just take some time getting used to.
[+] [-] cparsons3000|6 years ago|reply
[+] [-] sails|6 years ago|reply
[+] [-] kissgyorgy|6 years ago|reply
[+] [-] javajosh|6 years ago|reply
[+] [-] BlackLotus89|6 years ago|reply
[+] [-] xearl|6 years ago|reply
[+] [-] NickBusey|6 years ago|reply
[+] [-] lproven|6 years ago|reply
There is now even a Firefox add-on that works with Linkalot: https://addons.mozilla.org/en-US/firefox/addon/send-tab-url/
[+] [-] mosselman|6 years ago|reply
[+] [-] warpech|6 years ago|reply
I dream of a browser that would merge bookmarks + history into one, with full-text search.
[+] [-] systemfreund|6 years ago|reply
[+] [-] calpaterson|6 years ago|reply
https://github.com/calpaterson/quarchive
Looks like this: https://i.imgur.com/OMGlBpS.png
[+] [-] StavrosK|6 years ago|reply
[+] [-] Grumbledour|6 years ago|reply
Would be great if there was a demo linked (even if the functionality seems really straight forward).
Does it support organizing the links in categories?
[+] [-] lproven|6 years ago|reply
He is a colleague of mine on the documentation team at SUSE.
[+] [-] bullen|6 years ago|reply
It's a bit over-engineered on the db side but it works well.
[+] [-] techntoke|6 years ago|reply
[+] [-] gazelle21|6 years ago|reply
[deleted]