top | item 16901599

Why it took a long time to build the tiny link preview on Wikipedia

677 points| subset | 8 years ago |blog.wikimedia.org

243 comments

order
[+] fapjacks|8 years ago|reply
That preview window has been a lifesaver. Honestly, as someone with ADD (or ADHD or whatever it's called these days), Wikipedia is a fucking minefield. I regularly have many Wikipedia articles open in sometimes ten or twenty different windows, each with anywhere from twenty to fifty tabs (I open a new window to delineate a completely new tangent, or if opening a new tab will cause the tab icons to disappear, otherwise I open a new tab in the current window -- I like tabs and make heavy use of Session Buddy and The Great Suspender). The preview window has really cut a lot of the extraneous BS out of my evening-destroying rabbit holes. I figured it was probably a difficult thing to develop, but if anyone involved with its creation passes by this lowly post, please know that you have my sincerest gratitude for making such a meaningful, useful tool.
[+] shakna|8 years ago|reply
I am continually impressed by the markup Wikipedia generates.

They've managed to pull in pretty link previews, scientific notation and a grid layout, whilst building a highly nested markup structure?

The remarkable part? Wikipedia works great inside a text browser like elinks. It works great in a modern browser. Without sacrificing the interactivity people have grown to expect.

[+] saagarjha|8 years ago|reply
Wikipedia's markup is just terrible for trying to do any sort of scraping or analysis. I once tried to write a script that pulled the latest version of macOS from the sidebar of this article[1] and I gave up because it was difficult and brittle in a way that made it nearly impossible. I'd probably have better results parsing the HTML with a regex. Likewise, I know a friend who literally had to scrap an entire project because Wikipedia made it so difficult to get the text of an article without non-word entities in it.

[1] https://en.wikipedia.org/wiki/MacOS

[+] xab9|8 years ago|reply
Last time I checked, their markup was pretty much nightmare fuel, but it should work fine with IE5, I'm sure :)
[+] fwdpropaganda|8 years ago|reply
To me the remarkable part is that Wikipeda stills works great without javascript.
[+] tombrossman|8 years ago|reply
One of the best things about ad blockers is that they are really just general purpose content blockers. Want to disable this permanently? Here's an Adblock Plus-compatible filter to block the div if you find it annoying. Tested and working on uBlock Origin:

  ! Disable link preview popups on Wikipedia
  en.wikipedia.org##.mwe-popups
It's just a div with the class ".mwe-popups", and using your ad blocker will persist the change after clearing cookies, which the preference setting (mentioned elsewhere here) does not.

For Wikipedia in different languages, just change the subdomain in the filter.

[+] ken|8 years ago|reply
A user stylesheet can also be a content blocker, and it doesn't require a third-party extension, and it works across all domains (Wikipedia languages): https://github.com/kengruven/config/blob/master/.calm.css#L2...

It's frustrating that every popular webpage nowadays is so full of distractions that I can't use the web without blocking a lot of it.

[+] thought_alarm|8 years ago|reply
You can disable them by clicking the little gear icon on the pop up itself. I figured that out by googling aroung for half an hour.

It's just wretched, backwards UI all around.

[+] username223|8 years ago|reply
Thank you! I found those things incredibly annoying and useless, and was just about to dig through the source to figure out how to get rid of them.
[+] pwg|8 years ago|reply
Also, they are javascript based. They do not appear when running NoScript set to blockk wikipedia's javascript from executing.
[+] thelastidiot|8 years ago|reply
Not to criticize the hard work that went into doing this feature (I worked on a project using wikipedia/wiktionary data), all the things that had to be achieved to come up with a "simple" preview features are just made hard because the data in wiki media is not machine friendly. Things like the obvious priority order of fields and bizarre templates that one needs to implement to parse the data makes the job unbelievably hard in the first place.
[+] osteele|8 years ago|reply
In UCG gardens — as with data structure and algorithm design — there are trade-offs among retrieval difficulty (friction, for humans; time complexity, for machines), update difficulty, centralization and skill set of contributors, and centralization and skill set of editors, and the complexity of the structures themselves.

IMDB, CYC, Wolfram, and various RDF data sets, sample this space differently, and have different amounts of data and richness, probably as a result.

[+] squeaky-clean|8 years ago|reply
Yeah one of my first web scraping projects was using Wikipedia because I figured it would be easy to parse and have a fairly standard format, right? Well at least it was a good and sobering first lesson about cleaning data.
[+] CM30|8 years ago|reply
Have to say, I'm pleased they created an algorithm to choose the right thumbnail picture here. So many other inplementations of the same idea just pick the first one on the page, and end up with someone's avatar being the featured image.

Indeed, as much as it took a fair bit of time, it seems the reasons behind it are all fairly logical; they actually thought carefully about the functionality and how it should work in various cases rather than just going with something that was 'good enough' to get it out quickly.

Not going to complain about that.

[+] cantrevealname|8 years ago|reply
One downside is that when I move the cursor along while reading an article all sorts of links now pop up at me. What I mean is using the mouse or trackpad to keep my place in the article; sometimes I drag the cursor to highlight the text, especially when skimming. Surely I'm not the only one who does this?
[+] mezzode|8 years ago|reply
I opted into the beta for this way back and have been using it for ages, it was pretty surprising finding out it only just went into wide release. Given how useful I found it I'd have thought it would have been released pretty quickly even when not perfect, but given the results can't complain. https://blog.wikimedia.org/2014/03/26/Hovercards-now-availab...
[+] _delirium|8 years ago|reply
I had no idea what this was talking about, and it appears to be because they've defaulted it to off for existing logged-in users. Maybe that's a way of reducing pushback.

In case you want to turn it on (or off), it's under Preferences->Appearance->Page previews. I think I'll probably leave it off personally. I like the previews that have already existed for a while in the mobile app, but on desktop not so sure.

[+] phkahler|8 years ago|reply
When we do finally send people to Mars I think they'll appreciate having a local copy of Wikipedia with them. This would be one of the top 10 resources beyond what's required to produce air, food, water, and heat. Otherwise, something comes up and any question you have could take almost an hour to get a response from someone on earth.
[+] contoraria|8 years ago|reply
Wikipedia is, by my estimate, to 90% about animals, places, historic places, historic animals, historic people, "notable" living people, and so on. How's that going to be relevant on mars? And who's going to vet all the articles?
[+] anotherfounder|8 years ago|reply
I feel like this is some dystopian alternate reality post. It took 4 years to release a hover popup! Take that in for a second. And the post seems extremely proud, and self-congratulatory about it.

From the post: > Our initial version wasn’t good enough. Our community asked us not to go ahead with it. We answered by listening to them and making it better.

This was 2 years ago, and read the comments on the 39 votes it took to not release Hovercards - https://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28prop...

This meant that the silent majority didn't get these features for 2 years because some considered it 'intrusive'. A feature that you could objectively argue as being a useful utility.

If anything, I think this is an indictment of the complaints against Wikipedia from those observing it and ex-employees. While a benevolent dictatorship might be going too far, such a community process where only the loudest voice wins (over considering what is best for MOST of the users), is surely a broken process. I am surprised that product & design people work there at all in such an environment.

[+] ckoerner|8 years ago|reply
>And the post seems extremely proud, and self-congratulatory about it.

Well, yeah. Doing anything successfully at the scale of Wikipedia is worthy of some praise - and I say that as a US midwesterner - a culture not exactly known as the epitome of hubris. :)

You might claim I have Stockholm syndrome or something since I worked with the team that developed this feature, but the discussion you highlight did have valid feedback. The process for respecting community governance and developing consensus is more complicated than any one person could imagine. It is frustrating and imperfect. Folks at the foundation like myself are trying to do better in how we approach, work with, and deploy software changes. I agree too that it took a long time to develop, but that's not on any one single community's shoulders.

For instance, after doing due diligence we approached the English community again earlier this month and the result of that discussion was quite boring.

https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(miscel...

For a technical example that lead to the time it took, the team looked at how we were generating the previews and saw an opportunity to improve them. Previously we tried to parse a bunch of wikitext with a list as long as my arm of exclusions and edge cases. Then the team figured out a way to return HTML summaries from the source article. Not just something useful for this feature and a huge improvement to how information is rendered (like math formulas). Refactoring the code and implementing a new API endpoint took time.

I hope this doesn't come across as too argumentative, I wanted to provide an alternative perspective from someone who works daily with product teams and communities within the Wikimedia movement.

[+] mistrial9|8 years ago|reply
This is a great collection of comments! because it exposes a deep bias in the readers, who are mainly coders and dev-culture.

Guess what ? perfectly machine-readable data is called a DATABASE, and it works well in its scope.. and if you think that all of human knowledge, history, arts and culture are perfectly represented in a DATABASE, then congratulations, you are already more computer than human in your preconceptions.

There are several important layers to this onion.. lets call one "participation by non-specialists" .. a second "human factors, aesthetics and publishing arts" .. another is "imperfect intermediate products enable evolution" .. yet another is "taking rules before content" ..

Each topic might be an essay in itself.. Generally, I am happy to see XML essentially proposed as the answer to all human information challenges, because it takes less time to blink than to refute it, for me.

[+] yjftsjthsd-h|8 years ago|reply
> Guess what ? perfectly machine-readable data is called a DATABASE, and it works well in its scope.. and if you think that all of human knowledge, history, arts and culture are perfectly represented in a DATABASE, then congratulations, you are already more computer than human in your preconceptions.

It's on a computer. It is a database. Are you also upset at people who smash the subtle beauty of music into unfeeling bits?

[+] d0lph|8 years ago|reply
What information could you not store in a database?
[+] rurban|8 years ago|reply
As the major other php wiki implementor, phpwiki, I can add my input to this.

I've implemented such an ajax page preview and also a page inliner (for expanding page trees via ajax) about 10 years ago. It was major work, because you essentially send a stripped down page in xml, so it needed some architectural changes to separate the various page templates properly. In the end it needed 2 months work. phpwiki has a proper template design and its plugins cannot ever destroy a page layout or harm security.

mediawiki on the other side is horribly undesigned spaghetti code, with no proper templating and plugin integration, so it needed a few years more. It's like parsing html with regexp.

[+] ckoerner|8 years ago|reply
> mediawiki on the other side is horribly undesigned spaghetti code

But like my mom's spaghetti, it's my favorite. :)

Think you can make it better? https://wikimediafoundation.org/wiki/Work_with_us

I work for the Wikimedia Foundation, but this subtle snark is my own, and may not reflect the views of the Foundation.

[+] majewsky|8 years ago|reply
> People seem to like it — we are seeing 5,000 hits to our API a minute to serve those cards that show when you hover over any link.

Uhm, no. It just means that people are hovering over links with their mouse. It does not imply any opinion about the previews.

> The original idea was conceived four years ago

When I was active at Wikipedia/Wikibooks 12 years ago, there was a user script floating around that did the same thing, except I'm not sure if it included an image. (Mediawiki allows a user to define custom CSS and JS that get embedded in every page of their user session.)

I don't mean to express dislike or downplay the hard work that went into this feature, just to add some context.

[+] panic|8 years ago|reply
> Uhm, no. It just means that people are hovering over links with their mouse. It does not imply any opinion about the previews.

When you evaluate features using engagement metrics, there are only two possibilities. If the metric is high, users love the feature. If the metric is low, users don't know the feature exists, and more alerts or "unread" badges must be added to help them learn.

[+] jdlrobson|8 years ago|reply
Hey! (author here)

I sincerely regret the use of that statement "people seem to like it" in my post. I've now removed it as I worry this confuses my message so thank you for pointing it out. This really trivializes all the work that went into A/B testing this and how we measured success. I really recommend a read of https://www.mediawiki.org/wiki/Page_Previews#Success_Metrics... Side note: the volume of traffic was also wrong by several magnitudes... actually 0.5 million)

My main motivation when writing this post was to share how small changes require magnitudes of effort not to express the merits of the feature. As a developer who works with product owners a lot and often get asked how I can build things quicker (I'm sure many can relate). I wanted to provide something useful to other developers that easy looking things are not actually easy to build, so thanks for flagging.

With regards to the user script, yeh that's been around for some time and was the seed for this idea. It's just taken a long time to get that from such a small audience to a mainstream one. It doesn't downplay it in my opinion, just shows how far we've come.

Thanks for reading.

[+] subpixel|8 years ago|reply
I do mean to express dislike, as I'm one of the millions of users giving Wikipedia a false metric and sense of satisfaction. I've seen the pop-up hundreds of times, and have considered it a hindrance to my process every single time.

Something that just happens accidentally is a bug, no matter how useful it might be if it were not happening accidentally.

[+] omegote|8 years ago|reply
Not to mention another probable reason: Mediawiki's codebase is a mess and should be rebuilt from the ground up, if possible not in PHP. I once had to build an extension for it and it still gives me the creeps.
[+] lucideer|8 years ago|reply
> if possible not in PHP

I'm not a fan of PHP as a language, but given the community has been working with PHP for 16 years, it would be odd to switch suddenly to an entirely new language and expect support and adoption.

The codebase is an absolute nightmare though, and a ground-up rebuild would be great. I wonder though about it having a similar affect to the Wordpress codebase: people who recognise the mess stay away completely, and people who don't contribute, leaving a contributor base who isn't really equipped (or extremely willing) to do a quality, maintainable rewrite.

I suspect any rewrite attempt would be doomed to end up being an unmaintainable behemoth.

A better approach would be to focus on secure integration tools and API entry-points, to make users less entrenched and solely dependent on the MW software.

[+] publicfig|8 years ago|reply
There's a lot of negativity in these comments (which is to be expected, as it is still HN after all) but I've been using the preview boxes for a while now and have to say that I absolutely love them. I use Wikipedia a LOT for primative/secondary research and being able to even just figure out the dates someone lived, the very basic information, or even sometimes just a photo saves me from so many instances of new-tab opening of links that I have to remember the context of after I'm done reading the passage. Really happy to see this is more available now

EDIT: This is unrelated, but after reading more of the comments, I legitimately can't believe how absolutely disrespectful and hateful so many of these comments are in here. I appreciate this site as a place where you can express your opinions, even if they aren't just placid support of whoever the OP is, but I really don't want to see this community dive further and further into the echo chamber of hate that it seems to be becoming.

[+] lucideer|8 years ago|reply
Responding to your edit on respectful commenting, I can't see any disrespectful comments in this thread (and generally find discourse on HN better than other places). Have some been deleted?

The top-level comments are mostly positive, with one or two constructively critical ones. There are one or two sub-comments with strongly worded criticism of Wikipedia's markup (the code holding the text), but none that mention people or are in any way ad hominem.

[+] abalone|8 years ago|reply
There's a rule against excessive negativity but respectful, constructive criticism is an important role that HN plays here. Here's a feature that wades into what is arguably browser vendor territory, rethinking the way that hyperlinks work.[1] Is it a good pattern we should adopt throughout the web?

Does its on-by-defaut nature disrupt the reading experience? Does summarizing linked content have problems? What about mobile (now approx. half of traffic and growing)?

I see a few comments using the word "hate" but for the most part the negative ones are just critical, with supporting points. I think a fundamental design pattern like this is worth some scrutiny alongside the support.

[1] Case in point: Safari implemented a feature similar to this a few years ago. It works generally across all sites, uses reserved gestures (3D Touch on mobile, 3-finger-tap on desktop) to gives users full control, and sidesteps the whole summarization problem by using more screen real estate to just show a bigger preview.

[+] yarrel|8 years ago|reply
Describing the comments here as "an echo chamber of hate" is inaccurate and serves to raise the temperature in itself.
[+] ravenstine|8 years ago|reply
It's a very noticeable feature too. I was very delighted to recently discover it, and it really does help Wikipedia to continue to be the revolutionary platform it is. Kudos to those who implemented it.
[+] barkingcat|8 years ago|reply
To be honest, it's been like this for a long time. If you haven't noticed the negativity, you've been reading a different "hacker's news" than everyone else on the internet.
[+] jopsen|8 years ago|reply
> We couldn’t expect every single article to be edited to designate a thumbnail.

It wouldn't surprise me if they could. Not to say that automation isn't great, and for this purpose probably ideal.

But, selecting a thumbnail for every Wikipedia article seems like something the community could easily have done.

[+] anc84|8 years ago|reply
And what a completely senseless thing it would be. Reminds me of all the random stockphotos in articles wasting bandwidth and attention for no gain.
[+] vermontdevil|8 years ago|reply
I hate that preview feature. I always hover over links out of habit ready to click if I’m interested. But that pop up blocks me from reading the text.
[+] anilgulecha|8 years ago|reply
This is useful feature, but do note for some class of wikipedia surfing this is a net-negative:

For the case when I came across a subject I knew not a lot about I would keep queuing them up, leading to an array of pages I'd read about a topic, leading to a deeper understanding about the topic/domain. With this feature, the probability that a page would be queued up would go down.

Sometimes going down the wiki rabbit-hole is the best form of time-sink.