So after transforming multispectral satellite data into a 128-dimensional embedding vector you can play "Where's Wally" to pinpoint blackberry bushes? I hope they tasted good! I'm guessing you can pretty much pinpoint any other kind of thing as well then?
Yes it's very good fun just exploring the embeddings! It's all wrapped by the geotessera Python library, so with uv and gdal installed just try this for your favourite region to get a false-colour map of the 128-dimensional embeddings:
# for cambridge
# https://github.com/ucam-eo/geotessera/blob/main/example/CB.geojson
curl -OL https://raw.githubusercontent.com/ucam-eo/geotessera/refs/heads/main/example/CB.geojson
# download the embeddings as geotiffs
uvx geotessera download --region-file CB.geojson -o cb2
# do a false colour PCA down to 3 dimensions from 128
uvx geotessera visualize cb2 cb2.tif
# project onto webmercator and visualise using leafletjs over openstreetmap
uvx geotessera webmap cb2.tif --output cb2-map --serve
Downstream classifiers are really fast to train (seconds for small regions). You can try out a notebook in VSCode to mess around with it graphically using https://github.com/ucam-eo/tessera-interactive-map
The berries were a bit sour, summer is sadly over here!
I haven’t done this kind of thing since undergrad, but hyperspectral data is really frickin cool this way. Not only can you use spectral signatures to identify specific things, but also figure out what those things are made out of by unmixing the spectra.
For example, figure out what crop someone’s growing and decide how healthy it is. With sufficient temporal resolution, you can understand when things are planted and how well they’re growing, how weedy or infiltrated they are by pest plants, how long the soil remains wet or if rainwater runs off and leaves the crop dry earlier than desired. Etc.
If you’re a good guy, you’d leverage this data to empower farmers. If you’re an asshole, you’re looking to see who has planted your crop illegally, or who is breaking your insurance fine print, etc.
> Can a model trained on satellite data really find brambles on the ground?
No, as per researcher, "However, it is obvious that most of the generated findings aren’t brambles" and obviously no.
All the model did was think they followed roads, all roads.
If it was oil and gas where people put in effort and their results where checked vs universities where meaningless citations matter and results are never confirmed, it would be more believable.
What they are asking is impossible, increasing the likelihood without silly hacks like it's not in rivers or on top of buildings is an interesting problem but out of scope for academics.
I was a lot more optimistic about Gabriel's model than he was. It is essentially a presence-only species distribution model where accuracy depends largely on assumptions around prevalence and which really needs some presence-absence data to calibrate.
As I mentioned in one of the other comments, the model is also only pixel-wise. That is, it is not using spatial information for predictions.
FarmLogs (YC 12) did exactly this. We used sat imagery in the near-infrared spectrum to determine crop health remotely. Modern farming utilizes a practice called precision ag - where your machine essentially has a map of zones on the field for where treatments are or aren't needed and controllers that can turn spray nozzles on/off depending on boundaries. We used sat imagery as the base for an automated prescription system, too. So a farmer can reduce waste by only applying fertilizer or herbicide in specific areas that need it.
Well looks like they found a lot of brambles! Were there large areas without any bramble?
Cue dowsers, who successfully find water... but also who would anyway anywhere else because underground water isn't the underground river/pocket that people imagine and thus random chance by itself has high probability of finding water.
We did note several places during the trip that didn't contain bramble. The hotspot in the middle of the residential area was also entirely isolated.
For a proper evaluation you would need to be more methodological but as a sanity-check we were very happy with it.
One other thing to point out about the bramble model is that it is pixel-wise. That is each prediction is exclusively only what is within the 10 metre pixel (give or take the georeferencing error).
Not much detail on the method? Like what data it takes from iNaturalist - for example if it's taking in GPS coordinates of observations of brambles then it's not clear what there is for the ML model to do.
What detail was in the satellite images, was it taking signals of the type of spaces brambles are in, or was it just visually identifying bramble patches?
In the UK you get brambles in pretty much every non-cultivated green space. I wonder how well the classifier did?
Hi! You can find a bit more about Gabriel's model through some of his posts over the last few weeks: https://gabrielmahler.org/posts/
When it comes to the satellite images, the model actually used TESSERA (https://arxiv.org/abs/2506.20380) which is a model we trained to produce embeddings for every point on earth that encodes the temporal-spectral properties over a year.
Think of it like a compression of potentially fifty or a hundred observations of a particular point in earth down to a single 128 dimension vector.
We have a problem with Giant Hogweed and I was thinking about ways to identify hotspots. My guess is that standard satellite imagery, like Google Maps, probably isn’t good enough. To even check if this could work, you’d need high-res imagery (sub-meter), ideally multispectral, and some way to validate it on the ground. What steps should I take to verify if this is possible in a way this was done here?
https://github.com/ucam-eo/geotessera has an image showing our embedding coverage at the moment. Blue areas we have complete coverage for 2024, green areas we cover 2017-2024. We're slowly trying to populate everything 2017-2024 but the constraint is GPU and storage at the moment - each year takes ~20k GPU/200k CPU hours and requires storing and serving 200 terabytes of data. The world is big!
If there is an area you would like prioritised, there's an issue template on the geotessera github repo which we can use to move regions around in the processing queue.
That's actually a great idea! I wonder what kind of feature size would be needed though - TESSERA's embeddings are at a 10 metre resolution so for larger structures you might need some kind of spatial aggregation.
A model I have trained on ASTER and LANDSAT data has major difficulties identifying spots for agate hunting. Even after I've given it extra instruction such as looking only in volcanic terrain (with USGS map provided,) or focusing on mixed signals of hydrous silica and iron, checking near known fault zones in said volcanic areas, it still gave me results everywhere, and almost none matching my criteria.
Plants are a way different and more difficult ballgame (they like to mess up my satellite data) so as I read I am not surprised to see that this didn't really give proper results.
I read this and questioned the statistical methods 101. To say it works, one would also need to check for false positives. And such a check would pick up on "oh it's finding roads and there's a correlation between road and brambles."
Are you thinking of _new_ fresh water sources that emerge in recent years? If you have any candidate lat/lon where this might have happened, we can take a look at the 2024 and earlier embeddings to see if we can spot it.
There is the issue of just how visible truffles are from space though, if they grow under cover. That said, it may still work because you can find habitats that are very likely to have truffles. We've had some promising results looking at fungal biomass.
cuno|5 months ago
avsm|5 months ago
Downstream classifiers are really fast to train (seconds for small regions). You can try out a notebook in VSCode to mess around with it graphically using https://github.com/ucam-eo/tessera-interactive-map
The berries were a bit sour, summer is sadly over here!
Waterluvian|5 months ago
For example, figure out what crop someone’s growing and decide how healthy it is. With sufficient temporal resolution, you can understand when things are planted and how well they’re growing, how weedy or infiltrated they are by pest plants, how long the soil remains wet or if rainwater runs off and leaves the crop dry earlier than desired. Etc.
If you’re a good guy, you’d leverage this data to empower farmers. If you’re an asshole, you’re looking to see who has planted your crop illegally, or who is breaking your insurance fine print, etc.
sadiq|5 months ago
We're hoping to try it with a few different things for our next field trip, maybe some that are much harder to find than brambles.
0_____0|5 months ago
NedF|5 months ago
No, as per researcher, "However, it is obvious that most of the generated findings aren’t brambles" and obviously no.
All the model did was think they followed roads, all roads.
If it was oil and gas where people put in effort and their results where checked vs universities where meaningless citations matter and results are never confirmed, it would be more believable.
What they are asking is impossible, increasing the likelihood without silly hacks like it's not in rivers or on top of buildings is an interesting problem but out of scope for academics.
dmbche|5 months ago
For the "However, it is obvious that most of the generated findings aren’t brambles"
sadiq|5 months ago
As I mentioned in one of the other comments, the model is also only pixel-wise. That is, it is not using spatial information for predictions.
xarope|5 months ago
whalesalad|5 months ago
lloeki|5 months ago
Cue dowsers, who successfully find water... but also who would anyway anywhere else because underground water isn't the underground river/pocket that people imagine and thus random chance by itself has high probability of finding water.
sadiq|5 months ago
For a proper evaluation you would need to be more methodological but as a sanity-check we were very happy with it.
One other thing to point out about the bramble model is that it is pixel-wise. That is each prediction is exclusively only what is within the 10 metre pixel (give or take the georeferencing error).
pbhjpbhj|5 months ago
What detail was in the satellite images, was it taking signals of the type of spaces brambles are in, or was it just visually identifying bramble patches?
In the UK you get brambles in pretty much every non-cultivated green space. I wonder how well the classifier did?
Interesting project.
sadiq|5 months ago
When it comes to the satellite images, the model actually used TESSERA (https://arxiv.org/abs/2506.20380) which is a model we trained to produce embeddings for every point on earth that encodes the temporal-spectral properties over a year.
Think of it like a compression of potentially fifty or a hundred observations of a particular point in earth down to a single 128 dimension vector.
Happy to answer any other questions.
ensocode|5 months ago
sadiq|5 months ago
https://github.com/ucam-eo/geotessera has an image showing our embedding coverage at the moment. Blue areas we have complete coverage for 2024, green areas we cover 2017-2024. We're slowly trying to populate everything 2017-2024 but the constraint is GPU and storage at the moment - each year takes ~20k GPU/200k CPU hours and requires storing and serving 200 terabytes of data. The world is big!
If there is an area you would like prioritised, there's an issue template on the geotessera github repo which we can use to move regions around in the processing queue.
cjensen|5 months ago
jcims|5 months ago
sadiq|5 months ago
folli|5 months ago
It would be interesting to overlay TESSERA data there, although the resolution is of course very different.
spogbiper|5 months ago
https://www.pnas.org/doi/10.1073/pnas.2407652121
lightedman|5 months ago
Plants are a way different and more difficult ballgame (they like to mess up my satellite data) so as I read I am not surprised to see that this didn't really give proper results.
ggm|5 months ago
thinkingemote|5 months ago
Peteragain|5 months ago
themafia|5 months ago
> In every place we checked, we found pretty significant amounts of bramble.
[Shocked Pikachu face]
uwcs|5 months ago
Show us the bee!
avsm|5 months ago
daemonologist|5 months ago
avsm|5 months ago
siva7|5 months ago
sadiq|5 months ago
There is the issue of just how visible truffles are from space though, if they grow under cover. That said, it may still work because you can find habitats that are very likely to have truffles. We've had some promising results looking at fungal biomass.
emorning4|5 months ago
[deleted]