top | item 33326196

(no title)

qdequelen | 3 years ago

If you remove the URLs from indexation, it'll generally save a ton of place and will be much, much faster to index. We are thinking about not indexing URLs by default; you can help us by explaining your use case here -> https://github.com/meilisearch/product/discussions/553

Just a detail, if you're making a `du -sh` on your computer, the size on the disk will stay unchanged because we are doing soft deletion ;). Don't worry. It will be physically deleted after a while if you need it in the future.

If you kept the default configuration of Meilisearch, the maximum size of the HTTP payload is 100Mb (for security). You change it here -> https://docs.meilisearch.com/learn/configuration/instance_op...

addDocumentsInBatches() is just an helper to send your big json array into multiple parts, not absolutely sure you'll need it. (Code -> https://github.com/meilisearch/meilisearch-js/blob/807a6d827...)

discuss

hardwaresofton|3 years ago

Thanks! I removed the URLs and now the searchable attributes are only title, description and some author fields!

> Just a detail, if you're making a `du -sh` on your computer, the size on the disk will stay unchanged because we are doing soft deletion ;). Don't worry. It will be physically deleted after a while if you need it in the future.

Ah I was just wildy undershooting the size I gave the PVC! I just gave it much more and it's fine -- right now it's resting around 19Gi of usage, which is actually a bit of a problem considering the data set was only like 4GB or something like that originally. That said, disk is really not an issue so I'll just throw more at it, maybe leave it at 32GB and call it a day (it's around 1.6MM documents out of ~2MM), so shouldn't be too much more.

> If you kept the default configuration of Meilisearch, the maximum size of the HTTP payload is 100Mb (for security). You change it here -> https://docs.meilisearch.com/learn/configuration/instance_op...

Thanks for this, I'll keep this in mind -- so I could actually pass off HUGE chunks to Meilisearch.

It seems like the larger the chunk the more efficient? There didn't seem to be much of a change in how much time it took to work through a chunk of documents, more just that having lots of smaller chunks would go slower. I started off with 10k in a batch, then went to 1k then back to 5k, maybe I should go to 100k docs in a batch and see the performance.

There's a blog post waiting to be written in here...

> addDocumentsInBatches() is just an helper to send your big json array into multiple parts, not absolutely sure you'll need it. (Code -> https://github.com/meilisearch/meilisearch-js/blob/807a6d827...)

Thanks! Was this something someone requested? Is there a tangible benefit (were there some customers that didn't want to split up the payloads themselves)? Because it seems like unnecessary cruft in the API otherwise.