top | item 46757593

(no title)

arseniibr | 1 month ago

In an ideal local environment with a properly configured git client, sure. But in real-world CI/CD pipelines, people can use wget, curl, or custom caching layers that often pull the raw pointer file instead of the LFS blob. When that hits torch.load() in production, the service crashes. The tool was designed to catch this integrity mismatch before deployment.

discuss

order

embedding-shape|1 month ago

Right, but if your CI/CD pipeline is fetching repositories that are using Git LFS while whatever pipeline you're creating/maintaining can't actually handle Git LFS, wouldn't you say that it's the pipeline that would have to be fixed?

Trying to patch your CI builds by adding a tool that scans for licenses, "malware" and other metadata errors on top of all of this feels very much like "the wrong solution", fix the issue at the root instead, the pipeline doing the wrong things.

arseniibr|1 month ago

I agree that fixing the pipeline is indeed the correct decision, but I've created this tool to provide the detection.

In a complex environment, you often don't control the upstream ingestion methods used by every team. They might use git lfs, wget, huggingface-cli, or custom caching layers.

Relying solely on the hope that every downstream consumer correctly handles Git LFS is dangerous. This tool acts as a detector to catch those inevitable human or tooling errors before they crash the production.