top | item 5141193

(no title)

This should also be a reminder to everyone that you shouldn't be reliant on a single point of failure for your deploys. It's something that we in the Python community have already encountered (and hopefully learned from) due to the historical unreliability of our equivalent package repo, PyPI.

Have an internal repo that's accessible by your deploy servers, which in turn locally caches anything that you might have previously needed to externally fetch.

discuss

JonnieCache|13 years ago

    bundle package

puts all your app's dependencies in vendor/cache. That can then be put into a git submodule.

The problem then becomes the Gemfile and Gemfile.lock, which should really be in that submodule as well. You need to pass flags to bundler commands because it assumes the Gemfile is in the project root.

timr|13 years ago

I don't think Heroku's deploy is smart enough to recognize that you've packaged, right? It'll still try to bundle install, which would break in the current situation.

I think a full solution requires packaging, and using a modified buildpack that skips the bundle step.

simonw|13 years ago

The way we handle that for our Python deploys is to have a separate "deploy" git repo which includes complete .tar.gz files of all of our dependencies, then have our pip requirements.txt file point to those file paths rather than using external HTTP URLs.

To avoid packages sneakily trying to download their own dependencies from the internet we run pip install with a "--proxy http://localhost:9999 argument (where nothing is actually running on that port) so that we'll see an instant failure if something tries to pull a dependency over the network.

Pewpewarrows|13 years ago

We do something very similar, but like you said there are the occasional sneaky devils trying to download their own dependencies. Nine times out of ten it seems like it's some version of distribute that they insist on fetching.

The non-existant proxy trick seems useful, I'll have to try that out.

mapgrep|13 years ago

Indeed. I presume this is why Perl's package repository CPAN is actually a network of repositories ("Comprehensive Perl Archive Network"); Wikipedia says CPAN "is mirrored worldwide at more than 200 locations."

Does anyone know why rubygems does not work this way? I had always just assumed it did (due to the historical intertwining of Ruby and Perl communities).

rmoriz|13 years ago

The central architecture of rubygems allows you to publish and yank gems within minutes. CPAN takes some hours (and deletion may not be controlled).

Personally I'm a big fan of the CPAN approach as it is fairly simply. Just mirror via FTP. It's a nobrainer to setup and run a mirror.

That said, CPAN's master (PAUSE.cpan.org) is a SPOF as well.

What I like is that not a single party is responsible for paying server bills + maintaining the platform. Ruby Central and the team of volunteers do a great job, but in the end, people only care when something breaks.

Instead every big company/university that profits from the Ruby ecosystem should imho run a public rubygems mirror as a contribution to the open source world. That's common practice for other projects, too. Think of all mirrors of the Linux distributions, kernel.org, cpan, python etc.

=> http://slideshare.net/rmoriz/rubygems-behind-the-gems

I also want to mention, that ftp.ruby-lang.org is a single homed box. There is no other official mirror of the MRI/C-Ruby source that can be used as failover or load balancer. This is bad, too.

phillmv|13 years ago

If I had to guess, I would wager it's because it's expensive and hard. Plus there's the fortunate coincidence that - as far as I recall - rubygems has mostly Just Worked Fine, Thank You Very Much™ .

(I miss the days from when github also hosted a gem repository…)

Solving the authenticity problem alone is probably not fun – tho obviously there is much to be learned from CPAN. Given recent problems there will probably be enough political will to make this happen in the future, though.

steiza|13 years ago

I only recently realized how easy it was to run your own PyPI - it just has to handle a few HTTP GET / POSTs.

If you want to run your own PyPI internally, here's a very simple PyPI server (~150 lines of Python) that I wrote: https://github.com/steiza/simplepypi

po|13 years ago

Also of interest is http://crate.io/

What I've personally been looking for is an easy to setup caching proxy for PyPI. Something that is pip-compatible and serves files if it has them but will also fetch and then store packages if it doesn't. That way you could build up a collection of 3rd party packages over time, without having to explicitly manage it.

It probably wouldn't be hard to roll my own with a reverse proxy but it never gets moved to the front burner.

tobych|13 years ago

We're trying out DjangoPyPI 2 to host our own PyPI. Seems very actively maintained, and works a treat, despite it being still early days.

http://djangopypi2.readthedocs.org/en/latest/

For now though we'll probably just create a new git repo with a folder full of source distros (tarballs and zips), as mentioned above.

rykov|13 years ago

Gemfury also supports private Python packages

profquail|13 years ago

That's a very good point to make.

.NET developers, you can set up a similar cache for NuGet packages to avoid downtime (and reduce bandwidth usage): http://www.hanselman.com/blog/HowToAccessNuGetWhenNuGetorgIs...

kawsper|13 years ago

Is bandwidth usage really a concern?

sumone4life|13 years ago

They still allow you to deploy you just have to explicitly set a variable in the deploy command so they know you are aware whats going on

tlrobinson|13 years ago

The point was unless you also previously cached all your gems somewhere you'd have to deploy using potentially compromised gems from rubygems.

splatcollision|13 years ago

Is this safe if you haven't changed any gems since the last deploy? I have a bugfix that I would like to deploy...

stock_toaster|13 years ago

  >  It's something that we in the Python community have already learned due to the historical unreliability of our equivalent package repo, PyPI.

learned sounds a touch condescending to me for some reason. The python community has certainly run into it, but (anecdote time) in my experience people still often rely on pypi for their deploys (but use the --mirrors option to pip). Encountered may be more appropriate.

pekk|13 years ago

I think you are being oversensitive about a tiny difference of wording, maybe due to some prior history with the Python community?

Pewpewarrows|13 years ago

True, "learned" does sort of imply that it's a best practice now used by nearly everyone in the community. I know that's far from the truth. "Encountered" is more appropriate, so I'll edit my OP.

acdha|13 years ago

I read “learned“ as in “learned the hard way”, which definitely has a different feel