Being able to search code so efficiently across the whole company’s code base is a huge productivity gain. But I found people who hasn’t worked in Google or other companies that have this capability generally don’t believe it’s a big deal.
Well, you need to have a fairly large codebase with a fair amount of in-house infrastructure and interdependencies before you have both the need to find code that you're not working on and have a dense enough source tree where search makes more sense than just learning the organization and navigating to where you need to go.
Very nice. Reading that makes me miss Google’s internal dev environment. I may not much use anymore Google Search and gmail, but Google’s paid products like GCP (and Google Play Music, Movies, etc.) are awesome. Also, amazing permanent free tier levels. I need to use github for the repos for my books because they are already linked, but for my personal private repos I think I may switch.
Could you share why as a former Google dev you no longer use Google search and gmail? I'm curious why you mentioned that in response to a post about cloud source repositories. Thanks.
Nice. Is the code search the same as what Google has internally? I do miss that.
(Though honestly, the value from the internal code search was that everything was in there. So if you got an error message from a service you were a client of, you could just search for the error message in their code and see what it actually meant. "Aha, this 'optional' field that's marked 'deprecated' isn't actually optional or deprecated," most of the time ;)
The code search functionality is the same as what Google engineers use internally. Cloud Source Repositories uses the same indexing and retrieval technologies on the same type of infrastructure.
As you mention, the more code you have, the more benefit you get from having fast search tools across your entire code base that can perform complex semantic and regular expression queries. Even with smaller codebases, I find it the fastest way to find the code I need.
(Disclaimer: I work at Google and am the PM on this product)
Having glanced at it a bit I don’t see how this can be based on internal code search / kythe. Kythe works by building your codes. That’s how it can find callers and specializations and whatnot. This doesn’t seem to.
I was going to make this same joke. It's hard to put faith into a product when Google keeps killing them off. Wave and Google Code are two top of mind products that I missed when they were gone.
Conversely, I think it's important for Google engineers to be able to try new things and publish products without having to worry about supporting them forever. Is there some sort of happy medium? Like a Google labs of products that are in various states of whimsical testing?
Cloud source looks nice, and for GCP projects I'm sure it will have a nice tie in, but for now I'm 100% more likely to push code to Gitlab until it makes sense to do otherwise.
This looks interesting but impractical for me to mirror hundreds of repos here just to get better search.
Has anyone played with this or similar tools that can provide search by indexing at source. It’s a little surprising that Google doesn’t just index everything public in Github, gitlab, etc.
I would like a good code search solution especially because I have projects across lots of different repo servers but want developers to be able to find code regardless of repo home. A colleague indexed it with solr and that was ok, but had limited semantic ability and definitely nothing like what Google describes where your recent search history and activity is displayed to each user.
And I am glad they do, because when I search on Google, the above mentioned kinds of results are exactly what I am looking for, not individual source files etc.
Thanks for taking a look. I'm glad you find Cloud Source Repositories interesting. We'd love to Cloud Source Repositories more practical to use. Can you explain what is impractical for you? Is the initial process of setting up multiple mirrors? Or the routine of having to go to another site for search?
(Disclaimer: I work at Google and am the PM on this product)
For those that haven't used Google's code search before, I believe this will be very similar to Chromium's code search[0] (which is based on Google's internal one).
> For Java, JavaScript, Go, C++, Python, TypeScript and Proto files, you’ll see result suggestions indicating whether the match is an entity such as a class, method, enum, or field.
One nice thing is how trivial it is to mirror a repository from Bitbucket or GitHub. I wish it was as easy to mirror a repository from Bitbucket to GitHub or GitHub to Bitbucket.
Quickly glanced through the blog post and the docs but didn’t find any mention of Git LFS. Does Cloud Source Repositories support mirroring LFS content?
Cloud Source Repositories currently supports very large repositories. The same backend scales to the needs of the Android open source project, which regularly checks in massive binary files. There are many APKs and VM images in the Git repositories we host. However, Cloud Source Repositories does not support LFS or the mirroring of LFS content.
LFS is not deeply integrated into Git, creating usability problems. Because the content is not part of the object graph of the repository, you have to decide in advance to use Git LFS. You don't get the benefit in existing repositories with large files. You also can't back out --- once you're using Git LFS on a medium-sized file, you can't change your mind and instruct Git to send it inline in fetches, without rewriting history, which breaks existing clients that have cloned the repository.
For the same reason, history mining commands like "git blame" and "git log -S" don't have access to the object.
In addition, it complicates migrating to another host. Usually in Git, you can take out your content by running "git clone --mirror one-url and then "git -C directory.git push --mirror another-url". With Git LFS, this copies over the pointer files but not the underlying large file content and you must remember to take extra steps to instruct the new host about where the blobs are stored.
Google's Cloud Source Repositories team believes that Git itself needs to deal better with large files. The first step of this work within the Git project has been partial clone, which is supported in Cloud Source Repositories and in the public Git 2.17 release (for best results, please use Git 2.19, released September, 2018). If you run "git clone --filter=blob:limit=512M <url>", files larger than 512M will be omitted from the initial clone and fetched automatically on demand when needed (for example during checkout operations). See https://crbug.com/git/2 for more details about this feature. We are continuing to work with the community on adding other features related to large file support into Git.
(Disclaimer: I work at Google and am the PM on this product)
We've generally heard that developers move from Bitbucket to Cloud Source Repositories because they want to use other Google Cloud services and it's simpler for them to manage their source in the same place where they debug, build and deploy that code. Developers often mention that they appreciate the unified identity, IAM permissions, and integrations that Cloud Source Repositories has with Cloud Shell, Cloud Build, Cloud Debugger and others Google Cloud services.
However, developers don't need to move from Bitbucket to take advantage of the utility provided by Cloud Source Repositories. You can mirror any number of your repositories into Cloud Source Repositories and take advantage of the code browser, code search, and various integrations with Google Cloud Platform.
(Disclaimer: I work at Google and am the PM on this product)
GCP has been able to host your repo since before the turndown of Google Code. In fact, the original GCP Source Repo team and the Google Code team were the same team (though the current product has a different backend and team).
I'm glad you like it. Which languages would you like to see support added for?
Note that search works today across all languages but the semantic understanding of source which enhances search is limited to Java, JavaScript, Go, C++, Python, TypeScript and Proto files.
(Disclaimer: I work at Google and am the PM on this product)
I believe the difference is this is part of GCP and their terms for GA products are that they will support for at least one year on deprecated versions / products and not just instantly remove something.
[+] [-] dabei|7 years ago|reply
[+] [-] kngspook|7 years ago|reply
[+] [-] mark_l_watson|7 years ago|reply
[+] [-] bogomipz|7 years ago|reply
[+] [-] ngrilly|7 years ago|reply
[+] [-] monkpit|7 years ago|reply
Well, permanent until they kill the entire service 5 years from now.
[+] [-] jrockway|7 years ago|reply
(Though honestly, the value from the internal code search was that everything was in there. So if you got an error message from a service you were a client of, you could just search for the error message in their code and see what it actually meant. "Aha, this 'optional' field that's marked 'deprecated' isn't actually optional or deprecated," most of the time ;)
[+] [-] russellwolf|7 years ago|reply
The code search functionality is the same as what Google engineers use internally. Cloud Source Repositories uses the same indexing and retrieval technologies on the same type of infrastructure.
As you mention, the more code you have, the more benefit you get from having fast search tools across your entire code base that can perform complex semantic and regular expression queries. Even with smaller codebases, I find it the fastest way to find the code I need.
(Disclaimer: I work at Google and am the PM on this product)
[+] [-] duality|7 years ago|reply
[+] [-] romed|7 years ago|reply
[+] [-] lunch|7 years ago|reply
[+] [-] client4|7 years ago|reply
Conversely, I think it's important for Google engineers to be able to try new things and publish products without having to worry about supporting them forever. Is there some sort of happy medium? Like a Google labs of products that are in various states of whimsical testing?
Cloud source looks nice, and for GCP projects I'm sure it will have a nice tie in, but for now I'm 100% more likely to push code to Gitlab until it makes sense to do otherwise.
[+] [-] prepend|7 years ago|reply
Has anyone played with this or similar tools that can provide search by indexing at source. It’s a little surprising that Google doesn’t just index everything public in Github, gitlab, etc.
I would like a good code search solution especially because I have projects across lots of different repo servers but want developers to be able to find code regardless of repo home. A colleague indexed it with solr and that was ok, but had limited semantic ability and definitely nothing like what Google describes where your recent search history and activity is displayed to each user.
[+] [-] codetrotter|7 years ago|reply
GitHub excludes a few patterns in robots.txt that means a lot of the public data is never seen by crawlers that respect robots.txt, for example:
and https://github.com/robots.txtThey probably want people landing mainly on
- user profiles
- repo roots
- individual issues
- individual pull requests
And I am glad they do, because when I search on Google, the above mentioned kinds of results are exactly what I am looking for, not individual source files etc.
[+] [-] 013a|7 years ago|reply
[+] [-] russellwolf|7 years ago|reply
Thanks for taking a look. I'm glad you find Cloud Source Repositories interesting. We'd love to Cloud Source Repositories more practical to use. Can you explain what is impractical for you? Is the initial process of setting up multiple mirrors? Or the routine of having to go to another site for search?
(Disclaimer: I work at Google and am the PM on this product)
[+] [-] kyrra|7 years ago|reply
[0] https://cs.chromium.org/
[+] [-] pjmlp|7 years ago|reply
Apparently they forgot about Flutter users.
[+] [-] bryanlarsen|7 years ago|reply
[+] [-] briffle|7 years ago|reply
[+] [-] hjuutilainen|7 years ago|reply
[+] [-] russellwolf|7 years ago|reply
Cloud Source Repositories currently supports very large repositories. The same backend scales to the needs of the Android open source project, which regularly checks in massive binary files. There are many APKs and VM images in the Git repositories we host. However, Cloud Source Repositories does not support LFS or the mirroring of LFS content.
LFS is not deeply integrated into Git, creating usability problems. Because the content is not part of the object graph of the repository, you have to decide in advance to use Git LFS. You don't get the benefit in existing repositories with large files. You also can't back out --- once you're using Git LFS on a medium-sized file, you can't change your mind and instruct Git to send it inline in fetches, without rewriting history, which breaks existing clients that have cloned the repository.
For the same reason, history mining commands like "git blame" and "git log -S" don't have access to the object.
In addition, it complicates migrating to another host. Usually in Git, you can take out your content by running "git clone --mirror one-url and then "git -C directory.git push --mirror another-url". With Git LFS, this copies over the pointer files but not the underlying large file content and you must remember to take extra steps to instruct the new host about where the blobs are stored.
Google's Cloud Source Repositories team believes that Git itself needs to deal better with large files. The first step of this work within the Git project has been partial clone, which is supported in Cloud Source Repositories and in the public Git 2.17 release (for best results, please use Git 2.19, released September, 2018). If you run "git clone --filter=blob:limit=512M <url>", files larger than 512M will be omitted from the initial clone and fetched automatically on demand when needed (for example during checkout operations). See https://crbug.com/git/2 for more details about this feature. We are continuing to work with the community on adding other features related to large file support into Git.
(Disclaimer: I work at Google and am the PM on this product)
[+] [-] themihai|7 years ago|reply
[+] [-] russellwolf|7 years ago|reply
We've generally heard that developers move from Bitbucket to Cloud Source Repositories because they want to use other Google Cloud services and it's simpler for them to manage their source in the same place where they debug, build and deploy that code. Developers often mention that they appreciate the unified identity, IAM permissions, and integrations that Cloud Source Repositories has with Cloud Shell, Cloud Build, Cloud Debugger and others Google Cloud services.
However, developers don't need to move from Bitbucket to take advantage of the utility provided by Cloud Source Repositories. You can mirror any number of your repositories into Cloud Source Repositories and take advantage of the code browser, code search, and various integrations with Google Cloud Platform.
(Disclaimer: I work at Google and am the PM on this product)
[+] [-] etaioinshrdlu|7 years ago|reply
[+] [-] skj|7 years ago|reply
[+] [-] wasd|7 years ago|reply
[+] [-] russellwolf|7 years ago|reply
I'm glad you like it. Which languages would you like to see support added for?
Note that search works today across all languages but the semantic understanding of source which enhances search is limited to Java, JavaScript, Go, C++, Python, TypeScript and Proto files.
(Disclaimer: I work at Google and am the PM on this product)
[+] [-] foobaw|7 years ago|reply
[+] [-] unknown|7 years ago|reply
[deleted]
[+] [-] jdlyga|7 years ago|reply
[+] [-] tomassre|7 years ago|reply
[+] [-] devoply|7 years ago|reply
[+] [-] apoorvgarg|7 years ago|reply