top | item 22178903

GitHub Turned into an Enterprise Under Microsoft?

10 points| aliostad | 6 years ago

I was requested to remove a training file from my Deep Learning Language detection repo (only 64 stars but still). The repo used Deep Learning to detect programming language of a file or snippet. The files and snippets were harvested from public files and snippets of github and stackoverflow. The repo was taken down even after I removed the file from the git history. More info and screenshots here: https://twitter.com/aliostad/status/1222440190821781506?s=20

9 comments

order
[+] zegerjan|6 years ago|reply
The reason you couldn't delete the blob, is because someone forked your repository and GitHub uses git alternates for deduplication of fork networks.

I think you could ask GitHub if you can recreate your repository without the offending blob, and you should be good again.

[+] tastroder|6 years ago|reply
github.com was always a corporate entity and subject to DMCA takedowns.

The notice link so others don't have to hand-type it: https://github.com/github/dmca/blob/master/2020/01/2020-01-2...

I feel like that other twitter user / BSA / IBM as the originators of that takedown notice are more useful targets of animosity here.

[+] aliostad|6 years ago|reply
Lack of communication and courtesy - disrespect to public good. I am happy to remove the mention per his request.
[+] aliostad|6 years ago|reply
Here is the terminal output of what I did to remove the file from the git history:

~/g/aliostad bfg --delete-files 1703 deep-learning-lang-detection.git

Using repo : /Users/alikheyrollahi/github/aliostad/deep-learning-lang-detection.git

Found 72811 objects to protect Found 2 commit-pointing refs : HEAD, refs/heads/master

Protected commits -----------------

These are your protected commits, and so their contents will NOT be altered:

* commit ac12aa68 (protected by 'HEAD') - contains 8 dirty files : - data/stackoverflow-snippets/cpp/1703 (3.0 KB) - data/stackoverflow-snippets/csharp/1703 (835 B) - ...

WARNING: The dirty content above may be removed from other commits, but as the protected commits still use it, it will STILL exist in your repository.

Details of protected dirty content have been recorded here :

/Users/alikheyrollahi/github/aliostad/deep-learning-lang-detection.git.bfg-report/2020-01-27/22-24-03/protected-dirt/

If you really want this content gone, make a manual commit that removes it, and then run the BFG on a fresh copy of your repo.

Cleaning --------

Found 69 commits Cleaning commits: 100% (69/69) Cleaning commits completed in 304 ms.

Updating 1 Ref --------------

Ref Before After --------------------------------------- refs/heads/master | ac12aa68 | c51406cc

Updating references: 100% (1/1) ...Ref update completed in 13 ms.

Commit Tree-Dirt History ------------------------

Earliest Latest | | .................................................DDDDDDDDDDm

D = dirty commits (file tree fixed) m = modified commits (commit message or parents changed) . = clean commits (no changes to file tree)

                         Before     After
 -------------------------------------------
 First modified commit | a4a1bbac | cb32cfbf
 Last dirty commit     | 45322921 | 6b9e8d5d
Deleted files -------------

Filename Git id --------------------------------------------------- 1703 | 530293d7 (614 B), 98c9b646 (3.0 KB), ...

In total, 47 object ids were changed. Full details are logged here:

/Users/alikheyrollahi/github/aliostad/deep-learning-lang-detection.git.bfg-report/2020-01-27/22-24-03

BFG run is complete! When ready, run: git reflog expire --expire=now --all && git gc --prune=now --aggressive

-- You can rewrite history in Git - don't let Trump do it for real! Trump's administration has lied consistently, to make people give up on ever being told the truth. Don't give up: https://www.aclu.org/ --

~/g/aliostad cd deep-learning-lang-detection.git ~/g/a/deep-learning-lang-detection.git git reflog expire --expire=now --all && git gc --prune=now --aggressive Enumerating objects: 89539, done. Counting objects: 100% (89539/89539), done. Delta compression using up to 8 threads Compressing objects: 100% (89537/89537), done. Writing objects: 100% (89539/89539), done. Total 89539 (delta 28336), reused 61123 (delta 0) ~/g/a/deep-learning-lang-detection.git git push Enter passphrase for key '/Users/alikheyrollahi/.ssh/id_rsa': Enumerating objects: 89539, done. Counting objects: 100% (89539/89539), done. Delta compression using up to 8 threads Compressing objects: 100% (61201/61201), done. Writing objects: 100% (89539/89539), 40.83 MiB | 1.01 MiB/s, done. Total 89539 (delta 28336), reused 89539 (delta 28336) remote: Resolving deltas: 100% (28336/28336), done. To github.com:aliostad/deep-learning-lang-detection.git + ac12aa680...c51406cc8 master -> master (forced update) ~/g/a/deep-learning-lang-detection.git cd ..