top | item 43522720

(no title)

dlkmp | 11 months ago

Can't help but thinking how handy PowerShell is out of the box for tasks like this.

Translating the examples from the ReadMe, having read the file with:

  $medias = Get-Content .\medias.csv | ConvertFrom-Csv
Previewing the file in the terminal

  xan view medias.csv
  $medias | Format-Table
Reading a flattened representation of the first row

  xan flatten -c medias.csv
  $medias | Format-List
Searching for rows

  xan search -s outreach internationale medias.csv | xan view
  $medias | Where-Object { $_.outreach -eq "internationale" } | Format-Table
Selecting some columns

  xan select foundation_year,name medias.csv | xan view
  $medias | Select-Object -Property foundation_year, name | Format-Table
Sorting the file

  xan sort -s foundation_year medias.csv | xan view -s name,foundation_year
  $medias | Sort-Object -Property foundation_year | Select-Object -Property name, foundation_year | Format-Table
Deduplicating the file on some column

  # Some medias of our corpus have the same ids on mediacloud.org
  xan dedup -s mediacloud_ids medias.csv | xan count && xan count medias.csv
  $medias | Select-Object -ExpandProperty mediacloud_ids -Unique | Measure-Object; $medias | Measure-Object -Property mediacloud_ids
Computing frequency tables

  xan frequency -s edito medias.csv | xan view
  $medias | Group-Object -Property edito | Sort-Object -Property Count -Descending
It's probably orders of magnitude slower, and of course, plotting graphs and so on gets tricky. But for the simple type of analysis I typically do, it's fast enough, I don't need to learn an extra tool, and the auto-completion of column/property names is very convenient.

discuss

order

account-5|11 months ago

I find Nushell even better for these usecases:

    $medias = open .\medias.csv
The above is the initial read and format into table.

I'm currently on my phone so can't go through all the examples, but knowing both PS and nu, nu has the better syntax.

EDIT:

Get data and view in table:

    let $medias = http get https://github.com/medialab/corpora/raw/master/polarisation/medias.csv
    $medias
Get headers:

    $medias | columns
Get count of rows:

   $medias | length
Get flattened, slight more convoluted (caveat there might be a better way):

    $medias | each {print $in}
Search rows:

    $medias | where $it.outreach == 'internationale'
Select columns:

    $medias | select foundation_year name
Sort file:

    $medias | select foundation_year name | sort-by foundation_year
Dedup based on column:

    $medias | uniq-by mediacloud_ids
Computing frequency and histogram

    $medias | histogram edito

SwamyM|11 months ago

Yes, I find PowerShell is criminally underrated for these type of tasks. Even though it's open source and cross-platform, the stigma from it's Windows-centric days is hard to overcome.