mdouglas_1's comments

mdouglas_1 | 2 years ago | on: Killings in the U.S. are dropping at a historic rate. Will anyone notice?

umm...

I beg to differ on this. The state I grew up in "KY" (don't laugh!) just released some information regarding Louisville. The city is the largest in the state. Someone/somewhere -- no one knows who or how, reported vastly under reported homicide data to the state which made it appear that the city/state murder rates has declined.

THE RATES HAVE INCREASED!!

The proliferation of gun violence is a bit maddening. Couple that with lack of cops. You have a pretty good chance of getting away with the murder. I think the city is on track for 200+ murders. It would be significantly more if not due to the docs getting good at saving gunshot victims.

Perhaps there should be a subsequent article on getting the input data right!

garbage in... garbage out

mdouglas_1 | 6 years ago | on: How to Post a Freelance Job

Hi gus_massa

Thanks for the reply. I'm confused.

The URL 'https://news.ycombinator.com/submitted?id=whoishiring' lists a number of jobs.

I see in the top/dashboard the link to 'submit'.

I have no clue how or where or format to "post" for a freelance opportunity!

So I'm missing something subtle here.. And given questions I've seen on google, others are missing something as well!

thanks to all for the help!

mdouglas_1 | 7 years ago | on: Ask HN: Who is hiring? (July 2018)

Personal Project - Self-funded Cali/Fl - Remote is fine as long as results are obtained

Project Duration/Compensation - Less than a week - TBD (less than $1K)

I'm posting this here in the hopes that short term projects are ok/valid and welcome. (If not, I aplogize)

A crawling project is running into consistency issues in terms of the returned data/content. The crawler targets a dynamic site (no curl/wget) requiring a headless browser solution. The apparent issue - the crawler runs into "issues", and as a result returns inconsistent content. However, if the process iterates/loops it will eventually get the correct content. The test URLs work with a live browser FF/Chrome/Etc and return the result in a few secs. The test crawler often takes minutes!

The current stack for the crawler -- Centos7/Py/Selenium/Chrome (headless)

I'm looking for someone who has serious skills in the domain of headless browser crawling, with a deep/thorough understanding of possible issues with crawling. The goal is to have the crawler return the correct results in a minimum amount of time.

Current possible issues to investigate/solve/handle: -Gateway Timeout Issues -Page Not Found Issues -Other Incorrect/Weird Content!

I'm also willing to contemplate that a consistent crawl can't be achieved, but I'm fairly certain the goal can be accomplished.

If anyone wants to reply for more information, or to discuss, feel free to ping me and let's see what happens.

Thanks

-bruce [email protected]

mdouglas_1 | 8 years ago | on: Ask HN: Freelancer? Seeking freelancer? (June 2017)

SEEKING FREELANCER - I'm in the US -- Anytime is good - Remote is okay as long as we communicate

Got a project for scraping. I'm testing casperjs/selenium. However, we/I are curious as to whether a faster process can be implemented by using the browser (chrome/firefox) directly.

Here's the thought process: -we have a file with complete urls. (these urls could be inserted into the browser manually, and the resulting content generated) -we have an extension that runs within the browser. the extension can be "triggered/fired", to read the urls from the external file, and to extract the resulting content (dynamic or static) and to write the content to an external file. -ideally, the whole process would be able to access/use multiple tabs within the browser space to iterate through the list as fast as possible.

This is obviously a short term project. We're looking to talk to someone asap to see if this is even doable, and to then scope out the process.

This is (hopefully) the 1st of a number of projects we/I have a need for!

If you're interested, send me your contact data to [email protected]

thanks -bruce

page 1