HN Jobs

A searchable index of Hacker News “Who is hiring?” job postings.

← All postings · July 2018 thread

Personal Project - Self-funded

CompanyPersonal Project - Self-funded
LocationRemote
Salary$1k/yr (“an $1K”)
Apply viaEmailbadouglas@gmail.com
Hiring notes
Posted bymdouglas_1
PostedJul 11, 2018
SourceView on Hacker News ↗

Original posting

Personal Project - Self-funded Cali/Fl - Remote is fine as long as results are obtained Project Duration/Compensation - Less than a week - TBD (less than $1K) I'm posting this here in the hopes that short term projects are ok/valid and welcome. (If not, I aplogize) A crawling project is running into consistency issues in terms of the returned data/content. The crawler targets a dynamic site (no curl/wget) requiring a headless browser solution. The apparent issue - the crawler runs into "issues", and as a result returns inconsistent content. However, if the process iterates/loops it will eventually get the correct content. The test URLs work with a live browser FF/Chrome/Etc and return the result in a few secs. The test crawler often takes minutes! The current stack for the crawler -- Centos7/Py/Selenium/Chrome (headless) I'm looking for someone who has serious skills in the domain of headless browser crawling, with a deep/thorough understanding of possible issues with crawling. The goal is to have the crawler return the correct results in a minimum amount of time. Current possible issues to investigate/solve/handle: -Gateway Timeout Issues -Page Not Found Issues -Other Incorrect/Weird Content! I'm also willing to contemplate that a consistent crawl can't be achieved, but I'm fairly certain the goal can be accomplished. If anyone wants to reply for more information, or to discuss, feel free to ping me and let's see what happens. Thanks -bruce badouglas@gmail.com