HN Jobs

A searchable index of Hacker News “Who is hiring?” job postings.

← All postings · January 2022 thread

Zyte

Senior Data Scientist

CompanyZyte
Websitezyte.com
RoleSenior Data Scientist
Typefull-time
Role taxonomyData / AnalyticsSenior
SpecialtiesData Science
LocationRemote
Salary
Apply viaApplication linkhttps://apply.workable.com/zyte/
Hiring notes
TechNode.jsPythonML/AI
Posted bylopuhin
PostedJan 3, 2022
SourceView on Hacker News ↗

Original posting

Zyte | Senior Data Scientist | REMOTE | Full-time | https://www.zyte.com/ At Zyte (formerly Scrapinghub), our goal is to help you get the data from the web, so we develop services such as smart rotating proxies, browser rendering API, data extraction API, a cloud for running your crawling jobs, etc. At the data science team, the main project we work on is data extraction API, which can extract articles, products, job postings and other data types from any website, and also do automatic crawling and discovery. We approach this as a machine learning problem, with a deep learning model combining web page screenshot, text, node information and other features, trained on hundreds of thousands of web pages. We work on improving the quality of extraction and increasing coverage of attributes and data types. I find this problem really fascinating to work on, as on one side, you get to work on a neural network which uses both image, text and graphs as inputs and can find inspiration from current ML literature, but on the other hand, web extraction is not so well studied, and a great deal of experimentation is required. Our tech stack on the ML side is Python and PyTorch. We love Open Source: Zyte founders are authors of a popular Scrapy framework, and we open source many libraries we heavily rely on internally, such as dateparser and extruct. The company has been fully remote since the start, and hires from a large number of countries. Please check more details at https://wrkbl.ink/iNsRDym, and feel free to check other positions at https://apply.workable.com/zyte/