HN Jobs

A searchable index of Hacker News “Who is hiring?” job postings.

← All postings · December 2022 thread

Internet Archive

Data Engineer

CompanyInternet Archive
Websitearchive.org
Roles
  • Data Engineer
  • Turn researcher Jupyter notebooks into robust systems
Typefull-time
Role taxonomyData / Analytics
SpecialtiesData Engineering
LocationRemote (US)
Salary
Apply viaEmailavdempsey@archive.org
Hiring notes
TechPythonScalaML/AI
RegionsUS
Posted byavdempsey
PostedDec 1, 2022
SourceView on Hacker News ↗

Original posting

Internet Archive | Data Engineer | Remote (US, CA) | Full-Time | archive.org Internet Archive is a non-profit building a free library of all of the published works of humanity to share with the world. We're not there yet, but we've managed to accumulate some data along the way. Can you help us engineer it? The Archiving and Data Services department provides services to mission-aligned organizations (primarily other libraries and cultural heritage institutions). These services include: web crawling SaaS, managed large-scale crawls, long-term digital preservation, and particularly relevant for this role: making use of these web archives and digital collections. We're looking for a Data Engineer to help us with some of the following: - Turn researcher Jupyter notebooks into robust systems (these notebooks are mostly in Scala) - Develop data munging/wrangling/deriving workflows (we use Spark and Temporal.io) - Help administrate a 7.5 Petabyte Hadoop cluster - Potentially write jobs for our main, in-house long term storage cluster - There's always APIs that need work (these are mostly in Python) - ML experience is an interesting bonus We're fully remote, employees can be based anywhere in US or Canada. This is a new opening as of Dec 1, so new we're still working on getting it posted. If interested, please reach out to Alex at avdempsey [at] archive [dot] org.