HN Jobs

A searchable index of Hacker News “Who is hiring?” job postings.

← All postings · November 2024 thread

Common Crawl Foundation

CompanyCommon Crawl Foundation
Websitecommoncrawl.org
Typepart-time
LocationRemote
Salary
Apply viaApplication linkhttps://github.com/commoncrawl/whirlwind-python/
Hiring notes
TechPythonJavaAWS
Posted byccgreg
PostedNov 1, 2024
SourceView on Hacker News ↗

Original posting

Common Crawl Foundation | REMOTE | Full and part-time | https://commoncrawl.org/ | web datasets I'm the CTO at the Common Crawl Foundation, which has a 17 year old, 9 petabyte crawl & archive of the web. Our open dataset has been cited in nearly 10,000 research papers, and is the most-used dataset in the AWS Open Data program. Our organization is also very active in the open source community. We are expanding our engineering team. We're looking for people who are: * Excited about our non-profit, open data mission * Proficient with Python, and hopefully also some Java * Proficient at cloud systems such as Spark/PySpark * Willing to learn. Our current team is composed of engineers who do some data science, and data scientists who do some engineering. We are focused on improving our crawl, making new data products, and using these new data products to improve our crawl. If you'd like a little tour of what our data looks like, please see https://github.com/commoncrawl/whirlwind-python/ Interested? Contact us at jobs zat commoncrawl zot org. Please include a cover letter addressing the above points. Thank you for your interest!