31: Crawling the Web using Elixir with Oleg Tarasenko and Tze Yiing
Thinking Elixir Podcast - Podcast tekijän mukaan ThinkingElixir.com - Tiistaisin

We talk with Oleg Tarasenko and Tze Yiing about crawling the web using Elixir. Oleg created the crawly project to help solve this problem and Tze Yiing joined him as a contributor and maintainer. We cover how Elixir is well suited to orchestrate crawling, how to deal with login pages, understanding the legal concerns, building a codeless scraper and much more! Show Notes online - http://podcast.thinkingelixir.com/31 Elixir Community News https://dashbit.co/blog/ten-years-ish-of-elixir – January 9th marked the 10th year since the first commit to the Elixir repository https://github.com/elixir-lang/elixir/commit/337c3f2d569a42ebd5fcab6fef18c5e012f9be5b – First commit on the repository https://twitter.com/josevalim/status/1349010127270129670 – Jose Valim reveals the name of his secret project is called 'Nx' https://remote.com/blog/welcoming-elixir-creator-jose-valim – Jose Valim joins Remote as a Technical Adivsor https://twitter.com/josevalim/status/1347858475267854336 – ExUnit will catch SIGQUIT message from CTRL+\ and shows the tests that were running https://github.com/elixir-lang/elixir/blob/master/lib/mix/lib/mix/tasks/test.ex#L34 – ExUnit will print how much time the test suite spent on async tests vs sync tests https://twitter.com/fhunleth/status/1348092050487570433 – Nerves support on the M1 is looking good https://www.youtube.com/playlist?list=PLqj39LCvnOWZl_Pb0Y7wGWijKbTvL4gJg – Elixir Conf 2020 videos have all been publicly released! Do you have some Elixir news to share? Tell us at @ThinkingElixir or email at [email protected] Discussion Resources https://oltarasenko.medium.com/web-scraping-with-elixir-and-crawly-extracting-data-behind-authentication-a52584e9cf13 https://oltarasenko.medium.com/using-elixir-and-crawly-for-price-monitoring-7364d345fc64 – Using Elixir for price monitoring https://hex.pm/packages/crawly https://github.com/oltarasenko/crawly https://www.erlang-solutions.com/blog/web-scraping-with-elixir.html – Oleg's older web scraping with Elixir article https://www.erlang-solutions.com/blog/how-to-build-a-machine-learning-project-in-elixir.html – Building a machine learning projects with Elixir, Tensorflow and Crawly https://oltarasenko.medium.com/what-is-web-scraping-and-why-you-might-want-to-use-it-a0e4b621f6d0 – What is web scraping, and why you might want to use it? https://www.pillowskin.com – Ziinc's project using scraping and aggregation https://www.tensorflow.org/ https://oltarasenko.medium.com/the-unofficial-guide-to-extracting-google-search-results-in-2021-with-elixir-7a6ef80d0f5b https://scrapy.org/ https://github.com/fredwu/crawler https://www.eff.org/deeplinks/2019/09/victory-ruling-hiq-v-linkedin-protects-scraping-public-data – EFF legal interpretation of LinkedIn vs HiQ scraping case https://github.com/scrapinghub/splash/ https://www.joinhoney.com/ https://hexdocs.pm/crawly/readme.html#quickstart – Crawly quickstart guid https://hexdocs.pm/crawly/tutorial.html – Crawley tutorial https://github.com/oltarasenko/crawly_ui – Crawly UI project http://crawlyui.com/ – Crawly UI project page Data is the new gold https://t.me/elixir_crawly – Crawley Telegram group Guest Information https://github.com/oltarasenko – Oleg on Github https://oltarasenko.medium.com/ – Oleg's Blog https://twitter.com/tzeyiing – Lee TzeYiing on Twitter https://github.com/Ziinc – Lee TzeYiing on Github https://www.tzeyiing.com – Lee TzeYiing Blog Find us online Message the show - @ThinkingElixir Email the show - [email protected] Mark Ericksen - @brainlid David Bernheisel - @bernheisel Cade Ward - @cadebward