Version 1.8 of the crwlr/crawler package is out, introducing key new functions that will replace existing ones in v2.0. Addressing previous issues with composing crawling result data, this update provides a solution that enhances performance, minimizes memory usage further, and simplifies the process, making it more intuitive and easier to understand.

Since working with generators can be a bit tricky if you're new to them, this post offers an intro on how to use them and highlights common pitfalls to avoid.

Abstract classes cannot be instantiated directly, posing a challenge when testing functionality implemented within the abstract class itself. In this article, I will share my approach to addressing this issue.

This is the first article of our "Crwlr Recipes" series, providing a collection of thoroughly explained code examples for specific crawling and scraping use-cases. This first article describes how you can crawl any website fully (all pages) and extract the data of schema.org structured data objects from all its pages, with just a few lines of code.

My friend Florian Bauer recently posted an article saying that PHP needs a rebranding and that he would rename it to HypeScript. Here's my two cents on that subject.

I'm very proud to announce that version 1.0 of the crawler package is finally released. This article gives you an overview of why you should use this library for your web crawling and scraping jobs.

crwlr.software
What's new in crwlr / crawler v0.6?
2022-10-03

Version 0.6 is probably the biggest update so far with a lot of new features and steps from crawling whole websites, over sitemaps to extracting metadata and schema.org structured data from HTML. Here is an overview of all the new stuff.

crwlr.software
What's new in crwlr / crawler v0.5?
2022-09-03

We're already at v0.5 of the crawler package and this version comes with a lot of new features and improvements. Here's a quick overview of what's new.

There is a new package in town called query-string. It allows to create, access and manipulate query strings for HTTP requests in a very convenient way. Here's a quick overview of what you can do with it and also how it can be used via the url package.

crwlr.software
What's new in crwlr / crawler v0.4
2022-05-10

Last friday version 0.4 of the crawler package was released with some pretty useful improvements. Read what's shipped with this new minor update.

There are already two new 0.x versions of the crawler package. Here a quick summary of what's new in versions 0.2 and 0.3.

crwlr.software
Release of crwlr / crawler v0.1.0
2022-04-18

After months of hard work, today I'm finally releasing the first version (v0.1.0) of the crwlr / crawler package. Here some information on what it is, its state and current and future features.

If you're just starting out in web development, then one very fundamental thing to learn on your journey will be HTTP. I learnt it bit by bit over the course of years, probably like many other Developers. Learning the basics in the very beginning will help you to (faster) identify, understand and solve many problems in the projects you will build. In this post I'll start with an overview.

For a few weeks I'm unemployed now and starting to build my own SaaS project. A very obvious change to my job last year is that I'm alone now and not solely responsible for coding anymore. Here some thoughts on what I think you should focus on and how to organize and juggle it all.

Homograph attacks are using internationalized domain names (IDN) for malicious links including domains that look like trusted organizations. You can use the crwlr Url class to detect and monitor urls containing IDNs in your user's input.

Today I am celebrating that I have finally quit my job and decided to start my own business. I'll try to document my journey and my thoughts on that topic for anyone who is interested. I don't know if it will be successful or fail, but at least you will then know one way how not to do it. Let me start by telling my personal story that led me to this point.

crwlr.software
Why I start crwlr.software
2018-04-15

This is just a short introduction to what crwlr.software is and will become in the future and why you may like it.