Frequently Asked Questions

Find answers to common questions about our website link scanner and how it works.

Our crawler uses browser automation to fully render each page, including JavaScript execution and dynamic content loading. This ensures we capture all links that are generated client-side, but it means each page requires a full browser session. We also check every unique link for broken status, which involves HTTP requests to validate accessibility. While this thorough approach takes more time than simple HTML parsing, it catches issues that basic crawlers miss.

The free tier currently allows scanning up to 50 pages and 500 links per scan. If your website exceeds these limits, the scan will finish and show the results for the pages and links that were scanned. These limits help us keep the service free for everyone. We are working on an affordable paid tier that will offer much higher limits and additional features in the near future.

Some websites have anti-bot measures that block automated crawlers, even though we use realistic browser headers and user agents to appear legitimate. These 403 errors don't necessarily mean your links are broken for real users - they indicate the target server is blocking our requests. We implement various techniques to minimize detection, but some sites will still block automated access as a security measure.

Our crawler implements intelligent caching - when the same URL appears on multiple pages throughout your site, we only check its status once and cache the result. This significantly improves performance and reduces server load. The crawler reports show both total links encountered and unique links actually checked, so you can see the efficiency gains from our caching system.

We check all HTTP/HTTPS links for accessibility, including links to documents (PDFs, Word docs, images, etc.). However, we intelligently skip trying to crawl non-HTML files as pages - so PDFs and images are validated for accessibility but not parsed for additional links. This prevents errors while ensuring all your downloadable content is working properly.

The crawler follows all internal links recursively until it has discovered every page linked from your starting URL. It automatically stays within your domain and won't crawl external sites (though it will check if external links are accessible). The crawler implements visited page tracking to avoid infinite loops and duplicate work.

For each broken link, you get the source page where it was found, the broken URL, the anchor text of the link, the HTTP status code (404, 403, 500, etc.), and the type of error (HTTP error, network timeout, invalid URL format). This detailed information makes it easy to locate and fix issues on your site.

404 errors mean the page or resource doesn't exist - these are definitely broken links that need fixing. 403 errors indicate access is forbidden, often due to bot detection rather than actual broken links. Timeout errors suggest the server is slow to respond or overloaded. Network errors indicate connection problems. Our crawler categorizes each error type so you can prioritize fixes appropriately.

The current version scans publicly accessible content. However, since we use browser automation for full page rendering, future versions could potentially handle authentication by accepting login credentials or cookies, allowing comprehensive scanning of private areas of your website.

Our crawler is specifically designed for modern websites! We use headless browser automation, ensuring all JavaScript has executed and dynamic content has loaded before extracting links. This makes us particularly effective at crawling SPAs and sites with heavy client-side routing that simpler crawlers often miss.

Still have questions?

Can't find the answer you're looking for? We're here to help you get the most out of Brokyn.

Email us at support@brokyn.com

Frequently Asked Questions

Why is the scan taking so long?

Why do I get fetch errors or see a limit on the number of pages/links scanned?

Why am I seeing 403 Forbidden errors in my results?

How does the crawler avoid checking the same links multiple times?

What types of files and links does the scanner handle?

How deep does the crawler scan my website?

What information do I get about broken links?

What's the difference between 404, 403, and timeout errors?

Can I scan password-protected or authenticated areas of my site?

What happens if the crawler encounters JavaScript-heavy single page applications?

Still have questions?