This task can be performed using Thundercrawl
Thundercrawl – Turn Your Website Into AI Fuel.
Best product for this task

What to expect from an ideal product
- Automatically crawls your website and extracts clean text content without HTML markup or formatting noise
- Converts web pages directly into LLM-optimized .txt files that machine learning models can easily process and understand
- Removes unnecessary elements like navigation menus, ads, and footers to keep only the valuable content for training
- Processes multiple pages at once instead of manually copying and pasting content from each webpage
- Delivers ready-to-use text files in the exact format needed for AI training without additional cleanup work