This task can be performed using Thundercrawl
Thundercrawl – Turn Your Website Into AI Fuel.
Best product for this task

What to expect from an ideal product
- Thundercrawl automatically crawls through your entire website and pulls out all the text content without you having to manually copy and paste from each page
- The tool cleans up your website content by removing HTML tags, navigation menus, and other clutter that would confuse AI training models
- It formats everything into simple .txt files that large language models can easily read and process during their training phase
- You can batch process hundreds of web pages at once instead of converting them one by one, saving hours of manual work
- The output files are structured and organized in a way that maintains the logical flow of your content while being compatible with popular AI training frameworks