AI Web Scrapper with Fine-Tuned LLaMa 3.1 Chatbot

Introduction

This project involves developing an AI-driven web scraper that extracts and structures information from websites based on user inputs. The scraper utilizes advanced Natural Language Processing (NLP) capabilities through a fine-tuned LLaMA 3.1 model, specifically fine-tuned using the Supervised Fine-Tuning (SFT) technique with LoRA and the Unsloth library. The application features a user-friendly Streamlit interface, allowing users to enter website URLs, specify content extraction parameters, and receive structured data outputs.

Tech Stack:

Python: Main programming language used for the entire project.
Selenium: For automated browsing and scraping of web pages.
BeautifulSoup: For parsing HTML and extracting content.
Streamlit: To build the interactive web application interface.
LLaMA 3.1 Model: For parsing and processing scraped content using NLP.
LoRA (Low-Rank Adaptation): Fine-tuning technique applied to the LLaMA model.
Unsloth Library: Used to fine-tune LLaMA 3.1 model efficiently.
LangChain-Ollama: For structured data extraction and NLP parsing.

Features:

Automated Web Scraping: Enter a URL, and the application automatically scrapes data from the website.
NLP-Driven Content Parsing: Extract and structure specific information using a fine-tuned LLaMA 3.1 model.
Streamlit UI: Simple and intuitive user interface for seamless interaction.
Customisable Data Extraction: Users can define the parsing parameters to control the type of data extracted.

Fig. 1 Streamlit Interface

Github Repository link :

GitHub - ankitk75/AI-Web-Scraper-Chatbot-using-Llama3.1Contribute to ankitk75/AI-Web-Scraper-Chatbot-using-Llama3.1 development by creating an account on GitHub.

Back

Page updated

Google Sites

Report abuse