Skip to content

Commit

Permalink
Merge pull request #866 from ScrapeGraphAI/deps-cleanup
Browse files Browse the repository at this point in the history
Deps cleanup
  • Loading branch information
PeriniM authored Jan 6, 2025
2 parents 927c99b + 8d9c909 commit 8212340
Show file tree
Hide file tree
Showing 35 changed files with 355 additions and 2,616 deletions.
26 changes: 0 additions & 26 deletions .github/update-requirements.yml

This file was deleted.

32 changes: 0 additions & 32 deletions .github/workflows/python-publish.yml

This file was deleted.

111 changes: 47 additions & 64 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,21 +24,6 @@ Just say which information you want to extract and the library will do it for yo
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/sgai-hero.png" alt="ScrapeGraphAI Hero" style="width: 100%;">
</p>

## 🔗 ScrapeGraph API & SDKs
If you are looking for a quick solution to integrate ScrapeGraph in your system, check out our powerful API [here!](https://dashboard.scrapegraphai.com/login)

<p align="center">
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/api-banner.png" alt="ScrapeGraph API Banner" style="width: 100%;">
</p>

We offer SDKs in both Python and Node.js, making it easy to integrate into your projects. Check them out below:

| SDK | Language | GitHub Link |
|-----------|----------|-----------------------------------------------------------------------------|
| Python SDK | Python | [scrapegraph-py](https://github.com/ScrapeGraphAI/scrapegraph-sdk/tree/main/scrapegraph-py) |
| Node.js SDK | Node.js | [scrapegraph-js](https://github.com/ScrapeGraphAI/scrapegraph-sdk/tree/main/scrapegraph-js) |

The Official API Documentation can be found [here](https://docs.scrapegraphai.com/).

## 🚀 Quick install

Expand All @@ -47,35 +32,12 @@ The reference page for Scrapegraph-ai is available on the official page of PyPI:
```bash
pip install scrapegraphai

# IMPORTANT (to fetch websites content)
playwright install
```

**Note**: it is recommended to install the library in a virtual environment to avoid conflicts with other libraries 🐱

<details>
<summary><b>Optional Dependencies</b></summary>
Additional dependecies can be added while installing the library:

- <b>More Language Models</b>: additional language models are installed, such as Fireworks, Groq, Anthropic, Hugging Face, and Nvidia AI Endpoints.

This group allows you to use additional language models like Fireworks, Groq, Anthropic, Together AI, Hugging Face, and Nvidia AI Endpoints.
```bash
pip install scrapegraphai[other-language-models]
```
- <b>Semantic Options</b>: this group includes tools for advanced semantic processing, such as Graphviz.

```bash
pip install scrapegraphai[more-semantic-options]
```

- <b>Browsers Options</b>: this group includes additional browser management tools/services, such as Browserbase.

```bash
pip install scrapegraphai[more-browser-options]
```

</details>


## 💻 Usage
There are multiple standard scraping pipelines that can be used to extract information from a website (or local file).
Expand All @@ -84,13 +46,12 @@ The most common one is the `SmartScraperGraph`, which extracts information from


```python
import json
from scrapegraphai.graphs import SmartScraperGraph

# Define the configuration for the scraping pipeline
graph_config = {
"llm": {
"api_key": "YOUR_OPENAI_APIKEY",
"api_key": "YOUR_OPENAI_API_KEY",
"model": "openai/gpt-4o-mini",
},
"verbose": True,
Expand All @@ -99,33 +60,45 @@ graph_config = {

# Create the SmartScraperGraph instance
smart_scraper_graph = SmartScraperGraph(
prompt="Extract me all the news from the website",
source="https://www.wired.com",
prompt="Extract useful information from the webpage, including a description of what the company does, founders and social media links",
source="https://scrapegraphai.com/",
config=graph_config
)

# Run the pipeline
result = smart_scraper_graph.run()

import json
print(json.dumps(result, indent=4))
```

The output will be a dictionary like the following:

```python
"result": {
"news": [
{
"title": "The New Jersey Drone Mystery May Not Actually Be That Mysterious",
"link": "https://www.wired.com/story/new-jersey-drone-mystery-maybe-not-drones/",
"author": "Lily Hay Newman"
},
{
"title": "Former ByteDance Intern Accused of Sabotage Among Winners of Prestigious AI Award",
"link": "https://www.wired.com/story/bytedance-intern-best-paper-neurips/",
"author": "Louise Matsakis"
},
...
]
{
"description": "ScrapeGraphAI transforms websites into clean, organized data for AI agents and data analytics. It offers an AI-powered API for effortless and cost-effective data extraction.",
"founders": [
{
"name": "Marco Perini",
"role": "Founder & Technical Lead",
"linkedin": "https://www.linkedin.com/in/perinim/"
},
{
"name": "Marco Vinciguerra",
"role": "Founder & Software Engineer",
"linkedin": "https://www.linkedin.com/in/marco-vinciguerra-7ba365242/"
},
{
"name": "Lorenzo Padoan",
"role": "Founder & Product Engineer",
"linkedin": "https://www.linkedin.com/in/lorenzo-padoan-4521a2154/"
}
],
"social_media_links": {
"linkedin": "https://www.linkedin.com/company/101881123",
"twitter": "https://x.com/scrapegraphai",
"github": "https://github.com/ScrapeGraphAI/Scrapegraph-ai"
}
}
```
There are other pipelines that can be used to extract information from multiple pages, generate Python scripts, or even generate audio files.
Expand All @@ -145,20 +118,30 @@ It is possible to use different LLM through APIs, such as **OpenAI**, **Groq**,

Remember to have [Ollama](https://ollama.com/) installed and download the models using the **ollama pull** command, if you want to use local models.

## 🔍 Demo
Official streamlit demo:

[![My Skills](https://skillicons.dev/icons?i=react)](https://scrapegraph-demo-demo.streamlit.app)

Try it directly on the web using Google Colab:
## 📖 Documentation

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1sEZBonBMGP44CtO6GQTwAlL0BGJXjtfd?usp=sharing)

## 📖 Documentation

The documentation for ScrapeGraphAI can be found [here](https://scrapegraph-ai.readthedocs.io/en/latest/).
Check out also the Docusaurus [here](https://docs-oss.scrapegraphai.com/).

## 🔗 ScrapeGraph API & SDKs
If you are looking for a quick solution to integrate ScrapeGraph in your system, check out our powerful API [here!](https://dashboard.scrapegraphai.com/login)

<p align="center">
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/api-banner.png" alt="ScrapeGraph API Banner" style="width: 100%;">
</p>

We offer SDKs in both Python and Node.js, making it easy to integrate into your projects. Check them out below:

| SDK | Language | GitHub Link |
|-----------|----------|-----------------------------------------------------------------------------|
| Python SDK | Python | [scrapegraph-py](https://github.com/ScrapeGraphAI/scrapegraph-sdk/tree/main/scrapegraph-py) |
| Node.js SDK | Node.js | [scrapegraph-js](https://github.com/ScrapeGraphAI/scrapegraph-sdk/tree/main/scrapegraph-js) |

The Official API Documentation can be found [here](https://docs.scrapegraphai.com/).

## 🏆 Sponsors
<div style="text-align: center;">
<a href="https://2ly.link/1zaXG">
Expand Down
25 changes: 0 additions & 25 deletions docs/turkish.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,31 +31,6 @@ playwright install

**Not**: Diğer kütüphanelerle çakışmaları önlemek için kütüphaneyi sanal bir ortamda kurmanız önerilir 🐱

<details>
<summary><b>Opsiyonel Bağımlılıklar</b></summary>
Kütüphaneyi kurarken ek bağımlılıklar ekleyebilirsiniz:

- **Daha Fazla Dil Modeli**: Fireworks, Groq, Anthropic, Hugging Face ve Nvidia AI Endpoints gibi ek dil modelleri kurulur.

Bu grup, Fireworks, Groq, Anthropic, Together AI, Hugging Face ve Nvidia AI Endpoints gibi ek dil modellerini kullanmanızı sağlar.

```bash
pip install scrapegraphai[other-language-models]
```

- **Semantik Seçenekler**: Graphviz gibi gelişmiş semantik işleme araçlarını içerir.

```bash
pip install scrapegraphai[more-semantic-options]
```

- **Tarayıcı Seçenekleri**: Browserbase gibi ek tarayıcı yönetim araçları/hizmetlerini içerir.

```bash
pip install scrapegraphai[more-browser-options]
```

</details>

## 💻 Kullanım

Expand Down
14 changes: 5 additions & 9 deletions examples/anthropic/csv_scraper_anthropic.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,8 @@
"""
import os
from dotenv import load_dotenv
import pandas as pd
from scrapegraphai.graphs import CSVScraperGraph
from scrapegraphai.utils import convert_to_csv, convert_to_json, prettify_exec_info
from scrapegraphai.utils import prettify_exec_info

load_dotenv()

Expand All @@ -17,7 +16,8 @@
curr_dir = os.path.dirname(os.path.realpath(__file__))
file_path = os.path.join(curr_dir, FILE_NAME)

text = pd.read_csv(file_path)
with open(file_path, 'r') as file:
text = file.read()

# ************************************************
# Define the configuration for the graph
Expand All @@ -41,7 +41,7 @@

csv_scraper_graph = CSVScraperGraph(
prompt="List me all the last names",
source=str(text), # Pass the content of the file, not the file object
source=text, # Pass the content of the file
config=graph_config
)

Expand All @@ -53,8 +53,4 @@
# ************************************************

graph_exec_info = csv_scraper_graph.get_execution_info()
print(prettify_exec_info(graph_exec_info))

# Save to json or csv
convert_to_csv(result, "result")
convert_to_json(result, "result")
print(prettify_exec_info(graph_exec_info))
10 changes: 3 additions & 7 deletions examples/anthropic/csv_scraper_graph_multi_anthropic.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,8 @@
"""
import os
from dotenv import load_dotenv
import pandas as pd
from scrapegraphai.graphs import CSVScraperMultiGraph
from scrapegraphai.utils import convert_to_csv, convert_to_json, prettify_exec_info
from scrapegraphai.utils import prettify_exec_info

load_dotenv()
# ************************************************
Expand All @@ -16,7 +15,8 @@
curr_dir = os.path.dirname(os.path.realpath(__file__))
file_path = os.path.join(curr_dir, FILE_NAME)

text = pd.read_csv(file_path)
with open(file_path, 'r') as file:
text = file.read()

# ************************************************
# Define the configuration for the graph
Expand Down Expand Up @@ -48,7 +48,3 @@

graph_exec_info = csv_scraper_graph.get_execution_info()
print(prettify_exec_info(graph_exec_info))

# Save to json or csv
convert_to_csv(result, "result")
convert_to_json(result, "result")
2 changes: 1 addition & 1 deletion examples/openai/depth_search_graph_openai.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

load_dotenv()

openai_key = os.getenv("OPENAI_APIKEY")
openai_key = os.getenv("OPENAI_API_KEY")

graph_config = {
"llm": {
Expand Down
2 changes: 1 addition & 1 deletion examples/openai/search_graph_openai.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
# Define the configuration for the graph
# ************************************************

openai_key = os.getenv("OPENAI_APIKEY")
openai_key = os.getenv("OPENAI_API_KEY")

graph_config = {
"llm": {
Expand Down
2 changes: 1 addition & 1 deletion examples/openai/smart_scraper_openai.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
# ************************************************

smart_scraper_graph = SmartScraperGraph(
prompt="Extract me all the articles",
prompt="Extract me the first article",
source="https://www.wired.com",
config=graph_config
)
Expand Down
2 changes: 1 addition & 1 deletion examples/openai/speech_graph_openai.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
# Define the configuration for the graph
# ************************************************

openai_key = os.getenv("OPENAI_APIKEY")
openai_key = os.getenv("OPENAI_API_KEY")

graph_config = {
"llm": {
Expand Down
Loading

0 comments on commit 8212340

Please sign in to comment.