Merge pull request #866 from ScrapeGraphAI/deps-cleanup

Deps cleanup
ScrapeGraphAI · Jan 6, 2025 · 8212340 · 8212340
2 parents 927c99b + 8d9c909
commit 8212340
Show file tree

Hide file tree

Showing 35 changed files with 355 additions and 2,616 deletions.
diff --git a/.github/update-requirements.yml b/.github/update-requirements.yml
diff --git a/.github/workflows/python-publish.yml b/.github/workflows/python-publish.yml
diff --git a/README.md b/README.md
@@ -24,21 +24,6 @@ Just say which information you want to extract and the library will do it for yo
   <img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/sgai-hero.png" alt="ScrapeGraphAI Hero" style="width: 100%;">
 </p>
 
-## 🔗 ScrapeGraph API & SDKs
-If you are looking for a quick solution to integrate ScrapeGraph in your system, check out our powerful API [here!](https://dashboard.scrapegraphai.com/login)
-
-<p align="center">
-  <img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/api-banner.png" alt="ScrapeGraph API Banner" style="width: 100%;">
-</p>
-
-We offer SDKs in both Python and Node.js, making it easy to integrate into your projects. Check them out below:
-
-| SDK       | Language | GitHub Link                                                                 |
-|-----------|----------|-----------------------------------------------------------------------------|
-| Python SDK | Python   | [scrapegraph-py](https://github.com/ScrapeGraphAI/scrapegraph-sdk/tree/main/scrapegraph-py) |
-| Node.js SDK | Node.js  | [scrapegraph-js](https://github.com/ScrapeGraphAI/scrapegraph-sdk/tree/main/scrapegraph-js) |
-
-The Official API Documentation can be found [here](https://docs.scrapegraphai.com/).
 
 ## 🚀 Quick install
 
@@ -47,35 +32,12 @@ The reference page for Scrapegraph-ai is available on the official page of PyPI:
 ```bash
 pip install scrapegraphai
 
+# IMPORTANT (to fetch websites content)
 playwright install
 ```
 
 **Note**: it is recommended to install the library in a virtual environment to avoid conflicts with other libraries 🐱
 
-<details>
-<summary><b>Optional Dependencies</b></summary>
-Additional dependecies can be added while installing the library:
-
-- <b>More Language Models</b>: additional language models are installed, such as Fireworks, Groq, Anthropic, Hugging Face, and Nvidia AI Endpoints.
-
-  This group allows you to use additional language models like Fireworks, Groq, Anthropic, Together AI, Hugging Face, and Nvidia AI Endpoints.
-  ```bash
-  pip install scrapegraphai[other-language-models]
-  ```
-- <b>Semantic Options</b>: this group includes tools for advanced semantic processing, such as Graphviz.
-
-  ```bash
-  pip install scrapegraphai[more-semantic-options]
-  ```
-
-- <b>Browsers Options</b>: this group includes additional browser management tools/services, such as Browserbase.
-
-  ```bash
-  pip install scrapegraphai[more-browser-options]
-  ```
-
-</details>
-
 
 ## 💻 Usage
 There are multiple standard scraping pipelines that can be used to extract information from a website (or local file).
@@ -84,13 +46,12 @@ The most common one is the `SmartScraperGraph`, which extracts information from
 
 
 ```python
-import json
 from scrapegraphai.graphs import SmartScraperGraph
 
 # Define the configuration for the scraping pipeline
 graph_config = {
     "llm": {
-        "api_key": "YOUR_OPENAI_APIKEY",
+        "api_key": "YOUR_OPENAI_API_KEY",
         "model": "openai/gpt-4o-mini",
     },
     "verbose": True,
@@ -99,33 +60,45 @@ graph_config = {
 
 # Create the SmartScraperGraph instance
 smart_scraper_graph = SmartScraperGraph(
-    prompt="Extract me all the news from the website",
-    source="https://www.wired.com",
+    prompt="Extract useful information from the webpage, including a description of what the company does, founders and social media links",
+    source="https://scrapegraphai.com/",
     config=graph_config
 )
 
 # Run the pipeline
 result = smart_scraper_graph.run()
+
+import json
 print(json.dumps(result, indent=4))
 ```
 
 The output will be a dictionary like the following:
 
 ```python
-"result": {
-    "news": [
-      {
-        "title": "The New Jersey Drone Mystery May Not Actually Be That Mysterious",
-        "link": "https://www.wired.com/story/new-jersey-drone-mystery-maybe-not-drones/",
-        "author": "Lily Hay Newman"
-      },
-      {
-        "title": "Former ByteDance Intern Accused of Sabotage Among Winners of Prestigious AI Award",
-        "link": "https://www.wired.com/story/bytedance-intern-best-paper-neurips/",
-        "author": "Louise Matsakis"
-      },
-    ...
-    ]
+{
+    "description": "ScrapeGraphAI transforms websites into clean, organized data for AI agents and data analytics. It offers an AI-powered API for effortless and cost-effective data extraction.",
+    "founders": [
+        {
+            "name": "Marco Perini",
+            "role": "Founder & Technical Lead",
+            "linkedin": "https://www.linkedin.com/in/perinim/"
+        },
+        {
+            "name": "Marco Vinciguerra",
+            "role": "Founder & Software Engineer",
+            "linkedin": "https://www.linkedin.com/in/marco-vinciguerra-7ba365242/"
+        },
+        {
+            "name": "Lorenzo Padoan",
+            "role": "Founder & Product Engineer",
+            "linkedin": "https://www.linkedin.com/in/lorenzo-padoan-4521a2154/"
+        }
+    ],
+    "social_media_links": {
+        "linkedin": "https://www.linkedin.com/company/101881123",
+        "twitter": "https://x.com/scrapegraphai",
+        "github": "https://github.com/ScrapeGraphAI/Scrapegraph-ai"
+    }
 }
 ```
 There are other pipelines that can be used to extract information from multiple pages, generate Python scripts, or even generate audio files.
@@ -145,20 +118,30 @@ It is possible to use different LLM through APIs, such as **OpenAI**, **Groq**,
 
 Remember to have [Ollama](https://ollama.com/) installed and download the models using the **ollama pull** command, if you want to use local models.
 
-## 🔍 Demo
-Official streamlit demo:
-
-[![My Skills](https://skillicons.dev/icons?i=react)](https://scrapegraph-demo-demo.streamlit.app)
 
-Try it directly on the web using Google Colab:
+## 📖 Documentation
 
 [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1sEZBonBMGP44CtO6GQTwAlL0BGJXjtfd?usp=sharing)
 
-## 📖 Documentation
-
 The documentation for ScrapeGraphAI can be found [here](https://scrapegraph-ai.readthedocs.io/en/latest/).
 Check out also the Docusaurus [here](https://docs-oss.scrapegraphai.com/).
 
+## 🔗 ScrapeGraph API & SDKs
+If you are looking for a quick solution to integrate ScrapeGraph in your system, check out our powerful API [here!](https://dashboard.scrapegraphai.com/login)
+
+<p align="center">
+  <img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/api-banner.png" alt="ScrapeGraph API Banner" style="width: 100%;">
+</p>
+
+We offer SDKs in both Python and Node.js, making it easy to integrate into your projects. Check them out below:
+
+| SDK       | Language | GitHub Link                                                                 |
+|-----------|----------|-----------------------------------------------------------------------------|
+| Python SDK | Python   | [scrapegraph-py](https://github.com/ScrapeGraphAI/scrapegraph-sdk/tree/main/scrapegraph-py) |
+| Node.js SDK | Node.js  | [scrapegraph-js](https://github.com/ScrapeGraphAI/scrapegraph-sdk/tree/main/scrapegraph-js) |
+
+The Official API Documentation can be found [here](https://docs.scrapegraphai.com/).
+
 ## 🏆 Sponsors
 <div style="text-align: center;">
   <a href="https://2ly.link/1zaXG">

diff --git a/docs/turkish.md b/docs/turkish.md
@@ -31,31 +31,6 @@ playwright install
 
 **Not**: Diğer kütüphanelerle çakışmaları önlemek için kütüphaneyi sanal bir ortamda kurmanız önerilir 🐱
 
-<details>
-<summary><b>Opsiyonel Bağımlılıklar</b></summary>
-Kütüphaneyi kurarken ek bağımlılıklar ekleyebilirsiniz:
-
-- **Daha Fazla Dil Modeli**: Fireworks, Groq, Anthropic, Hugging Face ve Nvidia AI Endpoints gibi ek dil modelleri kurulur.
-
-  Bu grup, Fireworks, Groq, Anthropic, Together AI, Hugging Face ve Nvidia AI Endpoints gibi ek dil modellerini kullanmanızı sağlar.
-
-  ```bash
-  pip install scrapegraphai[other-language-models]
-  ```
-
-- **Semantik Seçenekler**: Graphviz gibi gelişmiş semantik işleme araçlarını içerir.
-
-  ```bash
-  pip install scrapegraphai[more-semantic-options]
-  ```
-
-- **Tarayıcı Seçenekleri**: Browserbase gibi ek tarayıcı yönetim araçları/hizmetlerini içerir.
-
-  ```bash
-  pip install scrapegraphai[more-browser-options]
-  ```
-
-</details>
 
 ## 💻 Kullanım
 

diff --git a/examples/anthropic/csv_scraper_anthropic.py b/examples/anthropic/csv_scraper_anthropic.py
@@ -3,9 +3,8 @@
 """
 import os
 from dotenv import load_dotenv
-import pandas as pd
 from scrapegraphai.graphs import CSVScraperGraph
-from scrapegraphai.utils import convert_to_csv, convert_to_json, prettify_exec_info
+from scrapegraphai.utils import prettify_exec_info
 
 load_dotenv()
 
@@ -17,7 +16,8 @@
 curr_dir = os.path.dirname(os.path.realpath(__file__))
 file_path = os.path.join(curr_dir, FILE_NAME)
 
-text = pd.read_csv(file_path)
+with open(file_path, 'r') as file:
+    text = file.read()
 
 # ************************************************
 # Define the configuration for the graph
@@ -41,7 +41,7 @@
 
 csv_scraper_graph = CSVScraperGraph(
     prompt="List me all the last names",
-    source=str(text),  # Pass the content of the file, not the file object
+    source=text,  # Pass the content of the file
     config=graph_config
 )
 
@@ -53,8 +53,4 @@
 # ************************************************
 
 graph_exec_info = csv_scraper_graph.get_execution_info()
-print(prettify_exec_info(graph_exec_info))
-
-# Save to json or csv
-convert_to_csv(result, "result")
-convert_to_json(result, "result")
+print(prettify_exec_info(graph_exec_info))
diff --git a/examples/anthropic/csv_scraper_graph_multi_anthropic.py b/examples/anthropic/csv_scraper_graph_multi_anthropic.py
@@ -3,9 +3,8 @@
 """
 import os
 from dotenv import load_dotenv
-import pandas as pd
 from scrapegraphai.graphs import CSVScraperMultiGraph
-from scrapegraphai.utils import convert_to_csv, convert_to_json, prettify_exec_info
+from scrapegraphai.utils import prettify_exec_info
 
 load_dotenv()
 # ************************************************
@@ -16,7 +15,8 @@
 curr_dir = os.path.dirname(os.path.realpath(__file__))
 file_path = os.path.join(curr_dir, FILE_NAME)
 
-text = pd.read_csv(file_path)
+with open(file_path, 'r') as file:
+    text = file.read()
 
 # ************************************************
 # Define the configuration for the graph
@@ -48,7 +48,3 @@
 
 graph_exec_info = csv_scraper_graph.get_execution_info()
 print(prettify_exec_info(graph_exec_info))
-
-# Save to json or csv
-convert_to_csv(result, "result")
-convert_to_json(result, "result")
diff --git a/examples/openai/depth_search_graph_openai.py b/examples/openai/depth_search_graph_openai.py
@@ -7,7 +7,7 @@
 
 load_dotenv()
 
-openai_key = os.getenv("OPENAI_APIKEY")
+openai_key = os.getenv("OPENAI_API_KEY")
 
 graph_config = {
     "llm": {

diff --git a/examples/openai/search_graph_openai.py b/examples/openai/search_graph_openai.py
@@ -11,7 +11,7 @@
 # Define the configuration for the graph
 # ************************************************
 
-openai_key = os.getenv("OPENAI_APIKEY")
+openai_key = os.getenv("OPENAI_API_KEY")
 
 graph_config = {
     "llm": {

diff --git a/examples/openai/smart_scraper_openai.py b/examples/openai/smart_scraper_openai.py
@@ -28,7 +28,7 @@
 # ************************************************
 
 smart_scraper_graph = SmartScraperGraph(
-    prompt="Extract me all the articles",
+    prompt="Extract me the first article",
     source="https://www.wired.com",
     config=graph_config
 )

diff --git a/examples/openai/speech_graph_openai.py b/examples/openai/speech_graph_openai.py
@@ -20,7 +20,7 @@
 # Define the configuration for the graph
 # ************************************************
 
-openai_key = os.getenv("OPENAI_APIKEY")
+openai_key = os.getenv("OPENAI_API_KEY")
 
 graph_config = {
     "llm": {