In Scade, users have the option to incorporate Python code into their workflows it helps to:
-
Enhance flexibility by introducing custom scripts and models.
-
Manage data efficiently, from retrieval to preprocessing, before using it in the workflow.
-
Optimize resources, potentially saving credits by handling data processing and other computationally expensive tasks in Python.
The node Python Async Function allows you to run Python scripts within your Scade flows. It comes with a pre-filled code example to help you get started.
To use this node effectively in this setup, you need to follow some essential guidelines:
- Function Definition: You must define a function called main(context).
`def main(context):`
`async def main(context):`
`def main(context, *args, **kwargs):`
`async def main(context, *args, **kwargs):`
- Yield Statement: The function should end with a yield statement.
async def main(context, *args, **kwargs):
from datetime import datetime
result = datetime.utcnow()
**yield** result
- Imports: All Python imports must be placed inside the main(context) function, not outside it.
async def main(context, *args, **kwargs):
from datetime **import** datetime
result = datetime.utcnow()
yield result
These rules ensure that your Python code integrates smoothly with the Scade environment, making it a powerful tool to enhance your workflows.
Using on Scade.
Chat with document template
“Chat with Document” refers to an interactive process where a system, powered by a model like ChatGPT, enables users to engage in a Q&A session based on the contents of a document.
The flow involves using a Python node to extract text from a DOC file using libraries such as fitz for PDF handling (if the document is in PDF format), BytesIO for in-memory file manipulation, and docx from the python-docx library to read DOC files specifically. Once the text is extracted, it’s sent to ChatGPT, where the model processes the content, searches for relevant information, and provides a summary based on the text, thereby streamlining the document analysis and summary generation process.
This code is a Python function designed to download a document from a given URL, extract text from either a PDF or DOCX file, and return the extracted text.
def main(context, **kwargs):
import requests
import fitz
from io import BytesIO
from docx import Document
def download_file(url):
response = requests.get(url)
response.raise_for_status()
return BytesIO(response.content)
def extract_text_from_pdf(file_stream):
document = fitz.open(stream=file_stream, filetype="pdf")
text = ""
for page_num in range(document.page_count):
page = document.load_page(page_num)
text += page.get_text()
return text
def extract_text_from_docx(file_stream):
document = Document(file_stream)
text = ""
for para in document.paragraphs:
text += para.text + "\n"
return text
def main(url):
file_stream = download_file(url)
if url.lower().endswith('.pdf'):
text = extract_text_from_pdf(file_stream)
elif url.lower().endswith('.docx'):
text = extract_text_from_docx(file_stream)
else:
raise ValueError("Unsupported file type. Only PDF and DOCX are supported.")
return text
# Example
url = context.variables["axi1-start"]["Document"]
try:
result = main(url)
except Exception as e:
result = e
yield result
Good to know.
Here are the libraries available within Python environment node, covering a range of functionalities from data processing and machine learning to web development and API interactions:
-
Data Processing and Analysis:
-
Pandas and NumPy for efficient data manipulation, analysis, and numerical computations.
-
SciPy for scientific computing, widely used in research and technical applications.
-
-
Machine Learning & Natural Language Processing:
-
Huggingface Hub to access and work with pre-trained NLP models and datasets.
-
LangChain for leveraging language models in custom applications.
-
-
Web Development:
- Django and FastAPI for building full-featured web applications and high-performance APIs.
-
Cloud and API Integrations:
-
Boto3 for AWS services integration and management.
-
Google API Python Client for interacting with various Google APIs.
-
Requests for making HTTP requests, commonly used for API calls.
-
-
Data Storage and Serialization:
-
SQLAlchemy as an ORM for database management.
-
JsonSchema and lxml for parsing and validating JSON and XML data structures.
-
Redis for high-speed, in-memory data storage.
-
-
Asynchronous Programming and Task Queuing:
-
Celery for distributed task handling, ideal for background job processing.
-
Asyncio for building asynchronous applications.
-
-
Development and Testing Tools:
-
Black for consistent code formatting.
-
Pytest (often found in similar setups) for comprehensive testing.
-
-
Security and Encryption:
-
Cryptography for secure data handling and encryption.
-
PyJWT for implementing JSON Web Tokens in authentication workflows.
-
-
Utility Libraries:
-
BeautifulSoup and lxml for web scraping and HTML/XML parsing.
-
Dateutil and Pytz for date and timezone manipulation.
-
TQDM for adding progress bars to loops and long-running tasks.
-
These libraries equip Python node with robust capabilities to handle a broad array of tasks, from data-heavy computations and real-time applications to secure web development and machine learning integration.