Python node: add code to your flow

Admin · November 14, 2024, 7:29am

In Scade, users have the option to incorporate Python code into their workflows it helps to:

Enhance flexibility by introducing custom scripts and models.
Manage data efficiently, from retrieval to preprocessing, before using it in the workflow.
Optimize resources, potentially saving credits by handling data processing and other computationally expensive tasks in Python.

The node Python Async Function allows you to run Python scripts within your Scade flows. It comes with a pre-filled code example to help you get started.

To use this node effectively in this setup, you need to follow some essential guidelines:

Function Definition: You must define a function called main(context).

`def main(context):`

`async def main(context):`

`def main(context, *args, **kwargs):`

`async def main(context, *args, **kwargs):`

Yield Statement: The function should end with a yield statement.

async def main(context, *args, **kwargs):
     from datetime import datetime
     result = datetime.utcnow()
     **yield** result

Imports: All Python imports must be placed inside the main(context) function, not outside it.

async def main(context, *args, **kwargs):
     from datetime **import** datetime
     result = datetime.utcnow()
     yield result

These rules ensure that your Python code integrates smoothly with the Scade environment, making it a powerful tool to enhance your workflows.

Using on Scade.

Chat with document template

“Chat with Document” refers to an interactive process where a system, powered by a model like ChatGPT, enables users to engage in a Q&A session based on the contents of a document.

The flow involves using a Python node to extract text from a DOC file using libraries such as fitz for PDF handling (if the document is in PDF format), BytesIO for in-memory file manipulation, and docx from the python-docx library to read DOC files specifically. Once the text is extracted, it’s sent to ChatGPT, where the model processes the content, searches for relevant information, and provides a summary based on the text, thereby streamlining the document analysis and summary generation process.

This code is a Python function designed to download a document from a given URL, extract text from either a PDF or DOCX file, and return the extracted text.

def main(context, **kwargs):
  import requests
  import fitz
  from io import BytesIO
  from docx import Document
  
  def download_file(url):
      response = requests.get(url)
      response.raise_for_status()
      return BytesIO(response.content)
  
  def extract_text_from_pdf(file_stream):
      document = fitz.open(stream=file_stream, filetype="pdf")
      text = ""
      for page_num in range(document.page_count):
          page = document.load_page(page_num)
          text += page.get_text()
      return text
  
  def extract_text_from_docx(file_stream):
      document = Document(file_stream)
      text = ""
      for para in document.paragraphs:
          text += para.text + "\n"
      return text
  
  def main(url):
      file_stream = download_file(url)
      
      if url.lower().endswith('.pdf'):
          text = extract_text_from_pdf(file_stream)
      elif url.lower().endswith('.docx'):
          text = extract_text_from_docx(file_stream)
      else:
          raise ValueError("Unsupported file type. Only PDF and DOCX are supported.")
      
      return text
  
  # Example
  url = context.variables["axi1-start"]["Document"]
  try:
      result = main(url)
  except Exception as e:
      result = e
  yield result

Good to know.

Here are the libraries available within Python environment node, covering a range of functionalities from data processing and machine learning to web development and API interactions:

Data Processing and Analysis:
- Pandas and NumPy for efficient data manipulation, analysis, and numerical computations.
- SciPy for scientific computing, widely used in research and technical applications.
Machine Learning & Natural Language Processing:
- Huggingface Hub to access and work with pre-trained NLP models and datasets.
- LangChain for leveraging language models in custom applications.
Web Development:
- Django and FastAPI for building full-featured web applications and high-performance APIs.
Cloud and API Integrations:
- Boto3 for AWS services integration and management.
- Google API Python Client for interacting with various Google APIs.
- Requests for making HTTP requests, commonly used for API calls.
Data Storage and Serialization:
- SQLAlchemy as an ORM for database management.
- JsonSchema and lxml for parsing and validating JSON and XML data structures.
- Redis for high-speed, in-memory data storage.
Asynchronous Programming and Task Queuing:
- Celery for distributed task handling, ideal for background job processing.
- Asyncio for building asynchronous applications.
Development and Testing Tools:
- Black for consistent code formatting.
- Pytest (often found in similar setups) for comprehensive testing.
Security and Encryption:
- Cryptography for secure data handling and encryption.
- PyJWT for implementing JSON Web Tokens in authentication workflows.
Utility Libraries:
- BeautifulSoup and lxml for web scraping and HTML/XML parsing.
- Dateutil and Pytz for date and timezone manipulation.
- TQDM for adding progress bars to loops and long-running tasks.

These libraries equip Python node with robust capabilities to handle a broad array of tasks, from data-heavy computations and real-time applications to secure web development and machine learning integration.

Admin · December 2, 2024, 4:24pm

List of pre-installed libraries:
ABBYY 0.3
aio-pika 9.4.3
aioboto3 11.3.1
aiobotocore 2.6.0
aiodns 3.2.0
aiofiles 23.2.1
aiohappyeyeballs 2.4.0
aiohttp 3.10.5
aiohttp-retry 2.8.3
aioitertools 0.12.0
aiokafka 0.10.0
aiormq 6.8.1
aioshutil 1.5
aiosignal 1.3.1
aiosmtplib 2.0.2
alembic 1.13.2
amqp 5.2.0
anthropic 0.29.2
anyio 3.7.1
apify 1.7.2
apify_client 1.6.4
apify_shared 1.1.2
argcomplete 3.5.0
argon2-cffi 23.1.0
argon2-cffi-bindings 21.2.0
asgiref 3.8.1
asteval 0.9.33
async-timeout 4.0.3
asyncache 0.3.1
asyncio 3.4.3
asyncpg 0.27.0
attrs 24.2.0
babel 2.16.0
backoff 2.2.1
bcrypt 4.1.2
beautifulsoup4 4.12.3
billiard 4.2.0
black 24.8.0
blinker 1.8.2
boto3 1.28.17
botocore 1.31.17
Brotli 1.1.0
bs4 0.0.2
build 1.2.2.post1
CacheControl 0.14.0
cachetools 5.5.0
celery 5.4.0
certifi 2024.8.30
cffi 1.17.1
charset-normalizer 3.3.2
chroma-hnswlib 0.7.1
chromadb 0.4.1
ci-info 0.3.0
cleo 2.1.0
click 8.1.7
click-didyoumean 0.3.1
click-plugins 1.1.1
click-repl 0.3.0
cmake 3.28.1
colorama 0.4.6
coloredlogs 15.0.1
configobj 5.0.8
configparser 7.1.0
containers 0.0.4
crashtest 0.4.1
cron-descriptor 1.4.5
cryptography 42.0.8
dataclasses-json 0.6.7
datamodel-code-generator 0.25.9
decorator 4.4.2
deepdiff 6.7.1
deepl 1.18.0
defusedxml 0.7.1
dependency-injector 4.41.0
Deprecated 1.2.14
diff-match-patch 20230430
dirtyjson 1.0.8
distlib 0.3.9
distro 1.9.0
dj-database-url 2.2.0
Django 5.1.1
django-admin-sortable2 2.2.2
django-fieldsignals 0.7.0
django-import-export 3.3.9
django-jsonform 2.21.2
django-storages 1.14.4
dnspython 2.6.1
docx 0.2.4
dulwich 0.21.7
elevenlabs 1.9.0
email-validator 1.3.1
environs 9.5.0
et-xmlfile 1.1.0
etelemetry 0.3.1
exceptiongroup 1.2.2
Faker 28.4.1
fastapi 0.96.1
fastapi-healthcheck 0.2.12
fastapi-injector 0.5.4
fastapi-mail 1.3.1
fastapi-users 13.0.0
fastapi-users-db-sqlalchemy 5.0.0
fastjsonschema 2.20.0
ffmpeg-asyncio 0.1.3
ffmpeg-python 0.2.0
filelock 3.15.4
fitz 0.0.1.dev2
flatbuffers 24.3.25
frozenlist 1.4.1
fsspec 2024.9.0
func_timeout 4.3.5
future 1.0.0
genson 1.3.0
google-ai-generativelanguage 0.6.6
google-api-core 2.19.2
google-api-python-client 2.123.0
google-auth 2.34.0
google-auth-httplib2 0.1.1
google-auth-oauthlib 1.2.1
google-generativeai 0.7.2
googleapis-common-protos 1.65.0
gql 3.5.0
graphql-core 3.2.4
greenlet 3.0.3
grpcio 1.63.0
grpcio-status 1.62.3
grpcio-tools 1.62.2
gunicorn 21.2.0
h11 0.14.0
h2 4.1.0
hpack 4.0.0
html2text 2024.2.26
httpcore 1.0.5
httplib2 0.22.0
httptools 0.6.1
httpx 0.27.2
huggingface-hub 0.24.6
humanfriendly 10.0
hyperframe 6.0.1
idna 3.8
imageio 2.35.1
imageio-ffmpeg 0.5.1
importlib_metadata 8.4.0
importlib_resources 6.4.4
inflect 5.6.2
injector 0.21.0
inquirerpy 0.3.4
installer 0.7.0
isodate 0.6.1
isort 5.13.2
itsdangerous 2.2.0
jaraco.classes 3.4.0
jeepney 0.8.0
Jinja2 3.1.4
jiter 0.5.0
jmespath 1.0.1
joblib 1.4.2
jsf 0.6.0
jsonfield 3.1.0
jsonpatch 1.33
jsonpickle 3.3.0
jsonpointer 3.0.0
jsonref 1.1.0
jsonschema 4.23.0
jsonschema-specifications 2023.12.1
keyring 24.3.1
kombu 5.4.0
langchain 0.1.17
langchain-community 0.0.37
langchain-core 0.1.52
langchain-openai 0.1.6
langchain-text-splitters 0.0.2
langsmith 0.1.115
llama-hub 0.0.60
llama-index 0.9.48
looseversion 1.3.0
lumaai 1.0.2
lxml 5.3.0
mailchimp-transactional 1.0.56
makefun 1.15.4
Mako 1.3.5
MarkupPy 1.14
MarkupSafe 2.1.5
marshmallow 3.22.0
martian-python 1.5.3
memory-profiler 0.61.0
monotonic 1.6
more-itertools 10.5.0
moviepy 1.0.3
mpmath 1.3.0
msgpack 1.1.0
multidict 6.0.5
mypy-extensions 1.0.0
neo4j 5.24.0
nest-asyncio 1.6.0
netaddr 1.3.0
networkx 3.3
nibabel 5.2.1
nipype 1.8.6
nltk 3.9.1
numpy 1.26.0
oauthlib 3.2.2
odfpy 1.4.1
onnxruntime 1.19.2
openai 1.51.0
opencv-python 4.8.0.74
openpyxl 3.1.5
ordered-set 4.1.0
orjson 3.10.7
ory-kratos-client 1.1.0
overrides 7.7.0
packaging 23.2
pamqp 3.3.0
pandas 2.2.2
param 2.1.1
paramiko 3.4.1
passlib 1.7.4
pathlib 1.0.1
pathspec 0.12.1
pexpect 4.9.0
pfzy 0.3.4
Pillow 9.5.0
pillow_heif 0.16.0
pip 24.2
pkginfo 1.11.2
platformdirs 4.2.2
poetry 1.8.3
poetry-core 1.9.0
poetry-plugin-export 1.8.0
portalocker 2.10.1
posthog 3.6.3
prettytable 3.11.0
proglog 0.1.10
prompt_toolkit 3.0.47
proto-plus 1.24.0
protobuf 4.25.4
prov 2.0.0
providers 0.0.2
psutil 6.0.0
psycopg2-binary 2.9.9
ptyprocess 0.7.0
pulsar-client 3.5.0
pwdlib 0.2.0
py-cpuinfo 9.0.0
pyaml 23.12.0
pyasn1 0.6.0
pyasn1_modules 0.4.0
pycares 4.4.0
pycparser 2.22
pycron 3.0.0
pydantic 1.10.14
pydantic_core 2.23.2
pydot 3.0.1
pyee 12.0.0
pyinstrument 4.7.3
PyJWT 2.8.0
PyMuPDF 1.24.10
PyMuPDFb 1.24.10
PyNaCl 1.5.0
pyOpenSSL 24.2.1
pyparsing 3.1.4
PyPDF2 3.0.1
PyPika 0.48.9
pyproject_hooks 1.2.0
python-dateutil 2.9.0.post0
python-docx 1.1.2
python-dotenv 1.0.1
python-multipart 0.0.9
pytz 2024.1
pyxnat 1.6.2
PyYAML 6.0.2
qdrant-client 1.11.1
RapidFuzz 3.10.0
rdflib 7.0.0
redis 4.6.0
referencing 0.35.1
regex 2024.7.24
replicate 0.25.2
requests 2.32.3
requests-oauthlib 1.3.1
requests-toolbelt 1.0.0
retrying 1.3.4
rpds-py 0.20.0
rsa 4.9
rstr 3.2.2
runpod 1.7.0
s3transfer 0.6.2
scipy 1.14.1
SecretStorage 3.3.3
sentry-sdk 1.45.1
setuptools 74.1.2
shellingham 1.5.4
simplejson 3.19.3
six 1.16.0
smart-open 7.0.4
sniffio 1.3.1
sortedcollections 2.1.0
sortedcontainers 2.4.0
soupsieve 2.6
sqladmin 0.13.0
SQLAlchemy 2.0.34
SQLAlchemy-Utils 0.41.2
sqlparse 0.5.1
sse-starlette 1.8.2
stability-sdk 0.8.6
starlette 0.27.0
stripe 5.6.0b2
sympy 1.13.2
tablib 3.5.0
taskiq 0.7.2
taskiq-aio-pika 0.4.1
taskiq-dependencies 1.5.3
taskiq-fastapi 0.2.0
tenacity 8.5.0
tiktoken 0.5.2
tokenizers 0.20.0
toml 0.10.2
tomli 2.0.1
tomlkit 0.13.2
tqdm 4.66.5
tqdm-loggable 0.2
traits 6.3.2
trove-classifiers 2024.10.11
typing_extensions 4.12.2
typing-inspect 0.9.0
tzdata 2024.1
ujson 5.10.0
uritemplate 4.1.1
urllib3 1.26.20
uvicorn 0.27.1
uvloop 0.20.0
vine 5.1.0
virtualenv 20.26.6
watchdog 5.0.2
watchfiles 0.24.0
wcwidth 0.2.13
webflow 1.2.0
websockets 13.0.1
wheel 0.44.0
wrapt 1.16.0
WTForms 3.1.2
xlrd 2.0.1
xlwt 1.3.0
yarl 1.9.11
yookassa 2.5.0
youtube-transcript-api 0.5.0
zipp 3.20.1