Create a Telegram Bot to get info from PyPi Server: Part I
In article:
BigQuery, pyTelegramBotAPI(telebot), Python 3.7, Poetry, Telegram(https://telegram.org/)
Link to the source code at the end of the article.
intro
Hi, I love investigating new tools and taking a look at different libraries that are created everyday in the Python community.
I thought that it would be cool to create some Telegram Bot that would take care of new information for me on an everyday basis.
Also last days pypistats.org did not feel well (stats was not updated several days) so I also want a tool that will quickly provide me info about package stats — downloads for last day, week, month and etc.
That I want to get as the end as the result of first task - I want to have a bot, that will take from me my interests, like themes (for example, data science, tests, web development, etc) and will send to me every day 5 new packages, that he did not sent to me previously. And as information I want to see: Description, Authors, Homepage of Package. For second task — I want to send to bot command ‘/stats’ and package name and it must answer with number of downloads for package.
In this Part I we will simple Telegram Bot sync service to work with API and obtain PyPi stats and in next Parts I will improve it.
Let’s start from a simple version that will just publish short info about 5 random packages from PyPi.
To obtain information about packages we will use Public PyPI BigQuery dataset — https://packaging.python.org/guides/analyzing-pypi-package-downloads/#getting-set-up
Create new project
Let’s init new project with Poetry (https://python-poetry.org/). PS: $ mean run command in console:
$ poetry new pypi_observer_bot
Enter our new project:
$ cd pypi_observer_bot
Next add dependency, at the start I will use sync https://github.com/eternnoir/pyTelegramBotAPI to create Bot Server on Python:
$ poetry add pyTelegramBotAPI
Create Telegram bot & get authorization token
When before start develop the code, you need to create bot in Telegram with BotFather https://core.telegram.org/bots#6-botfather and when you get your authorization token, we can continue.
I got the token:
And we can go ahead.
Create bot.py
Let’s first create simple example as described in https://github.com/eternnoir/pyTelegramBotAPI#a-simple-echo-bot and test that all works well.
I created ‘bot.py’ file with content:
import telebot
bot = telebot.TeleBot("mytoken", parse_mode=None)
@bot.message_handler(commands=['start', 'help'])
def send_welcome(message):
bot.reply_to(message, "Howdy, how are you doing?")
@bot.message_handler(func=lambda m: True)
def echo_all(message):
bot.reply_to(message, message.text)
bot.polling()
Now run bot.py:
$ python bot.py
And open chat with your bot (by username that you choose when you create bot with Bot Father). Check that all works well:
Nice. Move on.
Send message to Telegram Chat without User Actions
Now I want Bot to send me messages without any actions from my side. For this, I need to use method
bot.send_message(chat_id=5421727806, text=”hi, I’m a message from update”)
How to get get Telegram chat id
But to use it, I need to get chat_id — unique id of chat between me (or another user) and Bot.
To do this we need just get chat id from chat, in a method that catches the first action ‘start’ (by the way this action always used to start talking with Bot, so this is like the main entrypoint to start dialog with your Telegram Bot).
Let modify our send_welcome(message) to print chat.id, also, let’s print message.chat.__dict__ and message.__dict__ to see that else information we can obtain from the Message Object:
@bot.message_handler(commands=['start', 'help'])
def send_welcome(message):
print(message.chat.id)
print(message.chat.__dict__)
print(message.__dict__)
bot.reply_to(message, "Howdy, how are you doing?")
And restart bot.py:
$ python bot.py
Now in the console you can see the chat id and use it to send messages directly to your dialog with bot. Send ‘/start‘ message to bot and in console you will see something like this:
5421727806{'id': 5421727806, 'type': 'private', 'title': None, 'username': 'xnuinside', 'first_name': 'Iuliia', 'last_name': 'Volkova', 'all_members_are_administrators': None, 'photo': None, 'description': None, 'invite_link': None, 'pinned_message': None, 'permissions': None, 'slow_mode_delay': None, 'sticker_set_name': None, 'can_set_sticker_set': None}{'content_type': 'text', 'message_id': 24, 'from_user': <telebot.types.User object at 0x10d327b90>, 'date': 1595680726, 'chat': <telebot.types.Chat object at 0x10d327f50>, 'forward_from': None, 'forward_from_chat': None, ... , 'json': {'message_id': 24, 'from':{'id': 5421727806, 'is_bot': False, 'first_name': 'Iuliia', 'last_name': 'Volkova', 'username': 'xnuinside', 'language_code': 'en'}, 'chat': {'id': 5421727806, 'first_name': 'Iuliia', 'last_name': 'Volkova', 'username': 'xnuinside', 'type': 'private'}, 'date': 1595680726, 'text': '/start', 'entities': [{'offset': 0, 'length': 6, 'type': 'bot_command'}]}}
A lot of information, you can investigate it later to check, that else can be useful for you.
Okay, now we want to send messages to chat without any actions from the user, but we use bot.polling() in our bot.py and this means that this script is used for waiting actions from users.
Let’s rename ‘bot.py’ to ‘listner.py’ and we will use it for all logic that must be react to user requests.
Create the informer.py
And let’s create ‘informer.py’ that will contain logic with sending messages to users with our PyPi packages everyday.
In informer.py let’s add a test action with sending some messages to our chat. Use the chat id that you extracted from your message in the previous step.
import telebotbot = telebot.TeleBot("api_token", parse_mode=None)bot.send_message(chat_id=547123227806, text="hi, I'm a message from informer")
Now run script:
$ python informer.py
And check your chat with bot:
Great, all works.
Now we need to implement logic that will query PyPi BigQuery Dataset and:
1) will send 5 random packages from it
2) Information about how many packages was downloaded from PyPi on the last day (distinct packages that have at least one download).
Let’s start from the simplest task — number 2.
PyPi Dataset is partitioned by date and query is looks like:
SELECT count(distinct(file.project)) as packages_number FROM `the-psf.pypi.downloads20200724`;
Where 20200724 is a date.
You can enter BigQuery https://console.cloud.google.com/bigquery?p=the-psf&d=pypi&page=dataset and test it.
To query BigQuery from Python we will use google-cloud-bigquery, let’s add it to the project:
$ poetry add google-cloud-bigquery
To start work with BigQuery we also need to set path to credentials that must be used for authentication, let set GOOGLE_APPLICATION_CREDENTIALS variable with path, for me it will be
import telebot
import os
from google.cloud import bigquery
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "../pypi_observer_gcp_key.json"
If you don’t have a service account file check — https://cloud.google.com/docs/authentication/getting-started to get an access file.
Now let’s add function to call our query:
client = bigquery.Client()
def bq_get_unique_packages_downloaded_for_yesterday():
query_job = client.query(
f"SELECT count(distinct(file.project)) as packages_number FROM "
f"`the-psf.pypi.downloads{(datetime.now().date() - timedelta(days=1)).isoformat().replace('-', '')}`;")
results = query_job.result()
results = [row for row in results]
return results[-1].packages_number
(datetime.now().date() — timedelta(days=1)).isoformat().replace(‘-’, ‘’) — returns yesterday date in the format 20200724
And change bot.send_message to send information in chat:
bot.send_message(chat_id=547123227806,
text=f"Total unique packages from PyPi, that was downloaded yesterday: "
f"{bq_get_unique_packages_downloaded_for_yesterday()}")
Run informer:
$ python informer.py
And check chat with bot:
Cool.
Now, let’s create a Query that gets a random package from the downloaded package of the day.
This query will return random names of 7–20 packages:
SELECT distinct(file.project) as package_name FROM `the-psf.pypi.downloads20200724` WHERE RAND() < 10/164656895;
We will use this query only to get package id and after we will call PyPi API endpoint:
https://pypi.org/pypi/{package_id}/json
To obtain full information about package:
Use, for example, GET https://pypi.org/pypi/pyyaml/json to get sample of PyPi API answer:
Great, next create logic:
- call query with list of random packages
- Chose on package from the list and send it to chat with some time delta (we don’t want to get all 5 packages one-by-one, but I want to send when during the day — at morning, on the lunch time, at the evening)
- Call PyPi API to get more information about package
Call a query with a list of random packages
Lets add to imports:
from random import choice
And create function to call query and return random package name:
def bq_get_random_packages_downloaded_for_yesterday():query_job = client.query(f"SELECT distinct(file.project) as package_name FROM "
f"`the-psf.pypi.downloads{(datetime.now().date()-timedelta(days=1)
).isoformat().replace('-', '')}` "
f"WHERE RAND() < 10/164656895;")results = query_job.result()
results = [row.package_name for row in results]return results
And let’s test it with adding one more send message with random package name to Bot:
bot.send_message(chat_id=547123227806,
text=f"Random Package from PyPi: \n"
f"{choice(bq_get_random_packages_downloaded_for_yesterday())}")
Run script again and check it:
Great. Let’s modify little bit message to attach also url to package page on pypi.org: https://pypi.org/project/{package_id}/
bot.send_message(chat_id=54727806,text=f"Random Package from PyPi: \n"
f"https://pypi.org/project/{choice(bq_get_random_packages_downloaded_for_yesterday())}/")
And now I got:
Chose on package from the list and send it to chat with some timedelta
In primitive way we can do something like this:
from time import sleep # add this line to importsdate_ = datetime.now().date()while True:
if date_ <= datetime.now().date():
for i in range(5):
bot.send_message(chat_id=54727806,
text=f"Random Package from PyPi: \n"
f"https://pypi.org/project/{choice(bq_get_random_packages_downloaded_for_yesterday())}/") sleep(3)
if i == 4:
date_ = datetime.now().date() + timedelta(days=1)
What am I doing here? I set variable date_ at the start equal to current date and with timeout in 3 second I send a new message with a new package to telegram.
And if I send already 5 times — when we change date_ for tomorrow and new Packages will be sent only tomorrow.
“while True” means infinity until our script will be killed from outside or stopped manually.
We will leave this part of code this way and will add normal implementation tasks with schedule later.
Populate data about package from PyPI
We will use requests to make API call
$ poetry add requests
Doc about text formatting in Telegram: https://core.telegram.org/api/entities
And let change our bot send message, we need also to add parse_mode=’html’ to make our text formatting works in message:
Rerun informer.py and check the result:
Not ideal, but I hope you got the idea.
Add ‘/stats’ command to listener.py
Now let’s return to our ‘listener.py’ file and add a handler to ‘/stats’ command, we will extract package name from text message and request stats for 3 last days from PyPi.
1st create method that will query info from BigQuery (in listner.py):
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "../path/to/your/key"client = bigquery.Client()def bq_get_downloads_stats_for_package(package_name, date_):
query_job = client.query(f"SELECT count(timestamp) as downloads FROM `the-psf.pypi.downloads{date_}` "
f"WHERE file.project=\'{package_name}\'")
results = query_job.result()
result = [row for row in results][-1]
return result.downloads
2nd add a handler for bot ‘/stats’ command:
Re-run listener:
$ python listener.py
Check results:
Cool, now you can get info about downloads of your python package from PyPi stats.
Source code you can find here: https://github.com/xnuinside/pypi_observer_bot/tree/v0.0.1
In the next Part: move ‘informer’ and “listener” to asynchronous rails, adding saving information about Users (to avoid hardcoding chat_id) in DB and other improvements.