Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit the number of allowed entities/mentions in a message #222

Open
BoberMod opened this issue Jan 10, 2025 · 3 comments
Open

Limit the number of allowed entities/mentions in a message #222

BoberMod opened this issue Jan 10, 2025 · 3 comments

Comments

@BoberMod
Copy link

BoberMod commented Jan 10, 2025

I faced a new spam type when the first message contained mass-mention of chat members and it was edited to ad-message immediately after that. Telegram keeps mention notifications even after message is edited, so users go to chat from notification and see ad message.

image

The classifier doesn't detect such messages as spam, because they contain a lot of random text (usernames) even if the original message contains spam too:

Note: Message without mentions is added to spam samples, and has 99% detection.

Screenshot of how it looks

image

Each mention in the message is counted as a separate entity of the type mention. I suggest a feature request to allow limiting the number of entities by type or specifically restricting mention entities.

I think it would be useful to block/limit any entity type because spam also contains telegram cashtags ($USD) and hashtags.

Message JSON from Telegram API
  {
   "update_id": 936949643,
   "message": {
    "message_id": 1881484,
    "from": {
     "id": 155807040,
     "is_bot": false,
     "first_name": "BoberMod",
     "username": "BoberMod",
     "language_code": "ru",
     "is_premium": true
    },
    "chat": {
     "id": 155807040,
     "first_name": "BoberMod",
     "username": "BoberMod",
     "type": "private"
    },
    "date": 1736530089,
    "text": "Ищу энергичных партнёрοв для запуска прибыльных криптοпρоектов. У вас есть желание заρабатывать пассивнο? Ηапишите мне!\n\nИщу энергичных партнёрοв для запуска прибыльных криптοпρоектов. У вас есть желание заρабатывать пассивнο? Ηапишите мне!  @vladimiir49 @AliLit6062 @j_evgenyyyy @kravtsov_dya @Charger69 @Militant_Hamster @Bearded_alex @deshik80 @vovazlv @melkosofter @Vyacheslav_Voznyy0 @andyvers @Leonid_ur5yar @Andrey_911_psg @S_bobo",
    "entities": [
     {
      "offset": 242,
      "length": 12,
      "type": "mention"
     },
     {
      "offset": 255,
      "length": 11,
      "type": "mention"
     },
     {
      "offset": 267,
      "length": 12,
      "type": "mention"
     },
     {
      "offset": 280,
      "length": 13,
      "type": "mention"
     },
     {
      "offset": 294,
      "length": 10,
      "type": "mention"
     },
     {
      "offset": 305,
      "length": 17,
      "type": "mention"
     },
     {
      "offset": 323,
      "length": 13,
      "type": "mention"
     },
     {
      "offset": 337,
      "length": 9,
      "type": "mention"
     },
     {
      "offset": 347,
      "length": 8,
      "type": "mention"
     },
     {
      "offset": 356,
      "length": 12,
      "type": "mention"
     },
     {
      "offset": 369,
      "length": 19,
      "type": "mention"
     },
     {
      "offset": 389,
      "length": 9,
      "type": "mention"
     },
     {
      "offset": 399,
      "length": 14,
      "type": "mention"
     },
     {
      "offset": 414,
      "length": 15,
      "type": "mention"
     },
     {
      "offset": 430,
      "length": 7,
      "type": "mention"
     }
    ]
   }
  }

Check for entities also allows to update/unify LinksCheck function, because each URL in the message is also an entity of url type.

func LinksCheck(limit int) MetaCheck {
return func(req spamcheck.Request) spamcheck.Response {
links := req.Meta.Links
if links == 0 {
links = strings.Count(req.Msg, "http://") + strings.Count(req.Msg, "https://")
}
if links > limit {
return spamcheck.Response{
Name: "links",
Spam: true,
Details: fmt.Sprintf("too many links %d/%d", links, limit),
}
}
return spamcheck.Response{Spam: false, Name: "links", Details: fmt.Sprintf("links %d/%d", links, limit)}
}
}

Bot API documentation: https://core.telegram.org/bots/api#messageentity

@umputun
Copy link
Owner

umputun commented Jan 10, 2025

I like the idea of this new checker

@BoberMod
Copy link
Author

I'll try to implement it myself and submit the PR.

@umputun
Copy link
Owner

umputun commented Jan 11, 2025

I'll try to implement it myself and submit the PR.

Cool. I don't think we want to reimplement LinksCheck because currently it is a part of a library that does the job on any text, not just on TG meta info. It seems to work and feels like a more universal approach to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants