Limit the number of allowed entities/mentions in a message #222

BoberMod · 2025-01-10T17:41:35Z

I faced a new spam type when the first message contained mass-mention of chat members and it was edited to ad-message immediately after that. Telegram keeps mention notifications even after message is edited, so users go to chat from notification and see ad message.

The classifier doesn't detect such messages as spam, because they contain a lot of random text (usernames) even if the original message contains spam too:

Note: Message without mentions is added to spam samples, and has 99% detection.

Screenshot of how it looks

Each mention in the message is counted as a separate entity of the type mention. I suggest a feature request to allow limiting the number of entities by type or specifically restricting mention entities.

I think it would be useful to block/limit any entity type because spam also contains telegram cashtags ($USD) and hashtags.

Message JSON from Telegram API

  {
   "update_id": 936949643,
   "message": {
    "message_id": 1881484,
    "from": {
     "id": 155807040,
     "is_bot": false,
     "first_name": "BoberMod",
     "username": "BoberMod",
     "language_code": "ru",
     "is_premium": true
    },
    "chat": {
     "id": 155807040,
     "first_name": "BoberMod",
     "username": "BoberMod",
     "type": "private"
    },
    "date": 1736530089,
    "text": "Ищу энергичных партнёрοв для запуска прибыльных криптοпρоектов. У вас есть желание заρабатывать пассивнο? Ηапишите мне!\n\nИщу энергичных партнёрοв для запуска прибыльных криптοпρоектов. У вас есть желание заρабатывать пассивнο? Ηапишите мне!  @vladimiir49 @AliLit6062 @j_evgenyyyy @kravtsov_dya @Charger69 @Militant_Hamster @Bearded_alex @deshik80 @vovazlv @melkosofter @Vyacheslav_Voznyy0 @andyvers @Leonid_ur5yar @Andrey_911_psg @S_bobo",
    "entities": [
     {
      "offset": 242,
      "length": 12,
      "type": "mention"
     },
     {
      "offset": 255,
      "length": 11,
      "type": "mention"
     },
     {
      "offset": 267,
      "length": 12,
      "type": "mention"
     },
     {
      "offset": 280,
      "length": 13,
      "type": "mention"
     },
     {
      "offset": 294,
      "length": 10,
      "type": "mention"
     },
     {
      "offset": 305,
      "length": 17,
      "type": "mention"
     },
     {
      "offset": 323,
      "length": 13,
      "type": "mention"
     },
     {
      "offset": 337,
      "length": 9,
      "type": "mention"
     },
     {
      "offset": 347,
      "length": 8,
      "type": "mention"
     },
     {
      "offset": 356,
      "length": 12,
      "type": "mention"
     },
     {
      "offset": 369,
      "length": 19,
      "type": "mention"
     },
     {
      "offset": 389,
      "length": 9,
      "type": "mention"
     },
     {
      "offset": 399,
      "length": 14,
      "type": "mention"
     },
     {
      "offset": 414,
      "length": 15,
      "type": "mention"
     },
     {
      "offset": 430,
      "length": 7,
      "type": "mention"
     }
    ]
   }
  }

Check for entities also allows to update/unify LinksCheck function, because each URL in the message is also an entity of url type.

tg-spam/lib/tgspam/metachecks.go

Lines 17 to 32 in 60b5c3b

    
           func LinksCheck(limit int) MetaCheck { 
        
           	return func(req spamcheck.Request) spamcheck.Response { 
        
           		links := req.Meta.Links 
        
           		if links == 0 { 
        
           			links = strings.Count(req.Msg, "http://") + strings.Count(req.Msg, "https://") 
        
           		} 
        
           		if links > limit { 
        
           			return spamcheck.Response{ 
        
           				Name:    "links", 
        
           				Spam:    true, 
        
           				Details: fmt.Sprintf("too many links %d/%d", links, limit), 
        
           			} 
        
           		} 
        
           		return spamcheck.Response{Spam: false, Name: "links", Details: fmt.Sprintf("links %d/%d", links, limit)} 
        
           	} 
        
           }

Bot API documentation: https://core.telegram.org/bots/api#messageentity

The text was updated successfully, but these errors were encountered:

umputun · 2025-01-10T20:57:02Z

I like the idea of this new checker

BoberMod · 2025-01-11T01:18:44Z

I'll try to implement it myself and submit the PR.

umputun · 2025-01-11T07:41:07Z

I'll try to implement it myself and submit the PR.

Cool. I don't think we want to reimplement LinksCheck because currently it is a part of a library that does the job on any text, not just on TG meta info. It seems to work and feels like a more universal approach to me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit the number of allowed entities/mentions in a message #222

Limit the number of allowed entities/mentions in a message #222

BoberMod commented Jan 10, 2025 •

edited

Loading

umputun commented Jan 10, 2025

BoberMod commented Jan 11, 2025

umputun commented Jan 11, 2025

Limit the number of allowed entities/mentions in a message #222

Limit the number of allowed entities/mentions in a message #222

Comments

BoberMod commented Jan 10, 2025 • edited Loading

umputun commented Jan 10, 2025

BoberMod commented Jan 11, 2025

umputun commented Jan 11, 2025

BoberMod commented Jan 10, 2025 •

edited

Loading