Ways of income for Engagement Project + Major code update ( with explanation ) to check the comment quality

over 4 years ago

Good evening to everyone , I hope you are doing good .

This post to tell you notify you the major changes that I made to the code in order to prevent copy pasted comments , very similar comments to be identified .

Before going into the code , I will like to mention the number of ways I am trying to raise the income for Engagement project.

All the tokens earned goes to the Engagement Project itself.

SPORTS

Currently @amr008.sports earns through -

Curation
Authoring " SPORTS TODAY " discussion thread daily

From today -

Daily actifit post .

CTP

Currently @amr008.ctp earns through -

Curation
Authoring " Curation Report of Engagement Project " daily .

STEM and LEO

Curation alone .

In near future -

All the Hive earned from @amr008.sports and @amr008.ctp will be used to buy LEO and STEM to give a boost to engagement project.

If you have any suggestion what more can be added , please let me know in the comments .

Python code update - using FuzzyWuzzy - to prevent spammers be in top 25.

The main intention of the Engagement project is to reward those who are engaging with other and putting effort through engagement . It would be unfair if I just let some people who copy paste number of comment and surpass genuine users efforts .

Fuzzy Wuzzy

Fuzzy Wuzzy is a library which let's us find the similarity between two strings ( in laymen terms - two sentences ) . I have used this in my code to find similar comments -

How does it work? Example

from fuzzywuzzy import fuzz
from fuzzywuzzy import process


from fuzzywuzzy import fuzz
from fuzzywuzzy import process


s1="Thanks"
s2="Thanks buddy"
s3="Thanks a lot buddy"

compare=process.extract(s1,[s2,s3],scorer=fuzz.token_set_ratio)
print(compare)

Now I am comparing the first string s1 with s2 and s3 . The output is -

O/P = [('Thanks buddy', 100), ('Thanks a lot buddy', 100)]

This means , the s1 is actually in s2 and s3 . There is a 100% similarity between s1 , s2 and s3.

Example 2-

s1="Thanks a lot "
s2="Thanks buddy"
s3="Thanks a lot buddy"

compare=process.extract(s1,[s2,s3],scorer=fuzz.token_set_ratio)

O/P = s1 is - [('Thanks a lot buddy', 100), ('Thanks buddy', 67)]

This means s1 which is "Thanks a lot" is 100% similar to "Thanks a lot buddy " and 67% similar to "Thanks buddy" .

Let's take some real examples now -

@thatgermandude came forward and told me that he runs a lottery and talk to various authors with similar comments so he is getting an unfair advantage over others although the comments are similar .

His latest comments -

Thank you for participating! Today you had no luck...
Maybe you will do better in my next Not-a-Lottery
Thank you for participating! Today you had no luck...
Maybe you will do better in my next Not-a-Lottery
Thank you for participating! Today you had no luck...
Maybe you will do better in my next Not-a-Lottery
Thank you for participating! Today you had no luck...
Maybe you will do better in my next Not-a-Lottery
Thank you for participating! Today you had no luck...
Maybe you will do better in my next Not-a-Lottery
Thank you for participating! Today you had no luck...
Maybe you will do better in my next Not-a-Lottery
Thank you for participating! Today you had no luck...
Maybe you will do better in my next Not-a-Lottery
Are you sure you have the right LEO Token? I am talking about the LeoFinance Token on hive.
Thank you for participating! Today you had no luck...
Maybe you will do better in my next Not-a-Lottery
Thank you for participating! Today you had no luck...
Maybe you will do better in my next Not-a-Lottery
Thank you for participating! Today you had no luck...
Maybe you will do better in my next Not-a-Lottery
Thank you for participating! Today you had no luck...
Maybe you will do better in my next Not-a-Lottery
Thank you for participating! Today you had no luck...
Maybe you will do better in my next Not-a-Lottery
Thank you for participating! Today you had no luck...
Maybe you will do better in my next Not-a-Lottery
Thank you for participating! Today you had no luck...
Maybe you will do better in my next Not-a-Lottery
Thank you for participating! Today you had no luck...
Maybe you will do better in my next Not-a-Lottery

So when I applied the string comparison and ran test -

Observe the quality points which determines the rank , here it is - 0.0239

Without applying string comparison

He was actually in the top 25 today with comment quality of 2.2218 ( because of length of comment , number of comments , people talked to is high )

I would like to apologize to @thatgermandude for using his example here but its only because he was so honest and voluntarily came forward and told me about this I am using his example .

Prevents copy pasted comments , highly similar comments to get upvotes .

If you look at the above example @thatgermandude was at 18th rank but after I implemented the comment comparison , he is now in 163rd rank .

So if in future someone decides to take advantage by using a bot ? It will be very difficult to rank higher without manually answering the comments .

Which do you consider as similar comments ?

I will compare your 1st comment with all other comments - if any of the other comments returns 60% or above match - it will be considered as similar comment and the count for similar comment goes up .
Then I will move to 2nd comment and compare with 3rd to rest of the comments and similar process as 1st step until all the comments are done .

I arrived at 60% by manually checking a lot of various strings and taking samples out of real users comments .

Spam Alerts

I have also set alert in the code now using this logic -

If a user has made 50% or more comments which are very similar to each other , the code will show up the name to me .

This is how I found out -

The code told me that @erarium has 100% similar comments and it was actually true -

you can check the latest comments here and see -

https://leofinance.io/@erarium/comments

Is this intentional ? Absolutely not , it is a curation project just like mine . It is not their job to tell me they will post same comments , it is my job to figure it out .