Good evening to everyone , I hope you are doing good .
This post to tell you notify you the major changes that I made to the code in order to prevent copy pasted comments , very similar comments to be identified .
Before going into the code , I will like to mention the number of ways I am trying to raise the income for Engagement project.
All the tokens earned goes to the Engagement Project itself.
SPORTS
Currently @amr008.sports earns through -
- Curation
- Authoring " SPORTS TODAY " discussion thread daily
From today -
- Daily actifit post .
CTP
Currently @amr008.ctp earns through -
- Curation
- Authoring " Curation Report of Engagement Project " daily .
STEM and LEO
- Curation alone .
In near future -
- All the Hive earned from @amr008.sports and @amr008.ctp will be used to buy LEO and STEM to give a boost to engagement project.
If you have any suggestion what more can be added , please let me know in the comments .
Python code update - using FuzzyWuzzy - to prevent spammers be in top 25.
The main intention of the Engagement project is to reward those who are engaging with other and putting effort through engagement . It would be unfair if I just let some people who copy paste number of comment and surpass genuine users efforts .
Fuzzy Wuzzy
Fuzzy Wuzzy is a library which let's us find the similarity between two strings ( in laymen terms - two sentences ) . I have used this in my code to find similar comments -
How does it work? Example
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
s1="Thanks"
s2="Thanks buddy"
s3="Thanks a lot buddy"
compare=process.extract(s1,[s2,s3],scorer=fuzz.token_set_ratio)
print(compare)
Now I am comparing the first string s1 with s2 and s3 . The output is -
O/P = [('Thanks buddy', 100), ('Thanks a lot buddy', 100)]
This means , the s1 is actually in s2 and s3 . There is a 100% similarity between s1 , s2 and s3.
Example 2-
s1="Thanks a lot "
s2="Thanks buddy"
s3="Thanks a lot buddy"
compare=process.extract(s1,[s2,s3],scorer=fuzz.token_set_ratio)
O/P = s1 is - [('Thanks a lot buddy', 100), ('Thanks buddy', 67)]
This means s1 which is "Thanks a lot" is 100% similar to "Thanks a lot buddy " and 67% similar to "Thanks buddy" .
Let's take some real examples now -
@thatgermandude came forward and told me that he runs a lottery and talk to various authors with similar comments so he is getting an unfair advantage over others although the comments are similar .
His latest comments -
Thank you for participating! Today you had no luck...
Maybe you will do better in my next Not-a-Lottery
Thank you for participating! Today you had no luck...
Maybe you will do better in my next Not-a-Lottery
Thank you for participating! Today you had no luck...
Maybe you will do better in my next Not-a-Lottery
Thank you for participating! Today you had no luck...
Maybe you will do better in my next Not-a-Lottery
Thank you for participating! Today you had no luck...
Maybe you will do better in my next Not-a-Lottery
Thank you for participating! Today you had no luck...
Maybe you will do better in my next Not-a-Lottery
Thank you for participating! Today you had no luck...
Maybe you will do better in my next Not-a-Lottery
Are you sure you have the right LEO Token? I am talking about the LeoFinance Token on hive.
Thank you for participating! Today you had no luck...
Maybe you will do better in my next Not-a-Lottery
Thank you for participating! Today you had no luck...
Maybe you will do better in my next Not-a-Lottery
Thank you for participating! Today you had no luck...
Maybe you will do better in my next Not-a-Lottery
Thank you for participating! Today you had no luck...
Maybe you will do better in my next Not-a-Lottery
Thank you for participating! Today you had no luck...
Maybe you will do better in my next Not-a-Lottery
Thank you for participating! Today you had no luck...
Maybe you will do better in my next Not-a-Lottery
Thank you for participating! Today you had no luck...
Maybe you will do better in my next Not-a-Lottery
Thank you for participating! Today you had no luck...
Maybe you will do better in my next Not-a-Lottery
So when I applied the string comparison and ran test -
Observe the quality points which determines the rank , here it is - 0.0239
Without applying string comparison
He was actually in the top 25 today with comment quality of 2.2218 ( because of length of comment , number of comments , people talked to is high )
I would like to apologize to @thatgermandude for using his example here but its only because he was so honest and voluntarily came forward and told me about this I am using his example .
Prevents copy pasted comments , highly similar comments to get upvotes .
If you look at the above example @thatgermandude was at 18th rank but after I implemented the comment comparison , he is now in 163rd rank .
So if in future someone decides to take advantage by using a bot ? It will be very difficult to rank higher without manually answering the comments .
Which do you consider as similar comments ?
- I will compare your 1st comment with all other comments - if any of the other comments returns 60% or above match - it will be considered as similar comment and the count for similar comment goes up .
- Then I will move to 2nd comment and compare with 3rd to rest of the comments and similar process as 1st step until all the comments are done .
I arrived at 60% by manually checking a lot of various strings and taking samples out of real users comments .
Spam Alerts
I have also set alert in the code now using this logic -
- If a user has made 50% or more comments which are very similar to each other , the code will show up the name to me .
This is how I found out -
The code told me that @erarium has 100% similar comments and it was actually true -
you can check the latest comments here and see -
https://leofinance.io/@erarium/comments
Is this intentional ? Absolutely not , it is a curation project just like mine . It is not their job to tell me they will post same comments , it is my job to figure it out .
Ranking of @erarium before this code implementation - 11
Ranking of @erarium after this code implementation - 160
I wanted to do this for very long and I am very happy I got this working to some extent . This code will be used from tomorrow to rank and curate . This doesn't mean all the other factors don't matter anymore - ofcourse they do . Everything has its own weightage . It just became harder for non-quality comments to be at the top.
@abh12345 and @crokkon . What do you feel about this?
Regards,
MR.
Posted with STEMGeeks