Autospam and Naive Bayes: The Grandfather of Spam Filters Still Making Waves

In the ever-evolving landscape of digital communication, where spam seems to adapt and find new ways to infiltrate our inboxes and social media feeds, it's fascinating to discover that one of the most enduring and effective spam filtering techniques traces its roots all the way back to the 1990s.

Meet the Naive Bayes classifier – a true legend in the realm of spam detection. While technology has advanced by leaps and bounds since its inception, this venerable algorithm continues to prove its worth as a stalwart guardian against the relentless tide of unwanted messages.

Join us on a journey as we delve into the timeless efficacy of Naive Bayes, unravel its inner workings, and explore how it still stands strong in the modern fight against spam on Pixelfed.

Picture of an information modal explaining why a post was unlisted

In a world where the fight against spam has grown increasingly complex, it's almost poetic that one of the oldest players in the game, the Naive Bayes classifier, remains an essential tool in the arsenal of spam detection. Born in the late 18th century as a probabilistic theorem and later adapted for machine learning applications, Naive Bayes gained prominence in the early days of the internet as a solution to the rising tide of unwanted emails flooding inboxes.

The concept behind Naive Bayes is elegantly simple: it calculates the probability that a given message is spam or not spam based on the presence of certain words in its content. What makes it "naive" is its assumption of word independence – it treats each word in a message as if it's unrelated to the others, which is a bit oversimplified but surprisingly effective. By examining the frequency of specific words in both spam and non-spam messages during a training phase, Naive Bayes builds a model that can then classify new messages accordingly.

Picture of Pixelfed notification of a post that was detected by the Autospam feature

While it might seem like a throwback to a simpler time, Naive Bayes possesses remarkable staying power due to its reliability and efficiency. In an era where machine learning models can become astonishingly intricate, the straightforward nature of Naive Bayes can be a breath of fresh air. It requires relatively less computational resources compared to its more complex counterparts, making it an attractive choice for applications where speed and simplicity are key.

Even as the digital world has transformed over the years, with social media platforms like Pixelfed becoming hubs for visual sharing and communication, the challenge of spam remains as relevant as ever. Pixelfed's ingenious implementation of the Naive Bayes classifier to combat spam is a testament to the algorithm's versatility. By analyzing the captions accompanying images, Pixelfed's spam filter can swiftly determine whether a post contains genuine content or is simply trying to clutter your feed with unwanted promotions or irrelevant information.

In a landscape where cutting-edge algorithms and artificial intelligence solutions often grab the spotlight, it's important to remember the foundational techniques that laid the groundwork for today's sophisticated technologies. The Naive Bayes classifier is a true pioneer in the field of spam detection, proving that sometimes, the simplest solutions can be the most effective.

In conclusion, as we marvel at the rapid progress of technology, it's refreshing to acknowledge the lasting impact of the Naive Bayes classifier in the realm of spam filtering. Its ability to adapt and stay relevant over the decades is a testament to its intrinsic value. Whether it's filtering out unwanted emails from the 90s or tackling modern challenges like image captions on social media platforms, Naive Bayes continues to remind us that the classics never truly go out of style. So, the next time you hit the 'mark as spam' button on Pixelfed, take a moment to appreciate the enduring legacy of an algorithm that's been defending our digital spaces for generations.

How to enable Autospam + Advanced Autospam

We made it super easy to get started and use.

Make sure you are running v0.11.8 or later
Navigate to the Admin Dashboard
Navigate to the Settings page
Check the Spam detection box and then press save (stop here if you only want classic detection, you probably want Advanced though)
Navigate to the Autospam page
Press the Enable Advanced Detection button
Press the Train Autospam tab on the Autospam page
Press the Train Spam button, then press the Train Non-Spam button

Congrats, you've successfully enabled Advanced Autospam detection!

How to configure Autospam email notifications

You can easily configure an email address to send Autospam detection notifications if you have properly configured mail delivery settings.

Make sure you are running v0.11.8 or later
Open your .env file in an editor and add the following lines:

INSTANCE_REPORTS_EMAIL_ADDRESSES=''
INSTANCE_REPORTS_EMAIL_ENABLED=true
INSTANCE_REPORTS_EMAIL_AUTOSPAM=true

Replace the INSTANCE_REPORTS_EMAIL_ADDRESSES='' with your email address like the example below

INSTANCE_REPORTS_EMAIL_ADDRESSES='admin@example.org'

Save the .env file
Then run the following CLI command to update your config:

php artisan config:cache && php artisan cache:clear

Congrats, you successfully setup email notifications for Autospam!

How to add custom tokens

The "custom token" feature allows users to personalize their spam detection in Pixelfed.

Users can define specific words or phrases as "spam" or "not spam" tokens. These tokens serve as personalized indicators for the system to identify content that matches the user's preferences.

This feature empowers users to take an active role in fine-tuning their spam filter, tailoring it to their unique needs and enhancing the accuracy of content classification on the platform.

Make sure you are running v0.11.8 or later
Navigate to the Admin Dashboard
Navigate to the Autospam page
Press the Manage Tokens tab on the Autospam page
Press the Create New Token button
Define the token in the token input
Set an optional weight (defines precidence, safe to leave set to default value)
Set the category, either spam or not spam
Set an optional note to explain for future reference (never shown to users)
Make sure the active checkbox is checked
Press Save

Congrats, you successfully trained Autospam with a custom token!

How to import/export training data

The Autospam Import/Export feature in Pixelfed enables users to transfer their training data, which helps improve the accuracy of the spam detection system. Users can export their training data to save or share with others.

However, it's crucial to exercise caution when sharing this data, as in the hands of spammers, it can potentially make the spam filter less effective.

By safeguarding the training data and being mindful of who it's shared with, users can help maintain the integrity of the spam detection mechanism and its ability to accurately differentiate between genuine and spam content.

Make sure you are running v0.11.8 or later
Navigate to the Admin Dashboard
Navigate to the Autospam page
Press the Import/Export tab on the Autospam page
Press either the Upload Import or Download Export button
If you are importing training data, follow the instructions