Default
Google

« ASM85 | Main | I am Blogger »

December 31, 2002

A Different Approach Against SPAM

Present day solutions to stop spam work by analyzing headers and message text or classifying the mail based on history. We propose a solution which exploits the weakness of computers to perform natural language processing in order to increase the algorithmic cost of automated mailing. The system can be introduced in stages while maintaining backward compatibility with existing mail protocols and the implementation and modification requirements are minimal.

Download PDF



Abstract. Present day solutions to stop spam work by analyzing headers and message text or classifying the mail based on history. We propose a solution which exploits the weakness of computers to perform natural language processing in order to increase the algorithmic cost of automated mailing. The system can be introduced in stages while maintaining backward compatibility with existing mail protocols and the implementation and modification requirements are minimal.

1. INTRODUCTION
SPAM isn’t just ‘a problem’ but may most likely become the email killer if we cannot stop it. People have to spend valuable time sifting through large volumes of junk mail, and some have even stopped taking electronic mail seriously. Net congestion has increased as mail servers have to constantly shuttle unwanted email across the Internet. The root of the problem lies in the fact that email is cheap. Spammers rely on the cheapness of harvesting email addresses by spidering web sites and mail archives. Present day email filters introduce processing costs on the receiver (even on the filter designer) while analyzing and automatically discarding junk mail. For a negligible price and effort they can be easily circumvented by spammers by shifting to a variation that evades the spam detection mechanism. The asymmetric cost means that the general public always loses.

The solution we present can be implemented by upgrading existing SMTP or adding a filter plug at the mail user agent. The amount of modification required is minimal and existing SMTP servers can continue to be used for relaying. Only the receiver and sender need to install the feature. Section 2 introduces the idea used to filter spam, illustrating the concept with an example. Section 3 proposes new commands/operations that are required to be implemented to create a filtering server. Section 4 discusses some complications that may arise with the new system.

2. CONCEPT
The filtering mechanism works on the principle of a shared keyword between the sender and receiver of the mail being present in the communication. If the shared keyword is absent then the mail is discarded as it is considered to be spam. The main problem lies in the dissemination of the shared keyword to the public without making it trivial for a computer program to scan and locate the word from the user profile. The dissemination method should not however inconvenience legitimate email senders by asking them to perform a word hunt inside some abstruse mail filled with noise.

To increase the cost of spamming, we need an algorithmically expensive operation for a computer but this operation should be trivial for a human. Natural language is ideally suited for the purpose. Since viable natural language processors have yet to be created and the cost of modifying the shared keyword is trivial, the investment in natural language as a barrier is safe.

Let a receiver email ID be associated with two fields,
AccessString – a string message
AccessCode – the shared keyword, another string message

* The AccessCode is the shared keyword. The receiver will discard emails which do not contain the shared keyword. However, the shared keyword cannot be directly obtained and is kept securely.

* The AccessString is a specially constructed string by the owner of the mail ID which describes a method to obtain the AccessCode from the AccessString. This AccessString is public and can be queried by the public in order to find out the AccessCode.

For e.g.
AccessString : The first word is the access code.
AccessCode : The

AccessString : Reverse the third word to get my code.
AccessCode : driht

We will now illustrate how Alice can send an email to Bob using this system. We shall call our filtering server as SAGE henceforth.

- Alice queries Bob’s AccessString from Bob’s SAGE server and obtains the string, “It is the word in ALL capitals”

- Alice enters the AccessCode as “ALL” in her mail user agent.

- In case Alice does not have a SAGE server for sending mail, MUA can construct a special header containing a flag identifying that SAGE is being used, the AccessCode along with checksum which is encrypted using the AccessCode and pass it on to the SMTP server, just like normal email. Alternatively the MUA can notify the SAGE server that an SAGE message has to be constructed which the server will automatically generate given the AccessCode.

- The message is now be relayed using existing SMTP servers or SAGE servers until it reaches the destination.

- At Bob’s SAGE server, the message is received and the server performs a
look up to find out if Bob has set an AccessCode. If it is set, the header is decrypted using the AccessCode, the checksum verified and the header removed. The mail is now passed on to the LDA.

NOTE: In case Alice is using an SAGE server, then the server will immediately verify that the AccessCode is correct and discard the mail if not. This relieves net congestion.



3. COMMANDS/OPERATIONS (SAGE server)
a) QUER <userid>
Query the system to find out the AccessString of . Indicate if userid doesn’t exist or has not set AccessCode.

b) MTCH <accessCode>,<userid>
Reports whether userid has accessCode as his/her accessCode string.

c) USER <userid>,<user pass>
Logs in a user of that SAGE using user pass (different from accessCode).

d) MODI <AccessString>, <AccessCode>
Modifies the AccessString/AccessCode of the logged in user.

e) RACS <accessCode>
Indicates to the server that message that follows should have additional SAGE header and accessCode as provided included in the mail message.

f) On receipt of a message destined for the current system, the userid is checked to find out if an accessCode is set. If so, then header is removed and the message verified and passed on to the LDA. Otherwise the message is discarded, an appropriate error message is sent back.
Rest of the functionality is as provided by a SMTP server. The message can be passed on inside the DATA segment of a mail.

4. NOTES

* User wants to subscribe to an automated mailing list.
The accessCode is provided to the list owner to allow the mailer program to send mails. The accessCode should be kept securely at the list owner end.

* Spammers find out the accessCode.
The mail ID owner changes his/her accessString/accessCode. All mail sent using the old accessCode is discarded. Legitimate senders can easily query the new accessString and obtain the accessCode within a few seconds. All automated lists will lose the accessCode and hence a client side feature may be required to automatically update the lists with new accessCode.

* SAGE server needs to send back error message.
The server queries the sender server to find out if it is an SAGE server. In case of an SMTP server, a normal mail can be constructed and sent back notifying the sender of the error.
Otherwise, a standard error code is returned to the server, which the sender server will expand into a standard message and deliver to the sender, without requiring the sender server to provide an accessCode.

* Servers have to be upgraded as well as mail programs.
Not all servers need to be updated. Only the receiver server has to be upgraded in order to carry out the filtering. Relaying can still take place through SMTP. The sender will have to upgrade the mail program and use a plug in to generate the new header required.

* An algorithm is devised to find out accessCodes from the accessString.
The implementer should be given an award for his/her outstanding contribution to natural language processing!

5. ADDITIONAL THOUGHTS
Although we have proposed a server based system, the entire idea can be translated into plugins at the mail user agent. The accessString can be queried using a finger service which runs on most unix system or can be mailed back to a sender who has an SMTP server. The simplicity of the system and backward compatibility with existing SMTP servers ensures the deployment can occur in stages. A server based system ensures reduction in transmission of unwanted emails thus reducing congestion. Minimum bandwidth is spent in the system for the purpose of authentication and it gives the power to the mail ID owner to cut off spammers simply by changing the accessCode instead of having to change email ID.

It would now be costly for a spammer to manually harvest accessCodes for users but trivial for an email user to cut off spam. The internet community will have tilted the balance of the asymmetric cost of spamming!

PROBLEMS
The following was pointed out by Paul Graham.

The problem is, what happens if I send mail to someone who has
one of these access codes? Do I get a reply saying please
resend with the capital of France in the subject line? Bad
idea because:


1. I often don't bother, meaning this filtering method has
just generated a false positive. No one in business would
insist on access codes, because they want to put as few
obstacles in the way of potential customers as possible.


2. It's unethical, because spammers often use people's actual
email addrs (not theirs of course) as the reply-to. If
such a system were widely used, this innocent victim would
receive millions of mails.


--pg

Posted by amitc at December 31, 2002 10:10 AM

Comments

Post a comment




Remember Me?




Acquiring image from ProHosting Banner Exchange