Using Usenet Effectively

(Originally A ClariNet Internal Workshop)
by Patrick Salsbury
First presentation: 01-21-98
Document URL: http://reality.sculptors.com/~salsbury/Articles/using-USENET.html

Note: This is the first of what I envision as a three article series. This installment covers an introduction to Usenet, and outlines some of its capabilities for the end user.

The second article details ways that a Usenet conferencing system can be used in a corporate environment to reduce the amount of info that people wade through on a daily basis.

The third article (not yet written) I intend to be a technical-implementation document, describing the steps necessary to implement a working Usenet conferencing system in your environment, and the things to consider in tying the new system into your existing email system so as to extend your organization's capabilities, while reducing "info-glut" in the office, and allowing people to focus more clearly on their actual tasks.

Synopsis:


Usenet Philosophy, or the theory behind it all

What the heck is Usenet? Old-timers mention it, Newbies might not have even heard of it, and it sounds like it's different than the web...but HOW?

First off, Usenet is the largest distributed database project in the history of Mankind. It's been under development, and growing, since about 1983, perhaps even earlier. It's a loose collection of machines across the planet that share data with each other. There are hundreds of thousands, perhaps millions of "nodes" in this database, and they try to keep each other up to date with all the latest news. They communicate with each other using NNTP, the Network News Transfer Protocol.

From an end-user perspective, Usenet appears like a bunch of "bulletin boards" or "chat areas" (unlike the current usage of the term "chat", though, discussions take place over the course of weeks or months, rather than in real-time like the "chat rooms" on the Web.) These "boards", or newsgroups, are where people post messages on certain topics. Someone posts a question, or opinion, or interesting factoid, and others are able to respond, either via private email to the poster, or as another public post that goes into the group as a "followup message".

Each time one of these postings is made, it is branded with a unique "Message ID", which is a long string of characters generated by the news server software. They usually look something like this:

<Rrussia-minersVRlAO_8JJ.P@lnk.clari.net>
These are displayed in the headers on the Message-ID: line, and they are always unique for each and every Usenet message posted. Ever. This is a very important point to note, and one which we'll return to shortly.

Estimates of Usenet usage vary greatly, and there's no definitive way to tell for sure. Besides, the number is growing every day, so it doesn't really matter. The last number I heard was something around 30-33 million people, and that was a while ago. It may or may not be accurate, but it either was, or will be, at some point. :-)

At the time of this writing, Usenet is generating about 6-8 gigabytes of data per day. This data gets dutifully shuttled around to all of the nodes participating in the Usenet network, and machines try to keep each other up to date. They will get a note from one machine, and immediately turn around and ask all of their "peers" if they've "heard the news". They pass on the new Message ID, and say "Do you have this one, yet?" If the answer is "Yes, I've got that one", then nothing is done, but if the answer is "No, I haven't seen that one, yet", then the message is passed on, and the next machine begins asking its peers. In this way, news is passed quickly and efficiently around the globe. Often in a matter of minutes.

Usenet isn't really any "place", just as the World Wide Web isn't any "place", but it does form a very integral part of the virtual space of the Net. Conceptually, it's the place where people go to talk. It's like the coffee shops and meeting halls, the town squares and workrooms, of our physical space.

Each newsgroup represents a different area of interest. Sometimes they are on totally different topics (such as sci.chem and alt.swedish.chef.bork.bork.bork) and sometimes they are splinter groups that differ in some very specific way (such as rec.aquaria.freshwater.cichlids and rec.aquaria.freshwater.goldfish). The last count I saw was that there were more than 100,000 different groups around the planet. Some of them are global, some are extremely regional, dealing with one specific region, or city, or school, or even one class in a school.

What about all that spam?

Yes, there's spam on Usenet. In fact, Usenet is where the concept of spam originated, as well as the term. Sometime around 1993, a few misguided lawyers hit upon the brilliant idea of sending their message out to the (then) 6,000 Usenet newsgroups in existence, to tell everyone about their hot new service, the "Green Card Lottery". With that, a new phenomenon was born, which everyone is still battling with to this day.

Luckily, many Usenet news reading programs have tools built in that let you avoid most of that spam. Learning how to use those tools is pretty easy, and gives you surprising levels of control over what pops up on your screen.

An introduction to the concept of "threads"

The conversations in Usenet are many, and varied. They weave a very interesting tapestry, and the individual pieces that make up this tapestry are called "threads". A thread is a topic of discussion. Someone posts a message, someone posts a reply, then another, and another... Over time, the thread will grow, sometimes split into other threads, and eventually, it dies out.

A thread is not limited to one subject line, or a specific group of people, or even a particular length of time. (Some go on for months, some are only a message or two long, and then die out due to lack of responses.) Think of it like a gathering in a coffee shop: Some people are sitting around, someone starts up a discussion, and conversation ensues. Over the next hour or two, some people get up and leave, some come in and join the conversation, and eventually, everyone finishes up, has their say, and heads home. The coffee shop closes, and the discussion is complete. (Or, if it's a 24-hour shop, like Usenet, then perhaps people keep on talking 'round the clock.) At the end, the conversation may have wandered all over the place, and may not resemble the initial question/statement at all. This is very common.

If you were in the coffee shop, you might join in with the conversation at one table, or it might bore you, so you go to a different table, to see what people are talking about there. You can do the same thing in Usenet newsgroups, with the added benefit of being able to filter out boring threads, and auto-select the ones that interest you.

At its simplest level, you can say "filter out anything about topic X" and "select anything about topic Y", where you specify specific search criteria. In many newsreaders, the default is to look on the Subject: line. However, you can also say things like "select anything by person A", and "filter out things by person B", or even "select things by person A, except when they're responding to person B, because I'm tired of listening to them argue." (!) These involve scanning the From: line in postings, and in some more advanced newsreaders, you can also look in other (or any) header lines, such as Keywords:, Summary:, Distribution:, Newsgroups:, check the number of lines in the post, etc. You can also use some newsreaders to scan through the body of the article. Thus, you can create complex sorting rules like the following:

"Choose any message in the talk.politics group that mentions the phrase 'tax break', but filter out any posts from Persons X, Y, or Z, (and anything from person A responding to X, Y, or Z), but select anything else from person A, because I like what they have to say. Make sure the post is less than 200 lines, otherwise, get rid of it, because it's too much blather. Make sure that it's not cross-posted to more than 3 other groups (checking the Newsgroups: line), otherwise, chances are that it's spam. Also, make sure it doesn't have certain key phrases in it like 'make money fast', '$50,000', or 'fast cash'...I know those are spam."

As you can see, there's a surprising amount of flexibility in the filter rules, if the newsreader software is sufficiently advanced. The net result being that once you've constructed some rules, the computer will cut out much of the stuff you're not interested in before you even see it.

What if someone changes the subject?

Scanning on a specific Subject: line is all fine and good, until someone changes the subject. This is all too common on Usenet, and can quickly make your hand-tailored rulesets useless, if they're only looking at the Subject: line. An age-old example would be the subject line progression of something like this (which you can find at almost any time in the alt.religion.* and talk.religion.* groups):

Subject: Jesus Loves You!
Subject: Jesus Loves You! (And wants you to change your sinning ways!)
Subject: Satan Loves You!
Subject: Bob Dobbs Loves You! (And wants you to send $1...)
Subject: Buddha is indifferent!
Subject: My girlfriends and I would love for you to call 1-900-.....
As you can see, while all of these may be responses to one original post, the topic-drift is pretty rapid, and would be difficult to filter. (Except perhaps for the word "Subject:" ;^) ) This is where "threads" come to the rescue, and where that unique Message ID that we mentioned earlier comes into play.

Since the Subject: line can vary so widely, and since it's not even required, it's not the best thing to filter with. It's useful as a rule-of-thumb, but not dependable. Also, filtering by author is only good if someone uses the same account all the time, and that's not guaranteed. What is guaranteed (if the software is written correctly) is that each and every message will have a unique Message ID.

But if it's unique, you can't know in advance what it will be, so how can you filter on it? Usenet posting software (if it's written correctly) will keep track of those Message IDs, and when a new message is posted, it gets a unique ID, as well as a References: line that lists the ID or IDs that it's replying to. Thus, you can trace back the thread, and find earlier posts, if you want to check what someone wrote earlier.

Thus, every response refers back to an earlier post, and every fork in the conversation, or every change in topic, is tracable by its Message ID, even if the subject line is changed, or dropped.

A real-life example of a discussion thread

This is (most of) a thread tree taken from the talk.politics newsgroup at the time of this writing. It gives a good example of how a topic can grow, change course, change subject, die out, and be reborn. It was constructed by the trn newsreader, written by Wayne Davison.

You can see that this part of the thread starts in upper left with the first post, and within a few replies, begins to fork and fork again, as people reply to replies, and reply to replies of replies. Somewhere down the line, someone changes the subject, (each subject is noted by a number like [1], [2], [3], etc...) but they are all still pertaining to the original post, as determined by the Message-ID:, and the References: lines.

Subjects:
[1] Note to Jol fans (if there are any):
[2] Why the Lawyer-Like-Dishonesty Posts?
[3] Defamation Suit
[4] Jol's accusations 
[5] Lawyer-Like Dishonesty
 
  (1)--[1]+-[1]
          \-[1]+-[1]--[1]--[1]+-[1]--[1]--[1]--[1]--[1]--[1]--[1]
               |              |-[1]--[1]+-[1]--[1]+-[1]
               |              |         |         |-[1]--[1]+-[1]
               |              |         |         |         \-[1]+-[1]--( )--[1]--[1]--[1]
               |              |         |         |              \-[1]
               |              |         |         \-[1]--[1]+-[1]
               |              |         |                   |-[1]
               |              |         |                   \-[1]--( )--[1]
               |              |         \-[1]--[1]--[1]--[1]--[1]--[1]--[1]--[1]--[1]--( )+-[1]
               |              |                                                           \-[1]--( )+
               |              |-[2]+-[2]
               |              |    |-[2]--[2]
               |              |    \-[2]
               |              \-[1]--[1]--[1]--[1]--( )--[1]--( )--[1]
               |-[1]+-[1]--[1]--[1]+-[1]--[1]+-[1]--[1]
               |    |              |         \-[1]--[1]+-[1]--[1]
               |    |              |                   \-[1]+-[1]+-[1]--[1]+-[1]--[1]--[1]--[1]--[1]
               |    |              |                        |    |         \-[1]
               |    |              |                        |    \-[1]--[1]--[1]--[1]--[1]--[1]--[1]-
               |    |              |                        |-[1]
               |    |              |                        \-[1]
               |    |              |-[1]--[1]
               |    |              \-[1]--[1]
               |    \-[1]+-[1]+-[1]--[1]--[1]
               |         |    \-[1]--[1]+-( )--[1]--[1]+-[1]--[1]+-[1]--[1]
               |         |              |              |         \-[1]--[1]+-[1]
               |         |              |              |                   \-[1]
               |         |              |              \-[1]
               |         |              \-[1]
               |         \-[1]
               |-[1]--[1]--[1]+-[1]
               |              \-[1]+-[1]+-[1]--[1]
               |                   |    \-[1]
               |                   \-[1]
               \-[1]
  [3]+-[3]
     |-[3]
     |-[3]
     |-[3]+-[3]
     |    |-[3]
     |    \-[3]
     |-[3]--[3]
     |-[3]
     \-( )--[5]--[5]
 -[1]
 -[4]+-[4]--[4]--[4]--[4]--[4]--[4]--[4]--[4]+-[4]--( )--[4]--( )--[4]
     |                                       \-[5]
     |-[4]
     \-[4]

The lower section, with the [3], [4], and [5] entries, are forks from an earlier part of the discussion. (Apparently, this thread has been going on for quite a while.) As you can see, the paths that the threads weave can get quite complex, and they have the potential to grow for a lot longer than you're likely to be interested in them.

Thus, if you want to keep from drowning in useless information, it becomes necessary to proactively "prune" that tree, before it grows out of control. In the above example, if you weren't interested in the sub-thread that starts out section [3] (Defamation Suit), then adjusting your newsreader to prune that message, and all things matching the subject of "Defamation Suit" would save you from seeing a further 10 messages. However, just filtering on the subject wouldn't stop you from seeing the two messages marked [5] (Lawyer-Like Dishonesty) which stemmed off from the [3] branch of the thread-tree. If you were really, truly not-interested in anything to do with the [3] tree, then those two [5] reponses would be a waste of your time. By setting your newsreader to kill the entire sub-thread, starting with the original [3] posting, and any message posted in response to it, regardless of subject change, you'd get rid of all of the [3] posts, the two [5] posts, and any further responses in the future. In a very real sense, you've removed the non-interesting stuff from bothering you, so you can get on with things that interest you.

It's important to note that the original [1] tree, and the [4] tree are still preserved by a sub-thread kill. You can follow those parts, and kill off other sub-threads as you go. Or, you could say "this entire thread is crap, junk it" and the newsreader can trace forward and backward in the tree, taking note of each message ID, so that you don't see them or any response to any message in the entire thread ever again.

Think about the long-term effect of this. If one message can lead to more than 100 messages, as we've seen above, then what could those 100 messages lead to? By cutting the threads as soon as you decide you're not interested, you signal the computer to go to work making sure not to bother you with boring things. (And of course, you can undo your selections and filter rules, if you want to start reading a thread again later.) Thus, if you approach each message you read with the question of "Is this interesting to me? Do I want to read any more about this?" and set your rules right then, you can prune the news as you go. Killing this thread, selecting that one, following responses to one specific article, but removing responses to another, etc.

Scoring - an alternative to simple kill/select modes

But wait, there's more. :-)

Suppose you've got a fairly decent ruleset, and you've got your newsreader weeding out lots of stuff for you. There's another technique called "scoring" that gives you even finer-grained control over what hits your eyeballs.

This capability is not in every newsreader, but it exists in a few of the more advanced ones, such as trn, strn, and gnus.

Suppose you normally like the posts of person A, but occasionally they post something really bone-headed. So you don't really want to kill all their posts, but sometimes you wonder why you're bothering to read their stuff. With scoring, you are allowed to "rank" articles, and assign a point value to them. A "normal" Usenet article will start off with a value of 0, and can be adjusted either upwards or downwards. Bumping it up a point makes it "more interesting" than an average post, and likewise, bumping it down makes it "less interesting". The newsreader keeps track of your assigned scores, and will calculate article scores when you enter a group. Often, people will score over a wide range, such as +/- 1000 points. That way, any one specific article won't have a huge effect on the overall score, unless you specify so.

Thus, if you normally enjoy reading person A's posts, and you've given him a score of +50, his posts will show up in the newsreader as "more interesting" than Joe Average's posts. However, every time you see a boneheaded post, you can say "subtract 5 points", and after 11 boneheaded posts, person A's posts are LESS interesting than Joe Average. (Of course, if they say something that really ticks you off, you can give them a score of -500 and write them off.)

You can also set visibility thresholds, so that things below a certain score just vanish from sight, and things above a certain score get auto-selected for you to read. You can even set sub-thresholds that say things like "If the score gets to -400, warn me that it's about to vanish, and if it gets to -500, just remove it. Or if it's above +200, auto-select it for me."

This allows you to get complex with your scoring rules, just as you did with your thread-filtering rules. You can say "Bump up the score of person A by 50 points, unless the subject is about 'Blah', then drop the score by 100 points. If person A is responding to person B (whom I like), then it gets 50 points for A, 50 for B, and another 25 because it's a subject I'm interested in. (It's at 125.) If this is in response to a thread that I started, give it an extra 200 points." Suddenly it's at 325, and it's auto-selected for you to read and marked as "VERY interesting."

Some newsreaders will let you rank articles by score, so you can say "show me the most interesting article" and it will pull it up for you. When you get to the point where the most interesting thing the newsreader has to offer you isn't really all that interesting, you can say "catch up the rest, and move on to the next group".

If all this scoring and ranking seems like a hassle, there is another solution: Auto-scoring. I've only seen this amazing little feature in the gnus newsreader, but hope to see it in others as time goes on. Essentially, while reading news, you are leaving little marks as you go through the articles. Some are getting read, some are getting killed, some are just plain ignored. The Auto-scoring facility takes a look at what you did while in a group, and when you leave, it updates some internal score files automagically. You can (of course) adjust the point values for everything, and choose what does or doesn't get watched for scoring, but the defaults are fairly good to start with. (The gnus docs basically say "Just turn this on, then go read news for a week. You'll start to notice more interesting news popping up, and the boring stuff will just start to go away.")

Here's a rough idea of how it works: (These numbers are rough, but adjustable.)

All articles start at 0.
If you read an article, the Subject: gets bumped up by +1, and the author gets bumped up by +3.
If you ignore an article, Subject: gets -1, author gets -1.
If you delete the article, Subject: gets -10, author gets -3.
If you kill the thread, each following article's Subject: gets -10, author gets -3.
When you leave the group and catchup articles, they all get something like Subject: -3, author -1.

The net effect of this is that, if you went into a group, and killed a thread with a 30-post argument between two people, then the Subject: of that argument gets a net score of -300 (30 * -10 for killing), and each author gets a -45 (15 each * -3 for killing). That subject won't come up again if your threshold cuts off articles at -300, and both of those people are pretty low on your list. Also, other things in the group all get bumped up or down appropriately, based on whether you read them or not. So over a very short period of time, a lot of the noise of Usenet just goes away.

Diligence pays off

All this filtering, killing and scoring takes a bit of diligence, but it pays off in the long run. As you can hopefully see, by taking the time to consider while you're reading whether you want to see topics in the future, you can tailor the info-tide to bring you the things you want, and remove the things you don't want. And since "Joe Average" posts all start off as "presumed innocent" (with a score of 0), it's only by comparing their content, headers, author, keywords, etc., that determines whether you should see it or not. If it's actually "news to you", and doesn't match any of your filters, then it will come through, and you'll see it. So you get the news, but not the "olds" (unless you've flagged it for reading).

This is how the Usenet pros manage to follow along on relevant discussions, skip over all the flame-bait (and subsequent flames), carry on long-term conversations with friends & cohorts, answer questions by newbies (that get through the filters as something unique and new, not just a FAQ or response to something non-interesting), and still manage to hold down a regular job. :-)

It's this capability of the newsreader software to manage the huge volumes of data, and get the interesting stuff to you, that makes Usenet into the treasure-trove that it is. Sure, there's spam. Sure, there's lots of off-topic cruft. But there's also a wealth of info out there. There are millions of minds, experts in every field and interest area, all carrying on conversations. Quite often, they're willing to answer honest questions, share some of their knowledge, impart some of their wisdom, and help you find references so you can learn new things.

That's why Usenet is the largest distributed database project in the history of Mankind.

Dive in...and have fun. :-)


Bio: Patrick Salsbury lives in a dome in the mountains near Silicon Valley. He has held various "Newsmaster" and "Postmaster" positions at various companies, and is sorely acquainted with the problems of info-overload. His resume can be found here.

Last modified: Saturday, 11-Aug-2007 20:09:11 PDT