Online Banking Information

TCP Treason Uncloaked




TCP: Treason Uncloaked

DISCLAIMER:
I'm saying this because I've done it myself so many times myself. The comments and information presented here is not meant as advice. Your situation is likely to differ greatly from mine, so use of any information found here may very likely break your system. In that event, we cannot be held responsible at all for your actions. AGAIN: THIS INFORMATION PROVIDES NO WARRANTY, USE AT YOUR OWN RISK!

I've researched this extensively, and unfortunately the majority of information I've found is useless. Most of it reads like this:

Post to a newsgroup or mailing list: "What is this log message: 'Treason Uncloaked'?"
Response: "You didn't search on Google did you? Try searching on Google and you'll find out that it is no big deal."

Hmm, we'll I've searched and searched and found no conclusive evidence that it is no big deal. I've found a number of conflicting, inconclusive explanations, including tarpit attacks (aka tar-pit), buggy TCP stacks, buggy nic card drivers, spam bots, denial of service attacks, (d-o-s), and even bandwidth shaper effects.

What follows is the information I've found regarding this slippery log message, and how I've come up empty handed.

What is most interesting to me is that the source IPs are almost always guilty of blog spam bots, spam bots or otherwise:

Jun 17 04:07:59 xxxxx TCP: Treason uncloaked! Peer 168.209.98.35:43301/80 shrinks window 2244026871:2244027701. Repaired.

Jun 17 19:53:28 xxxxx TCP: Treason uncloaked! Peer 24.67.253.203:40640/80 shrinks window 2021223504:2021230600. Repaired.

http://en.wikipedia.org/wiki/User:168.209.98.35

and

http://en.wikipedia.org/wiki/User_talk:24.67.253.203


There are a lot of posts claiming that this is nothing, however a high traffic server I run is having issues from time to time, where the link goes down for 1 to 5 minutes. I've searched high and low and the only thing I've come up with is the treason uncloaked message usually happening sometime before the link goes down.

They are often from a similar IP, which when I researched turned up wikipedia pages reporting that they were blocking those ips because they were zombia sock puppets.

http://www.ussg.iu.edu/hypermail/linux/kernel/0402.2/0740.html

Thinking it could be a problem due to a iproute2 traffic control (tc) script I use to throttle bandwidth usage.

Either that, or the ethernet driver I'm using for my ethernet link.

Everyone says that the problem is caused by the other end, which could be true as the source IPs are listed as "bad" netizens. Even if they are, my server should still be able to handle it without a problem.

The strangest part of this is that even though the link is dropped to the internet, I can login to another machine connected to the same switch, and login to the computer, which makes me think that its not my computer which bugs out, but the router right before my machine, which is the responsibility of my colocation space provider.

http://forums.sw-soft.com/printthread.php?s=8ff4c28fa92467673b222c92b0b558e4&threadid=7318&perpage=19

Also thinking it might be

ip_conntrack_max

Increased that 100 times.

Looks again like it may be a bad e1000 driver:

http://www.vbulletin.com/forum/archive/index.php/t-173091.html

http://www.mail-archive.com/netdev@vger.kernel.org/msg10830.html

Switched port, switched intel driver, and Switched to Broadcom card instead of Intel card. Still getting treason uncloaked messages. Blocking new source address, but wondering how to make a global fix.



People also say it is a bug in tcp_timer.c:

http://www.derkeiler.com/Newsgroups/comp.security.unix/2002-05/0063.html

However, since these messages coincide with malicious IP addresses, I believe something more significant that a bug in the tcp_timer.c code. Nevertheless, I'm going to try and track down which package contains this code and update it.

Tracking this down I found this nice page about kernel security:

http://www.gentoo.org/doc/en/security/security-handbook.xml?part=1&chap=9

I enabled all of them except the ping ignore, and am not constraining pings to the same source ips as my ssh daemon accepts.

Also checking all Gentoo Linux Security Advisories while I'm at it.

Thus upgraing a bunch of packages:

  • flex
  • tar
  • rsync
  • sasl_authd
  • spamassassin and a bunch of perl libs

No package upgrades had any impact on the performance, reliability, or robustness of the system.


How do I defend against these log messages? First, not by suppressing the message. The first thing I'm trying is to block the ip using IPtables.

iptables -A INPUT -s 24.67.253.203 -j DROP

This is obviously not a good solution as I could be blocking legitimate traffic! However, it is a last resort as I search for the real cause and a global solution.


Does entropy have anything to do with it? rng-tools were already installed, which interface with the hardware random number generator.

Yeah fiddling with the rngd settings finally got the entropy where it needs to be!

So, at least entropy_avail is staying in the 3000s. Amazing that rngd hovers around 4% of the cpu's resources. Craziness. Simply emerging rngd was not enough, something (I believe my email setup) rapidly depletes the server's entropy. After starting the rngd daemon, the entropy would again be down to less than one hundred within moments. I changed the settings in /etc/conf.d/rngd to:


# /etc/conf.d/rngd

# Please see "/usr/sbin/rngd --help" and "man rngd" for more information

# Random step (Number of bytes written to random-device at a time):
# default
#STEP=64
STEP=2048

# Timeout (Interval written to random-device when the entropy pool is full):
TIMEOUT=5

Also, had to reduce the available apache servers, children, and stuff like that. Turned off throttle_cat, as well as bwshare.

Later re-emerged Apache as prefork, to compile in mod_perl. I think that worker may be problematic for high volume servers at this time, especially when trying to debug possibly related issues.


Just found this helpful page at WHT:

http://webhostingtalk.com/showthread.php?t=340239&highlight=apache+settings

It talks about tuning a web server. Even more thorough than the Gentoo page.

The server does seem nice and zippy now.



Still getting: "Broadcast, ethertype Unknown" in tcpdump... I again think this is from my switch.

Conclusion:
I believe the "TCP: Treason Uncloaked" messages were somehow involved with SYN flood attacks. I think that they were also responsible for causing the rapid depletion of entropy.

To review what was done:

Hardware:

  • Ethernet card replacement from Intel e1000 to Broadcom 1000.
  • Switch port change from 3 to 5.

OS:

  • Changed rngd settings to increase entropy
  • Made changes to sysctl kernel ipv4 stack as prescribed in Gentoo and WHT pages.
  • I think the network outtages may be caused by a dos worm or some such thing, if restarting the network is fixing the outage. So this calls for a smarter iptables script! Added syn flood limit, icmp limit, and fixed eth0->eth2 directives. Syn flood limit was a terrible idea. It DOS'ed right away. Duh. You can't reliably limit syn floods via iptables.
  • Now trying to recompile the kernel with syncookies enabled. It looks like syn cookies may be the answer to all this garbage. Seems to be working pretty well! I THINK THIS FIXED THE PROBLEM!

http://cr.yp.to/syncookies.html

  • Also dropping pings which occur at a rate greater than one per second, it is logging them and there were indeed a bunch of pings coming in with large sizes.
  • While I'm at it, increasing the backlog:

sysctl -w net.ipv4.tcp_max_syn_backlog="2048"
from:
http://www.securityfocus.com/infocus/1729

Also:
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_abort_on_overflow = 1

Entropy:

Apache:

  • Decreased Apache limits so that it wouldn't run into swap
  • Turned off gzip for everything except html
  • In fixing apache2splitlogfile, I accidentally added two characters which were causing it to bail. Removed them and now its fine again.
  • Switched back from worker to prefork MPM - this finally fixed my restarting issues, often attributed to digest creation for mod_auth_digest, or seed generation for mod_ssl. I think that the worker children had a difficult responding to the advice provided by the restart / reload init scripts, in that the children would be stuck to php processes, apache2splitlogfile processes, or throttle_cat processes. Although the restart is supposed to be graceful and replenish the children as they finish their processes, the init script couldn't handle it.
  • Turned off throttle_cat!

Monitor:

  • Setup monitrc to restart network interface when the website becomes unavailable. Set it to check simply "/" in case there is a filename change on it.

Email:

  • Enabled rbl checks
  • Fixed spamc setup
  • Increased imapd, imapd-ssl connections, daemons, and such.

What is interesting too is that after the tcp_syncookies were enabled and pings were dropped, the entropy is not depleting nearly as fast as it was.

Things to-do:

  • In my research, I came across this:

http://huizen.dto.tudelft.nl/devries/security/iptables_example.html#installation

This is by far the best iptables script I've ever seen. I plan to implement it on a test box which I have physical access to, so I don't lock myself out (something which I've been known to do from time to time).

Update June 20, 2006
I have implemented some of those iptables rules now, mostly the bogon list, as well as the syn,ack, and fin rules. I plan to also add the malformed rules today. As well as blocked icmp for all ips except for my nagios monitor.

N.B. - Check Netgear switch for utilities / vulnerabilities to dos / ping / syn floods. Read about some issues with auto-mdix (auto negotiation) causing temporary disconnects, so I manually set all my ports on the GS716T to be 100Mbit full duplex.

Reading more into the speed settings of the switch has led me down another interesting path. It turns out that there really are auto-negotiation problems which I have been previously unaware. This paper sums is up extremely well:

http://www.cites.uiuc.edu/network/autosense.html

Thank you to Jay Kreibich for writing and sharing that with us.

As I said earlier, I manually set the switch to turn off auto-negotiation and stay at 100Mbit full duplex. On the server side, the link went down, and came back up. The auto-negotiation on the server side correctly deduced that there was no auto-negotiation on the switch side, so it revered to half duplex. In the current situation there is a duplex mismatch. Therefore I looked into manually setting the duplex and speed of the server link. There are two tools to do this:

  • mii-tool
  • ethtool

I was previously unaware of mii-tool until recently when my co-worker used it in front of me to find out which links were active. Quick overviews here:
mii-tool:
http://community.smoothwall.org/forum/viewtopic.php?t=7626
Both mii-tool and ethtool:
http://www.linuxhomenetworking.com/wiki/index.php/Quick_HOWTO_:_Ch03_:_Linux_Networking#Changing_NIC_Speed_and_Duplex

Source of confusion, I set full duplex via mii-tool and it didn't reflect in ethtool. I set it via ethtool, it took a moment and almost gave me a heart attack, but it resolved and now reflects the appropriate settings.

So the server in question just went offline again. The part which really confused me in the past was that when a server outage would occur, no other servers connected to that switch would be affected. Even more confusing, I would be able to ssh to the server through another server, connected to the same switch! In these events, if I restarted the network connection, the link would again be available, presumably the negotiation between the nic and the switch. This server is very high traffic compared with the other servers connected to the switch. I read up on flow control and backpressure, and I believe that this may have been the cause all along. The descriptions here were very helpful:

http://www.rhyshaden.com/ethernet.htm

http://virtualthreads.blogspot.com/2006/02/beware-ethernet-flow-control.html

Now on my switch, all ports are configured to 100Mbit full duplex, flow control off. Backpressure is also turned off for the switch in general.


Still happening. Happened this morning at 6:00AM on eth2, when there was no traffic control running, ruling out the possibility that it could be caused by that.



Last edited: 2006-06-21 08:42:30

Copyright Informed Banking, Contact Us