[  Phrack Magazine   Volume 8, Issue 53 July 8, 1998, article 13 of 15

[Designing and Attacking Port Scan Detection Tools
--------[  solar designer 
                 번역 : 조윤종                                        

1. 소개
이 기사의 목적은 IDS(침입을 검색하는 시스템, intrusion detection systems)가 갖는 가능성이 
가장 많은 문제(특히 포트 스캔에 의한 가장 간단한 공격)를 보여주는 것이다.
The purpose of this article is to show potential problems with intrusion detection systems (IDS), 
concentrating on one simple attack: port scans.

This lets me cover all components of such a simplified IDS.  Also, unlike
이 기사는 이런 단순한 IDS의 모든 컴포넌트를 보여고,  SNI 
paper(http://www.secnet.com/papers/IDS.PS)와는 달리, 이 기사는 네트워크를 기반으로 둔 툴에 
제한하지 않는다. 실제로, 간단하고 신뢰할 수 있는 포트 스캔 감지 툴("scanlogd") 을 마지
막 부분에서 볼수 있는데 이 툴은 host-based로 만들어져 있다.
the great SNI paper (http://www.secnet.com/papers/IDS.PS), this article is not limited to network-based 
tools.  In fact, the simple and hopefully reliable example port scan detection tool that you'll find at the 
end is host-based.

2. 무엇을 검색할수 있는가?
 
포트 스캔은 공격자가 상대방 네트워크의 많은 포트(일반적으로 listening을 하고 있는 않는 
포트도 포함)에 접속을 시도하는 것이다. 여기서 포트 스캔을 감지할 수 있는 한가지 신호
는 여러 개의 패킷이 한군데의 IP 주소로부터 서로 다른 포트에 짧은 시간에 접속을 시도한
다는 것이다.  다른 신호는 listening을 하진 않는 포트에 SYN를 할려는 시도이다(SYN to a 
non-listening port). 분명히 위에 언급한 2가지외에 다른 많은 방법으로 포트 스캔을 감지할 
수 있을 것이다. 최악의 경우 모든 패킷 헤더를 파일로 저장하였다가 이것을 수작업으로 분
석할 수도 있을 것이다.(이건 무지 노가다 이겠죠 ^^)
A port scan involves an attacker trying many destination ports, usually including some that turn out not to 
be listening.  One "signature" that could be used for detecting port scans is "several packets to different 
destination ports from the same source address within a short period of time".  Another such signature 
could be "SYN to a non-listening port".
Obviously, there are many other ways to detect port scans, up to dumping all the packet headers to a file 
and analyzing them manually (ouch).
이렇게 다른 모든 방법들은 각각 자신만의 장점과 단점을 가지고 있고, 또한 다양한 잘못된 
긍정정인 점과 잘못된 부정적인 점을 초래할 수도 있다. 지금부터는 이러한 특별한 공격이 
포트 정보를 얻기 위하여 공격하는 동안 공격자가 자신을 드러내지 않거나 실제 발신지를 
추적할 수 없도록 하는 것이 가능하다는 것을 보게 될 것이다. 
All of these different methods have their own advantages and disadvantages, resulting in different 
numbers of "false positives" and "false negatives". Now, let me show that, for this particular attack type, 
it is always possible for an attacker to make her attack either very unlikely to be noticed, or very unlikely 
to be traced to its real origin, while still being able to obtain the port number information.

공격에 대하여 잘 드러내지 않게 하기 위하여, 공격자는 매우 천천히 포트를 스캔할수도 있
다. 일반적으로 대상이 되는 시스템이 idle(실제 상황은 틀리지만, 한 개의 패킷이 non-
listening 포트에 보내면 이것을 시스템 관리자에게 통보하는 경우)이 아니라면, 전에 보내진 
패킷이 포트 스캔하는 것이 아닌것처럼 느끼도록 충분한 시간 간격을 두는 것 또한 가능하
다.
To obscure the attack, an attacker could do the scan very slowly.  Unless the target system is normally 
idle (in which case one packet to a non-listening port is enough for the admin to notice, not a likely real 
world situation), it is possible to make the delay between ports large enough for this to be likely not 
recognized as a scan. 

정보를 받는동안 스캔하는 출처를 감추기 위한 방법으로는 많은 양의 spoofed 포트 스캔을 
보내고 그 중 오직 한 개만 자신의 실제 주소를 포함하여 보내는 것이다. 이 경우 만약 
1000개의 패킷을 보내고 이것을 모두 감지해서 시스템 관리자에게 보내더라도, 실제로 이중 
오직 한 개의 패킷만 실제 소스로부터 온것이기 때문에 어느 패킷이 진짜 패킷인지 구별할 
수가 없게 된다. 우리가 할수 있는 최선의 방법은 오직 포트가 스캔되었다는 것을 통보하여 
주는 것 뿐이다.
A way to hide the origin of a scan, while still receiving the information, is to send a large amount (say, 
999) of spoofed "port scans", and only on scan from the real source address.  Even if all the scans (1000 
of them) are detected and logged, there's no way to tell which of the source addresses is real.  All we can 
tell is that we've been port scanned. 

이러한 공격이 가능한 한, 공격자들은 포트를 스캔하기 위하여 자신들이 더욱더 많은 자원
을 필요로 하게 될 것이다. 몇몇 공격자들은 이러한 복잡하고 느린 공격을 선택하지 않거나 
다른 공격자들은 그들의 시간을 투자를 해야 할 것이다.  이것 한가지만으로도 최소한의 포
트 스캔(감지할 수 있는 공격)을 감지하는 툴을 만드는 것에 대한 충분한 이유가 될 것이다.
Note that, while these attacks are possible, they obviously require more resources from the attacker to 
perform.  Some attackers will likely choose not to use such complicated and/or slow attacks, and others 
will have to pay with their time.  This alone is enough reason to still detect at least some port scans (the 
ones that are detectable).

이러한 공격의 가능성은 우리의 목적이 모든 포트 스캔 방법을 검색하는 것이 아니라 충분
히 신뢰할 수 있는 가능한 많은 포트 스캔 종류를 검색하는 것이다.
The possibility of such attacks means that our goal is not to detect all port scans (which is impossible), 
but instead, in my opinion, to detect as many port scan kinds as possible while still being reliable enough.


3. 어떤한 정보가 믿을만 한가?

분명히 자신의 주소는 있기 때문에 증거가 확보되지 않는 한 패킷에서 들어오는 발신지 주
소를 믿으면 않된다. 어찌되었든 포트 스캐너는 때때로 spoofing 방식을 이용하여 자신의 다
른 주소를 실제 주소로 착각하도록 다른 정보들을 일부러 보내기도 한다.
Obviously, the source address can be spoofed, so we can't trust it unless other evidence is available.  
However, port scanners sometimes leak extra information that can be used to tell something about the real 
origin of a spoofed port scan.

예를들어, 만약 우리가 받은 패킷의 끝부분에 IP TTL이 255를 가지고 있다면 우리는 발신
지 주소가 어디든지간에 일단 자신의 로컬 네트워크에서 데이터가 보내졌다는 것을 알수 있
다. 만약 TTL이 250이라면 우리는 오직 공격자가 5 hops 이내에 있다는 것을 가르켜줄수 
있다. 즉, 이것으로 우리는 정확히 얼마나 멀리 떨어져 있는지를 알 수는 없는 것이다.
For example, if the packets we receive have an IP TTL of 255 at our end, we know for sure that they're 
being sent from our local network regardless of what the source address field says.  However, if TTL is 
250, we can only tell that the attacker was no more than 5 hops away, we can't tell how far exactly she 
was for sure.

우리는 시작되는 TTL과 소스 포트 넘버로부터 포트 스캐너 타입(예를들어 "stealth" 스캔)나 
공격자에 의해 사용되어지는 운영체제(예를들어 Full TCP Connection 스캔)를 알려 줄 수 있
다. 그러나, 이것 또한 우리가 절대적으로 신뢰할만한 정보는 못된다. 예를들어, 리눅스가 
TTL을 64로 설정하는 동안 nmap으로 TTL을 255로 설정하고, 발신지 포트 번호를 49274
로 설정할수 있기 떄문이다.
Starting TTL and source port number(s) can also give us a hint of what port scanner type (for "stealth" 
scans) or operating system (for full TCP connection scans) is used by the attacker.  We can never be sure 
though. For example, nmap sets TTL to 255 and source port to 49724, while Linux kernel sets TTL to 64.


4. Information Source (E-box) Choice

TCP 포트 스캔(stealth 포함)을 감지하기 위하여 우리는 RAW IP와 TCP 패킷 헤더에 접근할 
수 있어야 한다.
For detecting TCP port scans, including "stealth" ones, we need access to raw IP and TCP packet headers.

네트워크를 기반으로 하는 IDS에서는 raw packet을 얻기 위하여 promiscuous 모드를 사용하
여야 한다. 여기에 관련된 모든 문제는 SNI 문서에서 설명하였다. 어찌되었든, 모든 포트 스
캔을 감지하는 것이 불가능하기 떄문에 때로는 이러한 방식이 이와 같은 종류의 공격에는 
사용할만 하다. 
In a network-based IDS, we would use promiscuous mode for obtaining the raw packets.  This has all 
the problems described in the SNI paper: both false positives and false negatives are possible.  However, 
sometimes this might be acceptable for this attack type since it is impossible to detect all port scans 
anyway.

호스트를 기반으로 하는 IDS에서는 패킷을 얻는 2가지 방법이 있다. TCP 또는 IP 소켓에서 
Raw 데이터를 읽거나, 커널 내부에서 직접 데이터를 가져오는 것이다.(loadable module이나 
kenel patch로 가능)
For a host-based IDS, there are two major ways of obtaining the packets: reading from a raw TCP or IP 
socket, or getting the data directly inside the kernel (via a loadable module or a kernel patch).

Raw TCP 소켓을 사용하면 SNI 에서 지적한 대부분의 문제는 적용되지 않는다. 우리는 자신
의 커널에 의해 인식되는 패킷을 얻기만 하면 되는 것이다. 어찌되었든, 이것은 여전히 수동
적인 분석(패킷을 놓칠수 있기 때문)이고, 실패 확률이 있는 시스템이다. 이것은 오직 포트 
스캔만을 위한 것이라면 사용할 수는 있지만, 만약 추후에 다른 공격들을 감지하기 위하여
서는 별로 좋은 방법은 아니다. 만약 Raw IP 소켓외에 다른 방식을 사용한다면 우리는 "SNI 
문제점"에 대하여서 좀더 생각해야 한다. 어찌되었든, 우리가 만드는 예제에서는 Raw 소켓
을 사용할 것이다.
When using a raw TCP socket, most of the problems pointed out by SNI do not apply: we are only getting 
the packets recognized by our own kernel. However, this is still passive analysis (we might miss packets) 
and a fail-open system.  While probably acceptable for port scans only, this is not a good design if we 
later choose to detect other attacks.  If we used a raw IP socket instead (some systems don't have raw 
TCP sockets), we would have more of the "SNI problems" again.  Anyway, in my example code, I'm 
using a raw TCP socket.

가장 신뢰할 수 있는 IDS는 실제 구현하려는 시스템의 커널로부터 몇가지 지원을 받는 것
이다. 이것은 요구되는 모든 정보에 대하여 접근 가능하여야 하고 또한 fail-close 일수도 있
어야 한다. 분명한 단점은 커널 모듈이나 패치에 대한 이식 가능성이 매우 적다는 것이다. 
The most reliable IDS is one with some support from the target systems kernel.  This has access to all 
the required information, and can even be fail-close.  The obvious disadvantage is that kernel modules 
and patches aren't very portable.


5. Attack Signature (A-box) Choice

포트 스캔을 감지하기 위하여 사용되어지는 여러가지 신호들에 대해서는 위에서 이미 살펴
보았다.  우리가 선택할 공격신호는 여전히 false negatives를 타당한 범위 안에서 적게 유지
하면서 false negatives를 가능한 한 적게 유지하는 것이다. 여기서 물론 타당한 범위라는 것
이 분명하지 않기는 하지만 말이다. 내 견해로는 우리가 감지 하려는 공격의 심각성 여부
(false negative에 대한 비용)나 감지된 공격에 대한 처리 방법(false positive에 대한 비용)에 
따라 결정이 되어야 할 것 같다. 이 두가지 비용은 모두가 사이트에 따라서 달라질 수 있다. 
그래서 IDS는 사용자가 튜닝을 해야하는 것이다.
It has already been mentioned above that different signatures can be used to detect port scans; they differ 
by numbers of false positives and false negatives.  The attack signature that we choose should keep false 
positives as low as possible while still keeping false negatives reasonably low.  It is however not obvious 
what to consider reasonable. In my opinion, this should depend on the severity of the attack we're 
detecting (the cost of a false negative), and on the actions taken for a detected attack (the cost of a false 
positive).  Both of these costs can differ from site to site, so an IDS should be user-tunable.

scanlogd를 위하여서 나는 다음과 같은 공격 신호를 사용할 것이다. "최소한 같은 발신지 주
소로부터 스캔 되어지는데 필요한 포트의 COUNT(숫자)와 포트간에 DELAY로 설정한 시간
안에 발생한 공격에 한정을 둔다. 두가지 COUNT와 DELAY는 둘다 임의로 설정이 가능하
다. TCP 포트는 ACK 비트가 설정이 되어 있지 않으면 스캐닝되는 것으로 간주한다.
For scanlogd, I'm using the following attack signature: "at least COUNT ports need to be scanned from 
the same source address, with no longer than DELAY ticks between ports".  Both COUNT and DELAY 
are configurable. A TCP port is considered to be scanned when receiving a packet without the ACK bit set.


6. Logging the Results (D-box)

우리가 하드나, 프린터나, 다른 시스템등 로그를 어느곳에 기록하든지, 우리가 갖고 있는 자
원의 공간은 한정이 되어 있다. 저장 공간이 없을 때, 우리는 결과를 잃어버리게 될 것이다. 
대부분의 이러한 경우, 로그 데이터 저장을 그만두던지 오래된 결과 데이터를 새로운 데이
터로 바꾸는 작업을 하게 될 것이다.
Regardless of where we write our logs (a disk file, a remote system, or maybe even a printer), our space is 
limited.  When storage is full, results will get lost.  Most likely, either the logging stops, or old entries 
get replaced with newer ones.

한가지 분명한 공격은 중요하지 않은 정보를 로그 데이터로 채워넣게 하는 것이다. 그런다
음 IDS가 제대로 동작하지 않을 때 비로서 본격적인 공격을 하는 것이다. 포트 스캔 예제
에서, 발신지 주소가 바뀐 port scan은 로그 데이터에 저장이 되고 실제 공격은 충분히 시스
템이 제대로 동작하지 않은 다음에 실제 포트 스캔이 이루어지게 된다. 이 예제는 제대로 
만들어지지 않은 포트 스캔 감지 툴은 위에 말한 방법에 의하여 쉽게 로그인을 피할수 있게 
될 것이다.
An obvious attack is to fill up the logs with unimportant information, and then do the real attack with the 
IDS effectively disabled.  For the port scans example, spoofed "port scans" could be used to fill up the 
logs, and the real attack could be a real port scan, possibly followed by a breakin.  This example shows 
how a badly coded port scan detection tool could be used to avoid logging of the breakin attempt, which 
would get logged if the tool wasn't running.

이 문제에 대한 한가지 해결 방법은 약간의 제약 사항을 두는 것이다. 예를 들어 모든 공격 
형태에 대하여 각각 "20초안에 5개이하의 메시지를 저장하라." 고 계속적으로 공격이 되면 
잠시동안 이러한 타입에 관련된 공격에 대해서는 로그 데이터를 저장하지 않는다. 발신지 
주소가 바뀌지 않는 공격들을 위하여서, 이러한 제한을 공격 종류별대신에 발신지 주소로 
바뀌어서 저장할 수도 있다. 포트 스캔이 발신지 주소를 바꿀수 있기 때문에, 공격자가 지거
접 자신의 주소를 드러내지 않을수 있지만 공격자 자신을 다른 공격 타입(제약 사항을 두지 
않았을 때 구현할 수 있는 공격방법)으로 숨길 수는 없는 것이다. 이것이 바로 scanlogd에서
구현할려고 하는 것이다.
One solution for this problem would be to put rate limits (say, no more than 5 messages per 20 seconds) 
on every attack type separately, and, when the limit is reached, log this fact, and temporarily stop logging 
of attacks of this type.  For attack types that can't be spoofed, such limits could be put per source address 
instead.  Since port scans can be spoofed, this still lets an attacker not reveal her real address, but this 
doesn't let her hide another attack type this way, like she could do if we didn't implement the rate limits... 
that's life.  This is what I implemented in scanlogd.

비슷한 장점과 단점을 가지고 있는 다른 해결책은 모든 공격자에 대하여서 각각 메시지 공
간을 따로 할당하는 것이다. 이러한 해결 방법들 모두 동시에 구현할 수 있다.
Another solution, which has similar advantages and disadvantages, is to allocate space for messages from 
every attack type separately.  Both of these solutions can be implemented simultaneously.


7.  What To Do About Port Scans? (R-box)

몇가지의 IDS는 이들이 감지해낸 공격에 대하여서 대응 할 수 있는 능력을 가지고 있다. 
이 능력은 앞으로의 공격을 막거나 공격자에 대하여서 다른 부가적인 정보를 추가로 얻을 
수 있도록 해준다. 불행히도 이러한 기능들은 뛰어난 공격자에 의하여서 남용되어질 수 있
다.
Some IDS are capable of responding to attacks they detect.  The actions are usually directed to prevent 
further attacks and/or to obtain extra information about the attacker.  Unfortunately, these features can 
often be abused by a smart attacker.

공격에 대한 일반적인 대응은 공격 호스트로부터 데이터를 블록시키는 것이다. 예를 들면 
firewall로부터 접근 가능한 목록을 고친다든지 아니면 이와 유사한 작업을 하는 것이다. 만
약 우리가 감지한 공격자가 발신지 주소를 바꿀 수 있다면 이것은 분명히 DoS(Denial of 
Service) 공격을 가능하게 한다. 그러나 이 방법은 분명히 발신지 주소를 바꾸지 못하는 공
격에서는 DoS 공격을 가능하게끔 만들지는 못한다. 왜냐하면 때때로 IP 주소는 다른 많은 
사람과 공유(예를 들어 Ineternet ISP를 사용하는 사람들은 동적 주소 할당 방법을 사용하기 
때문에 IP를 공유할 수 있다.)할수도 있기 때문이다. 
A typical action is to block the attacking host (re-configuring access lists of the firewall, or similar).  
This leads to an obvious Denial of Service (DoS) vulnerability if the attack we're detecting is spoofable 
(like a port scan is).  It is probably less obvious that this leads to DoS vulnerabilities for non-spoofable 
attack types, too.  That's because IP addresses are sometimes shared between many people; this is the 
case for ISP shell servers and dynamic dialup pools.

또한, 이러한 방법을 사용했을 때 자원자체가 한정되어 있는 크기를 갖고 있는 모든 것
(firewall 목록 접근, 라우팅 테이블등)에 대하여 몇 가지 구현상의 문제점이 있다. 또한 자원
의 한계에 도착하기 전에, CPU 사용률에 대한 문제가 있다. 만약 IDS가 이러한 문제를 무
시하게 되면, 이것은 바로 전체 네트워크에 대한 DoS공격을 가능하게 만드는 요인이 된
다.(즉, firewall이 점점 다운되는 것이다.)
There are also a few implementation problems with this approach: firewall access lists, routing tables, 
etc... are all of a limited size.  Also, even before the limit is reached, there are CPU usage issues.  If an 
IDS is not aware of these issues, this can lead to DoS of the entire network (say, if the firewall goes 
down).

내 견해로는, 오직 극소소의 경우만이 이러한 일을 발생 시킬 가능성이 있다. 이중 포트 스
캔 공격은 이러한 문제를 절대로 발생시킬 수 없는 종류의 공격이다.
In my opinion, there're only very few cases in which such an action might be justified.  Port scans are 
definitely not among those. 

다른 일반적인 대응책은 다른 정보를 얻기 위하여 거꾸로 상대방의 호스트에 접근을 시도하
는 것이다. 발신지 주소를 바꾸는 공격을 대하여서, 방어자가 마지막에 가서는 이 정보를 다
른 호스트에 대한 공격을 하는데 사용할 수 도 있게 된다. 이것은 곧 아무것도 하지 않는 
것보다는 훨씬 좋다.
Another common action is to connect back to the attacking host to obtain extra information.  For 
spoofable attacks, we might end up being used in attacking a third-party.  We'd better not do anything for 
such attacks, including port scans.

어찌되었든 발신지 주소를 바꾸지 못하는 공격에서는 몇가지 경우에 많은 조기 경보와 함께 
구현할 만한 가치성을 가지고 있다. 주로 우리는 네트워크 대역폭, CPU 점유률, 메모리등 많
은 자원을 소비하지 않도록 주의하여야 한다. 분명히, 이것은 공격자로 하여금 몇 가지 요청
에 대하여서는 실패할 수 있도록 만들 수 있으나 이것 외에는 우리가 할 수 있는 것이 없다
는 것을 의미한다.
However, for non-spoofable attacks, this might be worth implementing in some cases, with a lot of 
precautions.  Mainly, we should be careful not to consume too many resources, including bandwidth 
(should limit request rate regardless of the attack rate, and limit the data size), CPU time, and memory 
(should have a timeout, and limit the number of requests that we do at a time).  Obviously, this means 
that an attacker can still make some of the requests fail, but there's nothing we can do here.

이 문제에 관련되서는 ftp://ftp.win.tue.nl/pub/security/murphy.ps.gz 문서를 참고하기 바란다. 
Wietse Venea에 의해 작성된 이 문서는 그의 유명한 TCP wrapper 패키지의 구 버전에 있는 
내용과 유사하게 공격 당하기 쉬운 시스템에 대하여 자세히 설명하고 있다.
See ftp://ftp.win.tue.nl/pub/security/murphy.ps.gz for an example of the issues involved.  This paper by 
Wietse Venema details similar vulnerabilities in older versions of his famous TCP wrapper package. 

이러한 이유로, scanlogd는 포트 스캔에 대한 로그 데이터를 저장하는 것 외에는 아무것도 
하지 않는다. 
For these reasons, scanlogd doesn't do anything but log port scans.  You should probably take action 
yourself.  What exactly you do is a matter of taste; I personally only check my larger logs (that I'm not 
checking normally) for activity near the port scan time.


8. 자료 구조와 알고리즘 선택
 
일반적인 응용 프로그램을 위하여서 데이터를 소트하고 자료를 찾는 알고리즘을 선택할 때 
사람들은 일반적으로 특정 경우에 최적화를 시킨다. 어찌되었든, IDS를 위하여서 최악의 경
우를 위한 시나리오를 항상 고려해야만 한다. 즉, 공격자가 방어하는 프로그램이 좋아하는 
데이터만 골라서 준다고 가정을 하자. 만약 IDS가 이러한 것을 고려하여서 만들어졌다면 
이 시스템은 문제가 되는 데이터를 무시 할 수 있지만, 그렇지 않은 경우에는 DoS 공격을 
하게 하는 빌미를 주게 되는 것이기 때문이다.(7에서 언급하였던 문제점.)
When choosing a sorting or data lookup algorithm to be used for a normal application, people are usually 
optimizing the typical case.  However, for IDS the worst case scenario should always be considered: an 
attacker can supply our IDS with whatever data she likes.  If the IDS is fail-open, she would then be able 
to bypass it, and if it's fail-close, she could cause a DoS for the entire protected system.

이것을 예제를 통하여서 살펴보자. scanlogd에서 나는 발신지 주소 데이터를 찾기 위하여서 
해쉬 테이블을 사용했다. 이 방식은 일반적으로 해쉬 테이블 크기가 충분히(?) 크다면 제대
로 잘 동작한다. 평균적으로 데이터를 찾는 시간은 이진 탐색(binary search)보다는 빠르다. 
어찌되었든, 공격자는 해쉬 충돌을 일으키기 위하여서 자신의 주소를 선별하여서 보낼 수 
있다. 이때는 해쉬 테이블 검색을 선형 탐색으로 바꾸는게 효과적이다. 얼마나 많은 데이터
를 유지하는가에 따라서 scanlogd 프로그램이 제시간에 새로운 패킷 데이터를 가져올 수 있
는지 없는지를 알 수 있게 된다. 이것은 또한 호스트를 기반으로 하는 IDS에서 다른 프로
세스가 사용할 시간을 CPU로부터 빼앗아가게 될 것이다.
Let me illustrate this by an example.  In scanlogd, I'm using a hash table to lookup source addresses.  
This works very well for the typical case as long as the hash table is large enough (since the number of 
addresses we keep is limited anyway).  The average lookup time is better than that of a binary search.  
However, an attacker can choose her addresses (most likely spoofed) to cause hash collisions, effectively 
replacing the hash table lookup with a linear search.  Depending on how many entries we keep, this 
might make scanlogd not be able to pick new packets up in time.  This will also always take more CPU 
time from other processes in a host-based IDS like scanlogd.

나는 이 문제를 해쉬 충돌 횟수를 제한둠으로 해결을 하였고 같은 해쉬 테이블이 제약 사항
에 도달하였을 때 같은 해쉬 값에 대한 데이터는 오래된 데이터를 버리는 방식을 택하였다. 
이 방식은 포트 스캔에서 사용할 만 하나 다른 공격을 감지하는데는 적당하지 않을 수 있다. 
만약 다른 공격 종류에 대하여서도 지원하기를 원한다면 해쉬 테이블말고 다른 방법의 알고
리즘을 사용하여야 할지도 모른다. 
I've solved this problem by limiting the number of hash collisions, and discarding the oldest entry with 
the same hash value when the limit is reached.  This is acceptable for port scans (remember, we can't 
detect all scans anyway), but might not be acceptable for detecting other attacks. If we were going to 
support some other attack type also, we would have to switch to a different algorithm instead, like a 
binary search.

만약 libc로부터 malloc(3)이나 free(3) 같은 메모리 관리자를 사용하면 공격자는 이 함수가 
갖고 있는 취약점을 이용하여 공격을 하게 될 것이다. 이것은 CPU 사용률 뿐만 아니라 메
모리 관리를 효율적으로 못해서(예를 들면 garbage collection) 발생하는 메모리 누수형상이 
발생할 것이다. 신뢰할만한 IDS는 자기 자신이 직접 메모리 관리를 해야 하며 메모리 할당
을 할 때 매우 주의를 기울여야 한다. scanlogd라는 간단한 툴을 위해서 나는 동적 메모리 
할당을 전혀 사용하지 않기로 결정했다.
If we're using a memory manager (such as malloc(3) and free(3) from our libc), an attacker might be able 
to exploit its weaknesses in a similar way.  This might include CPU usage issues and memory leaks 
because of not being able to do garbage collection efficiently enough.  A reliable IDS should have its 
very own memory manager (the one in libc can differ from system to system), and be extremely careful 
with its memory allocations. For a tool as simple as scanlogd is, I simply decided not to allocate any 
memory dynamically at all.

위 내용은 운영체제 커널 같은 용도에도 적용될 수 있는 것으로 언급의 값어치를 가지고 있
다고 생각한다. 예를 들어, 해쉬 테이블은 현재 연결되어 있는 것이나 listening 포트등을 찾
기 위하여 광범위하게 사용되어진다. 일반적으로 실제 위험에 노출되지 않는 다른 한계가 
있을 수 있으나 이것은 더욱 더 많은 연구가 필요하다.
It is probably worth mentioning that similar issues also apply to things like operating system kernels.  
For example, hash tables are widely used there for looking up active connections, listening ports, etc.  
There're usually other limits which make these not really dangerous though, but more research might be 
needed.


9.  IDS and Other Processes

일반적인 목적으로 사용되는 운영체제에 설치되어진 네트워크를 기반으로 하는 IDS나 모든 
호스트를 기반으로 하는 IDS든지 시스템의 나머지 부분(다른 프로세서들, 커널 포함)과 IDS
간에서 몇 가지의 상호작용이 있다.
For network-based IDS that are installed on a general-purpose operating system, and for all host-based 
IDS, there's some interaction of the IDS with the rest of the system, including other processes and the 
kernel.

운영체제 안에서 몇 가지의 DoS 노출은 공격자로 하여금 아무런 흔적도 남기지 않고 IDS
를 사용 못하게 할 수 있도록 만들 수 있다. 이것은 커널("teardrop"과 같은 것)이나 다른 프
로세서들(UDP 서비스가 연결 제한이나 다른 자원에 대한 제한 없이 inetd안에서 실행되는 
것) 안에서 공격할 수 있는 여지를 둠으로 인해서 실제도 공격을 당할 수 있다. 
Some DoS vulnerabilities in the operating system might allow an attacker to disable the IDS (of course, 
only if it is fail-open) without it ever noticing.  This can be done via vulnerabilities in both the kernel 
(like "teardrop") and in other processes (like having a UDP service enabled in inetd without a connection 
count limit and any resource limits).

유사하게 호스를 기반으로 하는 IDS가 제대로 코딩을 하지 못한다면 DoS 같은 공격을 당
할 수 있다.
Similarly, a poorly coded host-based IDS can be used for DoS attacks on other processes of the 
"protected" system.


10.  Example Code

마지막으로 여기에 리눅스를 위한 scanlogd가 있다. 이것은 다른 시스템에서도 역시 컴파일 
할 수 있으나 Raw TCP 소켓이 없으므로 인해 정상적으로 동작을 하지 않을 수 있다. 추후 
버전을 위해서는 http://www.false.com/security/scanlogd/ 사이트를 참고하기 바란다.
Finally, here you get scanlogd for Linux.  It may compile on other systems too, but will likely not work 
because of the lack of raw TCP sockets. For future versions see http://www.false.com/security/scanlogd/.

NOTE THAT SOURCE ADDRESSES REPORTED CAN BE SPOOFED, DON'T TAKE ANY ACTION
AGAINST THE ATTACKER UNLESS OTHER EVIDENCE IS AVAILABLE.

<++> Scanlogd/scanlogd.c
/*
 * Linux scanlogd v1.0 by Solar Designer.  You're allowed to do whatever you
 * like with this software (including re-distribution in any form, with or
 * without modification), provided that credit is given where it is due, and
 * any modified versions are marked as such.  There's absolutely no warranty.
 */

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#if (linux)
#define __BSD_SOURCE
#endif
#include 
#include 
#include 

/*
 * Port scan detection thresholds: at least COUNT ports need to be scanned
 * from the same source, with no longer than DELAY ticks between ports.
 */
#define SCAN_COUNT_THRESHOLD		10
#define SCAN_DELAY_THRESHOLD		(CLK_TCK * 5)

/*
 * Log flood detection thresholds: temporarily stop logging if more than
 * COUNT port scans are detected with no longer than DELAY between them.
 */
#define LOG_COUNT_THRESHOLD		5
#define LOG_DELAY_THRESHOLD		(CLK_TCK * 20)

/*
 * You might want to adjust these for using your tiny append-only log file.
 */
#define SYSLOG_IDENT			"scanlogd"
#define SYSLOG_FACILITY			LOG_DAEMON
#define SYSLOG_LEVEL			LOG_ALERT

/*
 * Keep track of up to LIST_SIZE source addresses, using a hash table of
 * HASH_SIZE entries for faster lookups, but limiting hash collisions to
 * HASH_MAX source addresses per the same hash value.
 */
#define LIST_SIZE			0x400
#define HASH_LOG			11
#define HASH_SIZE			(1 << HASH_LOG)
#define HASH_MAX			0x10

/*
 * Packet header as read from a raw TCP socket. In reality, the TCP header
 * can be at a different offset; this is just to get the total size right.
 */
struct header {
	struct ip ip;
	struct tcphdr tcp;
	char space[60 - sizeof(struct ip)];
};

/*
 * Information we keep per each source address.
 */
struct host {
	struct host *next;		/* Next entry with the same hash */
	clock_t timestamp;		/* Last update time */
	time_t start;			/* Entry creation time */
	struct in_addr saddr, daddr;	/* Source and destination addresses */
	unsigned short sport;		/* Source port, if fixed */
	int count;			/* Number of ports in the list */
	unsigned short ports[SCAN_COUNT_THRESHOLD - 1];	/* List of ports */
	unsigned char flags_or;		/* TCP flags OR mask */
	unsigned char flags_and;	/* TCP flags AND mask */
	unsigned char ttl;		/* TTL, if fixed */
};

/*
 * State information.
 */
struct {
	struct host list[LIST_SIZE];	/* List of source addresses */
	struct host *hash[HASH_SIZE];	/* Hash: pointers into the list */
	int index;			/* Oldest entry to be replaced */
} state;

/*
 * Convert an IP address into a hash table index.
 */
int hashfunc(struct in_addr addr)
{
	unsigned int value;
	int hash;

	value = addr.s_addr;
	hash = 0;
	do {
		hash ^= value;
	} while ((value >>= HASH_LOG));

	return hash & (HASH_SIZE - 1);
}

/*
 * Log this port scan.
 */
void do_log(struct host *info)
{
	char s_saddr[32];
	char s_daddr[32 + 8 * SCAN_COUNT_THRESHOLD];
	char s_flags[8];
	char s_ttl[16];
	char s_time[32];
	int index, size;
	unsigned char mask;

/* Source address and port number, if fixed */
	snprintf(s_saddr, sizeof(s_saddr),
		info->sport ? "%s:%u" : "%s",
		inet_ntoa(info->saddr),
		(unsigned int)ntohs(info->sport));

/* Destination address, if fixed */
	size = snprintf(s_daddr, sizeof(s_daddr),
		info->daddr.s_addr ? "%s ports " : "ports ",
		inet_ntoa(info->daddr));

/* Scanned port numbers */
	for (index = 0; index < info->count; index++)
		size += snprintf(s_daddr + size, sizeof(s_daddr) - size,
			"%u, ", (unsigned int)ntohs(info->ports[index]));

/* TCP flags: lowercase letters for "always clear", uppercase for "always
 * set", and question marks for "sometimes set". */
	for (index = 0; index < 6; index++) {
		mask = 1 << index;
		if ((info->flags_or & mask) == (info->flags_and & mask)) {
			s_flags[index] = "fsrpau"[index];
			if (info->flags_or & mask)
				s_flags[index] = toupper(s_flags[index]);
		} else
			s_flags[index] = '?';
	}
	s_flags[index] = 0;

/* TTL, if fixed */
	snprintf(s_ttl, sizeof(s_ttl), info->ttl ? ", TTL %u" : "",
		(unsigned int)info->ttl);

/* Scan start time */
	strftime(s_time, sizeof(s_time), "%X", localtime(&info->start));

/* Log it all */
	syslog(SYSLOG_LEVEL,
		"From %s to %s..., flags %s%s, started at %s",
		s_saddr, s_daddr, s_flags, s_ttl, s_time);
}

/*
 * Log this port scan unless we're being flooded.
 */
void safe_log(struct host *info)
{
	static clock_t last = 0;
	static int count = 0;
	clock_t now;

	now = info->timestamp;
	if (now - last > LOG_DELAY_THRESHOLD || now < last) count = 0;
	if (++count <= LOG_COUNT_THRESHOLD + 1) last = now;

	if (count <= LOG_COUNT_THRESHOLD) {
		do_log(info);
	} else if (count == LOG_COUNT_THRESHOLD + 1) {
		syslog(SYSLOG_LEVEL, "More possible port scans follow.\n");
	}
}

/*
 * Process a TCP packet.
 */
void process_packet(struct header *packet, int size)
{
	struct ip *ip;
	struct tcphdr *tcp;
	struct in_addr addr;
	unsigned short port;
	unsigned char flags;
	struct tms buf;
	clock_t now;
	struct host *current, *last, **head;
	int hash, index, count;

/* Get the IP and TCP headers */
	ip = &packet->ip;
	tcp = (struct tcphdr *)((char *)packet + ((int)ip->ip_hl << 2));

/* Sanity check */
	if ((char *)tcp + sizeof(struct tcphdr) > (char *)packet + size)
		return;

/* Get the source address, destination port, and TCP flags */
	addr = ip->ip_src;
	port = tcp->th_dport;
	flags = tcp->th_flags;

/* We're using IP address 0.0.0.0 for a special purpose here, so don't let
 * them spoof us. */
	if (!addr.s_addr) return;

/* Use times(2) here not to depend on someone setting the time while we're
 * running; we need to be careful with possible return value overflows. */
	now = times(&buf);

/* Do we know this source address already? */
	count = 0;
	last = NULL;
	if ((current = *(head = &state.hash[hash = hashfunc(addr)])))
	do {
		if (current->saddr.s_addr == addr.s_addr) break;
		count++;
		if (current->next) last = current;
	} while ((current = current->next));

/* We know this address, and the entry isn't too old. Update it. */
	if (current)
	if (now - current->timestamp <= SCAN_DELAY_THRESHOLD &&
	    now >= current->timestamp) {
/* Just update the TCP flags if we've seen this port already */
		for (index = 0; index < current->count; index++)
		if (current->ports[index] == port) {
			current->flags_or |= flags;
			current->flags_and &= flags;
			return;
		}

/* ACK to a new port? This could be an outgoing connection. */
		if (flags & TH_ACK) return;

/* Packet to a new port, and not ACK: update the timestamp */
		current->timestamp = now;

/* Logged this scan already? Then leave. */
		if (current->count == SCAN_COUNT_THRESHOLD) return;

/* Update the TCP flags */
		current->flags_or |= flags;
		current->flags_and &= flags;

/* Zero out the destination address, source port and TTL if not fixed. */
		if (current->daddr.s_addr != ip->ip_dst.s_addr)
			current->daddr.s_addr = 0;
		if (current->sport != tcp->th_sport)
			current->sport = 0;
		if (current->ttl != ip->ip_ttl)
			current->ttl = 0;

/* Got enough destination ports to decide that this is a scan? Then log it. */
		if (current->count == SCAN_COUNT_THRESHOLD - 1) {
			safe_log(current);
			current->count++;
			return;
		}

/* Remember the new port */
		current->ports[current->count++] = port;

		return;
	}

/* We know this address, but the entry is outdated. Mark it unused, and
 * remove from the hash table. We'll allocate a new entry instead since
 * this one might get re-used too soon. */
	if (current) {
		current->saddr.s_addr = 0;

		if (last)
			last->next = last->next->next;
		else if (*head)
			*head = (*head)->next;
		last = NULL;
	}

/* We don't need an ACK from a new source address */
	if (flags & TH_ACK) return;

/* Got too many source addresses with the same hash value? Then remove the
 * oldest one from the hash table, so that they can't take too much of our
 * CPU time even with carefully chosen spoofed IP addresses. */
	if (count >= HASH_MAX && last) last->next = NULL;

/* We're going to re-use the oldest list entry, so remove it from the hash
 * table first (if it is really already in use, and isn't removed from the
 * hash table already because of the HASH_MAX check above). */

/* First, find it */
	if (state.list[state.index].saddr.s_addr)
		head = &state.hash[hashfunc(state.list[state.index].saddr)];
	else
		head = &last;
	last = NULL;
	if ((current = *head))
	do {
		if (current == &state.list[state.index]) break;
		last = current;
	} while ((current = current->next));

/* Then, remove it */
	if (current) {
		if (last)
			last->next = last->next->next;
		else if (*head)
			*head = (*head)->next;
	}

/* Get our list entry */
	current = &state.list[state.index++];
	if (state.index >= LIST_SIZE) state.index = 0;

/* Link it into the hash table */
	head = &state.hash[hash];
	current->next = *head;
	*head = current;

/* And fill in the fields */
	current->timestamp = now;
	current->start = time(NULL);
	current->saddr = addr;
	current->daddr = ip->ip_dst;
	current->sport = tcp->th_sport;
	current->count = 1;
	current->ports[0] = port;
	current->flags_or = current->flags_and = flags;
	current->ttl = ip->ip_ttl;
}

/*
 * Hmm, what could this be?
 */
int main()
{
	int raw, size;
	struct header packet;

/* Get a raw socket. We could drop root right after that. */
	if ((raw = socket(AF_INET, SOCK_RAW, IPPROTO_TCP)) < 0) {
		perror("socket");
		return 1;
	}

/* Become a daemon */
	switch (fork()) {
	case -1:
		perror("fork");
		return 1;

	case 0:
		break;

	default:
		return 0;
	}

	signal(SIGHUP, SIG_IGN);

/* Initialize the state. All source IP addresses are set to 0.0.0.0, which
 * means the list entries aren't in use yet. */
	memset(&state, 0, sizeof(state));

/* Huh? */
	openlog(SYSLOG_IDENT, 0, SYSLOG_FACILITY);

/* Let's start */
	while (1)
	if ((size = read(raw, &packet, sizeof(packet))) >= sizeof(packet.ip))
		process_packet(&packet, size);
}
<-->