Monday, September 7, 2009

Great Dataset to parse through by ITOC

ITOC has a great set of data to parse through for those that are interested:
http://www.itoc.usma.edu/research/dataset/index.html

Just over 8 GB of data between inside/outside captures.

They also have a blog setup:
http://datasetsfortheresearchcommunity.blogspot.com/

I'm hoping for some more information on exact OS's being released so that I can take the data that Satori spit out and use that to extend the fingerprints on FreeBSD and possibly some of the other OS's seen on the network. I'd hate to just take and put it under the generic FreeBSD if we can tell for sure it was 7.0 or whatever.

Satori already has ID'd the systems, quite well from their initial diagram, but it would be nice to know for sure that it is correct before extending some of the fingerprints!

One problem I'm having it is takes forever to go through 1 GB files with Satori. Some of it has to do with the amount of "stuff" I've added to it, but that is just a lot of data to parse too! Oh well, 1-2 hours per file, come back, see if it blew up, etc. (Update: Make that 1-2 hours on the 100 MB files, not sure how many days to get through the 1 GB files!) This data set at least gave me some new packets that I hadn't seen before that caused some problems, so I updated a few of the dlls to handle vlan traffic in them. I was feeding it in, just not parsing it correctly!

No comments: