Archive for the ‘Tor’ Category

Tor is released!

Wednesday, August 3rd, 2016

Tor is released!

From the webpage:

Tor has been released! You can download the source from the Tor website. Packages should be available over the next week or so.

Tor is the first stable version of the Tor 0.2.8 series.

The Tor 0.2.8 series improves client bootstrapping performance, completes the authority-side implementation of improved identity keys for relays, and includes numerous bugfixes and performance improvements throughout the program. This release continues to improve the coverage of Tor’s test suite.

Below is a list of the changes since Tor 0.2.7. For a list of only the changes that are new since, please see the ChangeLog file.

Government agencies are upgrading and so should you.

Breaking Honeypots For Fun And Profit – Detecting Deception

Monday, July 4th, 2016

by Dean Sysman & Gadi Evron & Itamar Sher

The description:

We will detect, bypass, and abuse honeypot technologies and solutions, turning them against the defender. We will also release a global map of honeypot deployments, honeypot detection vulnerabilities, and supporting code.

The concept of a honeypot is strong, but the way honeypots are implemented is inherently weak, enabling an attacker to easily detect and bypass them, as well as make use of them for his own purposes. Our methods are analyzing the network protocol completeness and operating system software implementation completeness, and vulnerable code.

As a case study, we will concentrate on platforms deployed in real organizational networks, mapping them globally, and demonstrating how it is possible to both bypass and use these honeypots to the attacker’s advantage.

The slides for the presentation.

This presentation addresses the question of detecting (identifying) a deception.

Detection of the following honeypots discussed:

Artillery: (Updated URL)



Dionaea: (timed out on July 4, 2016)





Identification of an attack was argued to possibly result in the attack being prevented in all anti-attack code, whereas identification of an attacker, could have consequences for the attack as an operation.

Combining an IP address along with other dimensions of identification, say with a topic map, could prove to be a means of sharpening the consequences for attackers.

Of course, I am assuming that at least within an agency, agents share data/insights towards a common objective. That may not be the case in your agency.

While looking for other resources on honeypots, I did find Collection of Awesome Honeypots, dating from December of 2015.

Thomas Jefferson (Too Early For Tor – TEFT)

Monday, July 4th, 2016

Official Presidential portrait of Thomas Jefferson (by Rembrandt Peale, 1800)

Thomas Jefferson lived centuries before the internet and the rise of Tor but he is easy to see as a Tor user.

He was the author of the Declaration of Independence, which if you read the details, is a highly offensive document:

He has affected to render the Military independent of and superior to the Civil Power.

He has combined with others to subject us to a jurisdiction foreign to our constitution, and unacknowledged by our laws; giving his Assent to their Acts of pretended Legislation:

For quartering large bodies of armed troops among us:

For protecting them, by a mock Trial from punishment for any Murders which they should commit on the Inhabitants of these States:

For cutting off our Trade with all parts of the world:

For imposing Taxes on us without our Consent:

For depriving us in many cases, of the benefit of Trial by Jury:

For transporting us beyond Seas to be tried for pretended offences:

He is at this time transporting large Armies of foreign Mercenaries to compleat the works of death, desolation, and tyranny, already begun with circumstances of Cruelty & Perfidy scarcely paralleled in the most barbarous ages, and totally unworthy the Head of a civilized nation.

Update the language of “For transporting us beyond Seas to be tried for pretended offences” to “Transporting people to Guantanamo Bay prison for unlawful detention” and you have a good example of what FBI wants discussed in clear text.

Make no mistake, the FBI of today, working for George III, would have arrested Thomas Jefferson if it caught wind of the Declaration of Independence. At that time, Jefferson was not the towering figure of liberty that he is today. Then he was the opponent of a nation-state.

Jefferson was too early for Tor but he is the type of person that Tor protects.

Do you want to be on the side of George III or Jefferson in history?

Support Tor!

Outing Dark Web Spies (Donate to Tor)

Sunday, July 3rd, 2016

Two security experts have conducted a study that allowed them to spot over 100 snooping Tor Nodes spying on Dark Web Sites by Pierluigi Paganini.

From the post:

…Joseph Cox from Motherboad reported a study conducted by Guevara Noubir, a professor from the College of Computer and Information Science at Northeastern University, and Amirali Sanatinia, a PhD candidate also from Northeaster who revealed the existence of s number of Tor hidden service directories that are spying on Tor websites. Such kind of attacks could allow law enforcement to discover IP addresses of black markets and child pornography sites.

A similar technique could be very useful also for security firms that offer dark web intelligence services.

Threat actors using this technique could reveal the IP address of Tor hidden services, Noubir will present the results of the research at the Def Con hacking conference in August.

“We create what we call ‘honey onions’ or ‘honions.’ These are onion addresses that we don’t share with anyone,” Noubir said.

The security researchers ran 4,500 honey onions over 72 days, they identified that at least 110 HSDirs have been configured to spy on hidden services.

The experts highlighted that some of the threat actors operating the bogus HSDirs were active observers involved in many activities, including penetration testing.

While Next Generation Onion Services (issue 224), (Montreal 2016 update), is under development, outing dark web spies may be your next best defense.

Your best defense is supporting the Tor project. You support will help it gain and keep the advantage over dark web spies.

By helping Tor, you will be helping all of us, yourself included.

PS: Def Con 24 is August 4-7, 2016, at Paris + Bally’s in Las Vegas. No pre-registration, $240 USD cash at the door.

Hardening the Onion [Other Apps As Well?]

Friday, June 24th, 2016

Tor coders harden the onion against surveillance by Paul Ducklin.

From the post:

A nonet of security researchers are on the warpath to protect the Tor Browser from interfering busybodies.

Tor, short for The Onion Router, is a system that aims to help you be anonymous online by disguising where you are, and where you are heading.

That way, nation-state content blockers, law enforcement agencies, oppressive regimes, intelligence services, cybercrooks, Lizard Squadders or even just overly-inquisitive neighbours can’t easily figure out where you are going when you browse online.

Similarly, sites you browse to can’t easily tell where you came from, so you can avoid being traced back or tracked over time by unscrupulous marketers, social engineers, law enforcement agencies, oppressive regimes, intelligence services, cybercrooks, Lizard Squadders, and so on.

Paul provides a high-level view of Selfrando: Securing the Tor Browser against De-anonymization Exploits by Mauro Conti, et al.

The technique generalizes beyond Tor to GNU Bash 4.3, GNU less 4.58 Nginx 1.8.0, Socat, Thttpd 2.26, and, Google’s Chromium browser.

Given the spend at which defenders play “catch up,” there is much to learn here that will be useful for years to come.


Who Is Special Agent Mark W. Burnett? (FBI)

Monday, May 9th, 2016

In FBI Harassment, Tor developer isis agora lovecruft describes a tale of FBI harrassment, that begins with this business card:


The card was left while no one was at home. At best the business card is a weak indicator of a visitor’s identity. It was later confirmed Mark W. Burnett had visited, in various conversations between counsel and the FBI. See the original post for the harassment story.

What can we find out about Special Agent Mark W. Burnett? Reasoning if the FBI is watching us, we damned sure better be watching them.

The easiest thing to find is that Mark W. Burnett isn’t a “special agent in charge,” as per the FBI webpage for the Los Angeles office. A “special agent in charge” is a higher “rank” than a “special agent.”

Turning to Google, here’s a screenshot of my results:


The first two “hits” are the same Special Agent Mark W. Burnett (the second one requires a password) but the first one says in relevant part:

Special Luncheon Speaker – Mr. Mark W. Burnett, FBI Cyber Special Agent, who will discuss the Bureau’s efforts regarding cyber security measures

The event was:

3rd Annual West Coast Cyber Security Summit
Special Report on Cyber Technology and Its Impact on the Banking Community
The California Club
538 South Flower Street, Los Angeles, CA 90071
Tuesday, May 13, 2014

If you don’t know the California Club, as the song says “…you aren’t supposed to be here.”

So we know that Mark W. Burnett was working for the FBI in May of 2014.

The third “hit” is someone who says they know a Mark W. Burnett but it doesn’t go any further than that.

The last two “hits” are interesting because they both point to the Congressional Record on February 1, 2010, wherein the Senate confirms the promotion of a “Mark. W. Burnett” to the rank of colonel in the United States Army.

I searched U.S. District Court decisions at Justia but could not find any cases where Mark W. Burnett appeared.

The hand written “desk phone” detracts from the professionalism of the business card. It also indicates that Mark hasn’t been in the Los Angeles office long enough to get better cards.

What do you know about Special Agent Mark W. Burnett?

PS: There are hundreds of FBI agents from Los Angeles on LinkedIn but Mark W. Burnett isn’t one of them. At least not by that name.

Anonymous Chat Service

Tuesday, April 12th, 2016

From the description:

The continued effort of governments around the globe to censor our seven sovereign seas has not gone unnoticed. This is why we, once again, raise our Anonymous battle flags to expose their corruption and disrupt their surveillance operations. We are proud to present our new chat service residing within the remote island coves of the deep dark web. The OnionIRC network is designed to allow for full anonymity and we welcome any and all to use it as a hub for anonymous operations, general free speech use, or any project or group concerned about privacy and security looking to build a strong community. We also intend to strengthen our ranks and arm the current and coming generations of internet activists with education. Our plan is to provide virtual classrooms where, on a scheduled basis, ‘teachers’ can give lessons on any number of subjects. This includes, but is not limited to: security culture, various hacking/technical tutorials, history lessons, and promoting how to properly utilize encryption and anonymity software. As always, we do not wish for anyone to rely on our signal alone. As such, we will also be generating comprehensible documentation and instructions on how to create your own Tor hidden-service chat network in order to keep the movement decentralized. Hackers, activists, artists and internet citizens, join us in a collective effort to defend the internet and our privacy.

Come aboard or walk the plank.

We are Anonymous,
we’ve been expecting you.

Protip: This is not a website, it’s an IRC chat server. You must use an IRC chat client to connect. You cannot connect simply through a browser.

Some popular IRC clients are: irssi, weechat, hexchat, mIRC, & many more…

Here is an example guide for connecting with Hexchat:

To access our IRC network you must be connecting through the Tor network!

Either download the Tor browser or install the Tor daemon, then configure your IRC client’s proxy settings to pass through Tor or ‘torify’ your client depending on your setup.

If you are connecting to Tor with the Tor browser, keep in mind that the Tor browser must be open & running for you to pass your IRC client through Tor.

How you configure your client to pass through Tor will vary depending on the client.
Hostname: onionirchubx5363.onion

Port: 6667 No SSL, but don’t worry! Tor connections to hidden-services are end-to-end encrypted already! Thank you based hidden-service gods!

In the near future we will be releasing some more extensive client-specific guides and how-to properly setup Tor for transparent proxying (…) & best use cases.

This is excellent news!

With more good news promised in the near future (watch the video).

Go dark, go very dark!

Truthful Paedophiles On The Darknet?

Thursday, February 4th, 2016

There is credibility flaw in Cryptopolitik and the Darknet by Daniel Moore & Thomas Rid that I overlooked yesterday (The Dark Web, “Kissing Cousins,” and Pornography) Perhaps it was just too obvious to attract attention.

Moore and Rid write:

The pornographic content was perhaps the most distressing. Websites dedicated to providing links to videos purporting to depict rape, bestiality and paedophilia were abundant. One such post at a supposedly nonaffiliated content-sharing website offered a link to a video of ‘a 12 year old girl … getting raped at school by 4 boys’.52 Other examples include a service that sold online video access to the vendor’s own family members:

My two stepsisters … will be pleased to show you their little secrets. Well, they are rather forced to show them, but at least that’s what they are used to.53

Several communities geared towards discussing and sharing illegitimate fetishes were readily available, and appeared to be active. Under the shroud of anonymity, various users appeared to seek vindication of their desires, providing words of support and comfort for one another in solidarity against what was seen as society’s unjust discrimination against non-mainstream sexual practices. Users exchanged experiences and preferences, and even traded content. One notable example from a website called Pedo List included a commenter freely stating that he would ‘Trade child porn. Have pics of my daughter.’54 There appears to be no fear of retribution or prosecution in these illicit communities, and as such users apparently feel comfortable enough to share personal stories about their otherwise stifled tendencies. (page 23)

Despite their description of hidden services as dens of iniquity and crime, those who use them are suddenly paragons of truthfulness, at least when it suits the authors purpose?

Doesn’t crediting the content of the Darknet as truthful, as opposed to being wishful, fantasy, or even police officers posing to investigate (some would say entrap) others, strain the imagination?

Some of the content is no doubt truthful but policy arguments need to be based on facts, not a collection of self-justifying opinions from like minded individuals.

A quick search on the string (without quotes):

police officers posing as children sex rings

Returns 9.7 million “hits.

How many of those police officers appeared in the postings collected by Moore & Rid it isn’t possible to say.

But in science, there is this thing called the burden of proof. That is simply asserting a conclusion, even citing equally non-evidence based conclusions, isn’t sufficient to prove a claim.

Moore & Rid had the burden to prove that the Darknet is a wicked place that poses all sorts of dangers and hazards.

As I pointed out yesterday, The Dark Web, “Kissing Cousins,” and Pornography, their “proof” is non-replicable conclusions about a small part of the Darkweb.

Earlier today I realized their conclusions depend upon a truthful criminal element using the Darkweb.

What do you think about the presumption that criminals are truthful?

Sounds doubtful to me!

The Dark Web, “Kissing Cousins,” and Pornography

Wednesday, February 3rd, 2016

Dark web is mostly illegal, say researchers by Lisa Vaas.

You can tell where Lisa comes out on the privacy versus law enforcement issue by the slant of her conclusion:

Users, what’s your take: are hidden services worth the political firestorm they generate? Are they worth criminals escaping justice?

Illegal is a slippery concept.

Marriage of first “kissing” cousins is “illegal” in:

Arkansas, Delaware, Idaho, Iowa, Kansas, Kentucky, Louisiana, Michigan, Minnesota, Mississippi, Missouri, Montana, Nebraska, Nevada, New Hampshire, North Dakota, Ohio, Oklahoma, Oregon, Pennsylvania, South Dakota, Texas, Washington, West Virginia, and Wyoming.

Marriage of first “kissing” cousins is legal in:

Alabama, Alaska, California, Colorado, Connecticut, District of Columbia, Florida, Georgia, Hawaii, Maryland, Massachusetts, New Jersey, New Mexico, New York, North Carolina (first cousins but not double first), Rhode Island, South Carolina, Tennessee, Vermont, and Virginia.

There are some other nuances I didn’t capture and for those see: State Laws Regarding Marriages Between First Cousins.

If you read Cryptopolitik and the Darknet by Daniel Moore & Thomas Rid carefully, you will spot a number of problems with their methodology and reasoning.

First and foremost, no definitions were offered for their taxonomy (at page 20):

  • Arms
  • Drugs
  • Extremism
  • Finance
  • Hacking
  • Illegitimate pornography
  • Nexus
  • Other illicit
  • Social
  • Violence
  • Other
  • None

Readers and other researchers are left to wonder what was included or excluded from each of those categories.

In science, that would be called an inability to replicate the results. As if this were science.

Moore & Rid recite anecdotal accounts of particular pornography sites, calculated to shock the average reader, but that’s not the same thing as enabling replication of their research. Or a fair characterization of all the pornography encountered.

They presumed that text was equivalent to image content, so they discarded all images (pages 19-20). Which left them unable to test that presumption. Hmmm, untested assumptions in science?

The results of the unknown basis for classification identied 122 sites (page 21) as pornographic out of the 5,205 initial set of sites.

If you accept Tor’s estimate of 30,000 hidden services that announce themselves every day, Moore & Rid have found that illegal pornography (whatever that means) is:

122 / 30000 = 0.004066667

Moore & Rid have established that “illegal” porn is .004066667% of the Dark Net.

I should be grateful Moore & Rid have so carefully documented the tiny part of the Dark Web concerned with their notion of “illegal” pornography.

But, when you encounter “reasoning” such as:

The other quandary is how to deal with darknets. Hidden services have already damaged Tor, and trust in the internet as a whole. To save Tor – and certainly to save Tor’s reputation – it may be necessary to kill hidden services, at least in their present form. Were the Tor Project to discontinue hidden services voluntarily, perhaps to improve the reputation of Tor browsing, other darknets would become more popular. But these Tor alternatives would lack something precious: a large user base. In today’s anonymisation networks, the security of a single user is a direct function of the number of overall users. Small darknets are easier to attack, and easier to de-anonymise. The Tor founders, though exceedingly idealistic in other ways, clearly appreciate this reality: a better reputation leads to better security.85 They therefore understand that the popularity of Tor browsing is making the bundled-in, and predominantly illicit, hidden services more secure than they could be on their own. Darknets are not illegal in free countries and they probably should not be. Yet these widely abused platforms – in sharp contrast to the wider public-key infrastructure – are and should be fair game for the most aggressive intelligence and law-enforcement techniques, as well as for invasive academic research. Indeed, having such clearly cordoned-off, free-fire zones is perhaps even useful for the state, because, conversely, a bad reputation leads to bad security. Either way, Tor’s ugly example should loom large in technology debates. Refusing to confront tough, inevitable political choices is simply irresponsible. The line between utopia and dystopia can be disturbingly thin. (pages 32-33)

it’s hard to say nothing and see public discourse soiled with this sort of publication.

First, there is no evidence presented that hidden services have damaged Tor and/or trust in the Internet as a whole. Even the authors concede that Tor is the most popular option anonymous browsing and hidden services. That doesn’t sound like damage to me. You?

Second, the authors dump all hidden services in the “bad, very bad” basket, despite their own research classifying only .004066667% of the Dark Net as illicit pornography. They use stock “go to” examples to shock readers in place of evidence and reasoning.

Third, the charge that Tor has “[r]efused to confront tough, inevitable political choices is simply irresponsible” is false. Demonstrably false because the authors point out that Tor developers made a conscious choice to not take political considerations into account (page 25).

Since Moore & Rid disagree with that choice, they resort to name calling, terming the decision “simply irresponsible.” Moore & Rid are entitled to their opinions but they aren’t going to persuade even a semi-literate audience with name calling.

Take Cryptopolitik and the Darknet as an example of how to not write a well researched and reasoned paper. Although, that isn’t a bar to publication as you can see.

Tor for Technologists

Tuesday, June 16th, 2015

Tor for Technologists by Martin Fowler.

From the post:

Tor is a technology that is cropping up in news articles quite often nowadays. However, there exists a lot of misunderstanding about it. Even many technologists don’t see past its use for negative purposes, but Tor is much more than that. It is an important tool for democracy and freedom of speech – but it’s also something that is very useful in the day-to-day life of a technologist. Tor is also an interesting case study in how to design a system that has very specific security requirements.

The Internet is currently a quite hostile place. There are threats of all kinds, ranging from script kiddies and drive-by phishing attacks to pervasive dragnet surveillance by many of the major intelligence services in the world. The extent of these problems have only recently become clear to us. In this context, a tool like Tor fills a very important niche. You could argue that it’s a sign of the times that even a company like Facebook encourages the use of Tor to access their services. The time is right to add Tor to your tool belt.

Martin does a great job of summarizing Tor and giving a overview of what Tor does and does not do. Both are important for security conscious users (that should include you).

If you aren’t already using Tor and are a technologist, read Martin’s introduction first and then become an active user/supporter of Tor.

Astoria, the Tor client designed to beat the NSA surveillance

Sunday, May 24th, 2015

Astoria, the Tor client designed to beat the NSA surveillance by Pierluigi Paganini.

From the post:

A team of security researchers announced to have developed Astoria, a new Tor client designed to beat the NSA and reduce the efficiency of timing attacks.

Tor and Deep web are becoming terms even popular among Internet users, the use of anonymizing network is constantly increasing for this reason intelligence agencies are focusing their efforts in its monitoring.

Edward Snowden has revealed that intelligence agencies belonging to the Five Eyes Alliance have tried to exploit several techniques to de-anonymized Tor users.

Today I desire to introduce you the result of the work of a joint effort of security researchers from American and Israeli organizations which have developed a new advanced Tor client called Astoria.

The Astoria Tor Client was specially designed to protect Tor user form surveillance activities, it implements a series of features that make eavesdropping harder.

Time to upgrade and to help support the Tor network!

Every day that you help degrade NSA activities, you have contributed to your own safety and the safety of others.

You Can Help Keep Others Secure (Use Tor)

Sunday, May 3rd, 2015

Tor Browser 4.5 released by Mike Perry.

From the post:

The Tor Browser Team is proud to announce the first stable release in the 4.5 series. This release is available from the Tor Browser Project page and also from our distribution directory.

The 4.5 series provides significant usability, security, and privacy enhancements over the 4.0 series. Because these changes are significant, we will be delaying the automatic update of 4.0 users to the 4.5 series for one week.

Time to upgrade!

Why use Tor?

The Tor network is a group of volunteer-operated servers that allows people to improve their privacy and security on the Internet. Tor’s users employ this network by connecting through a series of virtual tunnels rather than making a direct connection, thus allowing both organizations and individuals to share information over public networks without compromising their privacy. Along the same line, Tor is an effective censorship circumvention tool, allowing its users to reach otherwise blocked destinations or content. Tor can also be used as a building block for software developers to create new communication tools with built-in privacy features.

Individuals use Tor to keep websites from tracking them and their family members, or to connect to news sites, instant messaging services, or the like when these are blocked by their local Internet providers. Tor’s hidden services let users publish web sites and other services without needing to reveal the location of the site. Individuals also use Tor for socially sensitive communication: chat rooms and web forums for rape and abuse survivors, or people with illnesses.

Journalists use Tor to communicate more safely with whistleblowers and dissidents. Non-governmental organizations (NGOs) use Tor to allow their workers to connect to their home website while they’re in a foreign country, without notifying everybody nearby that they’re working with that organization.

Groups such as Indymedia recommend Tor for safeguarding their members’ online privacy and security. Activist groups like the Electronic Frontier Foundation (EFF) recommend Tor as a mechanism for maintaining civil liberties online. Corporations use Tor as a safe way to conduct competitive analysis, and to protect sensitive procurement patterns from eavesdroppers. They also use it to replace traditional VPNs, which reveal the exact amount and timing of communication. Which locations have employees working late? Which locations have employees consulting job-hunting websites? Which research divisions are communicating with the company’s patent lawyers?

A branch of the U.S. Navy uses Tor for open source intelligence gathering, and one of its teams used Tor while deployed in the Middle East recently. Law enforcement uses Tor for visiting or surveilling web sites without leaving government IP addresses in their web logs, and for security during sting operations.

The variety of people who use Tor is actually part of what makes it so secure. Tor hides you among the other users on the network, so the more populous and diverse the user base for Tor is, the more your anonymity will be protected. (From

If you are concerned about privacy, yours and of others, use a Tor browser by default.

DARPA: MEMEX (Domain-Specific Search) Drops!

Sunday, April 19th, 2015

The DARPA MEMEX project is now listed on its Open Catalog page!

Forty (40) separate components listed by team, project, category, link to code, description and license. Each sortable of course.

No doubt DARPA has held back some of its best work but looking over the descriptions, there are no bojums or quantum leaps beyond current discussions in search technology. How far you can push the released work beyond its current state is an exercise for the reader.

Machine learning is mentioned in the descriptions for DeepDive, Formasaurus and SourcePin. No explicit mention of deep learning, at least in the descriptions.

If you prefer to not visit the DARPA site, I have gleaned the essential information (project, link to code, description) into the following list:

  • ACHE: ACHE is a focused crawler. Users can customize the crawler to search for different topics or objects on the Web. (Java)
  • Aperture Tile-Based Visual Analytics: New tools for raw data characterization of ‘big data’ are required to suggest initial hypotheses for testing. The widespread use and adoption of web-based maps has provided a familiar set of interactions for exploring abstract large data spaces. Building on these techniques, we developed tile based visual analytics that provide browser-based interactive visualization of billions of data points. (JavaScript/Java)
  • ArrayFire: ArrayFire is a high performance software library for parallel computing with an easy-to-use API. Its array-based function set makes parallel programming simple. ArrayFire’s multiple backends (CUDA, OpenCL, and native CPU) make it platform independent and highly portable. A few lines of code in ArrayFire can replace dozens of lines of parallel computing code, saving users valuable time and lowering development costs. (C, C++, Python, Fortran, Java)
  • Autologin: AutoLogin is a utility that allows a web crawler to start from any given page of a website (for example the home page) and attempt to find the login page, where the spider can then log in with a set of valid, user-provided credentials to conduct a deep crawl of a site to which the user already has legitimate access. AutoLogin can be used as a library or as a service. (Python)
  • CubeTest: Official evaluation metric used for evaluation for TREC Dynamic Domain Track. It is a multiple-dimensional metric that measures the effectiveness of complete a complex and task-based search process. (Perl)
  • Data Microscopes: Data Microscopes is a collection of robust, validated Bayesian nonparametric models for discovering structure in data. Models for tabular, relational, text, and time-series data can accommodate multiple data types, including categorical, real-valued, binary, and spatial data. Inference and visualization of results respects the underlying uncertainty in the data, allowing domain experts to feel confident in the quality of the answers they receive. (Python, C++)
  • DataWake: The Datawake project consists of various server and database technologies that aggregate user browsing data via a plug-in using domain-specific searches. This captured, or extracted, data is organized into browse paths and elements of interest. This information can be shared or expanded amongst teams of individuals. Elements of interest which are extracted either automatically, or manually by the user, are given weighted values. (Python/Java/Scala/Clojure/JavaScript)
  • DeepDive: DeepDive is a new type of knowledge base construction system that enables developers to analyze data on a deeper level than ever before. Many applications have been built using DeepDive to extract data from millions of documents, Web pages, PDFs, tables, and figures. DeepDive is a trained system, which means that it uses machine-learning techniques to incorporate domain-specific knowledge and user feedback to improve the quality of its analysis. DeepDive can deal with noisy and imprecise data by producing calibrated probabilities for every assertion it makes. DeepDive offers a scalable, high-performance learning engine. (SQL, Python, C++)
  • DIG: DIG is a visual analysis tool based on a faceted search engine that enables rapid, interactive exploration of large data sets. Users refine their queries by entering search terms or selecting values from lists of aggregated attributes. DIG can be quickly configured for a new domain through simple configuration. (JavaScript)
  • Dossier Stack: Dossier Stack provides a framework of library components for building active search applications that learn what users want by capturing their actions as truth data. The frameworks web services and javascript client libraries enable applications to efficiently capture user actions such as organizing content into folders, and allows back end algorithms to train classifiers and ranking algorithms to recommend content based on those user actions. (Python/JavaScript/Java)
  • Dumpling: Dumpling implements a novel dynamic search engine which refines search results on the fly. Dumpling utilizes the Winwin algorithm and the Query Change retrieval Model (QCM) to infer the user’s state and tailor search results accordingly. Dumpling provides a friendly user interface for user to compare the static results and dynamic results. (Java, JavaScript, HTML, CSS)
  • FacetSpace: FacetSpace allows the investigation of large data sets based on the extraction and manipulation of relevant facets. These facets may be almost any consistent piece of information that can be extracted from the dataset: names, locations, prices, etc… (JavaScript)
  • Formasaurus: Formasaurus is a Python package that tells users the type of an HTML form: is it a login, search, registration, password recovery, join mailing list, contact form or something else. Under the hood it uses machine learning. (Python)
  • Frontera: Frontera (formerly Crawl Frontier) is used as part of a web crawler, it can store URLs and prioritize what to visit next. (Python)
  • HG Profiler: HG Profiler is a tool that allows users to take a list of entities from a particular source and look for those same entities across a pre-defined list of other sources. (Python)
  • Hidden Service Forum Spider: An interactive web forum analysis tool that operates over Tor hidden services. This tool is capable of passive forum data capture and posting dialog at random or user-specifiable intervals. (Python)
  • HSProbe (The Tor Hidden Service Prober): HSProbe is a python multi-threaded STEM-based application designed to interrogate the status of Tor hidden services (HSs) and extracting hidden service content. It is an HS-protocol savvy crawler, that uses protocol error codes to decide what to do when a hidden service is not reached. HSProbe tests whether specified Tor hidden services (.onion addresses) are listening on one of a range of pre-specified ports, and optionally, whether they are speaking over other specified protocols. As of this version, support for HTTP and HTTPS is implemented. Hsprobe takes as input a list of hidden services to be probed and generates as output a similar list of the results of each hidden service probed. (Python)
  • ImageCat: ImageCat analyses images and extracts their EXIF metadata and any text contained in the image via OCR. It can handle millions of images. (Python, Java)
  • ImageSpace: ImageSpace provides the ability to analyze and search through large numbers of images. These images may be text searched based on associated metadata and OCR text or a new image may be uploaded as a foundation for a search. (Python)
  • Karma: Karma is an information integration tool that enables users to quickly and easily integrate data from a variety of data sources including databases, spreadsheets, delimited text files, XML, JSON, KML and Web APIs. Users integrate information by modelling it according to an ontology of their choice using a graphical user interface that automates much of the process. (Java, JavaScript)
  • LegisGATE: Demonstration application for running General Architecture Text Engineering over legislative resources. (Java)
  • Memex Explorer: Memex Explorer is a pluggable framework for domain specific crawls, search, and unified interface for Memex Tools. It includes the capability to add links to other web-based apps (not just Memex) and the capability to start, stop, and analyze web crawls using 2 different crawlers – ACHE and Nutch. (Python)
  • MITIE: Trainable named entity extractor (NER) and relation extractor. (C)
  • Omakase: Omakase provides a simple and flexible interface to share data, computations, and visualizations between a variety of user roles in both local and cloud environments. (Python, Clojure)
  • pykafka: pykafka is a Python driver for the Apache Kafka messaging system. It enables Python programmers to publish data to Kafka topics and subscribe to existing Kafka topics. It includes a pure-Python implementation as well as an optional C driver for increased performance. It is the only Python driver to have feature parity with the official Scala driver, supporting both high-level and low-level APIs, including balanced consumer groups for high-scale uses. (Python)
  • Scrapy Cluster: Scrapy Cluster is a scalable, distributed web crawling cluster based on Scrapy and coordinated via Kafka and Redis. It provides a framework for intelligent distributed throttling as well as the ability to conduct time-limited web crawls. (Python)
  • Scrapy-Dockerhub: Scrapy-Dockerhub is a deployment setup for Scrapy spiders that packages the spider and all dependencies into a Docker container, which is then managed by a Fabric command line utility. With this setup, users can run spiders seamlessly on any server, without the need for Scrapyd which typically handles the spider management. With Scrapy-Dockerhub, users issue one command to deploy spider with all dependencies to the server and second command to run it. There are also commands for viewing jobs, logs, etc. (Python)
  • Shadow: Shadow is an open-source network simulator/emulator hybrid that runs real applications like Tor and Bitcoin over a simulated Internet topology. It is light-weight, efficient, scalable, parallelized, controllable, deterministic, accurate, and modular. (C)
  • SMQTK: Kitware’s Social Multimedia Query Toolkit (SMQTK) is an open-source service for ingesting images and video from social media (e.g. YouTube, Twitter), computing content-based features, indexing the media based on the content descriptors, querying for similar content, and building user-defined searches via an interactive query refinement (IQR) process. (Python)
  • SourcePin: SourcePin is a tool to assist users in discovering websites that contain content they are interested in for a particular topic, or domain. Unlike a search engine, SourcePin allows a non-technical user to leverage the power of an advanced automated smart web crawling system to generate significantly more results than the manual process typically does, in significantly less time. The User Interface of SourcePin allows users to quickly across through hundreds or thousands of representative images to quickly find the websites they are most interested in. SourcePin also has a scoring system which takes feedback from the user on which websites are interesting and, using machine learning, assigns a score to the other crawl results based on how interesting they are likely to be for the user. The roadmap for SourcePin includes integration with other tools and a capability for users to actually extract relevant information from the crawl results. (Python, JavaScript)
  • Splash: Lightweight, scriptable browser as a service with an HTTP API. (Python)
  • streamparse: streamparse runs Python code against real-time streams of data. It allows users to spin up small clusters of stream processing machines locally during development. It also allows remote management of stream processing clusters that are running Apache Storm. It includes a Python module implementing the Storm multi-lang protocol; a command-line tool for managing local development, projects, and clusters; and an API for writing data processing topologies easily. (Python, Clojure)
  • TellFinder: TellFinder provides efficient visual analytics to automatically characterize and organize publicly available Internet data. Compared to standard web search engines, TellFinder enables users to research case-related data in significantly less time. Reviewing TellFinder’s automatically characterized groups also allows users to understand temporal patterns, relationships and aggregate behavior. The techniques are applicable to various domains. (JavaScript, Java)
  • Text.jl: Text.jl provided numerous tools for text processing optimized for the Julia language. Functionality supported include algorithms for feature extraction, text classification, and language identification. (Julia)
  • TJBatchExtractor: Regex based information extractor for online advertisements (Java).
  • Topic: This tool takes a set of text documents, filters by a given language, and then produces documents clustered by topic. The method used is Probabilistic Latent Semantic Analysis (PLSA). (Python)
  • Topic Space: Tool for visualization for topics in document collections. (Python)
  • Tor: The core software for using and participating in the Tor network. (C)
  • The Tor Path Simulator (TorPS): TorPS quickly simulates path selection in the Tor traffic-secure communications network. It is useful for experimental analysis of alternative route selection algorithms or changes to route selection parameters. (C++, Python, Bash)
  • TREC-DD Annotation: This Annotation Tool supports the annotation task in creating ground truth data for TREC Dynamic Domain Track. It adopts drag and drop approach for assessor to annotate passage-level relevance judgement. It also supports multiple ways of browsing and search in various domains of corpora used in TREC DD. (Python, JavaScript, HTML, CSS)

Beyond whatever use you find for the software, it is also important in terms of what capabilities are of interest to DARPA and by extension to those interested in militarized IT. – a search engine bringing the Dark Web into the light

Friday, February 27th, 2015 – a search engine bringing the Dark Web into the light by Mark Stockley.

From the post:

The Dark Web is reflecting a little more light these days.

On Monday I wrote about Memex, DARPA’s Deep Web search engine. Memex is a sophisticated tool set that has been in the hands of a few select law enforcement agencies for a year now, but it isn’t available to regular users like you and me.

There is another search engine that is though.

Just a few days before I wrote that article, on 11 February, user Virgil Griffith went onto the Tor-talk mailing list and announced Onion City, a Dark Web search engine for the rest of us.

The search engine delves into the anonymous Tor network, finds .onion sites and makes them available to regular users on the ordinary World Wide Web.


Search and Access to Onion sites for Amusement ONLY! All of your activities are transparent to anyone capturing your web traffic.

If you need security and privacy, use a Tor client.

With that understanding: Onion City awaits your requests.

Is there a demand for an internal to Tor network search engine? Supported by internal to Tor advertising? Or is most Tor “marketing” by referral?

Creating Tor Hidden Services With Python

Saturday, December 20th, 2014

Creating Tor Hidden Services With Python by Jordan Wright.

From the post:

Tor is often used to protect the anonymity of someone who is trying to connect to a service. However, it is also possible to use Tor to protect the anonymity of a service provider via hidden services. These services, operating under the .onion TLD, allow publishers to anonymously create and host content viewable only by other Tor users.

The Tor project has instructions on how to create hidden services, but this can be a manual and arduous process if you want to setup multiple services. This post will show how we can use the fantastic stem Python library to automatically create and host a Tor hidden service.

If you are interested in the Tor network, this is a handy post to bookmark.

I was thinking about exploring the Tor network in the new year but you should be aware of a more recent post by Jordan:

What Happens if Tor Directory Authorities Are Seized?

From the post:

The Tor Project has announced that they have received threats about possible upcoming attempts to disable the Tor network through the seizure of Directory Authority (DA) servers. While we don’t know the legitimacy behind these threats, it’s worth looking at the role DA’s play in the Tor network, showing what effects their seizure could have on the Tor network.*

Nothing to panic about, yet, but if you know anyone you can urge to protect Tor, do so.

81% of Tor users can be de-anonymised by analysing router information, research indicates

Sunday, November 16th, 2014

81% of Tor users can be de-anonymised by analysing router information, research indicates by Martin Anderson.

From the post:

Research undertaken between 2008 and 2014 suggests that more than 81% of Tor clients can be ‘de-anonymised’ – their originating IP addresses revealed – by exploiting the ‘Netflow’ technology that Cisco has built into its router protocols, and similar traffic analysis software running by default in the hardware of other manufacturers.

Professor Sambuddho Chakravarty, a former researcher at Columbia University’s Network Security Lab and now researching Network Anonymity and Privacy at the Indraprastha Institute of Information Technology in Delhi, has co-published a series of papers over the last six years outlining the attack vector, and claims a 100% ‘decloaking’ success rate under laboratory conditions, and 81.4% in the actual wilds of the Tor network.

Chakravarty’s technique [PDF] involves introducing disturbances in the highly-regulated environs of Onion Router protocols using a modified public Tor server running on Linux – hosted at the time at Columbia University. His work on large-scale traffic analysis attacks in the Tor environment has convinced him that a well-resourced organisation could achieve an extremely high capacity to de-anonymise Tor traffic on an ad hoc basis – but also that one would not necessarily need the resources of a nation state to do so, stating that a single AS (Autonomous System) could monitor more than 39% of randomly-generated Tor circuits.

Before you panic, read the rest of Mark’s article. Tor wasn’t designed for highly interactive web connections, which creates conditions where traffic in and out of routers can leave patterns to trace connections.

For years we got along with email-based search systems for mailing list archives and other materials. For security reasons, perhaps your next Dark Web app should offer email-based transactions.

I first saw this in a tweet by Nik Cubrilovic.

The Deep Web you don’t know about

Wednesday, May 28th, 2014

The Deep Web you don’t know about by Jose Pagliery.

From the post:

Then there’s Tor, the darkest corner of the Internet. It’s a collection of secret websites (ending in .onion) that require special software to access them. People use Tor so that their Web activity can’t be traced — it runs on a relay system that bounces signals among different Tor-enabled computers around the world.

(video omitted)

It first debuted as The Onion Routing project in 2002, made by the U.S. Naval Research Laboratory as a method for communicating online anonymously. Some use it for sensitive communications, including political dissent. But in the last decade, it’s also become a hub for black markets that sell or distribute drugs (think Silk Road), stolen credit cards, illegal pornography, pirated media and more. You can even hire assassins.

If you take the figures of 54% of the deep web being databases, plus the 13% said to be on intranets, that leaves 33% of the deep web unaccounted for. How much of that is covered by Tor is hard to say.

But, we can intelligently guess that search doesn’t work any better in Tor than other segments of the Web, deep or not.

Given the risk of using even the Tor network, Online privacy is dead by Jose Pagliery (NSA vs. Silk Road), finding what you want efficiently could be worth a premium price.

Is guarding online privacy the the tipping point for paid collocation services?