NewsStand: A New View on News (+ Underwear Down Under)

March 28th, 2015

NewsStand: A New View on News by Benjamin E. Teitler, et al.

Abstract:

News articles contain a wealth of implicit geographic content that if exposed to readers improves understanding of today’s news. However, most articles are not explicitly geotagged with their geographic content, and few news aggregation systems expose this content to users. A new system named NewsStand is presented that collects, analyzes, and displays news stories in a map interface, thus leveraging on their implicit geographic content. NewsStand monitors RSS feeds from thousands of online news sources and retrieves articles within minutes of publication. It then extracts geographic content from articles using a custom-built geotagger, and groups articles into story clusters using a fast online clustering algorithm. By panning and zooming in NewsStand’s map interface, users can retrieve stories based on both topical signifi cance and geographic region, and see substantially diff erent stories depending on position and zoom level.

Of particular interest to topic map fans:

NewsStand’s geotagger must deal with three problematic cases in disambiguating terms that could be interpreted as locations: geo/non-geo ambiguity, where a given phrase might refer to a geographic location, or some other kind of entity; aliasing, where multiple names refer to the same geographic location, such as “Los Angeles” and “LA”; and geographic name ambiguity or polysemy , where a given name might refer to any of several geographic locations. For example, “Springfield” is the name of many cities in the USA, and thus it is a challenge for disambiguation algorithms to associate with the correct location.

Unless you want to hand disambiguate all geographic references in your sources, this paper merits a close read!

BTW, the paper dates from 2008 and I saw it in a tweet by Kirk Borne, where Kirk pointed to a recent version of NewsStand. Well, sort of “recent.” The latest story I could find was 490 days ago, a tweet from CBS News about the 50th anniversary of the Kennedy assassination in Dallas.

Undaunted I checked out TwitterStand but it seems to suffer from the same staleness of content, albeit it is difficult to tell because links don’t lead to the tweets.

Finally I did try PhotoStand, which judging from the pop-up information on the images, is quite current.

I noticed for Perth, Australia, “A special section of the exhibition has been dedicated to famous dominatrix Madame Lash.”

Sadly this appears to be one the algorithm got incorrect, so members of Congress should not select purchase on their travel arrangements just yet.

From the modesty of bloomers to the seductiveness of lacy corsets, a new exhibition gives us a rare glimpse into the most intimate and private parts of history.

The Powerhouse Museum in Sydney have unveiled their ‘Undressed: 350 Years of Underwear in Fashion’ collection, which features undergarments from the 17th-century to more modern garments worn by celebrities such as Emma Watson, Cindy Crawford and even Queen Victoria.

Apart from a brief stint in Bendigo and Perth, the collection has never been seen by any members of the public before and lead curator Edwina Ehrman believes people will be both shocked and intrigued by what’s on display.

So the collection was once shown in Perth, but for airline reservations you had best book for Sydney.

And no, I won’t leave you without the necessary details:

Undressed: 350 Years of Underwear in Fashion opens at the Powerhouse Museum on March 28 and runs until 12 July 2015. Tickets can be bought here.

Ticket prices do not include transportation expenses to Sydney.

Spoiler alert: The exhibition page says:

Please note that photography is not permitted in this exhibition.

Enjoy!

Who You Gonna Call?

March 28th, 2015

Confirmation you should join the NRA to protect privacy and the rights of hackers to defend themselves.

The Drug Enforcement Administration abandoned an internal proposal to use surveillance cameras for photographing vehicle license plates near gun shows in the United States to investigate gun-trafficking, the agency’s chief said Wednesday.

DEA Administrator Michelle Leonhart said in a statement that the proposal memorialized in an employee’s email was only a suggestion, never authorized by her agency and never put into action. The AP also learned that the federal Bureau of Alcohol, Tobacco, Firearms and Explosives did not authorize or approve the license plate surveillance plan.

A casual email suggestion warrants two separate high profile denials. That’s power. DEA chief: US abandoned plan to track cars near gun shows.

I’m not sure where else you would investigate gun-trafficking but the bare mention of guns frightens off entire federal agencies.

To protect privacy, who you gonna call?

Tracking NSA/CIA/FBI Agents Just Got Easier

March 28th, 2015

I pointed out in The DEA is Stalking You! the widespread use of automobile license reading applications by the DEA. I also suggested that citizens start using their cellphones to take photos of people coming and going from DEA, CIA, FBI offices and posting them online.

The good news is that Big Data has come to the aid of citizens to track NSA/CIA/FBI, etc. agents.

Howard Matis, a physicist who works at the Lawrence Berkeley National Laboratory in California, didn’t know that his local police department had license plate readers (LPRs).

But even if they did have LPRs (they do: they have 33 automated units), he wasn’t particularly worried about police capturing his movements.

Until, that is, he gave permission for Ars Technica’s Cyrus Farivar to get data about his own car and its movements around town.

The data is, after all, accessible via public records law.

Ars obtained the entire LPR dataset of the Oakland Police Department (OPD), including more than 4.6 million reads of over 1.1 million unique plates captured in just over 3 years.

Then, to make sense out of data originally provided in 18 Excel spreadsheets, each containing hundreds of thousands of lines, Ars hired a data visualization specialist who created a simple tool that allowed the publication to search any given plate and plot its locations on a map.

How cool is that!?

Of course, your mileage may vary as the original Ars article reports:

In August 2014, the American Civil Liberties Union and the Electronic Frontier Foundation lost a lawsuit to compel the Los Angeles Police Department and the Los Angeles Sheriff’s Department to hand over a mere week’s worth of all LPR data. That case is now on appeal.

The trick being that the state doesn’t mind invading your privacy but is very concerned with you not invading its privacy or knowing enough about its activities to be an informed member of the voting electorate.

If you believe that the government wants to keep information like license reading secret to protect the privacy of other citizens, you need to move to North Korea. I understand they have a very egalitarian society.

Of course these are license reading records collected by the state. Since automobiles are in public view, anyone could start collecting license plate numbers with locations. Now there’s a startup idea. Blanket the more important parts of D.C. inside the Beltway with private license readers. That would be a big data set with commercial value.

To give you an idea of the possibilities, visit Police License Plate Readers at PoliceOne.com. You will find links to a wide variety of license plate reading solutions, including:

A fixed installation device from Vigilant Solutions.

You could wire something together but if you are serious about keeping track of the government keeping track on all of us, you should go with professional grade equipment. As well as adopt an activist response to government surveillance. Being concerned, frightened, “speaking truth to power,” etc. are as effective as doing nothing at all.

Think about a citizen group based license plate data collection. Possible discoveries could include government vehicles at local motels and massage parlors, explaining the donut gaze in some police officer eyes, meetings between regulators and the regulated, a whole range of governmental wrong doing is waiting to be discovered. Think about investing in a mobile license plate reader for your car today!

If you don’t like government surveillance, invite them into the fish bowl.

They have a lot more to hide that you do.

Using Spark DataFrames for large scale data science

March 27th, 2015

From the post:

When we first open sourced Spark, we aimed to provide a simple API for distributed data processing in general-purpose programming languages (Java, Python, Scala). Spark enabled distributed data processing through functional transformations on distributed collections of data (RDDs). This was an incredibly powerful API—tasks that used to take thousands of lines of code to express could be reduced to dozens.

As Spark continues to grow, we want to enable wider audiences beyond big data engineers to leverage the power of distributed processing. The new DataFrame API was created with this goal in mind. This API is inspired by data frames in R and Python (Pandas), but designed from the ground up to support modern big data and data science applications. As an extension to the existing RDD API, DataFrames feature:

• Ability to scale from kilobytes of data on a single laptop to petabytes on a large cluster
• Support for a wide array of data formats and storage systems
• State-of-the-art optimization and code generation through the Spark SQL Catalyst optimizer
• Seamless integration with all big data tooling and infrastructure via Spark
• APIs for Python, Java, Scala, and R (in development via SparkR)

For new users familiar with data frames in other programming languages, this API should make them feel at home. For existing Spark users, this extended API will make Spark easier to program, and at the same time improve performance through intelligent optimizations and code-generation.

If you don’t know Spark DataFrames, you are missing out on important Spark capabilities! This post will have to well on the way to recovery.

Even though the reading of data from other sources is “easy” in many cases and support for more is growing, I am troubled by statements like:

DataFrames’ support for data sources enables applications to easily combine data from disparate sources (known as federated query processing in database systems). For example, the following code snippet joins a site’s textual traffic log stored in S3 with a PostgreSQL database to count the number of times each user has visited the site.

That goes well beyond reading data and introduces the concept of combining data, which isn’t the same thing.

For any two data sets that are trivially transparent to you (caveat what is transparent to you may/may not be transparent to others), that example works.

That example fails where data scientists spend 50 to 80 percent of their time: “collecting and preparing unruly digital data.” For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights.

If your handlers are content to spend 50 to 80 percent of your time munging data, enjoy. Not that munging data will ever go away, but documenting the semantics of your data can enable you to spend less time munging and more time on enjoyable tasks.

United States Code (from Office of the Law Revision Counsel)

March 27th, 2015

United States Code (from Office of the Law Revision Counsel)

Current Release Point

Public Law 113-296 Except 113-287

Each update of the United States Code is a release point. This page provides downloadable files for the current release point. All files are current through Public Law 113-296 except for 113-287. Titles in bold have been changed since the last release point.

A User Guide and the USLM Schema and stylesheet are provided for the United States Code in XML. A stylesheet is provided for the XHTML. PCC files are text files containing GPO photocomposition codes (i.e., locators).

Information about the currency of United States Code titles is available on the Currency page. Files for prior release points are available on the Prior Release Points page. Older materials are available on the Annual Historical Archives page.

You can download as much or as little of the United States Code in XML, XHTML, PCC or PDF format.

Oh, yeah, the 113-287 reference does seem rather cryptic. What? You don’t keep up with Public Law numbers?

The short story is that Congress passed a bill to move material on national parks to volume 54 and that hasn’t happened, yet. If you need more details, see: Title 54 of the U.S. Code: Background and Guidance by the National Park Service.

You can think of this as the outcome of the sausage making process. Interesting in its own right but not terribly helpful in divining the process that produced it.

Enjoy!

PS: On Ubuntu, the site displays great on Chrome, don’t know about IE*, and poorly on FireFox.

Data and Goliath – Bruce Schneler – New Book!

March 27th, 2015

“More than just being ineffective, the NSA’s surveillance efforts have actually made us less secure,” he says. Indeed, the Privacy and Civil Liberties Oversight Board found the “Section 215″ program for bulk collection of telephone metadata to be nearly useless, as well as likely illegal and problematic in other ways. But by contrast, it also reported that the “Section 702″ collection program had made a valuable contribution to security. Schneier does not engage on this point.

I’m waiting on my copy of Data and Goliath to arrive but I don’t find it surprising that Bruce overlooked and/or chose to not comment on the Section 702 report.

Starting with the full text, Report on the Surveillance Program Operated Pursuant to Section 702 of the Foreign Intelligence Surveillance Act, at one hundred and ninety-six pages (196), you will be surprised at how few actual facts are recited.

In terms of the efficacy of the 702 program, this is fairly typical:

The Section 702 program has proven valuable in a number of ways to the government’s efforts to combat terrorism. It has helped the United States learn more about the membership, leadership structure, priorities, tactics, and plans of international terrorist organizations. It has enabled the discovery of previously unknown terrorist operatives as well as the locations and movements of suspects already known to the government. It has led to the discovery of previously unknown terrorist plots directed against the United States and foreign countries, enabling the disruption of those plots.

That seems rather short on facts and long on conclusions to me. Yes?

Here’s a case the report singles out as a success:

In one case, for example, the NSA was conducting surveillance under Section 702 of an email address used by an extremist based in Yemen. Through that surveillance, the agency discovered a connection between that extremist and an unknown person in Kansas City, Missouri. The NSA passed this information to the FBI, which identified the unknown person, Khalid Ouazzani, and subsequently discovered that he had connections to U.S.-based Al Qaeda associates, who had previously been part of an abandoned early stage plot to bomb the New York Stock Exchange. All of these individuals eventually pled guilty to providing and attempting to provide material support to Al Qaeda.

Recalling that “early stage plot” means a lot of hot talk with no plan for implementation, which accords with pleas to “attempting to provide material support to Al Qaeda.” That’s grotesque.

Oh, another case:

For instance, in September 2009, the NSA monitored under Section 702 the email address of an Al Qaeda courier based in Pakistan. Through that collection, the agency intercepted emails sent to that address from an unknown individual located in the United States. Despite using language designed to mask their true intent, the messages indicated that the sender was urgently seeking advice on the correct mixture of ingredients to use for making explosives. The NSA passed this information to the FBI, which used a national security letter to identify the unknown individual as Najibullah Zazi, located near Denver, Colorado. The FBI then began intense monitoring of Zazi, including physical surveillance and obtaining legal authority to monitor his Internet activity. The Bureau was able to track Zazi as he left Colorado a few days later to drive to New York City, where he and a group of confederates were planning to detonate explosives on subway lines in Manhattan within the week. Once Zazi became aware that law enforcement was tracking him, he returned to Colorado, where he was arrested soon after. Further investigative work identified Zazi’s co-conspirators and located bomb-making components related to the planned attack. Zazi and one of his confederates later pled guilty and cooperated with the government, while another confederate was convicted and sentenced to life imprisonment. Without the initial tip-off about Zazi and his plans, which came about by monitoring an overseas foreigner under Section 702, the subway-bombing plot might have succeeded.

Sorry, that went by rather fast. The unknown sender in the United States did not know how to make explosives? And despite that, the plot is described as “…planning to detonate explosives on subway lines in Manhattan within the week.” Huh? That’s quite a leap from getting advice on explosives to being ready to execute a complex operation.

What’s wrong with the “terrorists” being tracked by the NSA/FBI? Almost without exception, they lack the skills to make bombs. The FBI fills in, supplying bombs in many cases, Cleveland, 2012, Portland, 2010, and that’s two I remember right off hand. (I don’t have a complete list of terror plots where the FBI supplies the bomb or bomb making materials. Do you? It would save me the work of putting one together. Thanks!)

A more general claim rounds out the “facts” claimed by the report:

A rough count of these cases identifies well over one hundred arrests on terrorism-related offenses. In other cases that did not lead to disruption of a plot or apprehension of conspirators, Section 702 appears to have been used to provide warnings about a continuing threat or to assist in investigations that remain ongoing. Approximately fifteen of the cases we reviewed involved some connection to the United States, such as the site of a planned attack or the location of operatives, while approximately forty cases exclusively involved operatives and plots in foreign countries.

Well, we know that “terrorism-related offense” includes “…attempting to provide material support to Al Qaeda.” And that conspiracy to commit a terrorist act can consist of talking about wanting to commit a terrorist act with no ability to put such a plan in action. Like no knowing how to make a bomb. Fairly serious impediment there, at least for a would be terrorist.

Not to mention that detention has no real relationship to the commission of a crime, as we have stood witness to at Guantanamo Bay (directions).

In Bruce’s defense, like he needs my help!, ;-), no one has an obligation to refute every lie told in support of government surveillance or its highly fictionalized “war on terrorism.” To no small degree, repeating those lies ad nauseam gives them credibility in group think circles, such as inside the beltway in D.C. Especially among agencies whose budgets depend upon those lies and the contractors who profit from them.

Treat yourself to some truth about cybersecurity, order your copy of Data and Goliath: The Hidden Battles to Capture Your Data and Control Your World by Bruce Schneler.

Congressional Influence Model [How To Choose Allies 4 Hackers]

March 26th, 2015

From the webpage:

This is a collection of data and code for investigating influence in Congress. Specifically, it uses data generated by MapLight and the Center for Responsive Politics to identify opposing interest groups and analyze their political contributions.

Unfortunately, due to size constraints, not all of the campaign finance data can be included in this repo. But if you’re curious you can download it using this scraper (see further instructions there).

I found this following the data for:

When interest groups disagreed on legislation, who did the 113th Congress vote with?

Sorted to show groups most frequently on opposite sides of legislation

To fully appreciate the graphic, see the original at: Congress is a Game, and We Have the Data to Show Who’s Winning by Westley Hennigh.

Where Westley also notes after the graphic:

Amongst more ideologically focused groups the situation is much the same. Conservative Republican interests were very often at odds with both health and welfare and human rights advocates, but Congress stood firmly with conservatives. They were almost twice as likely to vote against the interests of human rights advocates, and more than twice as likely to vote against health & welfare policy organizations.

The force driving this correlation between support by certain groups and favorable votes in Congress isn’t incalculable or hard to guess at. It’s money. The groups above that come out on top control massive amounts of political campaign spending relative to their opponents. The conservative Republican interests in conflict with health and welfare policy groups spent an average of 26 times as much on candidates that won seats in the 113th Congress. They outspent human rights advocates by even more — 300 times as much on average. The Chambers of Commerce, meanwhile, has spent more on lobbying than any other group every year since 1999.

As Westley points out, this is something we all “knew” at some level and now the data makes the correlation between money and policy undeniable.

My first reaction was Westley’s data is a good start towards: How much is that Representative/Senator in the window? The one with the waggly tail., a website where the minimum contribution for legislative votes, taking your calls, etc., is estimated for each member of the United States House and Senate. Interest groups could avoid overpaying for junior members and embarrassing themselves with paltry contributions to more senior members. Think of it as a public price list for legislation.

A How much is that Representative/Senator in the window? The one with the waggly tail. website would be very amusing, but it wouldn’t help me because I don’t have that sort of money. And it isn’t a straight out purchase, which is how they avoid the quid pro quo issue. Many of these interest groups have been greasing the palms of, sorry, contributing to, politicians for years.

In order to gain power by contributions, real power, requires a contribution/issue campaign that spans the political careers of multiple politicians, starting at the state and local level and following those careers into Congress. Which means, of course, getting upset about this or that outrage isn’t enough to sustain the required degree of organization and contributions. Contributions and reminders of contributions have to flow 7 x 365, in good years and lean years, perhaps even more so in lean (non-election) years.

Not to mention that you will need to make friends fast and enemies, permanent ones anyway, very slowly. Perhaps a member of Congress has too much local opposition to favor your side on a minor bill. They have simply be absent rather than vote. You have to learn to live with the reality that your representative/senator has other pressure points. Not unless you want to own one outright. They exist I have no doubt but the asking price would be very high. Easier to get one issue representatives elected than senators but I don’t know how useful that would be in the long term.

After thinking about it for a while, I concluded we know three things for sure:

• Congress votes with conservatives twice as often as human rights advocates.
• Conservatives outspend other groups and have for decades.
• Outspending conservatives would require national/state/local contributions for decades.

Based on those facts, would you choose an ally that:

• Loses twice as often on their issues as other groups?
• Doesn’t regularly contributed to campaigns at state/local/federal levels?
• That has no effective national/state/local organization that has persisted for decades?

How you frame your issues makes a difference in available allies.

Take for example the ACLU and its suit against the NSA to take back the Internet Backbone. The NSA Has Taken Over the Internet Backbone. We’re Suing to Get it Back.

The ACLU complaint against the NSA has issues such as:

48. Plaintiffs are educational, legal, human rights, and media organizations. Their work requires them to engage in sensitive and sometimes privileged communications, both international and domestic, with journalists, clients, experts, attorneys, civil society organizations, foreign government officials, and victims of human rights abuses, among others.

49. By intercepting, copying, and reviewing substantially all international text-based communications—and many domestic communications as well—as they transit telecommunications networks inside the United States, the government is seizing and searching Plaintiffs’ communications in violation of the FAA and the Constitution.

Really makes you feel like girding your loins and putting on body armor doesn’t it? Almost fifty (50) pages of such riveting prose.

Don’t get me wrong, I support the ACLU and deeply appreciate their suing the NSA. The NSA needs to be opposed in every venue by everyone who cares about having any semblance of freedom in the United States.

I hope the ACLU is victorious but at best, the NSA will be forced to obey existing laws, assuming you can trust known liars when they say “…now we are obeying the law, but we can’t let you see that we are obeying the law.” Somehow that doesn’t fill me with confidence, assuming the ACLU is successful.

What happens if we re-phrase the issue of NSA surveillance? So we can choose stronger allies to have on our side? Take the mass collection of credit card data for example. Sweeping NSA Surveillance Includes Credit-Card Transactions, Top Three Phone Companies’ Records by Ryan Gallagher.

What would credit card data enable? Hmmm, can you say a de facto national gun registry? With purchase records for guns and ammunition? What reason other than ownership would I have for buying .460 Weatherby Magnum ammunition?

By framing the issue of surveillance as a gun registration issue, we find the NRA joining with the ACLU and others in ACLU vs. Clapper, No. 13-cv-03994 (WHP), saying:

For more than 50 years since its decision in Nat’l Ass’n for Advancement of Colored People v. State of Ala. ex rel. Patterson, 357 U.S. 449 (1958), the Supreme Court has recognized that involuntary disclosure of the membership of advocacy groups inhibits the exercise of First Amendment rights by those groups. For nearly as long—since the debates leading up to enactment of the Gun Control Act of 1968—the Congress has recognized that government recordkeeping on gun owners inhibits the exercise of Second Amendment rights. The mass surveillance program raises both issues, potentially providing the government not only with the means of identifying members and others who communicate with the NRA and other advocacy groups, but also with the means of identifying gun owners without their knowledge or consent, contrary to longstanding congressional policy repeatedly reaffirmed and strengthened by Congresses that enacted and reauthorized the legislation at issue in this case. The potential effect on gun owners’ privacy is illustrative of the potential effect of the government’s interpretation of the statute on other statutorily protected privacy rights. The injunction should be issued.

That particular suit was unsuccessful at the district court level but that should give you an idea of how “framing” an issue can enable you to attract allies who are more successful than most.

With support of the ACLU, perhaps, just perhaps the NSA will be told to obey the law. Guesses for grabs on how successful that “telling” will be.

With the support of the NRA and similar groups, the very existence of the NSA data archives will come into question. Not beyond possibility that the NSA will be returned to its former, much smaller footprint of legitimate cryptography work.

And what of other NRA positions? (shrugs) I’m sure that any group you look closely enough at will stand for something you don’t like. As I put it to a theologically diverse group forming to create a Bible encoding, “I’m looking for allies, not soul mates. I already have one of those.”

You?

PS: As of April, 2014, Overview of Constitutional Challenges to NSA Collection Activities and Recent Developments, is a summary of legal challenges to the NSA. Dated but I thought it might be helpful.

2nd Amendment-Summary-4-Hackers

March 25th, 2015

As promised, not a deeply technical (legal) analysis of District of Columbia vs. Heller but summary of the major themes in Scalia’s opinion for the majority.

District of Columbia v. Heller, 554 U.S. 570, 128 S. Ct. 2783, 171 L. Ed. 2d 637 (2008) [2008 BL 136680] has the following pagination markers:

* U.S. (official)
** S. Ct. (West Publishing)
*** L. Ed. 2d (Lawyers Editon 2nd)
**** BL (Bloomberg Law)

In the text you will see: [*577] for example which is the start of page 577 in the official version of the opinion. I use the official pagination herein.

Facts: Heller, a police officer applied for a handgun permit, which was denied. Without a permit, possession of a handgun was banned in the District of Columbia. Even if a permit were obtained, the handgun had to be disabled and unloaded. Heller sued the district saying that the Second Amendment protects an individual’s right to possess firearms and that the city’s ban on handguns and the non-functioning requirement, should the handgun be required for self-defense, infringed on that right.

[Observation: When challenging a law on constitutional grounds, get an appropriate plaintiff to bring the suit. I haven’t done the factual background but I rather doubt that Heller was just an ordinary police officer who decided on his own to sue the District of Columbia. Taking a case to the Supreme Court is an expensive proposition. In challenging laws that infringe on hackers, use security researchers, universities, people with clean reputations. Not saying you can’t win with others but on policy debates its better to wear your best clothes.]

Law: Second Amendment: “A well regulated Militia, being necessary to the security of a free State, the right of the people to keep and bear Arms, shall not be infringed.”

Scalia begins by observing:

“[t]he Constitution was written to be understood by the voters; its words and phrases were used in their normal and ordinary as distinguished from [****4] technical meaning.” United States v. Sprague, 282 U. S. 716, 731 (1931); see also Gibbons v. Ogden, 9 Wheat. 1, 188 (1824). [*576]

The crux of Scalia’s argument comes early and is stated quite simply:

The Second Amendment is naturally divided into two parts: its prefatory clause and its operative clause. The former does not limit the latter grammatically, but rather announces a purpose. The Amendment could be rephrased, “Because a well regulated Militia is necessary to the security of a free State, the right of the people to keep and bear Arms shall not be infringed.” [*577]

With that “obvious” construction, Scalia sweeps to one side all arguments that attempt to limit the right to bear arms to a militia context. Its just an observation, not binding in any way on the operative clause. He does retain it for later use to argue the interpretation of the operative clause is consistent with that purpose.

Scalia breaks his analysis of the operative clause into the following pieces:

a. “Right of the People.”

b. “Keep and Bear Arms”

c. Meaning of the Operative Clause.

“Right of the People.” In perhaps the strongest part of the opinion, Scalia observes that “right of the people” occurs in the unamended Constitution and Bill of Rights only two other times, First Amendment (assemby-and-petition clause) and Fourth Amendment (search-and-seizure) clause. The Fourth Amendment has fallen on hard times of late but the First Amendment is still attractive to many. He leaves little doubt that the right to “keep and bear arms” (the next question), is undoubtedly meant to be an individual right. [*579]

“Keep and Bear Arms” Before turning to “keep” and “bear,” Scalia makes two important points with regard to “arms:”

Before addressing the verbs “keep” and “bear,” we interpret their object: “Arms.” The 18th-century meaning is no different from the meaning today. The 1773 edition of Samuel Johnson’s dictionary defined “arms” as “[w]eapons of offence, or armour of defence.” 1 Dictionary of the English Language 106 (4th ed.) (reprinted 1978) (hereinafter Johnson). Timothy Cunningham'[****6] s important 1771 legal dictionary defined “arms” as “any thing that a man wears for his defence, or takes into his hands, or useth in wrath to cast at or strike another.” 1 A New and Complete Law Dictionary; see also N. Webster, American Dictionary of the English Language (1828) (reprinted 1989) (hereinafter Webster) (similar).

The term was applied, then as now, to weapons that were not specifically designed for military use and were not employed in a military capacity. For instance, Cunningham’s legal dictionary gave as an example of usage: “Servants and labourers shall use bows and arrows on Sundays, & c. and not bear other arms.” See also, e.g., An Act for the trial of Negroes, 1797 Del. Laws ch. XLIII, § 6, in 1 First Laws of the State of Delaware 102, 104 (J. Cushing ed. 1981 (pt. 1)); see generally State v. Duke, 42 Tex. 455, 458 (1874) (citing decisions of state courts construing “arms”). Although one founding-era thesaurus limited “arms” (as opposed to “weapons”) to “instruments of offence generally made use of in war,” even that source stated that all firearms constituted “arms.” 1 J. Trusler, The Distinction Between Words Esteemed [*582] Synonymous in the English Language 37 (3d ed. 1794) (emphasis added).

Some have made the argument, bordering on the frivolous, that only those arms in existence in the 18th century are protected by the Second Amendment. We do not interpret constitutional rights that way. Just as the First Amendment protects modern forms of communications, e. g., Reno v. American Civil Liberties Union, 521 U. S. 844, 849 (1997), and the Fourth Amendment applies to modern forms of search, e.g., Kyllo v. United States, 533 U. S. 27, 35-36 (2001), the Second Amendment extends, [**2792] prima facie, to all instruments that constitute bearable arms, even those that were not in existence at the time of the founding. [*581-*582]

Although he says “The 18th-century meaning is no different from the meaning today.” at the outset, the sources cited make it clear that it is the character of an item as a means of offense or defense, generally used in war, that makes it fall into the category “arms.” Which extends to bows and arrows as well as 18th century firearms as well as modern firearms.

“Arms” not limited to 18th Century “Arms”

The second point, particularly relevant to hackers, is that arms are not limited to those existing in the 18th century. Scalia specifically calls out both First and Fourth Amendment cases where rights have evolved along with modern technology. The adaptation to modern technology under those amendments is particularly relevant to making a hackers argument under the Second Amendment.

Posession/Bearing Arms

The meaning of “keep arms” requires only a paragraph or two:

Thus, the most natural reading of “keep Arms” in the Second Amendment is to “have weapons.” [*582]

Which settles the possession of arms question, but what about the right to carry such arms?

The notion of “bear arms” devolves into a lively contest of snipes between Scalia and Stevens. You can read both the majority opinion and the dissent if you are interested but the crucial text reads:

We think that JUSTICE GINSBURG accurately captured the natural meaning of “bear arms.” Although the phrase implies that the carrying of the weapon is for the purpose of “offensive or defensive action,” it in no way connotes participation in a structured military organization.

From our review of founding-era sources, we conclude that this natural meaning was also the meaning that “bear arms” had in the 18th century. In numerous instances, “bear arms” was unambiguously used to refer to the carrying of weapons outside of an organized militia. [*584]

I mention that point just in case some wag argues that cyber weapons should be limited to your local militia or that you don’t have the right to carry such weapons on your laptop, cellphone, USB drive, etc.

Meaning of the Operative Clause

c. Meaning of the Operative Clause. [4] Putting all of these textual elements together, we find that they guarantee the individual right to possess and carry weapons in case of confrontation. This meaning is strongly confirmed by the historical background of the Second Amendment. [5] We look to this because it has always been widely understood that the Second Amendment, like the First and Fourth Amendments, codified a pre-existing right. The very text of the Second Amendment implicitly recognizes the pre-existence of the right and declares only that it “shall not be infringed.” As we said in United [****11] States v. Cruikshank, 92 U. S. 542, 553 (1876), “[t]his is not a right granted by the Constitution. Neither is it in any manner dependent upon that instrument for its existence. The [**2798] second amendment declares [***658] that it shall not be infringed. . . .”[fn16] [*592]

You can’t get much better than a pre-existing right, at least not with the current Supreme Court composition. Certainly sounds like it would extent to defending your computer systems, which the government seems loathe to undertake.

Motivation for the Second Amendment

Skipping over the literalist interpretation of the prefactory clause, Scalia returns to the relationship between the prefatory and operative clause. The opinion goes on for twenty-one (21) pages at this point but an early paragraph captures the gist of the argument if not all of its details:

The debate with respect to the right to keep and bear arms, as with other guarantees in the Bill of Rights, was not over whether it was desirable (all agreed that it was) but over whether it needed to be codified in the Constitution. During the 1788 ratification debates, the fear that the Federal Government would disarm the people in order to impose rule through a standing army or select militia was pervasive in Anti-federalist rhetoric. See, e. g., Letters from The Federal Farmer III (Oct. 10, 1787), in 2 The Complete Anti-Federalist 234, 242 (H. Storing ed. 1981). John Smilie, for example, worried not only that Congress’s “command of the militia” could be used to create a “select militia,” or to have “no militia at all,” but also, as a separate concern, that “[w]hen a select militia is formed; the people in general may be disarmed.” 2 Documentary History of the Ratification of the Constitution 508-509 (M. Jensen ed. 1976) (hereinafter [*599] Documentary Hist.). Federalists responded that because Congress was given no power to abridge the ancient right of individuals to keep and bear arms, such a force could never oppress the people. See, e.g., A Pennsylvanian III (Feb. 20, 1788), in The Origin of the Second Amendment 275, [****15] 276 (D. Young ed., 2d ed. 2001) (hereinafter Young); White, To the Citizens of Virginia (Feb. 22, 1788), in id., at 280, 281; A Citizen of America (Oct. 10, 1787), in id., at 38, 40; Foreign Spectator, Remarks on the Amendments to the Federal Constitution, Nov. 7, 1788, in id., at 556. It was understood across the political spectrum that the right helped to secure the ideal of a citizen militia, which might be necessary to oppose an oppressive military force if the constitutional order broke down.[*598-*599]

Whether you choose to emphasize the disarming of the people by regulation of cyberweapons or the overreaching of the Federal government, the language here is clearly of interest in arguing for cyberweapons under the Second Amendment. The majority opinion on this point is found at pages [*598-*619].

Limitations on “Arms”

The right to possess arms, including cyberweapons, isn’t a slam dunk. The Federal and State governments can place some regulations on the possession of arms. One example that Scalia discusses is United States v. Miller, 307 U. S. 174, 179 (1939). Reading Miller:

…to say only that the Second Amendment [**2816] does not protect those weapons not typically possessed by law-abiding citizens for lawful purposes, such as short-barreled shotguns. That accords with the historical understanding of the scope of the right, see Part III, infra.[fn25] [*625]

So hackers will lose on blue boxes, if you know the reference but quite possibly win on software, code, etc. So far as I know, no one has challeged the right of computer users to protect themselves.

Is there a balancing test for cyber weapons?

The balance of the opinion is concerned with the case at hand and sparring with Justice Breyer but it does have this jewel when it is suggested that the Second Amendment should be subject to a balancing test (a likely argument about cyber weapons):

We know of no other enumerated constitutional right whose core protection has been subjected to a freestanding “interest-balancing” approach. The very enumeration of the right takes out of the hands of government — even the Third Branch of Government — the power to decide on a case-by-case basis whether the right is really worth insisting upon. A constitutional guarantee subject to future judges’ assessments of its usefulness is no constitutional guarantee at all. [15] Constitutional rights are enshrined with the scope they were understood to have when the people adopted [*635] them, whether or not future legislatures or (yes) even future judges think that scope too broad. We would not apply an “interest-balancing” approach to the prohibition of a peaceful neo-Nazi march through Skokie. See National Socialist Party of America v. Skokie, 432 U. S. 43 (1977) (per curiam). The First Amendment contains the freedom-of-speech guarantee that the people ratified, which included exceptions for obscenity, libel, and disclosure of state secrets, but not for the expression of extremely unpopular and wrongheaded views. The Second Amendment is no different. Like the First, it is the very product of an interest balancing by the people — which JUSTICE BREYER would now conduct for them anew. And whatever else it leaves to future evaluation, it surely elevates above all other interests the right of law-abiding, responsible citizens to use arms in defense of hearth and home. [*634-*635]

I rather like the lines:

The very enumeration of the right takes out of the hands of government — even the Third Branch of Government — the power to decide on a case-by-case basis whether the right is really worth insisting upon. A constitutional guarantee subject to future judges’ assessments of its usefulness is no constitutional guarantee at all.

Is the right to privacy no right at all because the intelligence community lapdog FISA court decides in secret when our right to privacy is unnecessary?

Open Issues

Forests have been depopulated to produce the paper required for all the commentaries on District of Columbia v. Heller. What I have penned above is a highly selective summary in hopes of creating interest in a Second Amendment argument for the possession and discussion of cyber weapons.

Open issues include:

• Evolution of the notion of “arms” for the Second Amendment.
• What does it mean to posses a cyber weapon? Is code required? Binary?
• Defensive purposes of knowledge or cyber weapons.
• Analogies to disarming the public.
• Others?

As I suggested in A Well Regulated Militia, a Second Amendment argument to protect our rights to cyber weapons could prove to be more successful than other efforts to date.

Unless you like being disarmed while government funded hackers invade your privacy of course.

Let me know if you are interested in sponsoring research on Second Amendment protection for cyber weapons.

PS: Just so you know, I took my own advice and joined the NRA earlier this week. Fights like this can only be won with allies, strong allies.

Who’s Pissed Off at the United States?

March 24th, 2015

Instances of Use of United States Armed Forces Abroad, 1798-2015 by Barbara Salazar Torreon (Congressional Research Service).

From the summary:

This report lists hundreds of instances in which the United States has used its Armed Forces abroad in situations of military conflict or potential conflict or for other than normal peacetime purposes. It was compiled in part from various older lists and is intended primarily to provide a rough survey of past U.S. military ventures abroad, without reference to the magnitude of the given instance noted. The listing often contains references, especially from 1980 forward, to continuing military deployments, especially U.S. military participation in multinational operations associated with NATO or the United Nations. Most of these post-1980 instances are summaries based on presidential reports to Congress related to the War Powers Resolution. A comprehensive commentary regarding any of the instances listed is not undertaken here.

One of the first steps in security analysis is an evaluation of potential attackers. Who has a reason (in their eyes) to go to the time and trouble of attacking you?

Such a list for the United States doesn’t narrow the field by much but it may help avoid overlooking some of the less obvious candidates. To be sure the United States will keep China and North Korea as convenient whipping boys for any domestic cyber misadventures, but that’s just PR. Why would our largest creditor want to screw with our ability to pay them back? All of the antics about China are street theater, far away from where real decisions are made.

What amazes me is despite centuries of misbehavior by American administration after American administration, that places like Vietnam want to have peaceful relations with us. They aren’t carrying a grudge. Hard to say that for the engineers of one U.S. foreign policy disaster after another.

You could also think of the more recent incidents as the starting point of a list of people to hound from public office and/or public service. Either way, I think you will find it useful.

I first saw this in a tweet by the U.S. Dept. of Fear.

Bulk Collection of Signals Intelligence: Technical Options (2015)

March 24th, 2015

Bulk Collection of Signals Intelligence: Technical Options (2015)

From the webpage:

The Bulk Collection of Signals Intelligence: Technical Options study is a result of an activity called for in Presidential Policy Directive 28 (PPD-28), issued by President Obama in January 2014, to evaluate U.S. signals intelligence practices. The directive instructed the Office of the Director of National Intelligence (ODNI) to produce a report within one year “assessing the feasibility of creating software that would allow the intelligence community more easily to conduct targeted information acquisition rather than bulk collection.” ODNI asked the National Research Council (NRC) — the operating arm of the National Academy of Sciences and National Academy of Engineering — to conduct a study, which began in June 2014, to assist in preparing a response to the President. Over the ensuing months, a committee of experts appointed by the Research Council produced the report.

Useful background information for engaging on the policy side of collecting signals intelligence. Since I don’t share the starting assumption that bulk collection of signals intelligence is ever justified inside the United States, it is only of passing interest to me. I concede that in some limited cases surveillance can be authorized but only under the Fourth Amendment and then only by a constitutional court and not a FISA star chamber.

Distribution, posting, or copying of this PDF is strictly prohibited without written permission of the National Academies Press

Despite doubting I am on any list any where, still, it isn’t smart to give anyone a free shot.

The main difficulty in challenging such reports is that fictions, invented by the intelligence agencies, are take as facts. Such as the oft reported fiction that bulk collection/retention helps when a new figure is identified. To enable the agencies to consider their past activities. Certainly a theoretical possibility to be sure but how many cases and what were the results of that backtracking are unknown. Quite possibly to the intelligence agencies themselves.

If you have identified someone as a current credible threat, perhaps even on their way to commit an illegal act, who is going to worry about their phone conversations several years ago? Of course, that’s where their “logic” for immediate action runs counter to the fact they are simply inventing work for themselves. The more data they collect, the larger their IT budget and the more people needed just in case they ever want to search it. Complete and total farce.

That’s the other reason I oppose build signals intelligence collection in the United States, it is an incompetent waste of funds. Funds that could be spent on non-manipulative aid to the people of the Middle East (not their governments), which would greatly reduce the odds of anyone being unhappy enough with the United States to commit a terrorist act on its soil. Despite the fact the United States has committed numerous terrorist attacks on theirs.

Sorting [Visualization]

March 24th, 2015

Carlo Zapponi created http://sorting.at/, a visualization of sorting resource that steps through different sorting algorithms. You can choose from four different initial states, six (6) different sizes (5, 10, 20, 50, 75, 100), and six (6) different colors.

The page defaults to Quick Sort and Heap Sort, but under add algorithms you will find:

I added Wikipedia links for the algorithms. For a larger list see:
Sorting algorithm.

I first saw this in a tweet by Eric Christensen.

Bearing Arms – 2nd Amendment and Hackers – The Constitution

March 23rd, 2015

All discussions of the right to bear arms in the United States start with the Second Amendment. But since words can’t interpret themselves for specific cases, our next stop is the United States Supreme Court.

One popular resource, The Constitution of the United States of America: Analysis and Interpretation (popularly known as the Constitution Annotated), covers the Second Amendment in a scant five (5) pages.

There is a vast sea of literature on the Second Amendment but there is one case that established the right to bear arms is an individual right and not limited to state militias.

In District of Columbia vs. Heller, 554 U.S. 570 (2008), Justice Scalia writing for the majority found that the right to bear arms was an individual right, for the first time in U.S. history.

The unofficial syllabus notes:

The prefatory clause comports with the Court’s interpretation of the operative clause. The “militia” comprised all males physically capable of acting in concert for the common defense. The Antifederalists feared that the Federal Government would disarm the people in order to disable this citizens’ militia, enabling a politicized standing army or a select militia to rule. The response was to deny Congress power to abridge the ancient right of individuals to keep and bear arms, so that the ideal of a citizens’ militia would be preserved. Pp. 22–28.

Interesting yes? Disarm the people in order to enable “…a politicized standing army (read NSA/CIA/FBI/DHS) or a select militia to rule.”

If citizens are prevented from owning hacking software and information, necessary for their own cybersecurity, have they not been disarmed?

Justice Scalia’s opinion is rich in historical detail and I will be teasing out the threads that seem most relevant to an argument that hacking tools and knowledge should fall under the right to bear arms under the Second Amendment.

In the mean time, some resources that you will find interesting/helpful:

District of Columbia v. Heller in Wikipedia is a quick read and a good way to get introduced to the case and the issues it raises. But only as an introduction, you would not perform surgery based on a newspaper report of a surgery. Yes?

A definite step up in analysis is SCOTUSblog, District of Columbia v. Heller. You will find twenty (20) blog posts on Heller, briefs and documents in the case, plus some twenty (20) briefs supporting the petitioner (District of Columbia) and forty-seven (47) briefs supporting the respondent (Heller). Noting that attorneys could be asked questions about any and all of the theories advanced in the various briefs.

Take this as an illustration of why I don’t visit SCOTUSblog as often as I should. I tend to get lost in the analysis and start chasing threads through the opinions and briefs. One of the many joys being that rarely you find anyone with a hand waving citation “over there, somewhere” as you do in CS literature. Citations are precise or not at all.

No, I don’t propose to drag you through all of the details even of Scalia’s majority opinion but just enough to frame the questions to be answered in making the claim that cyber weapons are the legitimate heirs of arms for purposes of the Second Amendment and entitled to the same protection as firearms.

Do some background reading today and tomorrow. I am re-reading Scalia’s opinion now and will let it soak in for a day or so before posting an outline of it relevant for our purposes. Look for it late on Wednesday, 25 March 2015.

PS: Columbia vs. Heller, 554 U.S. 570 (2008), the full opinion plus dissents. A little over one hundred and fifty (150) pages of very precise writing. Enjoy!

Association Rule Mining – Not Your Typical Data Science Algorithm

March 23rd, 2015

From the post:

Many machine learning algorithms that are used for data mining and data science work with numeric data. And many algorithms tend to be very mathematical (such as Support Vector Machines, which we previously discussed). But, association rule mining is perfect for categorical (non-numeric) data and it involves little more than simple counting! That’s the kind of algorithm that MapReduce is really good at, and it can also lead to some really interesting discoveries.

Association rule mining is primarily focused on finding frequent co-occurring associations among a collection of items. It is sometimes referred to as “Market Basket Analysis”, since that was the original application area of association mining. The goal is to find associations of items that occur together more often than you would expect from a random sampling of all possibilities. The classic example of this is the famous Beer and Diapers association that is often mentioned in data mining books. The story goes like this: men who go to the store to buy diapers will also tend to buy beer at the same time. Let us illustrate this with a simple example. Suppose that a store’s retail transactions database includes the following information:

If you aren’t familiar with association rule mining, I think you will find Dr. Borne’s post an entertaining introduction.

I would not go quite as far as Dr. Borne with “explanations” for the pop-tart purchases before hurricanes. For retail purposes, so long as we spot the pattern, they could be building dikes out of them. The same is the case for other purchases. Take advantage of the patterns and try to avoid second guessing consumers. You can read more about testing patterns Selling Blue Elephants.

Enjoy!

Polyglot Data Management – Big Data Everywhere Recap

March 23rd, 2015

From the post:

At the Big Data Everywhere conference held in Atlanta, Senior Software Engineer Mike Davis and Senior Solution Architect Matt Anderson from Liaison Technologies gave an in-depth talk titled “Polyglot Data Management,” where they discussed how to build a polyglot data management platform that gives users the flexibility to choose the right tool for the job, instead of being forced into a solution that might not be optimal. They discussed the makeup of an enterprise data management platform and how it can be leveraged to meet a wide variety of business use cases in a scalable, supportable, and configurable way.

Matt began the talk by describing the three components that make up a data management system: structure, governance and performance. “Person data” was presented as a good example when thinking about these different components, as it includes demographic information, sensitive information such as social security numbers and credit card information, as well as public information such as Facebook posts, tweets, and YouTube videos. The data management system components include:

It’s a vendor pitch so read with care but it comes closer than any other pitch I have seen to capturing the dynamic nature of data. Data isn’t the same from every source and you treat it the same at your peril.

If I had to say the pitch has a theme it is to adapt your solutions to your data and goals, not the other way around.

The one place where I may depart from the pitch is on the meaning of “normalization.” True enough we may want to normalize data a particular way this week, this month, but that should no preclude us from other “normalizations” should our data or requirements change.

The danger I see in “normalization” is that the cost of changing static ontologies, schemas, etc., leads to their continued use long after they have passed their discard dates. If you are as flexible with regard to your information structures as you are your data, then new data or requirements are easier to accommodate.

Or to put it differently, what is the use of being flexible with data if you intend to imprison it in a fixed labyrinth?

Using scikit-learn Pipelines and FeatureUnions

March 23rd, 2015

From the post:

Since I posted a postmortem of my entry to Kaggle's See Click Fix competition, I've meant to keep sharing things that I learn as I improve my machine learning skills. One that I've been meaning to share is scikit-learn's pipeline module. The following is a moderately detailed explanation and a few examples of how I use pipelining when I work on competitions.

The pipeline module of scikit-learn allows you to chain transformers and estimators together in such a way that you can use them as a single unit. This comes in very handy when you need to jump through a few hoops of data extraction, transformation, normalization, and finally train your model (or use it to generate predictions).

When I first started participating in Kaggle competitions, I would invariably get started with some code that looked similar to this:

train = read_file('data/train.tsv')
train_y = extract_targets(train)
train_essays = extract_essays(train)
train_tokens = get_tokens(train_essays)
train_features = extract_feactures(train)
classifier = MultinomialNB()

scores = []
train_idx, cv_idx in KFold():
classifier.fit(train_features[train_idx], train_y[train_idx])
scores.append(model.score(train_features[cv_idx], train_y[cv_idx]))

print("Score: {}".format(np.mean(scores)))


Often, this would yield a pretty decent score for a first submission. To improve my ranking on the leaderboard, I would try extracting some more features from the data. Let's say in instead of text n-gram counts, I wanted tf–idf. In addition, I wanted to include overall essay length. I might as well throw in misspelling counts while I'm at it. Well, I can just tack those into the implementation of extract_features. I'd extract three matrices of features–one for each of those ideas and then concatenate them along axis 1. Easy.

Zac has quite a bit of practical advice for how to improve your use of scikit-learn. Just what you need to start a week in the Spring!

Enjoy!

I first saw this in a tweet by Vineet Vashishta.

MapR Sandbox Fastest On-Ramp to Hadoop

March 23rd, 2015

MapR Sandbox Fastest On-Ramp to Hadoop

From the webpage:

The MapR Sandbox for Hadoop provides tutorials, demo applications, and browser-based user interfaces to let developers and administrators get started quickly with Hadoop. It is a fully functional Hadoop cluster running in a virtual machine. You can try our Sandbox now – it is completely free and available as a VMware or VirtualBox VM.

If you are a business intelligence analyst or a developer interested in self-service data exploration on Hadoop using SQL and BI Tools, the MapR Sandbox including Apache Drill will get you started quickly. You can download the Drill Sandbox here.

You of course know about the Hortonworks and Cloudera (at the very bottom of the page) sandboxes as well.

Don’t expect a detailed comparison of all three because the features and distributions change too quickly for that to be useful. And my interest is more in capturing the style or approach that may make a difference to a starting user.

Enjoy!

I first saw this in a tweet by Kirk Borne.

Classifying Plankton With Deep Neural Networks

March 23rd, 2015

Classifying Plankton With Deep Neural Networks by Sander Dieleman.

From the post:

The National Data Science Bowl, a data science competition where the goal was to classify images of plankton, has just ended. I participated with six other members of my research lab, the Reservoir lab of prof. Joni Dambre at Ghent University in Belgium. Our team finished 1st! In this post, we’ll explain our approach.

The ≋ Deep Sea ≋ team consisted of Aäron van den Oord, Ira Korshunova, Jeroen Burms, Jonas Degrave, Lionel Pigou, Pieter Buteneers and myself. We are all master students, PhD students and post-docs at Ghent University. We decided to participate together because we are all very interested in deep learning, and a collaborative effort to solve a practical problem is a great way to learn.

There were seven of us, so over the course of three months, we were able to try a plethora of different things, including a bunch of recently published techniques, and a couple of novelties. This blog post was written jointly by the team and will cover all the different ingredients that went into our solution in some detail.

Overview

This blog post is going to be pretty long! Here’s an overview of the different sections. If you want to skip ahead, just click the section title to go there.

Introduction

The problem

The goal of the competition was to classify grayscale images of plankton into one of 121 classes. They were created using an underwater camera that is towed through an area. The resulting images are then used by scientists to determine which species occur in this area, and how common they are. There are typically a lot of these images, and they need to be annotated before any conclusions can be drawn. Automating this process as much as possible should save a lot of time!

The images obtained using the camera were already processed by a segmentation algorithm to identify and isolate individual organisms, and then cropped accordingly. Interestingly, the size of an organism in the resulting images is proportional to its actual size, and does not depend on the distance to the lens of the camera. This means that size carries useful information for the task of identifying the species. In practice it also means that all the images in the dataset have different sizes.

Participants were expected to build a model that produces a probability distribution across the 121 classes for each image. These predicted distributions were scored using the log loss (which corresponds to the negative log likelihood or equivalently the cross-entropy loss).

This loss function has some interesting properties: for one, it is extremely sensitive to overconfident predictions. If your model predicts a probability of 1 for a certain class, and it happens to be wrong, the loss becomes infinite. It is also differentiable, which means that models trained with gradient-based methods (such as neural networks) can optimize it directly – it is unnecessary to use a surrogate loss function.

Interestingly, optimizing the log loss is not quite the same as optimizing classification accuracy. Although the two are obviously correlated, we paid special attention to this because it was often the case that significant improvements to the log loss would barely affect the classification accuracy of the models.

This rocks!

Code is coming soon to Github!

Certainly of interest to marine scientists but also to anyone in bio-medical imaging.

The problem of too much data and too few experts is a common one.

What I don’t recall seeing are releases of pre-trained classifiers. Is the art developing too quickly for that to be a viable product? Just curious.

I first saw this in a tweet by Angela Zutavern.

ICDM ’15: The 15th IEEE International Conference on Data Mining

March 23rd, 2015

ICDM ’15: The 15th IEEE International Conference on Data Mining November 14-17, 2015, Atlantic City, NJ, USA

Important dates:

All deadlines are at 11:59PM Pacific Daylight Time
* Workshop notification:                             Mar 29, 2015
* ICDM contest proposals:                            Mar 29, 2015
* Full paper submissions:                            Jun 03, 2015
* Demo proposals:                                    Jul 13, 2015
* Workshop paper submissions:                        Jul 20, 2015
* Tutorial proposals:                                Aug 01, 2015
* Conference paper, tutorial, demo notifications:    Aug 25, 2015
* Workshop paper notifications:                      Sep 01, 2015
* Conference dates:                                  Nov 14-17, 2015


From the post:

The IEEE International Conference on Data Mining series (ICDM) has established itself as the world’s premier research conference in data mining. It provides an international forum for presentation of original research results, as well as exchange and dissemination of innovative, practical development experiences. The conference covers all aspects of data mining, including algorithms, software and systems, and applications. ICDM draws researchers and application developers from a wide range of data mining related areas such as statistics, machine learning, pattern recognition, databases and data warehousing, data visualization, knowledge-based systems, and high performance computing. By promoting novel, high quality research findings, and innovative solutions to challenging data mining problems, the conference seeks to continuously advance the state-of-the-art in data mining. Besides the technical program, the conference features workshops, tutorials, panels and, since 2007, the ICDM data mining contest.

Topics of Interest
******************

Topics of interest include, but are not limited to:

* Foundations, algorithms, models, and theory of data mining
* Machine learning and statistical methods for data mining
* Mining text, semi-structured, spatio-temporal, streaming, graph, web, multimedia data
* Data mining systems and platforms, their efficiency, scalability, and privacy
* Data mining in modeling, visualization, personalization, and recommendation
* Applications of data mining in all domains including social, web, bioinformatics, and finance

An excellent conference but unlikely to be as much fun as Balisage. The IEEE conference will be the pocket protector crowd whereas Balisage features a number of wooly-pated truants (think Hobbits), some of which don’t even wear shoes. Some of them wear hats though. Large colorful hats. Think Mad Hatter and you are close.

If your travel schedule permits do both Balisage and this conference.

Enjoy!

Unstructured Topic Map-Like Data Powering AI

March 23rd, 2015

From the post:

Such mining of digitized information has become more effective and powerful as more info is “tagged” and as analytics engines have gotten smarter. As Dario Gil, Director of Symbiotic Cognitive Systems at IBM Research, told me:

“Data is increasingly tagged and categorized on the Web – as people upload and use data they are also contributing to annotation through their comments and digital footprints. This annotated data is greatly facilitating the training of machine learning algorithms without demanding that the machine-learning experts manually catalogue and index the world. Thanks to computers with massive parallelism, we can use the equivalent of crowdsourcing to learn which algorithms create better answers. For example, when IBM’s Watson computer played ‘Jeopardy!,’ the system used hundreds of scoring engines, and all the hypotheses were fed through the different engines and scored in parallel. It then weighted the algorithms that did a better job to provide a final answer with precision and confidence.”

Granting that the tagging and annotation is unstructured, unlike a topic map, but it is as unconstrained by first order logic and other crippling features of RDF and OWL. Out of that mass of annotations, algorithms can construct useful answers.

Imagine what non-experts (Stanford logic refugees need not apply) could author about your domain, to be fed into an AI algorithm. That would take more effort than relying upon users chancing upon subjects of interest but it would also give you greater precision in the results.

Perhaps, just perhaps, one of the errors in the early topic maps days was the insistence on high editorial quality at the outset, as opposed to allowing editorial quality to emerge out of data.

As an editor I’m far more in favor of the former than the latter but seeing the latter work, makes me doubt that stringent editorial control is the only path to an acceptable degree of editorial quality.

What would a rough-cut topic map authoring interface look like?

Suggestions?

Pwn2Own +1!

March 23rd, 2015

Paul details the results from Pwn2Own 2015 and gives a great run down on the background of the contest. A must read if you are interested in cybersecurity competitions. Here the targets were:

• Windows
• Microsoft IE 11
• Mozilla Firefox
• Apple Safari

Bugs were found in all and system access obtained in four cases.

I mention this in part to ask you to participate in Paul’s poll on whether Pwn2Own contests are a good idea.

As you can imagine, I think they rock!

Assuming the winners did devote a substantial amount of time prior to the contest, a \$110,000 prize (by one winner) is no small matter.

Paul cites critics as saying:

it makes security molehills into theatrical mountains.

I don’t know who the critics are but system level access sounds like more a molehill to me.

Critics of Pwn2Own are dour faced folks who want bugs reported to vendors and with an unlimited time to fix them, whether they acknowledge the report or not, and if they do, you should be satisfied with an “atta boy/girl” and maybe a free year’s subscription to a PC gaming zine.

Let’s see, vendors sell buggy software for a profit, accept no liability for it, abuse/neglect reporters of bugs, and then want reporters of bugs to contribute their work for free. Plus keep your knowledge secret for the “good of the community.”

Do you see a pattern there?

Screw that!

Vote in favor of Pwn2Own and organize similar events!

From Nand to Tetris / Part I [“Not for everybody.”]

March 23rd, 2015

From Nand to Tetris / Part I April 11 – June 7 2015

From the webpage:

Build a modern computer system, starting from first principles. The course consists of six weekly hands-on projects that take you from constructing elementary logic gates all the way to building a fully functioning general purpose computer. In the process, you will learn — in the most direct and intimate way — how computers work, and how they are designed.

This course is a fascinating 7-week voyage of discovery in which you will go all the way from Boolean algebra and elementary logic gates to building a central processing unit, a memory system, and a hardware platform, leading up to a general-purpose computer that can run any program that you fancy. In the process of building this computer you will become familiar with many important hardware abstractions, and you will implement them, hands on. But most of all, you will enjoy the tremendous thrill of building a complex and useful system from the ground up.

You will build all the hardware modules on your home computer, using a Hardware Description Language (HDL), learned in the course, and a hardware simulator, supplied by us. A hardware simulator is a software system that enables building and simulating gates and chips before actually committing them to silicon. This is exactly what hardware engineers do in practice: they build and test computers in simulation, using HDL and hardware simulators.

Do you trust locks?

Do you know how locks work?

I don’t and yet I trust locks to work. But then a lock requires physical presence to be opened and locks do have a history of defeating attempts to unlock them without the key. Not always but a high percentage of the time.

Do you trust computers?

Do you know how computers work?

I don’t, not really. Not at the level of silicon.

So why would I trust computers? We know computers are as faithful as a napkin at a party and have no history of being secure, for anyone.

Necessity seems like a weak answer doesn’t it? Trusting computers to be insecure seems like a better answer.

Not that everyone wants or needs to delve into computers at the level of silicon but exposure to the topic doesn’t hurt.

Might even help when you hear of hardware hacks like rowhammer. You don’t really think that is the last of the hardware hacks do you? Seriously?

BTW, I first read about this course in the Clojure Gazette, which is a great read, whether you are a Clojure programmer or not. Take a look and consider subscribing. Another reason to subscribe is that it lists a smail address of New Orleans, Louisiana.

Even the fast food places have good food in New Orleans. The non-fast food has to be experienced. Words are not enough. It would be like trying to describe sex to someone who has only read about it. Just not the same. Every conference should be in New Orleans every two or three years.

After you get through day-dreaming about New Orleans, go ahead and register for From Nand to Tetris / Part I April 11 – June 7 2015

A Well Regulated Militia

March 22nd, 2015

From the post:

The National Security Agency want to be able to hack more people, vacuum up even more of your internet records and have the keys to tech companies’ encryption – and, after 18 months of embarrassing inaction from Congress on surveillance reform, the NSA is now lobbying it for more powers, not less.

NSA director Mike Rogers testified in front of a Senate committee this week, lamenting that the poor ol’ NSA just doesn’t have the “cyber-offensive” capabilities (read: the ability to hack people) it needs to adequately defend the US. How cyber-attacking countries will help cyber-defense is anybody’s guess, but the idea that the NSA is somehow hamstrung is absurd.

Like everyone else I like reading hacking stories, particularly the more colorful ones! But for me, at least until now, hacking has been like debugging core dumps, it’s an interesting technical exercise but not much more than that.

I am incurious about the gossip the NSA is sweeping up for code word access, but I am convinced that we all need a strong arm to defend our digital privacy and the right to tools to protect ourselves.

The dangers to citizens have changed since James Madison wrote in the Bill or Rights:

“A well regulated Militia, being necessary to the security of a free State, the right of the people to keep and bear Arms, shall not be infringed.”

In 1789, oppression and warfare was conducted with muzzle loaders and swords. Guns are still a common means of oppression, but the tools of oppression have grown since 1789. Then there was no mass surveillance of phone traffic, bank accounts, camera feeds, not to mention harvesting of all network traffic. Now, all of those things are true.

Our reading of the Second Amendment needs to be updated to include computers, software developed for hacking, training for hackers and research on hacking. Knowing how to break encryption isn’t the same thing as illegally breaking encryption. It is a good way to test whether the promised encryption will exclude prying government eyes.

I’m not interested in feel good victories that come years after over reaching by the government. It’s time for someone to take up the gage that the NSA has flung down in the street. Someone who traffics in political futures and isn’t afraid to get their hands dirty.

The NRA has been a long term and successful advocate for Second Amendment rights. And they have political connections that would take years to develop. When was the last time you heard of the NRA winning symbolic victories for someone after they had been victimized? Or do you hear of victories by the NRA before their membership is harmed by legislation? Such as anti-hacking legislation.

Since the NRA is an established defender of the Second Amendment, with a lot of political clout, let’s work on expanding the definition of “arms” in the Second Amendment to include computers, knowledge of how to break encryption and security systems, etc.

The first step is to join the NRA (like everybody they listen to paying members first).

The second step is educate other NRA members and the public posed by unchecked government cyberpower. Current NRA members may die with their guns in hand but government snoops know what weapons they have, ammunition, known associates, and all of that is without gun registration. A machine pistol is a real mis-match against digital government surveillance. As in the losing side.

The third step is to start training yourself as a hacker. Setup a small network at home so you can educate yourself, off of public networks, about the weaknesses of hardware and software. Create or join computer clubs dedicated to learning hacking arts.

BTW, the people urging you to hack Y12 (a nuclear weapons facility), Chase and the White House are all FBI plants. Privately circulate their biometrics to other clubs. Better informants that have been identified than unknowns. Promptly report all illegal suggestions from plants. You will have the security agencies chasing their own tails.

Take this as a warm-up. I need to dust off some of my Second Amendment history. Suggestions and comments are always welcome.

Looking forward to the day when even passive government surveillance sets off alarms all over the net.

Balisage submissions are due on April 17th

March 21st, 2015

Balisage submissions are due on April 17th!

Yeah, that’s what I thought when I saw the email from Tommie Usdin earlier this week!

Tommie writes:

Just a friendly reminder: Balisage submissions are due on April 17th! That’s just under a month.

Do you want to speak at Balisage? Participate in the pre-conference symposium on Cultural Heritage Markup? Then it is time to put some work in on your paper!

See the Call for Participations at:

http://www.balisage.net/Call4Participation.html

http://www.balisage.net/CulturalHeritage/index.html

Instructions for authors: http://www.balisage.net/authorinstructions.html

Do you need help with the mechanics of your Balisage submission? If we can help please send email to info@balisage.net

It can’t be the case that the deep learning, GPU toting AI folks have had all the fun this past year. After all, without data they would not have anything to be sexy about. Or is that with? Never really sure with those folks.

What I am sure about is that the markup folks at Balisage are poised to save Big Data from becoming Big Dark Data without any semantics.

But they can’t do it without your help! Will you stand by and let darkness cover all of Big Data or will you fight to preserve markup and the semantics it carries?

Sharpen your markup! Back to back, our transparency against the legions of darkness.

Well, it may not get that radical because Tommie is such a nice person but she has to sleep sometime. After she’s asleep, then we rumble.

Be there!

FaceNet: A Unified Embedding for Face Recognition and Clustering

March 21st, 2015

Abstract:

Despite significant recent advances in the field of face recognition, implementing face verification and recognition efficiently at scale presents serious challenges to current approaches. In this paper we present a system, called FaceNet, that directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure of face similarity. Once this space has been produced, tasks such as face recognition, verification and clustering can be easily implemented using standard techniques with FaceNet embeddings as feature vectors.

Our method uses a deep convolutional network trained to directly optimize the embedding itself, rather than an intermediate bottleneck layer as in previous deep learning approaches. To train, we use triplets of roughly aligned matching / non-matching face patches generated using a novel online triplet mining method. The benefit of our approach is much greater representational efficiency: we achieve state-of-the-art face recognition performance using only 128-bytes per face.

On the widely used Labeled Faces in the Wild (LFW) dataset, our system achieves a new record accuracy of 99.63%. On YouTube Faces DB it achieves 95.12%. Our system cuts the error rate in comparison to the best published result by 30% on both datasets. (emphasis in the original)

With accuracy at 99.63%, the possibilities are nearly endless.

How long will it be before some start-up is buying ATM feeds from banks? Fast and accurate location information would be of interest to process servers, law enforcement, debt collectors, various government agencies, etc.

Looking a bit further ahead, ATM surrogate services will become a feature of better hotels and escort services.

GCHQ May Be Spying On You!

March 21st, 2015

GCHQ, like many similar agencies, have been given carte blanche to snoop around the world.

Dave reports that GCHQ has responded to this disclosure not with denial but protesting that it would never ever snoop without following all of the rules, except for those against snooping of course.

What fails in almost every government scandal isn’t the safeguards against wrong doing, but rather the safeguards against anyone discovering the wrong doing. Yes? So it isn’t that the government doesn’t lie, cheat, abuse, etc., but that they are seldom caught. Safeguards against government violating its own restrictions seem particularly weak.

The UK and other governments fail to realize every retreat from the rule of law damages the legitimacy of that government. If they think governing is difficult now, imagine the issues when the average citizen obeys the law only with due regard to the proximity of a police officer. People joke about that now but watch people obey even mindless traffic rules. To say nothing of more serious rules.

The further and further governments retreat into convenience of the moment decision making, the less and less call they will have on the average citizen to “do the right thing.” Why should they? Their leadership has set the example that whether it is lying to get elected (Benjamin Netanyahu) or lying to start a war (George W. Bush) or lying to get funding (Michael Rogers, its ok.

Since GCHQ has decided it isn’t subject to the law, would you report a plot against GCHQ or the UK government? (Assume you just overheard it and weren’t involved.)

Memantic: A Medical Knowledge Discovery Engine

March 21st, 2015

Abstract:

We present a system that constructs and maintains an up-to-date co-occurrence network of medical concepts based on continuously mining the latest biomedical literature. Users can explore this network visually via a concise online interface to quickly discover important and novel relationships between medical entities. This enables users to rapidly gain contextual understanding of their medical topics of interest, and we believe this constitutes a significant user experience improvement over contemporary search engines operating in the biomedical literature domain.

Alexei takes advantage of prior work on medical literature to index and display searches of medical literature in an “economical” way that can enable researchers to discover new relationships in the literature without being overwhelmed by bibliographic detail.

You will need to check my summary against the article but here is how I would describe Memantic:

Memantic indexes medical literature and records the co-occurrences of terms in every text. Those terms are mapped into a standard medical ontology (which reduces screen clutter). When a search is performed, the “results are displayed as nodes based on the medical ontology and includes relationships established by the co-occurrences found during indexing. This enables users to find relationships without the necessity of searching through multiple articles or deduping their search results manually.

As I understand it, Memantic is as much an effort at efficient visualization as it is an improvement in search technique.

Very much worth a slow read over the weekend!

I first saw this in a tweet by Sami Ghazali.

PS: I tried viewing the videos listed in the paper but wasn’t able to get any sound? Maybe you will have better luck.

Where’s the big data?

March 21st, 2015

Alex Woodie in Can’t Ignore the Big Data Revolution draws our attention to: Big Data Revolution by Rob Thomas and Patrick McSharry.

Not the first nor likely the last book on “big data,” but it did draw these comments from Thomas Hale:

Despite all the figures, though, the revolution is not entirely quantified after all. The material costs to businesses implied by installing data infrastructure, outsourcing data management to other companies, or storing data, are rarely enumerated. Given the variety of industries the authors tackle, this is understandable. But it seems the cost of the revolution (something big data itself might be inclined to predict) remains unknown.

The book is perhaps most interesting as a case study of the philosophical assumptions that underpin the growing obsession with data. Leaders of the revolution will have “the ability to suspend disbelief of what is possible, and to create their own definition of possible,” the authors write.

Their prose draws heavily on similar invocations of technological idealism, with the use of words such as “enlightenment”, “democratise”, “knowledge-based society” and “inspire”.

Part of their idea of progress implies a need to shift from opinion to fact. “Modern medicine is being governed by human judgment (opinion and bias), instead of data-based science,” state the authors.

Hale comes close but strikes short of the mark when he excuses the lack of data to justify the revolution.

The principal irony of this book and others in the big data orthodoxy is the lack of big data to justify the claims made on behalf of big data. If the evidence is lacking because big data isn’t in wide use, then the claims for big data are not “data-based” are they?

The claims for big data take on a more religious tinge, particularly when readers are urged to “suspend disbelief,” create new definitions of possible, to seek “enlightenment,” etc.

You may remember the near religious hysteria around intelligent agents and the Semantic Web, the remnants of which are still entangling libraries and government projects who haven’t gotten the word that it failed. In part because information issues are indifferent to the religious beliefs of humans.

The same is the case with both the problems and benefits of big data, whatever you believe them to be, those problems and benefits are deeply indifferent to your beliefs. What is more, your beliefs can’t change the nature of those problems and benefits.

Shouldn’t a “big data” book be data-driven and not the product of “human judgment (opinion and bias)”?

Careful readers will ask, hopefully before purchasing a copy of Big Data Revolution and thereby encouraging more publications on “big data” is:

Where’s the big data?

You can judge whether to purchase the volume on the basis of the answer to that question.

PS: Make no mistake, data can have value. But, spontaneous generation of value by piling data into ever increasing piles is just as bogus as spontaneous generation of life.

PPS: Your first tip off that there is no “big data” is the appearance of the study in book form. If there were “big data” to support their conclusions, you would need cloud storage to host it and tools to manipulate it. In that case, why do you need the print book?

Turning the MS Battleship

March 21st, 2015

Improving interoperability with DOM L3 XPath by Thomas Moore.

From the post:

As part of our ongoing focus on interoperability with the modern Web, we’ve been working on addressing an interoperability gap by writing an implementation of DOM L3 XPath in the Windows 10 Web platform. Today we’d like to share how we are closing this gap in Project Spartan’s new rendering engine with data from the modern Web.

Some History

Prior to IE’s support for DOM L3 Core and native XML documents in IE9, MSXML provided any XML handling and functionality to the Web as an ActiveX object. In addition to XMLHttpRequest, MSXML supported the XPath language through its own APIs, selectSingleNode and selectNodes. For applications based on and XML documents originating from MSXML, this works just fine. However, this doesn’t follow the W3C standards for interacting with XML documents or exposing XPath.

To accommodate a diversity of browsers, sites and libraries wrap XPath calls to switch to the right implementation. If you search for XPath examples or tutorials, you’ll immediately find results that check for IE-specific code to use MSXML for evaluating the query in a non-interoperable way:

It seems like a long time ago that a relatively senior Microsoft staffer told me that turning a battleship like MS takes time. No change, however important, is going to happen quickly. Just the way things are in a large organization.

The important thing to remember is that once change starts, that too takes on a certain momentum and so is more likely to continue, even though it was hard to get started.

Yes, I am sure the present steps towards greater interoperability could have gone further, in another direction, etc. but they didn’t. Rather than complain about the present change for the better, why not use that as a wedge to push for greater support for more recent XML standards?

For my part, I guess I need to get a copy of Windows 10 on a VM so I can volunteer as a beta tester for full XPath (XQuery?/XSLT?) support in a future web browser. MS as a full XML competitor and possible source of open source software would generate some excitement in the XML community!

NSA Chief Crys Wolf! (Again)

March 21st, 2015

Cyber Attackers Leaving Warning ‘Messages': NSA Chief

From the post:

Admiral Michael Rogers, director of the National Security Agency and head of the Pentagon’s US Cyber Command, made the comments to a US Senate panel as he warned about the growing sophistication of cyber threats.

“Private security researchers over the last year have reported on numerous malware finds in the industrial control systems of energy sector organizations,” Rogers said in written testimony. ”

Of particular risk is so-called critical infrastructure networks — power grids, transportation, water and air traffic control, for example — where a computer outage could be devastating.

Rogers added that the military is about halfway toward building its new cyber defense corps of 6,200 which could help in defending the national against cyber attacks.

Wait for it…

But he told the lawmakers on the Armed Services Committee that any budget cuts or delays in authorizing funds “will slow the build of our cyber teams” and hurt US defense efforts in cyberspace. (emphasis added)

So, the real issue is that Admiral Rogers doesn’t want to lose funding. Why didn’t he just say that and skip lying about the threat to infrastructure?

The Naval Academy Honor Concept doesn’t back Rogers on this point:

They tell the truth and ensure that the full truth is known. They do not lie.

Ted G. Lewis in Critical Infrastructure Protection in Homeland Security notes:

Digital Pearl Harbors are unlikely. Infrastructure systems, because they have to deal with failure on a routine basis, are also more flexible and responsive in restoring service than early analysts realized. Cyber attacks, unless accompanied by a simultaneous physical attack that achieves physical damage, are short-lived and ineffective.

Everyone in the United States has experienced loss of electrical power or telephone communications due to bad weather. Moreover, industrial control systems aren’t part of the Internet.

Rogers is training “cyber-warriors” for the wrong battlefield. Rogers can’t get access to the private networks where Stuxnet, etc., might be a problem so he is training “cyber-warriors” to fight where they can get access.

Huh? Isn’t that rather dumb? Training to fight on the Internet when the attack will come by invasion of private networks? That doesn’t sound like a winning strategy to me. Maybe Rogers doesn’t know the difference between the Internet and private networks. They do both use network cabling.

It’s not just me that disagrees with Admiral Rogers’ long face about critical infrastructure. Jame Clapper, you remember, the habitual liar to Congress? and also Director of National Intelligence, he disagrees with Rogers:

If there is good news, he said, it is that a catastrophic destruction of infrastructure appears unlikely.

“Cyber threats to U.S. national and economic security are increasing in frequency, scale, sophistication, and severity of impact,” the written assessment says. “Rather than a ‘Cyber Armageddon’ scenario that debilitates the entire US infrastructure, we envision something different. We foresee an ongoing series of low-to-moderate level cyber attacks from a variety of sources over time, which will impose cumulative costs on U.S. economic competitiveness and national security.”

Of course, Clapper may be lying again. But he could be accidentally telling the truth. Picked up the wrong briefing paper on his way out of the office. Mistakes do happen.

Unless and until Admiral Rogers specifies the “…numerous malware finds in the industrial control systems….” and specifies how his “cyber-warriors” have the ability to stop such malware attacks, all funding for the program should cease.

Connecting the dots in procurement of cybersecurity services could provide more protection to United States infrastructure that stopping every cyber attack over the next several years.

March 20th, 2015

Hacking Your Neighbor’s Wi-Fi: Practical Attacks Against Wi-Fi Security

From the post:

While the access points in organizations are usually under the protection of organization-wide security policies, home routers are less likely to be appropriately configured by their owners in absence of such central control. This provides a window of opportunity to neighboring Wi-Fi hackers. We talk about hacking a neighbor’s Wi-Fi since proximity to the access point is a must for wireless hacking—which is not an issue for a neighbor with an external antenna. With abundance of automated Wi-Fi hacking tools such as ‘Wifite’, it no longer takes a skilled attacker to breach Wi-Fi security. Chances are high that one of your tech-savvy neighbors would eventually exploit a poorly configured access point. The purpose may or may not be malicious; sometimes it may simply be out of curiosity. However, it is best to be aware of and secure your Wi-Fi against attacks from such parties.

For all the attention that bank and insurance company hacks get, having your own Wi-Fi hacked would be personally annoying.

Take the opportunity to check and correct any Wi-Fi security issues with your network. If you aren’t easy, it may encourage script kiddies to go elsewhere. And could make life more difficult for the alphabet agencies, which is always an added plus.

I first saw this in a tweet by NuHarbor Security.