Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

October 10, 2018

Microsoft Open-Sources Patent Portfolio: OIN ~1,300 + 60,000 = ~61,300 Patents

Filed under: Open Source,Patents — Patrick Durusau @ 1:06 pm

Kudos! Microsoft Open-Sources Patent Portfolio by Steven J Vaughan-Nichols.

From the post:

Several years ago, I said the one thing Microsoft has to do — to convince everyone in open source that it’s truly an open-source supporter — is stop using its patents against Android vendors. Now, it’s joined the Open Invention Network (OIN), an open-source patent consortium. Microsoft has essentially agreed to grant a royalty-free and unrestricted license to its entire patent portfolio to all other OIN members.

Before Microsoft joined, OIN had more than 2,650 community members and owns more than 1,300 global patents and applications. OIN is the largest patent non-aggression community in history and represents a core set of open-source intellectual-property values. Its members include Google, IBM, Red Hat, and SUSE. The OIN patent license and member cross-licenses are available royalty-free to anyone who joins the OIN community.

In a conversation, Erich Andersen, Microsoft’s corporate vice president and chief intellectual property (IP) counsel — that is, Microsoft top patent person — added: We “pledge our entire patent portfolio to the Linux system. That’s not just the Linux kernel, but other packages built on it.”

This is huge

How many patents does this affect? Andersen said Microsoft is bringing 60,000 patents to OIN.
(emphasis in original)

If approximately 1,300 patents attracted members to Open Invention Network (OIN), imagine the attractive force exerted by an additional 60,000!

Suggestion: None of us are who we were yesterday, much less ten or twenty years ago. Let’s take these new facts on the patent landscape and move forward.

Discussions of “could have, should have, what if had, etc.,” are non-contributions to building a new tomorrow.

October 4, 2018

Patent Prior Art Archive – Malware Prior Art?

Filed under: Cybersecurity,Malware,Patents — Patrick Durusau @ 8:18 am

Coming together to create a prior art archive by Ian Wetherbee and Mike Lee.

From the post:

Patent quality is a two-way street. Patent applicants should submit detailed disclosures describing their inventions and actively participate in the examination process to define clear distinctions between their inventions and existing technology. Examiners reviewing patent applications should conduct thorough searches of existing technology, reject any attempts to patent existing technology, and develop a clear record of the differences between the patent claims and what came before. The more that the patent system supports and incentivizes these activities, the more reliable the rights that issue from patent offices will be, and the more those patents will promote innovation.

A healthy patent system requires that patent applicants and examiners be able to find and access the best documentation of state-of-the-art technology. This documentation is often found in sources other than patents. Non-patent literature can be particularly hard to find and access in the software field, where it may take the form of user manuals, technical specifications, or product marketing materials. Without access to this information, patent offices may issue patents covering existing technology, or not recognize trivial extensions of published research, removing the public’s right to use it and bringing the reliability of patent rights into question.

To address this problem, academia and industry have worked together to launch the Prior Art Archive, created through a collaboration between the MIT Media Lab, Cisco and the USPTO, and hosted by MIT. The Prior Art Archive is a new, open access system that allows anyone to upload those hard-to-find technical materials and make them easily searchable by everyone.

Believe it or not, Wetherbee and Lee write an entire post on Google and the Prior Art Archive, without ever giving the web address of the Prior Art Archive.

There, fixed that problem on the web. 😉 You know, it’s possible to be so self-centered as to be self-defeating.

The problems of malware prior art are orders of magnitude greater than patent prior art. The literature, posts, etc., alone are spread across ephemeral and often inaccessible forums, blogs, emails, chat groups, to say nothing of the self-defeating secrecy of security researchers themselves. (Not to mention information in languages other than English.)

A malware prior art archive would present numerous indexing, searching, machine translation, clustering and other problems. Perhaps not as lucrative as the results of the Patent Prior Art Archive but at least as interesting.

Thoughts? Suggestions?

PS: You can search the Prior Art Archive through Google Patents. Two other relevant Google resources: TDCommons (non-patented information) and Google Patents Public Datasets.

August 30, 2016

Elsevier Awarded U.S. Patent For “Online Peer Review System and Method” [Sack Pool?]

Filed under: Intellectual Property (IP),Patents — Patrick Durusau @ 7:19 pm

Elsevier Awarded U.S. Patent For “Online Peer Review System and Method” by Gary Price.

Gary quotes the abstract:

An online document management system is disclosed. In one embodiment, the online document management system comprises: one or more editorial computers operated by one or more administrators or editors, the editorial computers send invitations and manage peer review of document submissions; one or more system computers, the system computers maintain journals, records of submitted documents and user profiles, and issue notifications; and one or more user computers; the user computers submit documents or revisions to the document management system; wherein one or more of the editorial computers coordinate with one or more of the system computers to migrate one or more documents between journals maintained by the online document management system.

Is there a pool on the staff that recommended and pursued that patent being awarded the sack in the next week?

March 8, 2016

Patent Sickness Spreads [Open Source Projects on Prior Art?]

Filed under: Intellectual Property (IP),Natural Language Processing,Patents,Searching — Patrick Durusau @ 7:31 pm

James Cook reports a new occurrence of patent sickness in Facebook has an idea for software that detects cool new slang before it goes mainstream.

The most helpful part of James’ post is the graphic outline of the “process” patented by Facebook:

facebook-patent

I sure do hope James has not patented that presentation because it make the Facebook patent, err, clear.

Quick show of hands on originality?

While researching this post, I ran across Open Source as Prior Art at the Linux Foundation. Are there other public projects that research and post prior art with regard to particular patents?

An armory of weapons for opposing ill-advised patents.

The Facebook patent is: 9,280,534 Hauser, et al. March 8, 2016, Generating a social glossary:

Its abstract:

Particular embodiments determine that a textual term is not associated with a known meaning. The textual term may be related to one or more users of the social-networking system. A determination is made as to whether the textual term should be added to a glossary. If so, then the textual term is added to the glossary. Information related to one or more textual terms in the glossary is provided to enhance auto-correction, provide predictive text input suggestions, or augment social graph data. Particular embodiments discover new textual terms by mining information, wherein the information was received from one or more users of the social-networking system, was generated for one or more users of the social-networking system, is marked as being associated with one or more users of the social-networking system, or includes an identifier for each of one or more users of the social-networking system. (emphasis in original)

February 22, 2016

U.S. Patents Requirements: Novel/Non-Obvious or Patent Fee?

Filed under: Intellectual Property (IP),Patents,Searching — Patrick Durusau @ 8:34 am

IBM brags about its ranking in patents granted, IBM First in Patents for 23rd Consecutive Year, and is particularly proud of patent 9087304, saying:

We’ve all been served up search results we weren’t sure about, whether they were for “the best tacos in town” or “how to tell if your dog has eaten chocolate.” With IBM Patent no. 9087304, you no longer have to second-guess the answers you’re given. This new tech helps cognitive machines find the best potential answers to your questions by thinking critically about the trustworthiness and accuracy of each source. Simply put, these machines can use their own judgment to separate the right information from wrong. (From: http://ibmblr.tumblr.com/post/139624929596/weve-all-been-served-up-search-results-we-werent

Did you notice that the 1st for 23 years post did not have a single link for any of the patents mentioned?

You would think IBM would be proud enough to link to its new patents and especially 9087304, that “…separate[s] right information from wrong.”

But if you follow the link for 9087304, you get an impression of one reason IBM didn’t include the link.

The abstract for 9087304 reads:

Method, computer program product, and system to perform an operation for a deep question answering system. The operation begins by computing a concept score for a first concept in a first case received by the deep question answering system, the concept score being based on a machine learning concept model for the first concept. The operation then excludes the first concept from consideration when analyzing a candidate answer and an item of supporting evidence to generate a response to the first case upon determining that the concept score does not exceed a predefined concept minimum weight threshold. The operation then increases a weight applied to the first concept when analyzing the candidate answer and the item of supporting evidence to generate the response to the first case when the concept score exceeds a predefined maximum weight threshold.

I will spare you further recitations from the patent.

Show of hands, do U.S. Patents always require:

  1. novel/non-obvious ideas
  2. patent fee
  3. #2 but not #1

?

Judge rankings by # of patents granted accordingly.

February 15, 2015

The US Patent and Trademark Office should switch from documents to data

Filed under: Government Data,Patents — Patrick Durusau @ 2:00 pm

The US Patent and Trademark Office should switch from documents to data by Justin Duncan.

From the post:

The debate over patent reform — one of Silicon Valley’s top legislative priorities — is once again in focus with last week’s introduction of the Innovation Act (H.R. 9) by House Judiciary Committee Chairman Bob Goodlatte (R-Va.), Rep. Peter DeFazio (D-Ore.), Subcommittee on Courts, Intellectual Property, and the Internet Chairman Darrell Issa (R-Calif.) and Ranking Member Jerrold Nadler (D-N.Y.), and 15 other original cosponsors.

The Innovation Act largely takes aim at patent trolls (formally “non-practicing entities”), who use patent litigation as a business strategy and make money by threatening lawsuits against other companies. While cracking down on litigious patent trolls is important, that challenge is only one facet of what should be a larger context for patent reform.

The need to transform patent information into open data deserves some attention, too.

The United States Patent and Trademark Office (PTO), the agency within the Department of Commerce that grants patents and registers trademarks, plays a crucial role in empowering American innovators and entrepreneurs to create new technologies. Ironically, many of the PTO’s own systems and technologies are out of date.

Last summer, Data Transparency Coalition advisor Joel Gurin and his colleagues organized an Open Data Roundtable with the Department of Commerce, co-hosted by the Governance Lab at New York University (GovLab) and the White House Office of Science and Technology Policy (OSTP). The roundtable focused on ways to improve data management, dissemination, and use at the Department of Commerce. It shed some light on problems faced by the PTO.

According to GovLab’s report of the day’s findings and recommendations, the PTO is currently working to improve the use and availability of some patent data by putting it in a more centralized, easily searchable form.

To make patent applications easier to navigate – for inventors, investors, the public, and the agency itself – the PTO should more fully embrace the use of structured data formats, like XML, to express the information currently collected as PDFs or text documents.

Justin’s post is a brief history of efforts to improve access to patent and trademark information, mostly focusing on the need for the USPTO (US Patent and Trademark Office) to stop relying on PDF as its default format.

Other potential improvements:

Additional GovLab recommendations included:

  • PTO [should] make more information available about the scope of patent rights, including expiration dates, or decisions by the agency and/or courts about patent claims.
  • PTO should add more context to its data to make it usable by non-experts – e.g. trademark transaction data and trademark assignment.
  • Provide Application Programming Interfaces (APIs) to enable third parties to build better interfaces for the existing legacy systems. Access to Patent Application Information Retrieval (PAIR) and Patent Trial and Appeal Board (PTAB) data are most important here.
  • Improve access to Cooperative Patent Classification (CPC)/U.S. Patent Classification (USPC) harmonization data; tie this data more closely to economic data to facilitate analysis.

Tying in related information, the first and last recommendations on the GovLab list is another step in the right direction.

But only a step.

If you have ever searched the USPTO patent database you know making the data “searchable” is only a nod and wink towards accessibility. Making the data is nothing to sneeze at but USPTO reform should have a higher target than simple being “searchable.”

Outside of patent search specialists (and not all of them), what ordinary citizen is going to be able to navigate the terms of art across domains when searching patents?

The USPTO should go beyond making patents literally “searchable” and instead make patents “reliably” searchable. By “reliable” searching I mean searching that returns all the relevant patents. A safe harbor if you will that protects inventors, investors and implementers from costly suits arising out of the murky wood filled with traps, intellectual quicksand and formulaic chants that are the USPTO patent database.

I first saw this in a tweet by Joel Gurin.

January 22, 2015

Supremes “bitch slaps” Patent Court

Filed under: Law,Patents — Patrick Durusau @ 3:20 pm

Supreme Court strips more power from controversial patent court by Jeff John Roberts.

From the post:

The Supreme Court issued a ruling Tuesday that will have a significant impact on the patent system by limiting the ability of the Federal Circuit, a specialized court that hears patent appeals, to review key findings by lower court judges.

The 7-2 patent decision, which came the same day as a high profile ruling by the Supreme Court on prisoner beards, concerns an esoteric dispute between two pharmaceutical companies, Teva and Sandoz, over the right way to describe the molecule weight of a multiple sclerosis drug.

The Justices of the Supreme Court, however, appears to have taken the case in part because it presented another opportunity to check the power of the Federal Circuit, which has been subject to a recent series of 9-0 reversals and which some regard as a “rogue court” responsible for distorting the U.S. patent system.

As for the legal decision on Tuesday, it turned on the question of whether the Federal Circuit judges can review patent claim findings as they please (“de novo”) or only in cases where they has been serious error. Writing for the majority, Justice Stephen Breyer concluded that the Federal Circuit could not second guess how lower courts interpret those claims (a process called “claim construction”) except on rare occasions.

There is no doubt the Federal Circuit has done its share of damage to the patent system but it hasn’t acted alone. Congress and the patent system itself bear a proportionate share of the blame.

Better search and retrieval technology can’t clean out the mire in the USPTO stables. That is going to require reform from Congress and a sustained effort at maintaining the system once it has been reformed.

In the meantime, knowing that another blow has been dealt the Federal Circuit on patent issues will have to sustain reform efforts.

December 15, 2014

Some tools for lifting the patent data treasure

Filed under: Deduplication,Patents,Record Linkage,Text Mining — Patrick Durusau @ 11:57 am

Some tools for lifting the patent data treasure by by Michele Peruzzi and Georg Zachmann.

From the post:

…Our work can be summarized as follows:

  1. We provide an algorithm that allows researchers to find the duplicates inside Patstat in an efficient way
  2. We provide an algorithm to connect Patstat to other kinds of information (CITL, Amadeus)
  3. We publish the results of our work in the form of source code and data for Patstat Oct. 2011.

More technically, we used or developed probabilistic supervised machine-learning algorithms that minimize the need for manual checks on the data, while keeping performance at a reasonably high level.

The post has links for source code and data for these three papers:

A flexible, scaleable approach to the international patent “name game” by Mark Huberty, Amma Serwaah, and Georg Zachmann

In this paper, we address the problem of having duplicated patent applicants’ names in the data. We use an algorithm that efficiently de-duplicates the data, needs minimal manual input and works well even on consumer-grade computers. Comparisons between entries are not limited to their names, and thus this algorithm is an improvement over earlier ones that required extensive manual work or overly cautious clean-up of the names.

A scaleable approach to emissions-innovation record linkage by Mark Huberty, Amma Serwaah, and Georg Zachmann

PATSTAT has patent applications as its focus. This means it lacks important information on the applicants and/or the inventors. In order to have more information on the applicants, we link PATSTAT to the CITL database. This way the patenting behaviour can be linked to climate policy. Because of the structure of the data, we can adapt the deduplication algorithm to use it as a matching tool, retaining all of its advantages.

Remerge: regression-based record linkage with an application to PATSTAT by Michele Peruzzi, Georg Zachmann, Reinhilde Veugelers

We further extend the information content in PATSTAT by linking it to Amadeus, a large database of companies that includes financial information. Patent microdata is now linked to financial performance data of companies. This algorithm compares records using multiple variables, learning their relative weights by asking the user to find the correct links in a small subset of the data. Since it is not limited to comparisons among names, it is an improvement over earlier efforts and is not overly dependent on the name-cleaning procedure in use. It is also relatively easy to adapt the algorithm to other databases, since it uses the familiar concept of regression analysis.

Record linkage is a form of merging that originated in epidemiology in the late 1940’s. To “link” (read merge) records across different formats, records were transposed into a uniform format and “linking” characteristics chosen to gather matching records together. A very powerful technique that has been in continuous use and development ever since.

One major different with topic maps is that record linkage has undisclosed subjects, that is the subjects that make up the common format and the association of the original data sets with that format. I assume in many cases the mapping is documented but it doesn’t appear as part of the final work product, thereby rendering the merging process opaque and inaccessible to future researchers. All you can say is “…this is the data set that emerged from the record linkage.”

Sufficient for some purposes but if you want to reduce the 80% of your time that is spent munging data that has been munged before, it is better to have the mapping documented and to use disclosed subjects with identifying properties.

Having said all of that, these are tools you can use now on patents and/or extend them to other data sets. The disambiguation problems addressed for patents are the common ones you have encountered with other names for entities.

If a topic map underlies your analysis, the less time you will spend on the next analysis of the same information. Think of it as reducing your intellectual overhead in subsequent data sets.

Income – Less overhead = Greater revenue for you. 😉

PS: Don’t be confused, you are looking for EPO Worldwide Patent Statistical Database (PATSTAT). Naturally there is a US organization, http://www.patstats.org/ that is just patent litigation statistics.

PPS: Sam Hunting, the source of so many interesting resources, pointed me to this post.

June 19, 2014

Software Patent Earthquake!

Filed under: Patents — Patrick Durusau @ 7:14 pm

The details are far from settled but in Alice v. CSL Bank, the US Supreme Court ruled 9-0 that a software patent is invalid.

From the opinion:

We hold that the claims at issue are drawn to the abstract idea of intermediated settlement, and that merely requiring generic computer implementation fails to transform that abstract idea into a patent-eligible invention.

If you want to buy a software portfolio, I would do it quickly, while patent holders are still in a panic. 😉

May 18, 2014

“Dear Piece of Shit…”

Filed under: Patents — Patrick Durusau @ 2:43 pm

“Dear piece of shit…” Life360 CEO sends a refreshingly direct response to a patent troll by Paul Carr.

From the post:

Tale as old as time. Now that family social network Life360 is firmly established in the big leagues — with 33m registered families, and having raised $50m last week from ADT — it was inevitable that the patent trolls would come calling.

But where most CEOs are happy to let their lawyers set the tone of how they respond, Life360′s Chris Hulls has a more, uh, refreshing approach.

When Hulls received a letter from attorney acting for Florida-based Advanced Ground Information Systems, inviting Life360 to “discuss” a “patent licensing agreement” for what AGIS claims is its pre-existing patent for displaying markers of people on a map, he decided to bypass his own attorneys and instead send an email reply straight out of David Mamet…

Dear Piece of Shit,…

Paul’s account of the demand by a patent troll, Advanced Ground Information Systems and the response of Life360 CEO Chris Hull is a masterpiece!

But it left me wondering, ok, so Life360 is stepping up to the plate, is there anything the rest of us can do other than cheer?

Not to discount the value of cheering but cheering is less filling and less satisfying than taking action.

The full complaint is here and the “Dear Piece of Shit” response appears in paragraph 11 of the compliant. The fact finder will be able to conclude that the “Dear Piece of Shit” response was sent, whether the court will take evidence on the plaintiff being a “piece of shit” remains unclear.

Let’s think about how to support Life360 as social network/graph people.

First, we all know about the six degrees of Kevin Bacon. Assuming that is generally true, that means someone reading this blog post is six degrees or less away from someone who is acting for or on behalf of Advanced Ground Information Systems (AGIS). Yes?

From the complaint we can identity the following people for AGIS:

  • Malcolm K. “Cap” Beyer, Jr. (paragraph 9 of the complaint)
  • Ury Fischer, Florida Bar No. 048534, E-mail: ufischer@lottfischer.com
  • Adam Diamond, Florida Bar No. 091008, E-mail: adiamond@lottfischer.com
  • Mark A. Hannemann, New York Bar No. 2770709, E-mail: mhannemann@kenyon.com
  • Thomas Makin, New York Bar No. 3953841, E-mail: tmakin@kenyon.com
  • Matthew Berkowitz, New York Bar No. 4397899, E-mail: mberkowitz@kenyon.com
  • Rose Cordero Prey. New York Bar No. 4326591, E-mail: rcordero@kenyon.com
  • Anne Elise Li, New York Bar No. 4480497, E-mail: ali@kenyon.com
  • Vincent Rubino, III, New York Bar No. 4557435, E-mail: vrubino@kenyon.com

Everyone with a “Bar No” is counsel for AGIS.

All that information appears in the public record of the pleadings filed on behalf of AGIS.

What isn’t known is who else works for AGIS?

Or, who had connections to people who work for AGIS?

Obviously no one should contact or harass anyone in connection with a pending lawsuit, civil or criminal.

On the other hand, everyone within six degrees of separation of those acting on behalf of AGIS, retain their freedom of association rights.

Or should I say, their freedom of disassociation rights? Much in the same way that were exercised concerning J. Bruce Ismay.

The USPTO, which recently issued a patent for taking a photograph against a white background, isn’t going to help fix the patent system.

Lawyers seeking:

C. An award to Plaintiff of the damages to which it is entitled under at least 35 U.S.C. § 284 for Defendant’s past infringement and any continuing or future infringement, including both compensatory damages and treble damages for defendants’ willful infringement;

D. A judgment and order requiring defendants to pay the costs of this action (including all disbursements), as well as attorneys’ fees;

aren’t going to fix the patent system.

Lawyers advising victims of patent troll litigation aren’t going to fix the patent system because settling is cheaper. It’s just a question of which costs more money, settlements or litigation? Understandable but that leaves trolls to harass others.

If anyone is going to fix it, it will have to be victims like Life360 along with the lawful support of all non-trolls in the IP universe.

PS: If you have legal analysis or evidence that would be relevant to invalidation of the patents in question, please don’t hesitate to speak up.

May 8, 2014

Patents Aren’t What They Used To Be

Filed under: Patents — Patrick Durusau @ 7:31 pm

US Patent Office Grants ‘Photography Against A White Background’ Patent To Amazon

I can remember when “having” a patent was a mark of real distinction.

Now, not so much.

See the post for the details but Amazon’s patent for photographs against a white background can be violated by your:

How does this breakthrough work in practice? Glad you asked.

1. Turn back lights on.
2. Turn front lights on.
3. Position thing on platform.
4. Take picture.

Now, we’ll note that in all fairness (HAHAHAHA), Amazon filed this application back in the early days of photography, circa 2011. Nearly three years later, that foresight has paid off, and Amazon can now corner the market on taking pictures in front of a white background.

The patent itself, US 8,676,045. Another blog post: You Can Close The Studio, Amazon Patents Photographing On Seamless White

Amazon does a lot of really cool stuff so I am hopeful that:

  1. Amazon will donate US 8,676,045 to the public domain.
  2. Fire whoever was responsible for this farce. Without consequences for abuse of the patent system by staff of companies who do have legitimate IP, the patent system will continue to deteriorate.

I knew I should have filed to patent addition last year! 😉

May 3, 2014

OpenPolicy [Patent on Paragraphs?]

Filed under: Government,Patents,Searching — Patrick Durusau @ 3:29 pm

OpenPolicy: Knowledge Makes Document Searches Smarter

From the webpage:

The government has a wealth of policy knowledge derived from specialists in myriad fields. What it lacked, until now, was a flexible method for searching the content of thousands of policies using the knowledge of those experts. LMI has developed a tool—OpenPolicy™—to provide agencies with the ability to capture the knowledge of their experts and use it to intuitively search their massive storehouse of policy at hyper speeds.

Traditional search engines produce document-level results. There’s no simple way to search document contents and pinpoint appropriate paragraphs. OpenPolicy solves this problem. The search tool, running on a semantic-web database platform, LMI SME-developed ontologies, and web-based computing power, can currently host tens of thousands of pages of electronic documents. Using domain-specific vocabularies (ontologies), the tool also suggests possible search terms and phrases to help users refine their search and obtain better results.

For agencies wanting to use OpenPolicy, LMI initially builds a powerful computing environment to host the knowledgebase. It then loads all of an agency’s documents—policies, regulations, meeting notes, trouble tickets, essentially any text-based file—into the database. The system can scale to store billions of paragraphs.

No detail on the technology behind OpenPolicy but the mention of paragraphs is enough to make me wary of possible patents on paragraphs.

I am hopeful that even the USPTO would balk at patenting paragraphs in general or as the results of a search but I would not bet money on it.

If you know of any such patents, please post them in comments below.

I first saw this at: LMI Named a Winner in Destination Innovation Competition by Angela Guess.

April 29, 2014

Question-answering system and method based on semantic labeling…

Filed under: Patents,Semantics — Patrick Durusau @ 6:57 pm

Question-answering system and method based on semantic labeling of text documents and user questions

From the patent:

A question-answering system for searching exact answers in text documents provided in the electronic or digital form to questions formulated by user in the natural language is based on automatic semantic labeling of text documents and user questions. The system performs semantic labeling with the help of markers in terms of basic knowledge types, their components and attributes, in terms of question types from the predefined classifier for target words, and in terms of components of possible answers. A matching procedure makes use of mentioned types of semantic labels to determine exact answers to questions and present them to the user in the form of fragments of sentences or a newly synthesized phrase in the natural language. Users can independently add new types of questions to the system classifier and develop required linguistic patterns for the system linguistic knowledge base.

Another reason to hope the United States Supreme Court goes nuclear on processes and algorithms.

That’s not an opinion on this patent but on the cloud that all process/algorithm patents cast on innovation.

I first saw this at: IHS Granted Key Patent for Proprietary, Next-Generation Search Technology by Angela Guess.

February 8, 2014

Patenting Emotional Analysis?

Filed under: Patents — Patrick Durusau @ 11:53 am

BehaviorMatrix Issues Groundbreaking Foundational Patent for Advertising Platforms

From the post:

Applied behavior analytics company BehaviorMatrix, LLC, announced today that it was granted a groundbreaking foundational patent by the United States Patent and Trademark Office (USPTO) that establishes a system for classifying, measuring and creating models of the elements that make up human emotions, perceptions and actions leveraged from the Internet and social media. The BehaviorMatrix patent, U.S. patent number 8,639,702, redefines the online advertising platform and social CRM industries by ushering in a new era of assessment and measurement of emotion by digital means. It covers a method for detecting and measuring emotional signals within digital content. Emotion is based on perception – and perception is based on exteroceptive stimuli – what is seen, touched, tasted, heard and smelled.

If you look up U.S. patent number 8,639,702, you will find this background of the invention:

A data element forms the premise on which an inference may be drawn and represents the lowest level of abstraction from which information and then knowledge are derived. In humans, the perception of environment or condition is comprised of data gathered by the senses, i.e., the physiological capacity to provide input for perception. These “senses” are formally referred to as the exteroceptive senses and in humans comprise quantifiable or potential sensory data including, sight, smell, hearing, touch, taste, temperature, pressure, pain, and pleasure, the admixture of which determine the spectrum of human emotion states and resultant behaviors.

Potentials in these senses work independently, or in combination, to produce unique perceptions. For instance, the sense of sight is primarily used to identify a food item, but the flavor of the food item incorporates the senses of both taste and smell.

In biological terms, behavior can generally be regarded as any action of an organism that changes its relationship to its environment. Definable and measurable behaviors are predicated on the association of stimuli within the domain of exteroceptive sensation, to perception, and ultimately, a behavioral outcome.

The ability to determine the exteroceptive association and impact on behavior from data that is not physical but exists only in digital form has profound implications for how data is viewed, both intrinsically and associatively.

An advantage exists, therefore, for a system and method for dynamically associating digital data with values that approximate exteroceptive stimuli potentials, and from those values forecasting probabilistically the likely behavioral response to that data, thereby promoting the ability to design systems and models to predict behavioral outcomes that are inherently more accurate in determining behavioral response. In turn, interfaces and computing devices may be developed that would “expect” certain behaviors, or illicit them through the manipulation of data. Additionally, models could be constructed to classify data not only for the intrinsic value of the data but for the potential behavioral influence inherent in the data as well.

Really? People’s emotions influence their “digital data?” You mean like a person’s emotions influence their speech (language, volume), their body language (tense, pacing), their communications (angry tweets, letters, emails), etc.?

Did you ever imagine such a thing? Emotions being found in digital data?

Have I ever mentioned: Emotional Content of Suicide Notes, Jacob Tuckman; Robert J. Kleiner; Martha Lavell, Am J Psychiatry 1959;116:59-63, to you?

Abstract:

An analysis was made of the emotional content of notes left by 165 suicides in Philadelphia over a 5-year period. Over half the notes showed such positive affect as gratitude, affection, and concern for the welfare of others, while only 24% expressed hostile or negative feelings directed toward themselves or the outside world, and 25% were completely neutral in affect.2. Persons aged 45 and over showed less affect than those under 45, with a concomitant increase in neutral affect.3. Persons who were separated or divorced showed more hostility than those single, married, or widowed.4. It is believed that these findings have certain implications for further understanding of suicide and ultimate steps toward prevention. The recognition that positive or neutral feelings are present in the majority of cases should lead to a more promising outlook in the care and treatment of potential suicides if they can be identified.

That was written in 1959. Do you think it is a non-obvious leap to find emotions in digital data? Just curious.

Of course, that isn’t the only thing claimed for this invention:

capable of detecting one or data elements including, without limitation, temperature, pressure, light, sound, motion, distance and time.

Wow, it can also act as a thermostat.

Feel free to differ I think measuring emotion in all form of communications, including digital data, has been around for a long time.

The filing fee motivated U.S. Patent Office is doing a very poor job of keeping the prior art commons free of patents that infringe on it. Which costs everyone, except for patent trolls, a lot of time and effort.

Thoughts on a project to defend the prior art commons more generally? How much does the average patent infringement case cost? Win or lose?

Of course I am thinking about integrating search across the patent database with searches in relevant domains for prior art to load up examiners with detailed reports of prior art.

Stopping a patent in its tracks avoids more expensive methods later on.

Could lead to a new yearly patent count: Patents Denied.

February 5, 2014

Patent Search and Analysis Tools

Filed under: Intellectual Property (IP),Patents,Searching — Patrick Durusau @ 2:54 pm

Free and Low Cost Patent Search and Analysis Tools: Who Needs Expensive Name Brand Products? by Jackie Hutter.

From the post:

In private conversations, some of my corporate peers inform me that they pay $1000′s per year (or even per quarter for larger companies) for access to “name brand” patent search tools that nonetheless do not contain accurate and up to date information. For example, a client tells me that one of these expensive tools fails to update USPTO records on a portfolio her company is monitoring and that the PAIR data is more than 1 year out of date. This limits the effectiveness of the expensive database by requiring her IP support staff to check each individual record on a regular basis to update the data. Of course, this limitation defeats the purpose of spending the big bucks to engage with a “name brand” search tool.

Certainly, one need not have sympathy for corporate IP professionals who manage large department budgets–if they spend needlessly on “name brand” tools and staff to manage the quality of such tools, so be it. But most companies with IP strategy needs do not have money and staff to purchase such tools, let alone to fix the errors in the datasets obtained from them. Others might wish not to waste their department budgets on worthless tools. To this end, over the last 5 years, I have used a number of free and low cost tools in my IP strategy practice. I use all of these tools on a regular basis and have personally validated the quality and validity of each one for my practice.
….

Jackie makes two cases:

First, there are free tools that perform as well or better than commercial patent tools. A link is offered to a list of them.

Second, and more importantly from my perspective, is the low cost tools leave a lot to be desired in terms of UI and usability.

Certainly enough room for an “inexpensive” but better than commercial-grade patent search service to establish a market.

Or perhaps a more expensive “challenge” tool that warns subscribers about patents close to theirs.

I first saw this in a tweet by Lutz Maicher.

January 15, 2014

What’s Hiding In Your Classification System?

Filed under: Classification,Graphics,Patents,Visualization — Patrick Durusau @ 5:10 pm

Patent Overlay Mapping: Visualizing Technological Distance by Luciano Kay, Nils Newman, Jan Youtie, Alan L. Porter, Ismael Rafols.

Abstract:

This paper presents a new global patent map that represents all technological categories, and a method to locate patent data of individual organizations and technological fields on the global map. This overlay map technique may support competitive intelligence and policy decision-making. The global patent map is based on similarities in citing-to-cited relationships between categories of theInternational Patent Classification (IPC) of European Patent Office (EPO) patents from 2000 to 2006. This patent dataset, extracted from the PATSTAT database, includes 760,000 patent records in 466 IPC-based categories. We compare the global patent maps derived from this categorization to related efforts of other global patent maps. The paper overlays nanotechnology-related patenting activities of two companies and two different nanotechnology subfields on the global patent map. The exercise shows the potential of patent overlay maps to visualize technological areas and potentially support decision-making. Furthermore, this study shows that IPC categories that are similar to one another based on citing-to-cited patterns (and thus are close in the global patent map) are not necessarily in the same hierarchical IPC branch, thus revealing new relationships between technologies that are classified as pertaining to different (and sometimes distant) subject areas in the IPC scheme.

The most interesting discovery in the paper was summarized as follows:

One of the most interesting findings is that IPC categories that are close to one another in the patent map are not necessarily in the same hierarchical IPC branch. This finding reveals new patterns of relationships among technologies that pertain to different (and sometimes distant) subject areas in the IPC classification. The finding suggests that technological distance is not always well proxied by relying on the IPC administrative structure, for example, by assuming that a set of patents represents substantial technological distance because the set references different IPC sections. This paper shows that patents in certain technology areas tend to cite multiple and diverse IPC sections.

That being the case, what is being hidden in other classification systems?

For example, how does the ACM Computing Classification System compare when the citations used by authors are taken into account?

Perhaps this is a method to compare classifications as seen by experts versus a community of users.

BTW, the authors have posted supplemental materials online:

Supplementary File 1 is an MS Excel file containing the labels of IPC categories, citation and similarity matrices, factor analysis of IPC categories. It can be found at: http://www.sussex.ac.uk/Users/ir28/patmap/KaySupplementary1.xls

Supplementary File 2 is an MS PowerPoint file with examples of overlay maps of firms and research topics. It can be found at: http://www.sussex.ac.uk/Users/ir28/patmap/KaySupplementary2.ppt

Supplementary File 3 is an interactive version of map in Figure 1visualized with the freeware VOSviewer. It can be found at: http://www.vosviewer.com/vosviewer.php?map=http://www.sussex.ac.uk/Users/ir28/patmap/KaySupplementary3.txt

January 14, 2014

Speculative Popcount Data Creation

Filed under: Patents,Sampling — Patrick Durusau @ 4:28 pm

Cognitive systems speculate on big data by Ravi Arimilli.

From the post:

Our brains don’t need to tell our lungs to breathe or our hearts to pump blood. Unfortunately, computers require instructions for everything they do. But what if machines could analyze big data and determine what to do, based on the content of the data, without specific instructions? Patent #8,387,065 establishes a way for computer systems to analyze data in a whole new way, using “speculative” population count (popcount) operations.

Popcount technology has been around for several years. It uses algorithms to pair down the number of traditional instructions a system has to run through to solve a problem. For example, if a problem takes 10,000 instructions to be solved using standard computing, popcount techniques can reduce the number of instructions by more than half.

This is how IBM Watson played Jeopardy! It did not need to be given instructions to look for every possible bit of data to answer a question. Its Power 7-based system used popcount operations to make assumptions about the domain of data in question, to come up with a real time answer.

Reading the patent: Patent #8,387,065, you will find this statement:

An actual method or mechanism by which the popcount is calculated is not described herein because the invention applies to any one of the various popcount algorithms that may be executed by CPU to determine a popcount. (under DETAILED DESCRIPTION OF AN ILLUSTRATIVE EMBODIMENT. There are no section/paragraph numbers, etc.)

IBM patented a process to house a sampling method without ever describing the sampling method. As Ben Stein would say: “wow.”

When I think of IBM patents, I think of eWeek’s IBM Patent: 100 Years of High-Tech Innovations top ten (10) list:

Sampling methods, just like naive Bayes classifiers, work if and only if certain assumptions are met. Naive Bayes classifiers assume all features are independent. Sampling methods, on the other hand, assume a data set is uniform. Meaning that a sample is an accurate reflection of an entire data set.

Uniformity is a chancy assumption because in order to confirm that is the right choice, you have to process data that sampling allows you to avoid.

There are methods to reduce the risks of sampling but it isn’t possible to tell from IBM’s “patent” in this case which of any of them are being used.

December 27, 2013

The Lens

Filed under: Patents,Semantics,Topic Maps — Patrick Durusau @ 5:58 pm

The Lens

From the about page:

Welcome to The Lens, an open global cyberinfrastructure built to make the innovation system more efficient, fair, transparent and inclusive. The Lens is an extension of work started by Cambia in 1999 to render the global patent system more transparent, called the Patent Lens. The Lens is a greatly expanded and updated version of the Patent Lens with vastly more data and greater analytical capabilities. Our goal is to enable more people to make better decisions, informed by evidence and inspired by imagination.

The Lens already hosts a number of powerful tools for analysis and exploration of the patent literature, from integrated graphical representation of search results to advanced bioinformatics tools. But this is only just the beginning and we have lot more planned! See what we’ve done and what we plan to do soon on our timeline below:

The Lens current covers 80 million patents in 100 different jurisdictions.

When you create an account, the following appears in your workspace:

Welcome to the Lens! The Lens is a tool for innovation cartography, currently featuring over 78 million patent documents – many of them full-text – from nearly 100 different jurisdictions. The Lens also features hyperlinks to the scientific literature cited in patent documents – over 5 million to date.

But more than a patent search tool, the Lens has ben designed to make the patent system navigable, so that non-patent professionals can access the knowledge contained in the global patent literature. Properly mapped out, the global patent system has the potential to accelerate the pace of invention, to generate new partnerships, and to make a vast wealth of scientific and technical knowledge available for free.

The Lens is currently in beta version, with future versions featuring expanded access to both patent and scientific literature collections, as well as improved search and analytic capabilities.

As you already know, patents have extremely rich semantics and mapping of those semantics could be very profitable.

If you saw the post: Secure Cloud Computing – Very Secure, you will know that patent searches on “homomorphic encryption” are about to become very popular.

Are you ready to bundle and ship patent research?

December 13, 2013

Implementing a Custom Search Syntax…

Filed under: Lucene,Patents,Solr — Patrick Durusau @ 8:33 pm

Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled by John Berryman.

Description:

In a recent project with the United States Patent and Trademark Office, Opensource Connections was asked to prototype the next generation of patent search – using Solr and Lucene. An important aspect of this project was the implementation of BRS, a specialized search syntax used by patent examiners during the examination process. In this fast paced session we will relate our experiences and describe how we used a combination of Parboiled (a Parser Expression Grammar [PEG] parser), Lucene Queries and SpanQueries, and an extension of Solr’s QParserPlugin to build BRS search functionality in Solr. First we will characterize the patent search problem and then define the BRS syntax itself. We will then introduce the Parboiled parser and discuss various considerations that one must make when designing a syntax parser. Following this we will describe the methodology used to implement the search functionality in Lucene/Solr. Finally, we will include an overview our syntactic and semantic testing strategies. The audience will leave this session with an understanding of how Solr, Lucene, and Parboiled may be used to implement their own custom search parser.

One part of the task was to re-implement a thirty (30) year old query language on modern software. (Ouch!)

Uses parboiled to parse the query syntax.

On parboiled:

parboiled is a mixed Java/Scala library providing for lightweight and easy-to-use, yet powerful and elegant parsing of arbitrary input text based on Parsing expression grammars (PEGs). PEGs are an alternative to context free grammars (CFGs) for formally specifying syntax, they make a good replacement for regular expressions and generally have quite a few advantages over the “traditional” way of building parsers via CFGs. parboiled is released under the Apache License 2.0.

Covers a plugin for the custom query language.

Great presentation, although one where you will want to be following the slides (below the video).

September 30, 2013

Classifying Non-Patent Literature…

Filed under: Classification,Natural Language Processing,Patents,Solr — Patrick Durusau @ 6:29 pm

Classifying Non-Patent Literature To Aid In Prior Art Searches by John Berryman.

From the post:

Before a patent can be granted, it must be proven beyond a reasonable doubt that the innovation outlined by the patent application is indeed novel. Similarly, when defending one’s own intellectual property against a non-practicing entity (NPE – also known as a patent troll) one often attempts to prove that the patent held by the accuser is invalid by showing that relevant prior art already exists and that their patent is actual not that novel.

Finding Prior Art

So where does one get ahold of pertinent prior art? The most obvious place to look is in the text of earlier patents grants. If you can identify a set of reasonably related grants that covers the claims of the patent in question, then the patent may not be valid. In fact, if you are considering the validity of a patent application, then reviewing existing patents is certainly the first approach you should take. However, if you’re using this route to identify prior art for a patent held by an NPE, then you may be fighting an uphill battle. Consider that a very bright patent examiner has already taken this approach, and after an in-depth examination process, having found no relevant prior art, the patent office granted the very patent that you seek to invalidate.

But there is hope. For a patent to be granted, it must not only be novel among the roughly 10Million US Patents that currently exist, but it must also be novel among all published media prior to the application date – so called non-patent literature (NPL). This includes conference proceeding, academic articles, weblogs, or even YouTube videos. And if anyone – including the applicant themselves – publicly discloses information critical to their patent’s claims, then the patent may be rendered invalid. As a corollary, if you are looking to invalidate a patent, then looking for prior art in non-patent literature is a good idea! While tools are available to systematically search through patent grants, it is much more difficult to search through NPL. And if the patent in question truly is not novel, then evidence must surely exists – if only you knew where to look.

More suggestions than solutions but good suggestions, such as these, are hard to come by.

John suggests using existing patents and their classifications as a learning set to classify non-patent literature.

Interesting but patent language is highly stylized and quite unlike the descriptions you encounter in non-patent literature.

It would be an interesting experiment to take some subset of patents and their classifications along with a set of non-patent literature, known to describe the same “inventions” covered by the patents.

Suggestions for subject areas?

September 23, 2013

Broadening Google Patents [Patent Troll Indigestion]

Filed under: Law,Patents,Searching — Patrick Durusau @ 12:42 pm

Broadening Google Patents by Jon Orwant.

From the post:

Last year, we launched two improvements to Google Patents: the Prior Art Finder and European Patent Office (EPO) patents. Today we’re happy to announce the addition of documents from four new patent agencies: China, Germany, Canada, and the World Intellectual Property Organization (WIPO). Many of these documents may provide prior art for future patent applications, and we hope their increased discoverability will improve the quality of patents in the U.S. and worldwide.

The broadening of Google Patents is welcome news!

Especially following the broadening of “prior art” under the America Invents Act (AIA).

On the expansion of prior art, such as publication before date of filing the patent (old rule was before the date of invention), a good summary can be found at: The Changing Boundaries of Prior Art under the AIA: What Your Company Needs to Know.

The information you find needs to remain found, intertwined with other information you find.

Regular search engines won’t help you there. May I suggest topic maps?

April 1, 2013

USPTO – New Big Data App [Value-Add Opportunity]

Filed under: BigData,Government,Government Data,MarkLogic,Patents,Topic Maps — Patrick Durusau @ 4:15 pm

U.S. Patent and Trademark Office Launches New Big Data Application on MarkLogic®

From the post:

Real-Time, Granular, Online Access to Complex Manuals Improves Efficiency and Transparency While Reducing Costs

MarkLogic Corporation, the provider of the MarkLogic® Enterprise NoSQL database, today announced that the U.S. Patent and Trademark Office (USPTO) has launched the Reference Document Management Service (RDMS), which uses MarkLogic for real-time searching of detailed, specific, up-to-date content within patent and trademark manuals. RDMS enables real-time search of the Manual of Patent Examining Procedure (MPEP) and the Trademark Manual of Examination Procedures (TMEP). These manuals provide a vital window into the complexities of U.S. patent and trademark laws for inventors, examiners, businesses, and patent and government attorneys.

The thousands of examiners working for USPTO need to be able to quickly locate relevant instructions and procedures to assist in their examinations. The RDMS is enabling faster, easier searches for these internal users.

Having the most current materials online also means that the government can reduce reliance on printed manuals that quickly go out of date. USPTO can also now create and publish revisions to its manuals more quickly, allowing them to be far more responsive to changes in legislation.

Additionally, for the first time ever, the tool has also been made available to the public increasing the MPEP and TMEP accessibility globally, furthering the federal government’s efforts to promote transparency and accountability to U.S. citizens. Patent creators and their trusted advisors can now search and reference the same content as the USPTO examiners, in real time — instead of having to thumb through a printed reference guide.

The date on this report was March 26, 2013.

I don’t know if the USPTO is just playing games but searching their site for “Reference Document Management Service” produces zero “hits.”

Searching for “RDMS” produces four (4) “hits,” none of which were pointers to an interface.

Maybe it was too transparent?

The value-add proposition I was going to suggest was mapping the results of searching into some coherent presentation, like TaxMap.

And/or linking the results of searches into current literature in rapidly developing fields of technology.

Guess both of those opportunities will have to wait for basic searching to be available.

If you have a status update on this announced but missing project please ping me.

March 29, 2013

Mathematics Cannot Be Patented. Case Dismissed.

Filed under: Law,Mathematics,Patents — Patrick Durusau @ 4:48 am

Mathematics Cannot Be Patented. Case Dismissed. by Alan Schoenbaum.

From the post:

Score one for the good guys. Rackspace and Red Hat just defeated Uniloc, a notorious patent troll. This case never should have been filed. The patent never should have been issued. The ruling is historic because, apparently, it was the first time that a patent suit in the Eastern District of Texas has been dismissed prior to filing an answer in the case, on the grounds that the subject matter of the patent was found to be unpatentable. And was it ever unpatentable.

Red Hat indemnified Rackspace in the case. This is something that Red Hat does well, and kudos to them. They stand up for their customers and defend these Linux suits. The lawyers who defended us deserve a ton of credit. Bill Lee and Cynthia Vreeland of Wilmer Hale were creative and persuasive, and their strategy to bring the early motion to dismiss was brilliant.

The patent at issue is a joke. Uniloc alleged that a floating point numerical calculation by the Linux operating system violated U.S. Patent 5,892,697 – an absurd assertion. This is the sort of low quality patent that never should have been granted in the first place and which patent trolls buy up by the bushel full, hoping for fast and cheap settlements. This time, with Red Hat’s strong backing, we chose to fight.

The outcome was just what we had in mind. Chief Judge Leonard Davis found that the subject matter of the software patent was unpatentable under Supreme Court case law and, ruling from the bench, granted our motion for an early dismissal. The written order, which was released yesterday, is excellent and well-reasoned. It’s refreshing to see that the judiciary recognizes that many of the fundamental operations of a computer are pure mathematics and are not patentable subject matter. We expect, and hope, that many more of these spurious software patent lawsuits are dismissed on similar grounds.

A potential use case for a public topic map on patents?

At least on software patents?

Thinking that a topic map could be constructed of all the current patents that address mathematical operations, enabling academics and researchers to focus on factual analysis of the processes claimed by those patents.

From the factual analysis, other researchers, primarily lawyers and law students, could outline legal arguments, tailored for each patent, as to its invalidity.

A community resource, not unlike a patent bank, that would strengthen the community’s hand when dealing with patent trolls.

PS: I guess this means I need to stop working on my patent for addition. 😉

March 11, 2013

The Annotation-enriched non-redundant patent sequence databases [Curation vs. Search]

Filed under: Bioinformatics,Biomedical,Marketing,Medical Informatics,Patents,Topic Maps — Patrick Durusau @ 2:01 pm

The Annotation-enriched non-redundant patent sequence databases Weizhong Li, Bartosz Kondratowicz, Hamish McWilliam, Stephane Nauche and Rodrigo Lopez.

Not a real promising title is it? 😉 The reason I cite it here is that by curation, the database is “non-redundant.”

Try searching for some of these sequences at the USPTO and compare the results.

The power of curation will be immediately obvious.

Abstract:

The EMBL-European Bioinformatics Institute (EMBL-EBI) offers public access to patent sequence data, providing a valuable service to the intellectual property and scientific communities. The non-redundant (NR) patent sequence databases comprise two-level nucleotide and protein sequence clusters (NRNL1, NRNL2, NRPL1 and NRPL2) based on sequence identity (level-1) and patent family (level-2). Annotation from the source entries in these databases is merged and enhanced with additional information from the patent literature and biological context. Corrections in patent publication numbers, kind-codes and patent equivalents significantly improve the data quality. Data are available through various user interfaces including web browser, downloads via FTP, SRS, Dbfetch and EBI-Search. Sequence similarity/homology searches against the databases are available using BLAST, FASTA and PSI-Search. In this article, we describe the data collection and annotation and also outline major changes and improvements introduced since 2009. Apart from data growth, these changes include additional annotation for singleton clusters, the identifier versioning for tracking entry change and the entry mappings between the two-level databases.

Database URL: http://www.ebi.ac.uk/patentdata/nr/

Topic maps are curated data. Which one do you prefer?

October 8, 2012

Patent war

Filed under: Graphics,News,Patents,Visualization — Patrick Durusau @ 7:11 pm

Patent war by Nathan Yau.

Nathan points to research and visualizations by the New York Times of the ongoing patent war between Apple and Samsung.

An ideal outcome would be for the principals and their surrogates to be broken by litigation costs and technology patents rendered penny stock value by the litigation.

You can move the system towards that outcome by picking a patent and creating a topic map starting with that patent.

The more data the litigants have, the more they will think they need.

Let’s let them choke on it.

August 14, 2012

Prior Art Finder

Filed under: Patents — Patrick Durusau @ 2:31 pm

Improving Google Patents with European Patent Office patents and the Prior Art Finder by Jon Orwant, Engineering Manager (Google Research).

From the post:

At Google, we’re constantly trying to make important collections of information more useful to the world. Since 2006, we’ve let people discover, search, and read United States patents online. Starting this week, you can do the same for the millions of ideas that have been submitted to the European Patent Office, such as this one.

Typically, patents are granted only if an invention is new and not obvious. To explain why an invention is new, inventors will usually cite prior art such as earlier patent applications or journal articles. Determining the novelty of a patent can be difficult, requiring a laborious search through many sources, and so we’ve built a Prior Art Finder to make this process easier. With a single click, it searches multiple sources for related content that existed at the time the patent was filed.

Maybe the USPTO will add:

Have you used Google’s Prior Art Finder? as part of the patent examination form.

Vocabulary issue remains but at least this is a start in the right direction.

August 12, 2012

DATA MINING: Accelerating Drug Discovery by Text Mining of Patents

Filed under: Contest,Data Mining,Drug Discovery,Patents,Text Mining — Patrick Durusau @ 1:34 pm

DATA MINING: Accelerating Drug Discovery by Text Mining of Patents

From the contest page:

Patent documents contain important research that is valuable to the industry, business, law, and policy-making communities. Take the patent documents from the United States Patent and Trademark Office (USPTO) as examples. The structured data include: filing date, application date, assignees, UPC (US Patent Classification) codes, IPC codes, and others, while the unstructured segments include: title, abstract, claims, and description of the invention. The description of the invention can be further segmented into field of the invention, background, summary, and detailed description.

Given a set of “Source” patents or documents, we can use text mining to identify patents that are “similar” and “relevant” for the purpose of discovery of drug variants. These relevant patents could further be clustered and visualized appropriately to reveal implicit, previously unknown, and potentially useful patterns.

The eventual goal is to obtain a focused and relevant subset of patents, relationships and patterns to accelerate discovery of variations or evolutions of the drugs represented by the “source” patents.

Timeline:

  • July 19, 2012 – Start of the Contest Part 1
  • August 23, 2012 – Deadline for Submission of Onotolgy delieverables 
  • August 24 to August 29, 2012 – Crowdsourced And Expert Evaluation for Part 1. NO SUBMISSIONS ACCEPTED for contest during this week.
  • Milestone 1: August 30, 2012 – Winner for Part 1 contest announced and Ontology release to the community for Contest Part 2
  • Aug. 31 to Sept. 21, 2012 – Contest Part 2 Begins – Data Exploration / Text Mining of Patent Data
  • Milestone 2: Sept. 21, 2012 – Deadline for Submission Contest Part 2. FULL CONTEST CLOSING.
  • Sept. 22 to Oct. 5, 2012 – Crowdsourced and Expert Evaluation for contest Part 2
  • Milestone 3: Oct. 5, 2012 – Conditional Winners Announcement 

Possibly fertile ground for demonstrating the value of topic maps.

Particularly if you think of topic maps as curating search strategies and results.

Think about that for a moment: curating search strategies and results.

We have all asked reference librarians or other power searchers for assistance and watched while they discovered resources we didn’t imagine existed.

What if for medical expert searchers, we curate the “search request” along with the “search strategy” and the “result” of that search?

Such that we can match future search requests up with likely search strategies?

What we are capturing is the experts understanding and recognition of subjects not apparent to the average user. Capturing it in such a way as to make use of it again in the future.

If you aren’t interested in medical research, how about: Accelerating Discovery of Trolls by Text Mining of Patents? 😉

I first saw this at KDNuggets.


Update: 13 August 2012

Tweet by Lars Marius Garshol points to: Patent troll Intellectual Ventures is more like a HYDRA.

Even a low-end estimate – the patents actually recorded in the USPTO as being assigned to one of those shells – identifies around 10,000 patents held by the firm.

At the upper end of the researchers’ estimates, Intellectual Ventures would rank as the fifth-largest patent holder in the United States and among the top fifteen patent holders worldwide.

As sad as that sounds, remember this is one (1) troll. There are others.

February 14, 2012

Querying joined data within a search engine index U.S. Patent 8,073,840

Filed under: Patents,Search Engines — Patrick Durusau @ 5:03 pm

Querying joined data within a search engine index U.S. Patent 8,073,840

Abstract:

Techniques and systems for indexing and retrieving data and documents stored in a record-based database management system (RDBMS) utilize a search engine interface. Search-engine indices are created from tables in the RDBMS and data from the tables is used to create “documents” for each record. Queries that require data from multiple tables may be parsed into a primary query and a set of one or more secondary queries. Join mappings and documents are created for the necessary tables. Documents matching the query string are retrieved using the search-engine indices and join mappings.

Is anyone maintaining an index or topic map of search engine/technique patents?

If such a resource was public it might be of assistance to patent examiners.

I say “might” because I have yet to see a search technology patent that would survive even minimal knowledge of prior art.

Knowledge of prior art in a field isn’t a qualification or at least not an important one for patent examiners.

My suggestion is that we triple the estimated cost of a patent and start selling them on a same day basis. Skip the fiction of examination and make some money for the government in the process.

People can pay their lawyers to fight out overlapping patents in the courts.

January 21, 2012

December 19, 2011

Information extraction from chemical patents

Filed under: Cheminformatics,Patents — Patrick Durusau @ 8:10 pm

Information extraction from chemical patents by David M. Jessop.

Abstract:

The automated extraction of semantic chemical data from the existing literature is demonstrated. For reasons of copyright, the work is focused on the patent literature, though the methods are expected to apply equally to other areas of the chemical literature. Hearst Patterns are applied to the patent literature in order to discover hyponymic relations describing chemical species. The acquired relations are manually validated to determine the precision of the determined hypernyms (85.0%) and of the asserted hyponymic relations (94.3%). It is demonstrated that the system acquires relations that are not present in the ChEBI ontology, suggesting that it could function as a valuable aid to the ChEBI curators. The relations discovered by this process are formalised using the Web Ontology Language (OWL) to enable re-use. PatentEye – an automated system for the extraction of reactions from chemical patents and their conversion to Chemical Markup Language (CML) – is presented. Chemical patents published by the European Patent Office over a ten-week period are used to demonstrate the capability of PatentEye – 4444 reactions are extracted with a precision of 78% and recall of 64% with regards to determining the identity and amount of reactants employed and an accuracy of 92% with regards to product identification. NMR spectra are extracted from the text using OSCAR3, which is developed to greatly increase recall. The resulting system is presented as a significant advancement towards the large-scale and automated extraction of high-quality reaction information. Extended Polymer Markup Language (EPML), a CML dialect for the description of Markush structures as they are presented in the literature, is developed. Software to exemplify and to enable substructure searching of EPML documents is presented. Further work is recommended to refine the language and code to publication-quality before they are presented to the community.

Curious to see how the system would perform against U.S. Patent office literature?

Perhaps more to the point, how would it compared to commercial chemical indexing services?

Always possible to duplicate what has already been done.

Curious what current systems, commercial or otherwise, are lacking that could be a value-add proposition?

How would you poll users? In what journals? What survey instruments or practices would you use?

Powered by WordPress