Using your Lucene index as input to your Mahout job

March 6, 2012

Using your Lucene index as input to your Mahout job – Part I

Filed under: Clustering,Collocation,Lucene,Mahout — Patrick Durusau @ 8:08 pm

Using your Lucene index as input to your Mahout job – Part I

From the post:

This blog shows you how to use an upcoming Mahout feature, the lucene2seq program or https://issues.apache.org/jira/browse/MAHOUT-944. This program reads the contents of stored fields in your Lucene index and converts them into text sequence files, to be used by a Mahout text clustering job. The tool contains both a sequential and MapReduce implementation and can be run from the command line or from Java using a bean configuration object. In this blog I demonstrate how to use the sequential version on an index of Wikipedia.

Access to original text can help with improving clustering results. See the blog post for details.

Comments (1)

1 Comment

[…] background-color:#222222; background-repeat : repeat; } tm.durusau.net – Today, 8:13 […]

Pingback by Using your Lucene index as input to your Mahout job – Part I « Another Word For It | Hadoop and Mahout | Scoop.it — March 8, 2012 @ 8:13 am

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Pages:
Blogroll
Categories:
- .Net
- #BLM
- #DAPL
- #gamergate
- 1000 Genomes
- 3D Printing
- 4store
- A/B Tests
- Access Points
- Accumulo
- ActionGenerator
- Active Learning
- ActiveSpaces
- Actor-Based
- ActorFx
- Acunu
- Ad Targeting
- Adams
- Adaptive Networks
- Additivity
- Adjacency List
- ADMS
- ADO.Net Entity Framework
- Adversarial Learning
- Advertising
- Aerospike
- agamemnon
- Agda
- Agents
- Aggregation
- Agriculture
- AgroTagger
- AGROVOC
- Ajax
- Akiban Persistit
- Akka
- Alchemy Database
- AlchemyAPI
- Algebird
- Algebra
- Algebraic Geometry
- Algorithms
- Alignment
- AllegroGraph
- Amazon Aurora
- Amazon CloudSearch
- Amazon DynamoDB
- Amazon EMR
- Amazon Web Services AWS
- Ambari
- Ambiguity
- Ambrose
- Anaconda
- Analog Computing
- Analytics
- Ancient World
- AnnotateIt
- Annotation
- Annotator
- ANTLR
- Apache Ambari
- Apache Calcite
- Apache Camel
- Apache Crunch
- Apache Flink
- Apache Ignite
- Apache Marmotta
- Apache Ranger
- Apache S4
- Apache Tajo
- Apache Velocity
- Applied Topology
- Approximate Nearest Neighbors (ANN)
- Arabic
- ArangoDB
- Aranuka
- Architecture
- Archives
- Argumentation
- Argumentation Mining
- ARM
- Arrays
- Art
- Artificial Intelligence
- ASCII
- Asgard
- Assembly
- Association Rule Mining
- Associations
- Associative Classification Mining
- Associative Model
- AsterixDB
- Astroinformatics
- Attribution
- Audio
- Auditing
- Augmented Reality
- Aurelius Graph Cluster
- Authoring Semantics
- Authoring Topic Maps
- Authority Record
- Auto Tagging
- AutoComplete
- Automata
- AutoSuggestion
- Aviation
- AvocadoDB
- Avro
- Awk
- AZOrange
- Azure Marketplace
- B-trees
- B+Tree
- BabelNet
- Bag-of-Words (BOW)
- BaseX
- Bash
- Bayesian Data Analysis
- Bayesian Models
- BBC
- Behemoth
- Benchmarks
- BerkeleyDB
- BI
- Bias
- BIBFRAME
- Bible
- Bibliography
- Bibliometrics
- BibTeX
- Big Query
- BigCouch
- BigData
- bigdata®
- BigInsights
- Bigtop
- Binary Distance
- Binary Relations
- Binary Search
- Binary Similarity
- Binary Tags
- Bing
- Bio-Linux
- Bio4j
- Biodiversity
- Biography
- Bioinformatics
- Biology
- Biomedical
- Biometrics
- Biostatistics
- Biplots
- BIRT
- Bisociative
- Bitly
- Bitmap Indexes
- Bitsy
- BitTorrent
- Black Literature
- Blacklight
- BLAST
- Blazegraph
- blekko
- BlinkDB
- Blocky
- Blogs
- Bloom Filters
- Bloom Language
- Blueprints
- Bobo Search
- Bookmarking
- Bookmarks
- Books
- Boolean Functions
- Boolean Operators
- Boost Graph Library
- Boosting
- Bots
- Bregman Divergences
- BrightstarDB
- Brisk
- British Library
- British Museum
- British National Bibliography
- Broadcasting
- Broccoli
- Browsers
- Bug Prediction
- Bugs
- Bulk Synchronous Parallel (BSP)
- Burrows-Wheeler Transform (BWT)
- Business Intelligence
- C-Rank
- C/C++
- C#
- Cache Invalidation
- Cache-Oblivious Search Trees
- Calvin
- Cambridge Advanced Modeler (CAM)
- Camilstore
- CAP
- Cardinality Estimation
- CartoDB
- Cartogram
- Cartography
- Cascading
- Cascalog
- Cassandra
- Cassovary
- Cataloging
- Categorical Data
- Category Theory
- Cayley
- CCNx
- Cell Architecture
- Cell Stores
- Cellular Automata
- Censorship
- Census Data
- CERN
- CFinder
- Challenges
- Change Data
- Chaos
- Charts
- Chemical Markup Language (CML)
- Cheminformatics
- Chemistry
- CHI
- Chilcot Report (Iraq)
- Chinese
- Chip Hacking
- Chordalysis
- Chorus
- ChuQL
- Church
- CIA
- CIEL
- Cinder
- Citation Analysis
- Citation Indexing
- Citation Practices
- CJK
- CKAN
- Class Admin
- Classics
- Classification
- Classification Trees
- Classifier
- Classifier Fusion
- Click Graph
- Climate Data
- Climate Informatics
- Clojure
- ClojureScript
- Closure Table
- Cloud Computing
- Cloudera
- ClueWeb2012
- Clustering
- Clustering (servers)
- Clydesdale
- CMDB
- CMS
- co-occurrence
- Co-Words
- CockroachDB
- Code Lists
- CodernityDB
- Colin Powell Emails
- Collaboration
- Collaborative Annotation
- Collation
- Collection Pipeline
- Collocation
- Collocative Integrity
- Column-Oriented
- Columnar Database
- Combinatorics
- Common Ancestor
- Common Crawl
- Communication
- Communities of Practice
- Compass
- CompChem
- Competence
- Compilers
- Complex Networks
- Complexity
- Compression
- Computation
- Computational Biology
- Computational Geometry
- Computational Linguistics
- Computational Literary Analysis
- Computational Photography
- Computational Semantics
- Computational Statistics
- Computer Fraud and Abuse (CFAA)
- Computer Science
- Concept Detection
- Concept Drift
- Concept Hierarchies
- Concept Maps
- Conceptualizations
- Concord
- Concordance
- Concurrent Programming
- Conferences
- Confidence Bias
- Conjunctive Query
- Connectome
- Consensus
- Consistency
- Constello
- Constraint Programming
- Content Analysis
- Content Management System (CMS)
- Contest
- Context
- Context Models
- Context-aware
- Continuous Integration
- Conversion
- Cooperation
- Coq
- Coreference Resolution
- Corpora
- Corporate Data
- Corporate Memory
- Corpus Linguistics
- Correlation
- Cosine Similarity
- Couchbase
- CouchDB
- CouchTM
- Counterfactual
- CQL – Cassandra Query Language
- Cray
- CRDT
- Crisp Sets
- CRISP-DM
- Critical Reading
- CRM
- Cross-lingual
- Crosswalk
- Crosswalks
- Crossword Puzzle
- Crowd Sourcing
- CRS
- CRUD
- Crunch
- CryptoCurrency
- Cryptography
- Cryptome
- CS Lectures
- CSS3
- CSV
- CTF
- CTM
- Ctools
- Cubert
- CUBRID
- CUDA
- Cultural Anthropology
- Cuneiform
- Curation
- Curiosity
- CXAIR
- CXTM
- Cybersecurity
- Cyc
- Cypher
- D Language
- D3
- DAG
- Damerau-Levenshtein Edit Distance
- Dark Data
- Dark Web
- DARPA
- Dart
- Dashboard
- Data
- DATA Act
- Data Aggregation
- Data Analysis
- Data as Service (DaaS)
- Data Attribution
- Data Auditing
- Data Citation
- Data Clustering
- Data Collection
- Data Contamination
- Data Contest
- Data Conversion
- Data Cubes
- Data Documentation Initiative (DDI)
- Data Engine
- Data Explorer
- Data Factorization
- Data Frames
- Data Fusion
- Data Governance
- Data Integration
- Data Locality
- Data Management
- Data Mining
- Data Models
- Data Pipelines
- Data Preservation
- Data Provenance
- Data Quality
- Data Reduction
- Data Replication
- Data Repositories
- Data Science
- Data Science Toolkit (DSTK)
- Data Silos
- Data Source
- Data Storytelling
- Data Streams
- Data Structures
- Data Types
- Data Virtualization
- Data Warehouse
- Data Without Borders
- Data-Scope
- Database
- Databus
- DataCaml
- DataCleaner
- DataFu
- DataJS
- DataKind
- Datalog
- Datamash
- Datameer
- Dataset
- DataStax
- Dataverse Network
- Dato
- Datomic
- Daytona
- DBpedia
- DCAT
- DCIP
- De Bruijn Graphs
- DEAP
- Debate
- Decentralized Internet
- Deception
- Decision Making
- Deductive Databases
- Deduplication
- Deep Learning
- Deep Web
- Defense
- Degree Program
- Deja vu
- Delite
- Demographics
- Dempsy
- Dendrite
- Denotational Semantics
- Dependency
- Dependency Graphs
- Description Logic
- Design
- Design Patterns
- Dewey – DDC
- DEX
- DHash
- DiaGen
- DiaMeta
- Diaphors
- Dictionary
- Digital Culture
- Digital Library
- Digital Research
- Dimension Reduction
- Dimensions
- Dimple
- Diplodocus
- Dirichlet Processes
- Disambiguation
- Disassortativity
- Disco
- Discourse
- DiscoverText
- Discovery Informatics
- Discrete Structures
- Disjunction (Widdows)
- Distance
- Distributed Computing
- Distributed Consistency
- Distributed Indexing
- Distributed RAM
- Distributed Sensemaking
- Distributed Systems
- Distributional Semantics
- Diversity
- Django
- DNA
- Document Classification
- Document Management
- Document Retention
- Documentation
- DocumentLens
- DOI
- DOM4
- Domain Change
- Domain Driven Design
- Domain Expertise
- Domain-Specific Languages
- Domesday Book
- Dr. Who
- Drake
- Dremel
- Drill
- Drizzle
- DRM
- Drug Discovery
- Druid
- Drupal
- Dryad
- DSL
- DSpace
- DSPL – Dataset Publishing Language
- DTS
- dtSearch
- Dublin Core
- Duke
- Duplicates
- Dwarf Cubes
- Dydra
- Dynamic Graphs
- Dynamic Mapping
- Dynamic Updating
- Dynamo
- e-Discovery
- EADitor
- eBay
- eBooks
- Eclipse
- Ecoinformatics
- eDiscovery
- Edit Distance
- Editor
- Education
- eGov
- Ehcache
- Elastic Map Reduce (EMR)
- ElasticSearch
- Electronic Frontier Foundation
- Electronic Records Management
- ElephantDB
- Elixir
- Elliptics
- Elm
- ELN Integration
- Emacs
- Email
- Emergent Semantics
- EmotionML
- Encoded Archival Description (EAD)
- Encog
- Encryption
- Encyclo
- Encyclopedia
- Endeca
- Enrichment
- Enron
- Ensemble Methods
- Enterprise Integration
- Enterprise Service Bus (ESB)
- Entertainment
- Entities
- Entity Extraction
- Entity Resolution
- Entity Salience
- Environment
- Epistemology
- EPUB3
- Equivalence Class
- Erjang
- Erlang
- Erotica
- Error Correction
- Esper
- ESPN
- eTBLAST
- Ethics
- Ethnological
- ETL
- EU
- European Parliment Proceedings Corpus
- Europeana
- Evaluation
- Event Stream
- EventMachine
- Evidential Logic
- Evoluntionary
- EWAB
- Examples
- Excel
- Excel Datascope
- Exercises
- eXist
- Explain.solr.pl
- Explicit Semantic Analysis
- Exploratory Data Analysis
- Expresso
- Expressor
- Extraction
- Extrinsic Semantics
- F-Score
- F#
- F1
- FAA
- Face Detection
- Facebook
- Faceted Search
- Facets
- Factor Analysis
- Factor Graphs
- Factorised Databases
- Fair Use
- Fake News
- Falcon
- FAST
- Fast Singular Value Decomposition
- FastBit
- Faunus
- FBI
- Feature Learning
- Feature Spaces
- Feature Vectors
- FEC
- Federated Search
- Federation
- Fellowships
- Feminism
- Ferguson
- Ferret (Ruby)
- Files
- Filters
- Finance Services
- Findability
- Finite State Automata
- Flash Cards
- Flash Storage
- Flex
- FlockDB
- Flow-Based Programming (FBP)
- Flowchart
- Fluentd
- Flume
- FluxGraph
- FM-Indexes
- FOIA
- Folklore
- Folksonomy
- Fonts
- Food
- Forecasting
- Formal Concept Analysis (FCA)
- Formal Methods
- Forth
- Fortress Language
- Forward Index
- FoundationDB
- Foursquare
- Fourth Paradigm
- Fractal Trees
- Fractals
- Frames
- FRBR
- Free Speech
- Freebase
- FreeMind
- FSTs
- Fulgora
- Full-Text Search
- Functional Decomposition
- Functional Genomics
- Functional Programming
- Functional Reactive Programming (FRP)
- Funding
- Furnace
- Fuseki
- Fusion Tables
- Fuzzing
- Fuzzy Logic
- Fuzzy Matching
- Fuzzy Sets
- G-Store (graphs)
- G-Store (Multikey)
- G2 Sensemaking
- GaBP
- Galaxy
- Galry
- Game of Life
- Game Theory
- Games
- Gatling
- gdb
- GDELT
- Gene Ontology
- Genealogy
- Genetic Algorithms
- Genome
- Genomics
- Geo Analytics
- Geo-Indexing
- Geoff
- Geographic Data
- Geographic Information Retrieval
- Geography
- GeoJSON
- Geologic Maps
- Geometry
- GeoNames
- Geophysical
- Georeferencing
- Geospatial Data
- Gephi
- Gephi Blueprints
- ggmap
- Ggplot2
- Ghidra
- GIMP
- Giraph
- GIS
- Gisgraphy
- Git
- Github
- Gizzard
- Globalsdb
- Glossary
- god Architecture
- GoldenOrb
- GoodRelations
- Google Analytics
- Google App Engine
- Google BigQuery
- Google Cloud
- Google Compute Engine
- Google Correlate
- Google CSE
- Google Docs
- Google Earth
- Google Knowledge Graph
- Google Maps
- Google Prediction
- Google Refine
- Google+
- GoogleBooks
- Gora
- Governance
- Government
- Government Data
- GPS
- GPU
- Grammar
- Graph Analytics
- Graph Coloring
- Graph Database Benchmark
- Graph Databases
- Graph Generator
- Graph Motif Model
- Graph Partitioning
- Graph Reading Club
- Graph Traversal
- GraphBuilder
- GraphChi
- GraphDB
- Graphene
- GraphGL
- Graphic Processors
- Graphical Models
- Graphics
- Graphillion
- Graphipedia
- Graphity
- GraphLab
- GraphML
- GraphPack
- GraphQL
- Graphs
- GraphStream
- Graphviz
- GraphX
- GRASS GIS
- Greek
- Green-Mari
- Greenplum
- Gremlin
- Griswold
- Grok – Numenta
- groonga
- Groovy
- Group identical values
- Group Theory
- GT.M
- Guassian Processes
- Guided Exploration
- Gutenberg Corpus
- GWT
- H20
- Hacking
- Hadapt
- Hadoop
- Hadoop YARN
- HAIL
- Halide
- Hama
- Hank
- Harvard
- Hashing
- Hashtags
- Haskell
- HBase
- HBase Coprocessor
- HCatalog
- HCIR
- HDFS
- HDInsight
- Health care
- Heatmaps
- HEP – High Energy Physics
- Hermes
- Heroku
- Heterogeneous Data
- Heterogeneous Programming
- HFile
- Hibari
- Hibernate
- Hidden Markov Model
- Hierarchical Temporal Memory (HTM)
- Hierarchy
- Hieroglyphics
- High Dimensionality
- High Order Sequence Memory
- Hilbert Curve
- Hillary Clinton
- HipG
- History
- Hive
- Hive Plots
- Hive-on-Spark
- HiveQL
- Holographic Embeddings
- Holographic Lexicon
- Homogenization
- Homographs
- Homoiconic
- Homology
- Homonymous
- Homotopy
- HoneyPots
- Hortonworks
- Hosting
- Hoya
- HPC
- HPCC
- HSA
- HSearch
- HStreaming
- HTML
- HTML Data
- HTML5
- HTree
- HTTP Speed+Mobility
- Hudson
- Hue
- Human Cognition
- Human Computation
- Human Rights
- Human-Computer Interaction Lab (HCIL)
- Humanities
- Humor
- HWAB
- Hydra
- HyperANF
- HyperDex
- Hyperdimensional Computing
- Hyperedges
- Hypergraphs
- HyperLogLog
- Hypernodes
- Hypernotation
- Hyperspace
- Hypertable
- Hypertext
- Hystrix
- HyTime
- I/O
- IBM Cognos
- ice
- Ideation
- Identification
- Identifiers
- Identity
- IDH HBase
- iFinder
- igraph
- iiBench
- IIIF (International Image Interoperability Framework)
- Image Processing
- Image Recognition
- Image synthesis
- Image Understanding
- IMDb
- Immutable
- Impala
- Implicit Associations
- InChl
- Indexicality
- Indexing
- IndexTank
- Indirect Inference
- Induction
- Inductive Logic Programming (ILP)
- Inexact
- Inference
- InfiniDB
- InfiniteGraph
- InfluxDB
- Infographics
- Infogrid
- Informatics
- Information Architecture
- Information Exchange
- Information Field Theory
- Information Flow
- Information Geometry
- Information Integration
- Information Overload
- Information Retrieval
- Information Reuse
- Information Science
- Information Sharing
- Information Silos
- Information Theory
- Information Workers
- InnoDB
- Insertion
- INSPIRE
- Instagram
- Insurance
- Integers
- Integration
- Intellectual Property (IP)
- Intelligence
- Intent
- Interactomics
- Interface Research/Design
- Interoperability
- Intersection Type
- Intrusion Detection
- Invenio
- IOPS
- IoT – Internet of Things
- IRC
- iReport
- iSAX
- ISBN
- Isidorus
- Islam
- ISO/IEC
- ISSN
- IT
- izik
- Jaccard Similarity
- JanusGraph
- JAQL
- Jargon
- Jasondb
- Jaspersoft
- Java
- Java Annotations
- JavaRx
- Javascript
- JBoss
- JDBC
- JDBM
- Jedis
- Jena
- JgraphT
- jHepWork
- Jigsaw File System
- Jobs
- Joins
- Journalism
- JPL
- JQuery
- JRuby
- JSON
- JSONiq
- JTC1
- JTM
- Jubatus
- Julia
- JUNG
- JZY3D
- K-Means Clustering
- K-Nearest-Neighbors
- Kafka
- KairosDB
- Kalman Filter
- Kamala
- kaon – Knowledge Attribution Ontology
- Karmasphere
- Katta
- KDD
- Keras
- Kernel Methods
- Kettle
- Key-Key-Value Stores
- Key-Value Stores
- Keywords
- Kibana
- KIji Project
- KitaroDB
- Kite SDK
- KML
- Knime
- Knoema
- Knowledge
- Knowledge Base Population
- Knowledge Capture
- Knowledge Discovery
- Knowledge Discovery Toolkit (KDT)
- Knowledge Economics
- Knowledge Engineering
- Knowledge Graph
- Knowledge Management
- Knowledge Map
- Knowledge Networks
- Knowledge Organization
- Knowledge Representation
- Knowledge Retention
- Knowledge Sharing
- Knox Gateway
- KSQL
- Kuria
- Kylin
- L-wrappers
- Labcoat
- LangSec
- Language
- Language Design
- Language Pyramid (LaP)
- Lasp
- Latent Dirichlet Allocation (LDA)
- Latent Semantic Analysis
- Lavastorm Desktop Public
- Law
- Law – Sources
- Layerscape
- LCCN
- LCSH
- LDIF
- Leaks
- Learning
- Learning Classifier
- Legal Entity Identifier (LEI)
- Legal Informatics
- LegalRuleML
- Legends
- Lemur Project
- LessJunk.org
- leveldb
- LevelGraph
- Levenshtein Distance
- Lexical Analyzer
- Lexicon
- LexisNexis
- LFE Lisp Flavored Erlang
- Librarian/Expert Searchers
- Library
- Library Associations
- Library software
- Licensing
- LiDAR
- Life Sciences
- Lily
- Linear Optimization
- Linear Regression
- LingPipe
- Lingual
- Linguistic Metadata
- Linguistics
- Link-IPLSI
- Linked Data
- Linked Lists
- LINQ
- Linux OS
- Lisp
- Literature
- Literature-based Discovery
- Local Search
- Localization
- Location Data
- Lock-Free Algorithms
- LOD
- Log Analysis
- Log-linear analysis
- Logic
- logstash
- LOV
- LTM – Linear Topic Map Notation
- Lucene
- LucidWorks
- Lucy
- Luke
- Lustre
- Lux
- LVars
- Lyra
- LZ77
- Mac OS X
- Machine Learning
- MADlib
- Mahout
- Maiana
- Main Memory Map Reduce (M3R)
- MaJorToM
- MALLET
- Maltego
- Malware
- Manuscripts
- MapBox
- MapD
- MapGraph
- Mapillary
- Mapping
- MapR
- MapReduce
- MapReduce 2.0
- MapReduceMerge
- MapReduceXMT
- Maps
- MARC
- MARCXML
- MariaDB
- Marketing
- MarkLogic
- Markov Decision Processes
- Mashups
- Masstree
- Master Data Management
- Mathematica
- Mathematical Reasoning
- Mathematics
- Mathematics Indexing
- Mathics
- Matrix
- Maven
- MDM
- Meaning
- Measurement
- Mechanical Turk
- Media
- Medical Informatics
- Meld
- Membase
- Memcached
- Meme
- Memory
- Merge Construct
- Merge Sort
- Merging
- Merging Operators
- Merkle Trees
- Meronymy
- MeSH
- Mesos
- Messaging
- Meta-analysis
- Metabolomics
- Metadata
- Metaheuristics
- MetaMap
- MetaModel
- Metaphors
- Metaservices
- Metasploit
- Metathesaurus
- Metric Spaces
- MG4J
- Microdata
- Microformats
- Microscopy
- Microsoft
- Military
- Mind Maps
- Minimum Description Length
- Mio
- MIT
- Mizan
- ML-Flex
- MLBase
- Mobile Gov
- Modeling
- Molecular Graphs
- Monads
- Mondrian
- MongoDB
- MongoGraph
- Mongraph
- MonoTable
- Monte Carlo
- Morphlines
- Mortar
- MPEG-7
- MPI
- MPP
- MRQL
- MRUnit
- mSDA
- MuckRock
- Mule
- Multi-Core
- Multi-Database Mining
- Multi-Relational
- Multidimensional
- Multilingual
- Multimaps
- Multimedia
- Multiperspective
- Multisets
- MultiThreaded Graph Library (MTGL)
- Multivariate Statistics
- Multiview Learning
- MUMPS
- Museums
- Music
- Music Retrieval
- Mutual Information Classifiers
- MySQL
- Myth
- N-Body Simulation
- N-Gram
- N-Grams
- N1QL
- Naiad
- Naked Objects
- Named Entity Mining
- Named Scopes
- Names
- Namespace
- NAQ-tree
- Narrative
- NASA
- National Information Exchange Model NIEM
- National Security
- Natural Language Processing
- Navigation
- Nearest Neighbor
- Negation (Widdows)
- Neighborhood
- Neighbors
- NEM
- Neo4j
- Neo4j.rb
- Neo4jClient
- Neo4jD
- Neoclipse
- Neocons
- Neography
- Neovigator
- Nephele
- nessDB
- Nested Sets
- Netezza
- NetflixGraph
- Network Security
- Networks
- NetworkX
- Neural Information Processing
- Neural Networks
- Neuroinformatics
- New York
- News
- Newspeak
- NewSQL
- Ngram Viewer
- NHibernate
- NiFi
- NIFTY
- NIH
- NISO
- NIST
- NkBASE
- NLTK
- NOAA
- node-js
- NodeBox
- NodeGL
- NodeXL
- noms
- Non-Profit
- Nonlinear Models
- NonMetric Indexing
- Nonmetric Similarity
- Nonparametric
- Normalization
- NoSQL
- Novelty
- NSA
- NSF
- Numerical Analysis
- Numerical Information Field Theory
- Numpy
- NuoDB
- Nutch
- NVIDIA
- OAI
- Oceanography
- OCLC
- OCLC Number
- OCR
- Odata
- ODBC
- ODBMS
- oDesk
- OLAP
- Omnigator
- Onboarding
- OnionRunner
- Online Harassment
- Onomastics
- Ontogeny
- Ontolica
- Ontological Emptiness
- Ontology
- Ontopia
- Ontopoly
- OODT
- Oomap Loomap
- Oozie
- OPACS
- Open Access
- Open Babel
- Open Data
- Open Government
- Open Graph Database Protocol
- Open Relevance Project
- Open Science
- Open Semantic Framework
- Open Source
- Open Source Intelligence
- Open Street Map
- OpenCalais
- OpenCL
- OpenCV
- OpenFrameworks
- OpenGL
- OpenMeetings
- OpenNLP
- OpenOffice
- OpenRefine
- OpenSearch.org
- OpenShift
- OpenStack
- OpenStreetMap
- OpenTSDB
- OpenURL
- Operational Equivalence
- Operations
- Operations Research
- Opinions
- Optique
- Oracle
- Orange
- Organic Programming
- Organization
- Organizational Memory
- OrientDB
- ORM
- OS X
- osquery
- Outlier Detection
- Overlapping Sets
- OWL
- Oyster
- P2P
- PacketPig
- PACT
- PageRank
- Pajek
- Palaeography
- Palantir
- Panama Papers
- Pandas
- Panopticon
- Paradise Papers
- Parallel Programming
- Parallel Sets
- Parallela
- Parallelism
- ParalleX
- Parquet
- Parsers
- Parsing
- Partially Observable
- Particle Physics
- Patents
- Path Algebra
- Path Enumeration
- Pathfinders
- Pathology Informatics
- Pattern Compression
- Pattern Matching
- Pattern Recognition
- Paxos
- PCAT
- PDF
- Peer Review
- Pegasus
- Peirce
- Pentaho
- Perception
- Perceptron
- Percona Server
- Peregrine
- Performance
- Perl
- Persistent Search URLs
- Personal
- Personalization
- Persuasion
- Pervasive RushAnalyzer
- Pervasive Tracking
- Petuum
- PGStrom
- Pharmaceutical Research
- Philosophy
- Philosophy of Science
- Phishing for Leaks
- Phoebus
- Phoenix
- Photo-Reconnaissance
- PHP
- PHPTMAPI
- Phylogenetic Trees
- Physics
- Piccolo
- Pig
- Pinot
- Pipelines (Oil/Gas)
- Pipes
- PivotPaths
- Pivotviewer
- Plagiarism
- Plasma
- PLFS
- PLOS
- Plotly
- Plug Computers
- Podcasting
- Politics
- PolyBase
- Polyglot Persistence
- Polyhedrons
- Polymorphism
- Polysemy
- Polytopes
- PolyZoom
- POMDPs
- Porn
- POS
- POSIX Threads
- Postgre-XL
- PostgreSQL
- PostScript
- PouchDB
- Power Law Distributions
- PowerPivot
- Precision
- Predicate Dispatch
- Prediction
- Predictive Analytics
- Predictive Model Markup Language (PMML)
- Prefix Operators
- Pregel
- Presentation
- Preservation
- Presto
- Principal Component Analysis (PCA)
- Privacy
- Probabilistic Data Structures
- Probabilistic Database
- Probabilistic Graphical Models
- Probabilistic Programming
- Probabilistic Ranking
- Probability
- Probablistic Counting
- Probalistic Models
- Problem Solving
- Processing
- Processing.js
- Procrustes Transformation
- Profiling
- Programming
- Project Management
- Project Rhino
- Projection
- Prolog
- Proof Theory
- Proofing
- Propagator
- Properties
- Protégé
- Proteomics
- Protests
- Protovis
- Provenance
- Proxies
- Proxy Servers
- PSI
- PSPP
- Psychology
- PubChem
- Public Data
- Publishing
- PubMed
- PubMed Watcher
- Pulsar
- Punch Cards
- Purge
- Puzzles
- py2neo
- PyData
- Pyed Piper
- Pygmalion
- Python
- Python-Graph
- QFS
- QGIS
- Qi4j
- Qlikview
- QR Codes
- QSAR
- QuaaxTM
- quadrigram
- Quantitative Analysis
- Quantities
- Quantum
- Query Engine
- Query Expansion
- Query Language
- Query Rewriting
- Quid
- Quran
- R
- R Markdown
- R-Trees
- R2ML
- R2R
- R2RML
- Rabbithole
- Radare2
- Radio
- Random Forests
- Random Indexing
- Random Numbers
- Random Walks
- Randomness
- Ranganathan
- Rank Correlation
- Ranking
- Ransomware
- RapidMiner
- RaptorDB
- Raspbery-Pi
- Rattle
- RavenDB
- RDA
- RDB
- RDBMS
- RDF
- RDF Data Cube Vocabulary
- RDFa
- Reachability
- React
- Reading
- ReadMe
- Reasoning
- Recall
- Recognition
- Recommendation
- Record Linkage
- Record Resolution Systems (RSS)
- Red Hat
- Red Teaming
- Reddit
- Redis
- Rediscovery
- REEF
- Reegle – Thesaurus
- Reference
- Reflection
- Regex
- Regexes
- Regression
- Reidentification
- Reification
- Reinforcement Learning
- Related
- Relation Extraction
- Relationship Persistence
- Relaxation
- Relevance
- Religion
- Remote Method Invocation (RMI)
- Remote Sensing
- Replica Sets
- Replication
- Reporting
- Requirements
- Research Methods
- Researchers
- Reservoir Sampling
- Restricted Bolzmann Machines
- RESTx
- RethinkDB
- Retrieval
- Reverse Data Management
- Reverse Engineering
- Reviews
- Rewriting
- Rexster
- RexterGraph
- RFI-RFP
- RHadoop
- Rhetoric
- RHIPE
- Riak
- Riak CS
- Ripples
- rNews
- RocksDB
- Roles
- ROMA
- Rough Sets
- Roughness
- RSS
- RSS River Plugin
- Ruby
- RuleML
- Rust
- Rx
- Rya
- S4
- SaaS
- Sage
- Sail
- Sampling
- Samza
- SAP
- SAP HANA
- SAP MaxDB
- SAP Visual Intelligence
- SAX
- Saxon
- Scala
- Scalability
- ScalaNLP
- Scalaris
- Scalding
- Scale-Free
- ScaleGraph
- Schema
- Schema.org
- Scheme
- SciDB
- Science
- Scientific Computing
- Scikit-Learn
- SciVerse
- Scoobi
- Scope
- Scrunch
- SDA
- SDDC
- SDMX
- SDShare
- Search Algorithms
- Search Analytics
- Search Behavior
- Search Data
- Search Engines
- Search History
- Search Interface
- Search Potpourri
- Search Requirements
- Search Trees
- SearchBlox
- Searching
- SEC
- SecureGraph
- Security
- Sed
- Segmentation
- Sehrch.com
- Self Organizing Maps (SOMs)
- Self-organization
- Self-Organizing
- Semantator
- Semantic Annotation
- Semantic Colonialism
- Semantic Diversity
- Semantic Graph
- Semantic Inconsistency
- Semantic Overlay Network
- Semantic Search
- Semantic Vectors
- Semantic Web
- Semantics
- Semi-Structured Data
- Semi-structured Knowledge Bases
- Semiotics
- SENNA
- Sense
- Sensei
- Sensemaking
- Sentiment Analysis
- Sequence Classification
- Sequence Detection
- Serendipity
- Serengeti
- Sesame
- Set Intersection
- Set Reconciliation
- Sets
- sexism
- Shannon
- Shape
- SHARD
- Shard-Query
- Sharding
- SharePoint
- Shark
- Shell Scripting
- Shep
- Shodan
- SIGKDD
- Sigma.js
- Signal Processing
- Signal/Collect
- Silos
- Silverlight
- Similarity
- Similarity Retrieval
- Simple Web Semantics
- Simulated Annealing
- Simulations
- Sindice
- Singular Value Decomposition (SVD)
- Skepticism
- Sketchnotes
- Skip Graph
- Skip List
- SKOS
- Skytree
- SlamData
- Small World
- Smart-Phones
- snapLogic
- Snarl [Protocol]
- SNOBOL
- SNOMED
- Snooze
- Social Graphs
- Social Media
- Social Networks
- Social Sciences
- Socioeconomic Data
- Socrata Open Data Server
- Soft Sets
- Software
- Software Engineering
- Software Preservation
- Solandra
- Solarium
- Solr
- SolrCloud
- SolrMarc
- Sonification
- Sorting
- Sound
- SoundEx
- Space Data
- Spam
- SPAMS
- Spanner
- Spark
- SPARQL
- Sparse Data
- Sparse Distributed Representation SDR
- Sparse Image Representation
- Sparse Learning
- Sparse Matrices
- Spatial Data
- Spatial Index
- Spectral Clustering
- Spectral Evolution Model
- Spectral Feature Selection
- Spectral Graph Theory
- Speech Recognition
- Sphinx
- SpiderStore
- Splunk
- Spreadsheets
- Spring
- Spring Data
- Spring Hadoop
- Springer
- SQL
- SQL Server
- SQL-NoSQL
- SQLite
- Sqoop
- SSTable
- Stanbol
- Standards
- Stanford NLP
- Starcounter
- Stardog
- State Machine
- Statistical Core Vocabulary (scovo)
- Statistical Learning
- Statistically Improbable Phrases (SIPs)
- Statistics
- STEFFI
- Steganography
- Stemming
- STIG Database
- STINGER
- Storage
- Storm
- Storyboarding
- Stream Analytics
- Streambase
- Streams
- String Matching
- structr
- Structured Data
- Students
- Subgraphs
- Subject Authority
- Subject Experts
- Subject Headings
- Subject Identifiers
- Subject Identity
- Subject Locators
- Subject Recognition
- Subjective Logic
- Suffix Array
- Suffix Tree
- Summa
- Summarization
- Summify
- Summingbird
- SUMO
- Supercomputing
- Superpositioning
- Support Vector Machines
- Surrogate Learning
- Survey
- SVG
- SVO
- Swarms
- Symbol
- Synchronization
- Synonymy
- Systematic Literature Review (SLR)
- Systems Administration
- Systems Research
- Tableau
- Tables
- TabLinker/UnTabLinker
- Tabu Search
- Tabula
- Tachyon
- Tagging
- Tails
- Talend
- Tall Data
- Tamana
- Taxonomy
- Teaching
- Teiid
- Telecommunications
- Telegram App
- Templates
- Temporal Data
- Temporal Graph Database
- Temporal Semantic Analysis
- TensorFlow
- Tensors
- Tenzing
- Teradata
- Terminology
- Terrastore
- Terrorism
- Tessera
- TeX/LaTeX
- Text Analytics
- Text Coherence
- Text Corpus
- Text Encoding Initiative (TEI)
- Text Extraction
- Text Feature Extraction
- Text Mining
- Text Series
- Texts
- Textual Entailment
- Tez
- TF-IDF
- Thesaurus
- Theses/Dissertations
- Three.js
- Tika
- Time
- Time Series
- Timelines
- TimesOpen
- TinkerGraph
- TinkerPop
- Titan
- TMAPI
- TMCL
- TMCore
- TMDM
- TMQL
- TMQL4J
- TMRM
- Toad
- TokuDB
- Tokutek
- tolog
- Top-k Query Processing
- Topic Map Software
- Topic Map Systems
- Topic Maps
- Topic Models
- Topic Models (LDA)
- Topincs
- Topography
- Topological Data Analysis
- Topology
- Tor
- TPC-H
- Tracing
- Trails
- Training
- Translation
- Translation Memory
- Transparency
- Travel
- Traversal
- TREC
- Trees
- Tribes
- Tries
- Triggers
- Trinity
- TripleRush
- Triplestore
- Truffler
- TSearch
- Tuple MapReduce
- Tuple-Join MapReduce
- Tuples
- Turing Machines
- TUSTEP/TXSTEP
- Tweets
- Twister
- Twitter
- Typeahead Search
- Types
- Typography
- Ubigraph
- UIMA
- UMBEL
- UML
- Umlaut
- UMLS
- Uncategorized
- Uncertainty
- Unicode
- Union Type
- Unit Testing
- Units
- Unstructured Data
- Urika
- Usability
- Usage
- Use Cases
- User Targeting
- Usergrid
- Users
- UX
- Vagrant
- Vagueness
- Vault 7
- Vector Space Model (VSM)
- Vectors
- Vega
- Velocity
- Velox
- Venn Diagrams
- Verification
- Version Vectors
- Versioning
- Video
- Video Conferencing
- Viral
- Virtual Documents
- Virtual Machines
- Virtualization
- Virtuoso
- Virus
- Visual Query Language
- Visualization
- VIVO
- Vizigator
- Vocabularies
- Vocabulary Mismatch
- VocBench
- Voldemort
- VoltDB
- Volunteer
- Volunteers
- von Neumann Architecture
- VOStat
- Vowpal Wabbit
- Voyeur
- W3C
- Wakanda
- Wandora
- Wargames
- Warp
- Wavelet Transforms
- Wavelet Trees
- Wavii
- Weaponize Data
- Weaponized Open Data
- Weather Data
- Weave
- Web Analytics
- Web Applications
- Web Browser
- Web Conferencing
- Web History
- Web Scrapers
- Web Scraping
- Web Server
- Webcrawler
- WebGL
- WebGraph
- Weka
- Westlaw
- Whirr
- WhiteDB
- Whoosh
- Wibidata
- Wicked Problems
- Wide Data
- Wiki
- Wikidata
- Wikileaks
- WikiMaps
- Wikipedia
- Wikistream
- Windows Azure
- Windows Azure Marketplace
- Wolfram Language
- WolframAlpha
- Wonderdog
- Word Association
- Word Cloud
- Word Meaning
- Word Processing
- Wordmap
- WordNet
- Workflow
- WorldCat
- Writing
- WS-LDA
- WWT
- WWW
- X3DOM
- Xanadu
- Xapian
- XBRL
- XDATA
- XInclude
- XKOS
- XLDB
- XLink
- XML
- XML Data Clustering
- XML Database
- XML Query Rewriting
- XML Schema
- XNAT
- XPath
- XProc
- XQilla
- XQuery
- XSLT
- XTM
- Yahoo!
- YarcData
- YARS2
- YCSB
- Zing JVM
- Zoltan
- Zookeeper
- Zorba
- Zotero
- Zotonic
Search:
Archives:
- May 2020
- March 2020
- October 2019
- September 2019
- July 2019
- June 2019
- May 2019
- April 2019
- March 2019
- February 2019
- January 2019
- December 2018
- November 2018
- October 2018
- September 2018
- August 2018
- July 2018
- June 2018
- May 2018
- April 2018
- March 2018
- February 2018
- January 2018
- December 2017
- November 2017
- October 2017
- September 2017
- August 2017
- July 2017
- June 2017
- May 2017
- April 2017
- March 2017
- February 2017
- January 2017
- December 2016
- November 2016
- October 2016
- September 2016
- August 2016
- July 2016
- June 2016
- May 2016
- April 2016
- March 2016
- February 2016
- January 2016
- December 2015
- November 2015
- October 2015
- September 2015
- August 2015
- July 2015
- June 2015
- May 2015
- April 2015
- March 2015
- February 2015
- January 2015
- December 2014
- November 2014
- October 2014
- September 2014
- August 2014
- July 2014
- June 2014
- May 2014
- April 2014
- March 2014
- February 2014
- January 2014
- December 2013
- November 2013
- October 2013
- September 2013
- August 2013
- July 2013
- June 2013
- May 2013
- April 2013
- March 2013
- February 2013
- January 2013
- December 2012
- November 2012
- October 2012
- September 2012
- August 2012
- July 2012
- June 2012
- May 2012
- April 2012
- March 2012
- February 2012
- January 2012
- December 2011
- November 2011
- October 2011
- September 2011
- August 2011
- July 2011
- June 2011
- May 2011
- April 2011
- March 2011
- February 2011
- January 2011
- December 2010
- November 2010
- October 2010
- September 2010
- August 2010
- July 2010
- June 2010
- May 2010
- April 2010
- March 2010
Meta:
- Log in
- RSS
- Comments RSS
- Valid XHTML
- XFN
- WP

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

March 6, 2012

Using your Lucene index as input to your Mahout job – Part I

1 Comment