Skip to content

Watch a directory for logs and send each line to a Fusion pipeline as a PipelineDocument using grok for parsing.

License

Notifications You must be signed in to change notification settings

lucidworks/fusion-log-indexer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fusion-log-indexer

This project offers up a number of tools designed to quickly and efficiently get logs into Fusion. It supports a pluggable Parsing strategy (with implementations for Grok, DNS, JSON and NoOp) as well as a number of preconfigured Grok patterns similar to what is available in Logstash and other engines.

Features

  1. Fast, multithreaded, lightweight client for installation on machines to be monitored
  2. Pluggable log parsing logic with support for a variety of formats, including Grok, JSON and DNS
  3. Ability to watch directories and automatically index new content
  4. Integration with Lucidworks Fusion

Getting Started

Prerequisites

  1. Maven (http://maven.apache.org)
  2. Java 1.7 or later
  3. Lucidworks Fusion

Building

After cloning the repository, do the following on the command line:

  1. mvn package // -DskipTests if you want to skip the tests

The output JAR file is in the target directory

Running

  1. To see all options: java -jar ./target/fusion-log-indexer-1.0-exe.jar

Basic Examples

  1. Watches and sends in logs from old Lucidworks Search system in 500 at a time to the my_collection collection using the default pipeline:java -jar ./target/fusion-log-indexer-1.0-exe.jar -dir ~/projects/content/lucid/lucidfind/logs/ -fusion "http://localhost:8764/api/apollo/index-pipelines/my_collection-default/collections/my_collection/index" -fusionUser USER_HERE -fusionPass PASSWORD_HERE -senderThreads 4 -fusionBatchSize 500 --verbose -lineParserConfig sample-properties/lws-grok-parser.properties

  2. Nagios example: java -jar ./target/fusion-log-indexer-1.0-exe.jar -dir ~/projects/content/nagios/ -fusion "http://localhost:8764/api/apollo/index-pipelines/nagios-default/collections/nagios/index" -fusionUser USER -fusionPass PASSWORD -lineParserConfig sample-properties/nagios-grok-parser.properties

Multi-line Parsing Example

Let's see how to handle parsing of a solr log file that has single-line and multi-line log messages (such as stacktraces). Specifically, we'll see how to parse the following snippet from a log generated by Solr 6.5.1:

INFO  - 2017-06-01 13:58:13.153; [   ] org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null path=/admin/cores params={indexInfo=false&wt=json&_=1496325489976} status=0 QTime=0
INFO  - 2017-06-01 13:58:13.165; [   ] org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null path=/admin/info/system params={wt=json&_=1496325489977} status=0 QTime=12
INFO  - 2017-06-01 13:58:13.169; [   ] org.apache.solr.handler.admin.CollectionsHandler; Invoked Collection Action :list with params action=LIST&wt=json&_=1496325489977 and sendToOCPQueue=true
INFO  - 2017-06-01 13:58:13.170; [   ] org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null path=/admin/collections params={action=LIST&wt=json&_=1496325489977} status=0 QTime=0
ERROR - 2017-06-01 13:58:23.840; [c:gettingstarted s:shard1 r:core_node1 x:gettingstarted_shard1_replica1] org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: undefined field: "notafield"
        at org.apache.solr.schema.IndexSchema.getField(IndexSchema.java:1239)
        at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:438)
        at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:405)
        at org.apache.solr.request.SimpleFacets.lambda$getFacetFieldCounts$0(SimpleFacets.java:803)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at org.apache.solr.request.SimpleFacets$3.execute(SimpleFacets.java:742)
        at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:818)
        at org.apache.solr.handler.component.FacetComponent.getFacetCounts(FacetComponent.java:329)
        at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:273)
        at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:295)
        at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:2440)
        at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
        at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
        at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:347)
        at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:298)
INFO  - 2017-06-01 13:58:23.842; [c:gettingstarted s:shard1 r:core_node1 x:gettingstarted_shard1_replica1] org.apache.solr.core.SolrCore; [gettingstarted_shard1_replica1]  webapp=/solr path=/select params={q=*:*&facet.field=notafield&indent=on&facet=on&wt=json&_=1496325493119} hits=32 status=400 QTime=61

A grok pattern from the resources/patterns/solr file for this log could be:

SOLR_651_LOG4J %{LOGLEVEL:level_s} - %{TIMESTAMP_ISO8601:logdate}; \[(?:%{DATA:mdc_s}| )\] %{DATA:category_s}; \[(?:%{DATA:core_s}| )\] %{JAVALOGMESSAGE:logmessage}

NOTE: You don't need to worry about multiple spaces as the parser collapses multiple whitespace characters down to a single space automatically.

Notice that the timestamp in the log has format: yyyy-MM-dd HH:mm:ss.SSS. Consequently, you'll need to set the following property in your log parser properties file:

dateFieldFormat=yyyy-MM-dd HH:mm:ss.SSS

Lastly, if you want to parse more fields from search requests, you can set the following property:

solrRequestGrokPattern=%{SOLR_6_REQUEST}

The final parser properties file you'll need to parse the example log above is:

grokPatternFile=patterns/grok-patterns
grokPattern=%{SOLR_651_LOG4J}
iso8601TimestampFieldName=timestamp_tdt
dateFieldName=logdate
dateFieldFormat=yyyy-MM-dd HH:mm:ss.SSS
logMessageFieldName=message_txt_en
solrRequestGrokPattern=%{SOLR_6_REQUEST}

To parse the example log, save the example log entries above into solr_example/solr.log and then run:

java -jar target/fusion-log-indexer-1.0-exe.jar -dir solr_example \
  -lineParserClass parsers.SolrLogParser -lineParserConfig solr_log_parser.properties \
  -parseOnly

Contributing

Please submit a pull request against the master branch with your changes.

Grokking Grok

For Grok, we are using https://github.com/thekrakken/java-grok/ implementation, which is a little thin on documentation. However, there are some useful tools available for learning and working with Grok. Additionally, see the src/main/resources/patterns directory for examples ranging from Apache logs to MongoDB to Nagios.

Useful Sites:

  1. http://grokconstructor.appspot.com/do/match
  2. Syntax: http://grokconstructor.appspot.com/RegularExpressionSyntax.txt

About

Watch a directory for logs and send each line to a Fusion pipeline as a PipelineDocument using grok for parsing.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages