Big Data 259 - Griffin Configuration
Griffin Key Features
- Data Quality Assessment: Supports rule-based and model-based quality assessment, allowing definition of rules for completeness, accuracy, consistency, validity, and timeliness
- Quality Rule Definition and Management: Users can define custom rules, use JSON format to describe data quality requirements, and periodically check data
- Flexible Data Source Support: Supports HDFS, Hive, Kafka, HBase, etc., handling both batch and streaming processing modes
- Multi-dimensional Data Quality Monitoring: Supports evaluation based on multiple dimensions such as time, location, and data source
- Visual Interface: View data quality assessment results, reports, warning information, etc.
- Integration and Compatibility: Highly integrated with Hadoop, Spark, and other big data platforms
- Automated Repair: Supports automatic repair of some data quality issues, such as filling missing values
- Extensibility: Provides extension interfaces and plugin mechanisms
Configuration Modifications
pom.xml
Modify service/pom.xml, add MySQL dependency:
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>${mysql.java.version}</version>
</dependency>
application.properties
Configure database, Hive metastore, Elasticsearch, Livy, etc.:
- Server port: 9876
- Database connection: jdbc:mysql://h123.wzk.icu:3306/quartz
- Hive metastore: thrift://h123.wzk.icu:9083
- Elasticsearch: h123.wzk.icu:9200
- Livy: http://0.0.0.0:8998/batches
quartz.properties
Modify line 26:
org.quartz.jobStore.driverDelegateClass=org.quartz.impl.jdbcjobstore.StdJDBCDelegate
sparkProperties.json
Configure Spark parameters, add hive-site.xml:
{
"file": "hdfs:///griffin/griffin-measure.jar",
"className": "org.apache.griffin.measure.Application",
"name": "griffin",
"queue": "default",
"numExecutors": 2,
"executorCores": 1,
"driverMemory": "1g",
"executorMemory": "1g",
"conf": {
"spark.yarn.dist.files": "hdfs:///spark/spark_conf/hive-site.xml"
}
}
env_batch.json
Configure Sinks (CONSOLE, HDFS, ELASTICSEARCH)
Compilation
cd /opt/servers/griffin-0.5.0
mvn -Dmaven.test.skip=true clean install
Copy Jar Files
cp service-0.5.0.jar /opt/servers/griffin-0.5.0/
cp measure-0.5.0.jar /opt/lagou/servers/griffin-0.5.0/griffin-measure.jar
hdfs dfs -mkdir /griffin
hdfs dfs -put griffin-measure.jar /griffin
Start Service
cd /opt/servers/griffin-0.5.0
nohup java -jar service-0.5.0.jar > service.out 2>&1 &
Access address: http://h122.wzk.icu:9876