Avro Getting Started (Java Maven)


In this tutorial I am going to explain Arvo with Java Maven.
as we know that Maven is very good tool to create Java applications

Steps:

  1. Create Maven Project in Eclipse.
  2. Add Avro Maven Dependency

2. Add Avro Maven Dependency

add the following dependency to
your POM:


  org.apache.avro
  avro
  1.8.2


As well as the Avro Maven plugin (for performing code generation):


  org.apache.avro
  avro-maven-plugin
  1.8.2
  
    
      generate-sources
      
        schema
      
      
        ${project.basedir}/src/main/avro/
        ${project.basedir}/src/main/java/
      
    
  


  org.apache.maven.plugins
  maven-compiler-plugin
  
    1.6
    1.6
  


3.Defining a schema
Avro schemas are defined using JSON. Schemas are composed of primitive types (null, boolean, int, long, float, double, bytes, and string) and complex types (record, enum, array, map, union, and fixed). You can learn more about Avro schemas and types from the specification, but for now let’s start with a simple schema example

{"namespace": "com.theprogrammersbook.avro",
 "type": "record",
 "name": "ThePerson",
 "fields": [
     {"name": "id", "type": "int"},
     {"name": "username",  "type": ["string", "null"]},
     {"name": "email_address",  "type": ["string", "null"]},
     {"name": "phone_number",  "type": ["string", "null"]},
     {"name": "first_name",  "type": ["string", "null"]},
     {"name": "last_name",  "type": ["string", "null"]},
     {"name": "middle_name",  "type": ["string", "null"]},
     {"name": "sex",  "type": ["string", "null"]},
     {"name": "birthdate", "type": ["string", "null"]},
     {"name": "join_date", "type": ["string", "null"]},
     {"name": "previous_logins", "type": ["int", "null"]},
     {"name": "last_ip", "type": ["string", "null"]}
 ]
}

4. Run the Maven Install
Select the Project and right client on it and select Run and then select the Maven Install .
The following way it will give response (something like this)

[INFO] Scanning for projects...
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for com.theprogrammersbook:avro.example:jar:0.0.1-SNAPSHOT
[WARNING] 'build.plugins.plugin.version' for org.apache.maven.plugins:maven-compiler-plugin is missing. @ line 44, column 12
[WARNING] 
[WARNING] It is highly recommended to fix these problems because they threaten the stability of your build.
[WARNING] 
[WARNING] For this reason, future Maven versions might no longer support building such malformed projects.
[WARNING] 
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building avro.example Maven Webapp 0.0.1-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- avro-maven-plugin:1.7.7:schema (default) @ avro.example ---
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ avro.example ---
[WARNING] Using platform encoding (Cp1252 actually) to copy filtered resources, i.e. build is platform dependent!
[INFO] skip non existing resourceDirectory E:\arvoProject\avro.example-master\src\main\resources
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ avro.example ---
[INFO] Changes detected - recompiling the module!
[WARNING] File encoding has not been set, using platform encoding Cp1252, i.e. build is platform dependent!
[INFO] Compiling 3 source files to E:\arvoProject\avro.example-master\target\classes
[INFO] 
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ avro.example ---
[WARNING] Using platform encoding (Cp1252 actually) to copy filtered resources, i.e. build is platform dependent!
[INFO] skip non existing resourceDirectory E:\arvoProject\avro.example-master\src\test\resources
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ avro.example ---
[INFO] No sources to compile
[INFO] 
[INFO] --- maven-surefire-plugin:2.12.4:test (default-test) @ avro.example ---
[INFO] 
[INFO] --- maven-jar-plugin:2.4:jar (default-jar) @ avro.example ---
[INFO] Building jar: E:\arvoProject\avro.example-master\target\avro.example.jar
[INFO] 
[INFO] --- maven-install-plugin:2.4:install (default-install) @ avro.example ---
[INFO] Installing E:\arvoProject\avro.example-master\target\avro.example.jar to C:\Users\Yind\.m2\repository\com\theprogrammersbook\avro.example\0.0.1-SNAPSHOT\avro.example-0.0.1-SNAPSHOT.jar
[INFO] Installing E:\arvoProject\avro.example-master\pom.xml to C:\Users\Yind\.m2\repository\com\theprogrammersbook\avro.example\0.0.1-SNAPSHOT\avro.example-0.0.1-SNAPSHOT.pom
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 4.927 s
[INFO] Finished at: 2018-07-14T13:17:09+05:30
[INFO] Final Memory: 18M/174M
[INFO] ------------------------------------------------------------------------

When we observe the project structure, we came to know that One Package with Name com.theprogrammersbook.avro have been created and One Class : ThePerson.java has been created .
How ? we have added arvo plug in in pom.xml as avro-maven-plugin.
So that it has been created the package and class. Okay.
Who told the Package Name ? When we observe the Schema , we have given the Name Space as : “namespace”: “com.theprogrammersbook.avro” . So that this same package name has been created.
Who told the Class Name ?When we observe the Schema , we have given the Name as : “name”: “ThePerson”, . So that this same package name has been created.

So,Now we have Schema and Schema related class.
Now we have to serialize the Data.

5. Serializing

Now that we’ve created our user objects, serializing and deserializing them is almost identical to the example above which uses code generation. The main difference is that we use generic instead of specific readers and writers. First we’ll serialize our persons to a data file on disk.

package com.theprogrammersbook.avro.serialize;

import java.io.File;
import java.io.IOException;

import org.apache.avro.file.DataFileWriter;
import org.apache.avro.io.DatumWriter;
import org.apache.avro.specific.SpecificDatumWriter;

import com.theprogrammersbook.avro.ThePerson;

public class SerializationTest {
	public static void main(String[] args) {

		ThePerson p1 = new ThePerson();
		p1.setId(1);
		p1.setUsername("mrscarter");
		p1.setFirstName("Beyonce");
		p1.setLastName("Knowles-Carter");
		p1.setBirthdate("1981-09-04");
		p1.setJoinDate("2016-01-01");
		p1.setPreviousLogins(10000);

		ThePerson p2 = new ThePerson();
		p2.setId(2);
		p2.setUsername("jayz");
		p2.setFirstName("Shawn");
		p2.setMiddleName("Corey");
		p2.setLastName("Carter");
		p2.setBirthdate("1969-12-04");
		p2.setJoinDate("2016-01-01");
		p2.setPreviousLogins(20000);
		// Serialize sample BdPerson
		File avroOutput = new File("theperson-test.avro");
		try {
			DatumWriter bdPersonDatumWriter = new SpecificDatumWriter(ThePerson.class);
			DataFileWriter dataFileWriter = new DataFileWriter(bdPersonDatumWriter);
			dataFileWriter.create(p1.getSchema(), avroOutput);
			dataFileWriter.append(p1);
			dataFileWriter.append(p2);
			dataFileWriter.close();
			System.out.println("Writing Complleted.....");
		} catch (IOException e) {
			System.out.println("Error writing Avro");
		}
	}
}

OutPut:
Writing Complleted.....

We create a DatumWriter, which converts Java objects into an in-memory serialized format. Since we are not using code generation, we create a GenericDatumWriter. It requires the schema both to determine how to write the GenericRecords and to verify that all non-nullable fields are present.

As in the code generation example, we also create a DataFileWriter, which writes the serialized records, as well as the schema, to the file specified in the dataFileWriter.create call. We write our users to the file via calls to the dataFileWriter.append method. When we are done writing, we close the data file.

6. Deserializing
Finally, we’ll deserialize the data file we just created.
// Deserialize persons from disk

package com.theprogrammersbook.avro.deserialize;

import java.io.File;
import java.io.IOException;

import org.apache.avro.file.DataFileReader;
import org.apache.avro.io.DatumReader;
import org.apache.avro.specific.SpecificDatumReader;

import com.theprogrammersbook.avro.ThePerson;

public class DeserializeTest {
	public static void main(String[] args) {
		// Deserialize sample avro file
		try {
			DatumReader bdPersonDatumReader = new SpecificDatumReader(ThePerson.class);
			DataFileReader dataFileReader = new DataFileReader(new File("theperson-test.avro"),
					bdPersonDatumReader);
			ThePerson p = null;
			while (dataFileReader.hasNext()) {
				p = dataFileReader.next(p);
				System.out.println(p);
			}
		} catch (IOException e) {
			System.out.println("Error reading Avro");
		}
	}
}


Output:

{"id": 1, "username": "mrscarter", "email_address": null, "phone_number": null, "first_name": "Beyonce", "last_name": "Knowles-Carter", "middle_name": null, "sex": null, "birthdate": "1981-09-04", "join_date": "2016-01-01", "previous_logins": 10000, "last_ip": null}
{"id": 2, "username": "jayz", "email_address": null, "phone_number": null, "first_name": "Shawn", "last_name": "Carter", "middle_name": "Corey", "sex": null, "birthdate": "1969-12-04", "join_date": "2016-01-01", "previous_logins": 20000, "last_ip": null}

Deserializing is very similar to serializing. We create a GenericDatumReader, analogous to the GenericDatumWriter we used in serialization, which converts in-memory serialized items into GenericRecords. We pass the DatumReader and the previously created File to a DataFileReader, analogous to the DataFileWriter, which reads the data file on disk.

Next, we use the DataFileReader to iterate through the serialized users and print the deserialized object to stdout. Note how we perform the iteration: we create a single GenericRecord object which we store the current deserialized user in, and pass this record object to every call of dataFileReader.next. This is a performance optimization that allows the DataFileReader to reuse the same record object rather than allocating a new GenericRecord for every iteration,

References:

Avro Java Example

Have any Question or Comment?

Leave a Reply

Your email address will not be published. Required fields are marked *