-
Notifications
You must be signed in to change notification settings - Fork 277
user>rmr>Changelog
Antonio Piccolboni edited this page Feb 12, 2015
·
9 revisions
From now on the "Changelog" and "New in this release" documents are merged into the release page. Please update your links and bookmarks.
*dfs.ls
and the Avro input format
- small enhancements and bug fixes
See New in this release for details.
- mapreduce returns job and application id as attributes
- multiple bug fixes related to
keyval
corner cases (check also user>rmr>Keyval-types-and-combinations), factor serialization, profiling, outer joins, hbase format build, and package load order.
See New in this release for details.
- Adds windows compatibility
- fixed support for date columns and logical NAs
- switched to Imports to prevent namespace pollution
- silenced some warnings
See New in this release for details.
- New option
hdfs.tempfile
, gonedfs.tempfile
, splits the tmp for the two backends. - Hbase format gains start and stop row and regex filtering capability, courtesy @khharut.
- Fixes an efficiency problem with serialization when data frames used as keys.
- Fixes a problem with factors used as keys and reduce groups.
- More bugs squashed.
See New in this release for details.
- Faster than 2.3.0 where that version was slow, 10X in some cases, and in general more predictable as far as performance.
- Removes confusing
keyval.length
option giving responsability to each format for how much to read and write. - Adds
dfs.exists
to check if a file exists (backend independent). - Fixes a problem with the hbase format.
- Fixes the reduce call counter.
- Allows to set the
HDFS_CMD
environment variable to help rmr2 find thehdfs
command, avoid annoying deprecation warnings.
See New in this release for details.
- Supports the upcoming plyrmr package, now in preview.
- New backend independent file operations
- New "pig.hive" format to import/export from/to those systems
- Speed improvements when using data frames.
- Better key normalization, prevents occasional grouping errors.
- Limit broadcasting of large objects for efficiency reasons, under user contol.
See New in this release for details.
- Fixes two bugs, one of which can cause occasional, hard to detect data corruption. Recommended upgrade.
See New in this release for details.
- Compatible with Hortonworks Data Platform for windows.
- Speed improvements
- A number of bug fixes affecting, among others,
equijoin
and the local backend.
See New in this release for details.
-
equijoin
now accepts I/O format specs likemapreduce
. -
rmr.options
now give access to adfs.tempdir
setting to set the HDFS tempdir to a different setting from the R tempdir. -
rmr.str
returns its own argument, which allows less intrusive code changes when adding logging. - Made some error messages more informative.
- Bugs affecting
c.keyval
,equijoin
,keyval
, the CSV input and ouput formats, the "reduce calls" counter and thebackend.parameters
option tomapreduce
See New in this release for details.
- Faster, with both behind-the-API work and some additional features focused on accelerating the reduce phase.
- Reduce functions can be vectorized w.r.t to the keys, in addition to the values, for the case of small reduce groups.
- In-memory combiners can be faster than the regular variety for some applications.
- Counters provide an additional way to monitor jobs and memory profiling helps with optimization.
- HBase input format to process directly HBase tables
-
c.keyval
function that helps creating complex key-value pairs.
See New in this release for details.
- Lighter dependencies, compatible with R 2.15.2 and numerous bug fixes, many related to
equijoin
.
See New in this release for details.
- Tested on CDH3, CDH4, Apache Hadoop 1.0.4 and MapR 2.0.1.
- Many bug fixes including
rmr.sample
andequijoin
.
See New in this release for details.
- Simplified API with better support for vectorization and structured data. As a trade off, some porting of 1.3.1 based code is necessary.
- Modified native format now combines speed and compatibility in a transparent way; backward compatible with 1.3.x
- Completely refactored source code
- Added non-core functions for sampling, size testing, debugging and more
- True map-only jobs
See New in this release for details.
- Tested on CDH3, CDH4, and Apache Hadoop 1.0.2
- Completed transition of the code-heavy part of the documentation to Rmd
See New in this release for details.
- An optional vectorized API for efficient R programming when dealing with small records.
- Fast C implementations for serialization and deserialization from and to typedbytes.
- Other readers and writers work much better in vectorized mode, namely csv and text
- Additional steps to support structured data better, that is you can use more data frames and less lists in the API
- Better whirr scripts, more forgiving behavior for package loading and bug fixes
See New in this release for details.
- Binary formats
- Simpler, more powerful I/O format API
- Native binary format with support for all R data types
- Worked around an R bug that made large reduces very slow.
- Backend specific parameters to modify things like number of reducers at the hadoop level
- Automatic library loading in mappers and reducers
- Better data frame conversions
- Adopted a uniform.naming.convention
- New package options API
- Native R serialization/deserialization, which implies that all R objects are supported as key and value, without any conversion boilerplate code. This is the new default. JSON still supported. csv reader/writer also available -- somewhat experimental.
- Multiple backends (hadoop and local); local backend is useful for debugging at small scale; having two backends enforces modular design, opens up further possibilities (rjava, Amazon's EMR, OpenCL have been suggested), forces to clarify semantics.
- Multiple tests of backend equivalence.
- Simpler interface for profiler.
- Equijoins (rough equivalent of merge for mapreduce)
- dfs.empty to check if file is empty
- to.map, to.reduce, to.reduce.all higher order functions to create simple map and reduce functions from regular ones.