<abbr id="y2asm"></abbr><abbr id="y2asm"></abbr>
  • <code id="y2asm"></code>
    <code id="y2asm"></code>
  • <button id="y2asm"></button>
    <rt id="y2asm"></rt>
    MLlib is Apache Spark's scalable machine learning library.

    Ease of use

    Usable in Java, Scala, Python, and R.

    MLlib fits into Spark's APIs and interoperates with NumPy in Python (as of Spark 0.9) and R libraries (as of Spark 1.5). You can use any Hadoop data source (e.g. HDFS, HBase, or local files), making it easy to plug into Hadoop workflows.

    data = spark.read.format("libsvm")\
      .load("hdfs://...")

    model = KMeans(k=10).fit(data)
    Calling MLlib in Python

    Performance

    High-quality algorithms, 100x faster than MapReduce.

    Spark excels at iterative computation, enabling MLlib to run fast. At the same time, we care about algorithmic performance: MLlib contains high-quality algorithms that leverage iteration, and can yield better results than the one-pass approximations sometimes used on MapReduce.

    Logistic regression in Hadoop and Spark

    Runs everywhere

    Spark runs on Hadoop, Apache Mesos, Kubernetes, standalone, or in the cloud, against diverse data sources.

    You can run Spark using its standalone cluster mode, on EC2, on Hadoop YARN, on Mesos, or on Kubernetes. Access data in HDFS, Apache Cassandra, Apache HBase, Apache Hive, and hundreds of other data sources.

    Algorithms

    MLlib contains many algorithms and utilities.

    ML algorithms include:

    • Classification: logistic regression, naive Bayes,...
    • Regression: generalized linear regression, survival regression,...
    • Decision trees, random forests, and gradient-boosted trees
    • Recommendation: alternating least squares (ALS)
    • Clustering: K-means, Gaussian mixtures (GMMs),...
    • Topic modeling: latent Dirichlet allocation (LDA)
    • Frequent itemsets, association rules, and sequential pattern mining

    ML workflow utilities include:

    • Feature transformations: standardization, normalization, hashing,...
    • ML Pipeline construction
    • Model evaluation and hyper-parameter tuning
    • ML persistence: saving and loading models and Pipelines

    Other utilities include:

    • Distributed linear algebra: SVD, PCA,...
    • Statistics: summary statistics, hypothesis testing,...

    Refer to the MLlib guide for usage examples.

    Community

    MLlib is developed as part of the Apache Spark project. It thus gets tested and updated with each Spark release.

    If you have questions about the library, ask on the Spark mailing lists.

    MLlib is still a rapidly growing project and welcomes contributions. If you'd like to submit an algorithm to MLlib, read how to contribute to Spark and send us a patch!

    Getting started

    To get started with MLlib:

    • Download Spark. MLlib is included as a module.
    • Read the MLlib guide, which includes various usage examples.
    • Learn how to deploy Spark on a cluster if you'd like to run in distributed mode. You can also run locally on a multicore machine without any setup.
    主站蜘蛛池模板: 神马重口味456| 国产激爽大片高清在线观看| 欧美日本中文字幕| 国产福利在线导航| 久久亚洲精品视频| 美国式禁忌交换伴侣| 天堂在线免费观看中文版| 亚洲日本在线电影| 东京一本一道一二三区| 男人j放进女人p全黄| 国产精品美女视频| 久久精品国产亚洲AV麻豆网站| 12至16末成年毛片高清| 日韩欧美在线免费观看| 国产SUV精品一区二区883| jux662正在播放三浦惠理子| 欧美视屏在线观看| 国产成人女人毛片视频在线| 中文字幕一区二区在线播放 | 日韩精品电影一区| 国产一区美女视频| chinese猛攻打桩机体育生| 欧美大黑bbb| 国产精品亚洲综合久久| 久久久精品久久久久久96| 精品一区二区三区波多野结衣| 国产精品第九页| 久久久精品久久久久特色影视| 男人j进女人p视频免费观看| 国产福利在线看| 中国国语毛片免费观看视频| 波多野结衣忆青春| 国产精品日韩欧美在线| 亚洲中文字幕久久精品无码a | 暖暖在线视频日本| 午夜久久久久久| 1000部拍拍拍18免费网站| 攵女yin乱合集高h小丹| 亚洲欧美日韩中文字幕在线 | 99精品无人区乱码在线观看| 最好看的中文字幕视频2018|