MobileVLM

Paper Link

MobileVLM: A Vision-Language Model for Better Intra- and Inter-UI Understanding

News

2024.11.12 - Partial training data and random walk code for Mobile3M released!
2024.10.4 - Test data for Mobile3M released!
2024.9.26 - Our work accepted by EMNLP 2024 Findings!

1. Quick Start

Requirements

transformers==4.32.0
accelerate
tiktoken
einops
transformers_stream_generator==0.0.4
scipy
torchvision
pillow
tensorboard
matplotlib

2. Mobile3M Dataset

Training Data

Training data is available at the following link: data. We will gradually upload data for all apps.

Corpus Collection Script

To start collecting data, run the script main/corpus/googleCreatDataset/arm_graph_para_lock.py.

Example usage:

python googleCreatDataset/arm_graph_para_lock.py --device_name 10.53.89.79:6532 --systemPort 8112 --appid 8201 --command_executorhttp://127.0.0.1:4812/wd/hub--appPackage com.lucky.luckyclient --name_en lucky --diff_max 0.5 --diff_png 0.3 --waitadb 8 --prefix lucky0_3_1_2_ --recheck -1

Parameter Descriptions

device_name: Name of the emulator.
appid: Storage ID of the app being collected, e.g., 8201.
command_executor: Appium system endpoint URL.
--diff_max 0.5 --diff_png 0.3: Page similarity thresholds for differentiating screens.
--prefix lucky0_3_1_2_: Distributed starting path for data collection.
--recheck -1: Specifies whether to recheck previously collected data. Set to -1 for no recheck.

Data Generation Code for Each Task

The code for generating data for each task can be found in the following directories:

Our Test Data

Our test data is available at data.

4. License

The dataset of this project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.

The source code of the this is licensed under the Apache 2.0 license.

Summary of Terms

Attribution: You must give appropriate credit, provide a link to the license, and indicate if changes were made.
NonCommercial: You may not use the material for commercial purposes.
ShareAlike: If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.

License Badge

5. Citation

If you'd like to use our benchmark or cite this paper, please kindly use the reference below:

@article{wu2024mobilevlm,
  title={Mobilevlm: A vision-language model for better intra-and inter-ui understanding},
  author={Wu, Qinzhuo and Xu, Weikai and Liu, Wei and Tan, Tao and Liu, Jianfeng and Li, Ang and Luan, Jian and Wang, Bin and Shang, Shuo},
  journal={arXiv preprint arXiv:2409.14818},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
corpus		corpus
data		data
evaluation		evaluation
finetune		finetune
test_515		test_515
LICENSE		LICENSE
README.md		README.md
ctrip0_0-上海.jpg		ctrip0_0-上海.jpg
finetune.py		finetune.py
finetune_resume.py		finetune_resume.py
mobiletask.jfif		mobiletask.jfif
mobilevlm.png		mobilevlm.png
mobilevlm_table.png		mobilevlm_table.png
requirements.txt		requirements.txt
taSK.png		taSK.png
xml_to_html.py		xml_to_html.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MobileVLM

Paper Link

News

1. Quick Start

Requirements

2. Mobile3M Dataset

Training Data

Corpus Collection Script

Parameter Descriptions

Data Generation Code for Each Task

Our Test Data

4. License

Summary of Terms

License Badge

5. Citation

About

Releases

Packages

Languages

License

XiaoMi/mobilevlm

Folders and files

Latest commit

History

Repository files navigation

MobileVLM

Paper Link

News

1. Quick Start

Requirements

2. Mobile3M Dataset

Training Data

Corpus Collection Script

Parameter Descriptions

Data Generation Code for Each Task

Our Test Data

4. License

Summary of Terms

License Badge

5. Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages